У нас вы можете посмотреть бесплатно Webinar: Automated scoring of writing: Comparing deep learning and feature-based approaches или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
For the 7th installment of our Innovations in Language Assessment webinar series, we welcome Dr. Alistair Van Moere, President, MetaMetrics Inc and Research Professor, The University of North Carolina at Chapel Hill, and Dr. Jing Wei, VP, AI and Product Innovation at MetaMetrics, Inc! There are two main modeling approaches in Automated Essay Scoring (AES): feature-based and deep learning. Feature-based approaches analyze characteristics of the writing which are combined and weighted in statistical models to predict human ratings. In contrast, deep learning approaches find patterns in the writing data, but the specific characteristics or variables in those patterns cannot be explicitly uncovered (they are too “deep” in complex layers of models). The AES literature treats these two approaches as distinct dichotomous paradigms (Kumar & Boulanger, 2020), but there has been little research comparing them on the same dataset. This paper analyzes 2,090 English Foreign Language learners’ written responses to three types of writing tasks. Student essays were double-rated on “language use” and “task completion” traits using a 0-4 scale, with the mean of the two ratings serving as the criterion score for model training. The dataset was split into training (80%) and test (20%) sets. Two models were applied, (i) a ridge regression with regularization and (ii) a deep-learning transformer-based model using Bidirectional Encoder Representations from Transformers (BERT) text embeddings. For ridge regression, a total of 189 features were initially entered into the model and through an iterative process the final model consisted of 14 features. The results show that ridge regression had similar performance to deep learning on language use (r=0.84 to 0.84) and task completion (r=0.83 to 0.84) on predictive accuracy. Further statistical comparisons were made using root mean square error, percent agreement and quadratic weighted kappa. Discussion will be made on descriptive accuracy and relevance to the language testing audience. Feature-engineering and deep learning models will be compared on model performance, interpretability, and implementation. Implications will be drawn on how a combination of feature-engineering and deep learning approaches can provide insights for the educational audience.