Paper presentation at Diálogos XVIII.
Arróniz, S. & Restrepo, F. Text Classification for L2 Spanish Compositions. 18th Diálogos Graduate Student Conference. Bloomington, IN. February 2021.
Abstract
In this work we report on the process of building a text classification tool for the automatic identification of language proficiency in Spanish L2 compositions. We trained and evaluated four machine learning models from widely available tools for text classification tasks, including Doc2Vec, fastText, scikit-learn, and Flair. The datasets come from a Corpus of L2 Spanish compositions, CEDEL2, which is composed of over 1,900 essays from English L1 learners of Spanish. Two models showed higher reliability and performance from the test in terms of both F1 accuracy and human ratings, namely scikit-learn and Flair. The models will be incorporated into SEÑAL, a program for the computational analysis of Spanish L2 compositions.