Examining the Part-of-speech Features in Assessing the Readability of Vietnamese Texts

Acta Linguistica Asiatica

View Publication Info
 
 
Field Value
 
Title Examining the Part-of-speech Features in Assessing the Readability of Vietnamese Texts
Študija značilnosti besednih vrst in njihovega vpliva na berljivost besedil v vietnamščini
 
Creator Luong, An-Vinh
Nguyen, Diep
Dinh, Dien
 
Subject text readability
text difficulty
Vietnamese text readability
text classification
school textbooks
berljivost besedila
raven enostavnosti
berljivost vietnamskih tekstov
klasifikacija tekstov
šolski učbeniki
 
Description The readability of the text plays a very important role in selecting appropriate materials for the level of the reader. Text readability in Vietnamese language has received a lot of attention in recent years, however, studies have mainly been limited to simple statistics at the level of a sentence length, word length, etc. In this article, we investigate the role of word-level grammatical characteristics in assessing the difficulty of texts in Vietnamese textbooks. We have used machine learning models (for instance, Decision Tree, K-nearest neighbor, Support Vector Machines, etc.) to evaluate the accuracy of classifying texts according to readability, using grammatical features in word level along with other statistical characteristics. Empirical results show that the presence of POS-level characteristics increases the accuracy of the classification by 2-4%.
Berljivost besedila ima zelo pomembno vlogo pri izbiri ustreznih gradiv za raven bralca. Berljivost besedil v vietnamskem jeziku pridobiva pozornost šele v zadnjih letih in dosedanje študije so omejene na preproste ocene na osnovi statističnih podatkov za dolžino stavka, dolžino besed in podobnih značilnosti. V tem članku raziskujemo vlogo slovničnih značilnosti na besedni ravni pri ocenjevanju težavnosti besedil v vietnamskih učbenikih. Za oceno natančnosti razvrščanja besedila glede na berljivost smo uporabili modele strojnega učenja (na primer drevo odločitve, K-najbližji sosed, podporni vektorski stroji itd.) Empirični rezultati kažejo, da upoštevanje različnih značilnosti na nivoju besednih vrst poveča natančnost klasifikacije za 2-4%.
 
Publisher Znanstvena založba Filozofske fakultete / Ljubljana University Press, Faculty of Arts
 
Date 2020-07-30
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
 
Format application/pdf
application/epub+zip
 
Identifier https://revije.ff.uni-lj.si/ala/article/view/9161
10.4312/ala.10.2.127-142
 
Source Acta Linguistica Asiatica; Vol 10 No 2 (2020); 127-142
Acta Linguistica Asiatica; L. 10 Št. 2 (2020); 127-142
2232-3317
10.4312/ala.10.2
 
Language eng
 
Relation https://revije.ff.uni-lj.si/ala/article/view/9161/9042
https://revije.ff.uni-lj.si/ala/article/view/9161/9057
 
Rights Copyright (c) 2020 An-Vinh LUONG, Diep NGUYEN, Dien DINH
http://creativecommons.org/licenses/by-sa/4.0
 

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us

Twitter

Copyright © 2015-2018 Simon Fraser University Library