POS (Parts of Speech) Tagging System for Sindhi Language

International Journal of Computer Science and Emerging Technologies

View Publication Info
 
 
Field Value
 
Title POS (Parts of Speech) Tagging System for Sindhi Language
 
Creator Ghazala Gul Junejo, Mir Sajjad Hussain Talpur, Taha Nuzhat, and Shakir Hussain Talpur
 
Subject Natural language Processing (NlP), Machine learning, core (NlP) library, HMM, Stanford POS taggers.
 
Description Part of Speech (POS) tagging is a fundamenta1 need for any natural language text processing system. However, bui1ding such a classifier is quite challenging due to the inherent ambiguity present in the natural languages where the same word may be used as different part of speech in different contexts. Severa1 efforts have been made to bui1d such taggers for many internationa11anguages inc1uding Eng1ish, French, German and Arabic. Now, in order to bui1d Sindhi text processing system, a POS tagger for Sindhi 1anguage is much needed. 1ike Arabic, Sindhi POS tagging is more cha11enging due to its word morpho1ogy. In this thesis, we will describe various techniques that are avai1ab1e for POS tagging and discuss why we may or may not opt for them. We will then present a brief survey of the efforts that have been done so far for POS tagging of Sindhi language. In this research we aim to create our own POS tagger for Sindhi by training the famous Stanford POS tagger over a corpus containing more than 5000 Sindhi words. The performance of the trained POS tagger will be s by using another test corpus containing 2000 Sindhi words. Manual tagging of words (even with the help of semi-automatic too1s) for training purpose in such huge corpuses is a significant effort in itself and will be retained for later studies.
 
Publisher Shah Abdul Latif University, Khairpur
 
Date 2021-03-29
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
 
Format application/pdf
 
Identifier http://ijcet.salu.edu.pk/index.php/IJCET/article/view/59
 
Source International Journal of Computer Science and Emerging Technologies ; Vol 4 No 2 (2020): IJCET Vol 4 Issue 2, Dec 2020; 14-22
2522-3348
2522-3348
 
Language eng
 
Relation http://ijcet.salu.edu.pk/index.php/IJCET/article/view/59/50
 
Rights Copyright (c) 2021 International Journal of Computer Science and Emerging Technologies
https://creativecommons.org/licenses/by-nc/4.0
 

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us

Twitter

Copyright © 2015-2018 Simon Fraser University Library