POS (Parts of Speech) Tagging System for Sindhi Language

International Journal of Computer Science and Emerging Technologies

View Publication Info
Field Value
Title POS (Parts of Speech) Tagging System for Sindhi Language
Creator Ghazala Gul Junejo, Mir Sajjad Hussain Talpur, Taha Nuzhat, and Shakir Hussain Talpur
Subject Natural language Processing (NlP), Machine learning, core (NlP) library, HMM, Stanford POS taggers.
Description Part of Speech (POS) tagging is a fundamenta1 need for any natural language text processing system. However, bui1ding such a classifier is quite challenging due to the inherent ambiguity present in the natural languages where the same word may be used as different part of speech in different contexts. Severa1 efforts have been made to bui1d such taggers for many internationa11anguages inc1uding Eng1ish, French, German and Arabic. Now, in order to bui1d Sindhi text processing system, a POS tagger for Sindhi 1anguage is much needed. 1ike Arabic, Sindhi POS tagging is more cha11enging due to its word morpho1ogy. In this thesis, we will describe various techniques that are avai1ab1e for POS tagging and discuss why we may or may not opt for them. We will then present a brief survey of the efforts that have been done so far for POS tagging of Sindhi language. In this research we aim to create our own POS tagger for Sindhi by training the famous Stanford POS tagger over a corpus containing more than 5000 Sindhi words. The performance of the trained POS tagger will be s by using another test corpus containing 2000 Sindhi words. Manual tagging of words (even with the help of semi-automatic too1s) for training purpose in such huge corpuses is a significant effort in itself and will be retained for later studies.
Publisher Shah Abdul Latif University, Khairpur
Date 2021-03-29
Type info:eu-repo/semantics/article
Format application/pdf
Identifier http://ijcet.salu.edu.pk/index.php/IJCET/article/view/59
Source International Journal of Computer Science and Emerging Technologies ; Vol 4 No 2 (2020): IJCET Vol 4 Issue 2, Dec 2020; 14-22
Language eng
Relation http://ijcet.salu.edu.pk/index.php/IJCET/article/view/59/50
Rights Copyright (c) 2021 International Journal of Computer Science and Emerging Technologies

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us


Copyright © 2015-2018 Simon Fraser University Library