Development of Longest-Match Based Stemmer for Texts of Wolaita Language

Girma Yohannis Bade; Hussien Seid

doi:doi:10.11648/j.ijdst.20180403.11

| Peer-Reviewed

Development of Longest-Match Based Stemmer for Texts of Wolaita Language

Girma Yohannis Bade, Hussien Seid

Published in International Journal on Data Science and Technology (Volume 4, Issue 3)

Received: 19 May 2018 Accepted: 5 July 2018 Published: 30 July 2018

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.

Published in	International Journal on Data Science and Technology (Volume 4, Issue 3)
DOI	10.11648/j.ijdst.20180403.11
Page(s)	79-83
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Stemmer, Natural Language Processing, Morphology, Longest-Match

References

[1]	Wardhaugh, R. Introduction to Linguistics. New York: McGraw-Hill Book Company, (1977).
[2]	Lemma Lessa. “Development of stemming algorithm for Wolaita text.” M. Sc. Thesis, Addis Ababa University, Department of Information Science, Addis Ababa,(2003).
[3]	Salton, G. & McGill, N. “Introduction to Modern Information Retrieval”. New York: McGraw-Hill, (1983).
[4]	Liddy, E. “Enhanced text retrieval using natural language processing.” Bulletin of the American Society for Information Science, 24, PP. 14-16, (1983).
[5]	Schinke, R, et al. "A Stemming Algorithm for Latin Text Databases." In Journal of Documentation. 52(2), PP. 172 – 187, (1996).
[6]	Lamberti, Marcello and Sottile, Roberto. ”The Wolaita Language. Koln: Rudiger Koppe Verlag.”, (1997).
[7]	Paice C. D. “An Evaluation Method for Stemming Algorithms”. ACM SIGIR Conference on Research and Development in Information Retrieval. 1994, 42-50.
[8]	McGregor, W., (2009). Linguistics: An Introduction. London: Continuum International Publishing Group.
[9]	Debela T, Ermias. Designing a Rule Based Stemmer for Afaan Oromo Text. International Journal of Computational Linguistics (IJCL), Volume (1): Issue (2), October 2010.
[10]	Dawson J. L., 1974: "Suffix removal and word connation," ALLC Bulletin, 2(3), 33-46.

Cite This Article

Plain Text BibTeX RIS

APA Style

Girma Yohannis Bade, Hussien Seid. (2018). Development of Longest-Match Based Stemmer for Texts of Wolaita Language. International Journal on Data Science and Technology, 4(3), 79-83. https://doi.org/10.11648/j.ijdst.20180403.11

Copy | Download

ACS Style

Girma Yohannis Bade; Hussien Seid. Development of Longest-Match Based Stemmer for Texts of Wolaita Language. Int. J. Data Sci. Technol. 2018, 4(3), 79-83. doi: 10.11648/j.ijdst.20180403.11

Copy | Download

AMA Style

Girma Yohannis Bade, Hussien Seid. Development of Longest-Match Based Stemmer for Texts of Wolaita Language. Int J Data Sci Technol. 2018;4(3):79-83. doi: 10.11648/j.ijdst.20180403.11

Copy | Download

@article{10.11648/j.ijdst.20180403.11,
  author = {Girma Yohannis Bade and Hussien Seid},
  title = {Development of Longest-Match Based Stemmer for Texts of Wolaita Language},
  journal = {International Journal on Data Science and Technology},
  volume = {4},
  number = {3},
  pages = {79-83},
  doi = {10.11648/j.ijdst.20180403.11},
  url = {https://doi.org/10.11648/j.ijdst.20180403.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20180403.11},
  abstract = {This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.},
 year = {2018}
}

Copy | Download

TY - JOUR
T1 - Development of Longest-Match Based Stemmer for Texts of Wolaita Language
AU - Girma Yohannis Bade
AU - Hussien Seid
Y1 - 2018/07/30
PY - 2018
N1 - https://doi.org/10.11648/j.ijdst.20180403.11
DO - 10.11648/j.ijdst.20180403.11
T2 - International Journal on Data Science and Technology
JF - International Journal on Data Science and Technology
JO - International Journal on Data Science and Technology
SP - 79
EP - 83
PB - Science Publishing Group
SN - 2472-2235
UR - https://doi.org/10.11648/j.ijdst.20180403.11
AB - This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.
VL - 4
IS - 3
ER -

Copy | Download

Author Information

Girma Yohannis Bade

Department of Computer Science,Wolaita Sodo University, Wolaita, Ethiopia
Hussien Seid

Department of Computer Science and IT, Arba-Minch University, Arba-Minch, Ethiopia

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Girma Yohannis Bade, Hussien Seid. (2018). Development of Longest-Match Based Stemmer for Texts of Wolaita Language. International Journal on Data Science and Technology, 4(3), 79-83. https://doi.org/10.11648/j.ijdst.20180403.11

Copy | Download

ACS Style

Girma Yohannis Bade; Hussien Seid. Development of Longest-Match Based Stemmer for Texts of Wolaita Language. Int. J. Data Sci. Technol. 2018, 4(3), 79-83. doi: 10.11648/j.ijdst.20180403.11

Copy | Download

AMA Style

Girma Yohannis Bade, Hussien Seid. Development of Longest-Match Based Stemmer for Texts of Wolaita Language. Int J Data Sci Technol. 2018;4(3):79-83. doi: 10.11648/j.ijdst.20180403.11

Copy | Download

@article{10.11648/j.ijdst.20180403.11,
  author = {Girma Yohannis Bade and Hussien Seid},
  title = {Development of Longest-Match Based Stemmer for Texts of Wolaita Language},
  journal = {International Journal on Data Science and Technology},
  volume = {4},
  number = {3},
  pages = {79-83},
  doi = {10.11648/j.ijdst.20180403.11},
  url = {https://doi.org/10.11648/j.ijdst.20180403.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20180403.11},
  abstract = {This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.},
 year = {2018}
}

Copy | Download