DC Field | Value | Language |
dc.contributor.author | Ebrahim, Mehbub | - |
dc.date.accessioned | 2023-08-02T11:50:49Z | - |
dc.date.available | 2023-08-02T11:50:49Z | - |
dc.date.issued | 2023-06 | - |
dc.identifier.uri | . | - |
dc.identifier.uri | http://hdl.handle.net/123456789/7697 | - |
dc.description.abstract | The process of stemming involves stripping a word of its inflectional and derived variations. It is
crucial for many applications of natural language processing. When analyzing the importance of
page for user query which only specifies one form, the varied word structures used in searching
and indexing should be anticipated. Conflation methods can help improve the efficiency of an IR
system by condensing variant phrases into a single form. In order to standardize as many similar
phrases and word patterns as possible. That may be utilized in the retrieval procedure, stemmers
are employed in information retrieval.
During this type of research work, a solid awareness of the Guragegna grammar in addition an
examination of the language's inflectional and derivational affix was required. The Gurage
language generates several word forms using stems by use of affixation and reduplication (final,
total, and frequentative). Prefix, suffix, and infix are frequently used affixations. Gurage often
concatenates affixes, which can lead to almost large words with a lot of semantic content.
This study introduces the first stemming algorithm that conflates Guragegna phrase variants.
Python programming was used in the creation of the Gurage stemmer. The researcher created little
rule sets for related affixes in an attempt to follow an algorithm with a straightforward structure.
In order to develop the stemmer, a list of stop words and the Experimental text document were
both acquired from various sources along with a research article that covers the morphology of the
Gurage language.
The iterative, context-sensitive, and recoding methods used in this study's stemmer eliminate
prefix, suffix, and reduplicated letters that are final, total, and frequentative reduplicates. Prefix,
suffix, and then letter reduplication were applied as part of this experiment's removing technique.
in the evaluation process is contained in the Data set. The experiment text has 1,933 words, of
which 1266 resulted from the stemming procedure, out of a total of 1266.The number of words
successfully stemmed is 1097, achieving an accuracy of 86.65%. 13.34% of the stemmed words
were wrongly stemmed. Over stemming accounts for 7.97% (101) of the terms, while under
stemming accounts for 5.37% . | en_US |
dc.language.iso | en | en_US |
dc.publisher | ST. MARY’S UNIVERSITY | en_US |
dc.subject | stemming algorithm; Guragegna stemmer; context-sensitive stemmer; iterative stemmer; Guragegna language | en_US |
dc.title | DEVELOPMENT OF STEMMING ALGORITHM FOR GURAGEGNA TEXT | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | Master of computer science
|