DC Field | Value | Language |
dc.contributor.author | MARA, MELAKU | - |
dc.date.accessioned | 2019-05-06T08:08:05Z | - |
dc.date.available | 2019-05-06T08:08:05Z | - |
dc.date.issued | 2018-07 | - |
dc.identifier.uri | . | - |
dc.identifier.uri | http://hdl.handle.net/123456789/4462 | - |
dc.description.abstract | Machine translation is a technology for the automatic translation of text or speech from one natural
language to another. Since there is a need for translation of sentences between English-Wolaytta
language to make available the English documents in Wolaytta language and minimize the language
barrier. Thus, this study in the development of a English-Wolaytta machine translation system using
statistical approach.
In order to achieve the objective of this research work, 30,000 bilingual corpus is collected from
spiritual domain and 39,893 monolingual corpus from different sources. And also prepared in a format
suitable for use in the development process (normalization, tokenization, lower-case and clean) and
classified as training, tunning and testing set. Aligned parallel sentences manually and used freely
available tools for the different purposes such as SRILM toolkit for language model, MGIZA++ align
the corpus at word level by using IBM models (1-5), Decoding has been done using Moses, and
Ubuntu operating system which is suitable for Moses environment has been used. In addition,
unsupervised morpheme segmentation tool Morfessor is used for segmentation of Wolaytta text.
The experiments were taken separately, one for the unsegmented and the other for segmented corpus.
The parallel sentences divided by 5,000, 10,000, 15,000, 20,000, 25,000 and 30,000. The
unsegmented corpus performs BLEU score of 4.91%, 6.30%, 7.21%, 7.60%, 7.96% and 8.46% used
the above divided parallel sentences. The segmented corpus performs BLEU score of 9.83%, 11.38%,
12.70%, 12.77%, 12.93% and 13.21% used the above divided parallel sentences. Its performance
improved by increased the size of the corpus and segmented parallel sentences.
Base on the experiments done, the researcher observed that there will be a better performance when
increase the size of the corpus and morphological segmentation. Therefore future research should
focus to further improve the performance of the system increase the size of the corpus and
morphological segmentation. | en_US |
dc.language.iso | en | en_US |
dc.publisher | St.Mary's University | en_US |
dc.subject | Machine translation | en_US |
dc.subject | English-Wolaytta machine translation system | en_US |
dc.title | ENGLISH-WOLAYTTA MACHINE TRANSLATION USING STATISTICAL APPROACH | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | Master of computer science
|