ENGLISH-WOLAYTTA MACHINE TRANSLATION USING STATISTICAL APPROACH

MARA, MELAKU

Full metadata record

DC Field	Value	Language
dc.contributor.author	MARA, MELAKU	-
dc.date.accessioned	2019-04-23T11:28:12Z	-
dc.date.available	2019-04-23T11:28:12Z	-
dc.date.issued	2018-07	-
dc.identifier.uri	.	-
dc.identifier.uri	http://hdl.handle.net/123456789/4453	-
dc.description.abstract	Machine translation is a technology for the automatic translation of text or speech from one natural language to another. Since there is a need for translation of sentences between English-Wolaytta language to make available the English documents in Wolaytta language and minimize the language barrier. Thus, this study in the development of a English-Wolaytta machine translation system using statistical approach. In order to achieve the objective of this research work, 30,000 bilingual corpus is collected from spiritual domain and 39,893 monolingual corpus from different sources. And also prepared in a format suitable for use in the development process (normalization, tokenization, lower-case and clean) and classified as training, tunning and testing set. Aligned parallel sentences manually and used freely available tools for the different purposes such as SRILM toolkit for language model, MGIZA++ align the corpus at word level by using IBM models (1-5), Decoding has been done using Moses, and Ubuntu operating system which is suitable for Moses environment has been used. In addition, unsupervised morpheme segmentation tool Morfessor is used for segmentation of Wolaytta text. The experiments were taken separately, one for the unsegmented and the other for segmented corpus. The parallel sentences divided by 5,000, 10,000, 15,000, 20,000, 25,000 and 30,000. The unsegmented corpus performs BLEU score of 4.91%, 6.30%, 7.21%, 7.60%, 7.96% and 8.46% used the above divided parallel sentences. The segmented corpus performs BLEU score of 9.83%, 11.38%, 12.70%, 12.77%, 12.93% and 13.21% used the above divided parallel sentences. Its performance improved by increased the size of the corpus and segmented parallel sentences. Base on the experiments done, the researcher observed that there will be a better performance when increase the size of the corpus and morphological segmentation. Therefore future research should focus to further improve the performance of the system increase the size of the corpus and morphological segmentation	en_US
dc.language.iso	en	en_US
dc.publisher	St.Mary's University	en_US
dc.subject	Machine translation	en_US
dc.subject	English documents in Wolaytta language	en_US
dc.title	ENGLISH-WOLAYTTA MACHINE TRANSLATION USING STATISTICAL APPROACH	en_US
dc.type	Thesis	en_US
Appears in Collections:	Business Administration

File	Description	Size	Format
last cover.pdf		99.63 kB	Adobe PDF	View/Open
melaku mara (thesis).pdf		572.58 kB	Adobe PDF	View/Open