A Hybrid Sentiment Classification for Amharic Book Reviews

Shikur, Musa

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shikur, Musa	-
dc.date.accessioned	2020-04-07T11:21:15Z	-
dc.date.available	2020-04-07T11:21:15Z	-
dc.date.issued	2019-01	-
dc.identifier.uri	.	-
dc.identifier.uri	http://hdl.handle.net/123456789/5271	-
dc.description.abstract	The emergence of Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. Sentiment Analysis involves extracting, understanding, classifying and presenting the emotions and opinions expressed by users. We explored opinion mining as a text classification task and employed unigram as a feature set. We have performed different experiments that can be grouped into three. In the first group (lexical classifier), we developed an algorithm to classify reviews based on the number of count of opinion words. The performance of this algorithm has been evaluated by comparing the result of lexical classifier algorithm with the actual labels of the reviews. In the second group of experiments, three popular feature selection methods Chi-Square, MutualInformation-Gain and Galavvotti-Sebastiani-Simi (GSS) coefficient have been compared for performance in selecting a better subset of feature set. For these comparisons, three supervised classifiers Nave Bayes, Logistic-Regression and SVM have been used. Experiments on these three classifiers have been done using all three of the above feature selection methods with 750, 1000, 1250, and 1500 numbers of features. Here, It enabled us to know which combinations of feature selection methods, classifier, and a number of features work best in our domain. In the third group of experiments, we combine the lexical classifier with machine learning sequentially. In this research work, hybrid sentiment classification has been done for classifying Amharic book reviews into positive and negative. The experiments are conducted using 600 Amharic book reviews collected from different sources like facebook, personal blogs, and manually collected from individual book readers. For machine learning, the experiment indicates that the Naïve Bayes algorithm, using Mutual Information Gain feature selection method, with 1500 number of features perform best with an accuracy of 93.33%. The experiment also indicates a hybrid approach with accuracy (87%) outperform lexical approach with 74% accuracy but not machine learning approach which performs with an accuracy of 93.33%.	en_US
dc.language.iso	en	en_US
dc.publisher	St. Mary's University	en_US
dc.subject	Opinion, Sentiment Analysis	en_US
dc.subject	Lexicon-Based Classifier, Machine Learning, Hybrid Classifier	en_US
dc.title	A Hybrid Sentiment Classification for Amharic Book Reviews	en_US
dc.type	Thesis	en_US
Appears in Collections:	Master of computer science