A Hybrid Sentiment Classification for Amharic Book Reviews

Shikur, Musa

st. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/5271

Title:	A Hybrid Sentiment Classification for Amharic Book Reviews
Authors:	Shikur, Musa
Keywords:	Opinion, Sentiment Analysis Lexicon-Based Classifier, Machine Learning, Hybrid Classifier
Issue Date:	Jan-2019
Publisher:	St. Mary's University
Abstract:	The emergence of Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. Sentiment Analysis involves extracting, understanding, classifying and presenting the emotions and opinions expressed by users. We explored opinion mining as a text classification task and employed unigram as a feature set. We have performed different experiments that can be grouped into three. In the first group (lexical classifier), we developed an algorithm to classify reviews based on the number of count of opinion words. The performance of this algorithm has been evaluated by comparing the result of lexical classifier algorithm with the actual labels of the reviews. In the second group of experiments, three popular feature selection methods Chi-Square, MutualInformation-Gain and Galavvotti-Sebastiani-Simi (GSS) coefficient have been compared for performance in selecting a better subset of feature set. For these comparisons, three supervised classifiers Nave Bayes, Logistic-Regression and SVM have been used. Experiments on these three classifiers have been done using all three of the above feature selection methods with 750, 1000, 1250, and 1500 numbers of features. Here, It enabled us to know which combinations of feature selection methods, classifier, and a number of features work best in our domain. In the third group of experiments, we combine the lexical classifier with machine learning sequentially. In this research work, hybrid sentiment classification has been done for classifying Amharic book reviews into positive and negative. The experiments are conducted using 600 Amharic book reviews collected from different sources like facebook, personal blogs, and manually collected from individual book readers. For machine learning, the experiment indicates that the Naïve Bayes algorithm, using Mutual Information Gain feature selection method, with 1500 number of features perform best with an accuracy of 93.33%. The experiment also indicates a hybrid approach with accuracy (87%) outperform lexical approach with 74% accuracy but not machine learning approach which performs with an accuracy of 93.33%.
URI:	. http://hdl.handle.net/123456789/5271
Appears in Collections:	Master of computer science

Files in This Item:

File	Description	Size	Format
A Hybrid Sentiment Classification -Muisa-Shikur.pdf		2.08 MB	Adobe PDF	View/Open

Show full item record