DC Field | Value | Language |
dc.contributor.author | Shikur, Musa | - |
dc.date.accessioned | 2020-04-07T11:21:15Z | - |
dc.date.available | 2020-04-07T11:21:15Z | - |
dc.date.issued | 2019-01 | - |
dc.identifier.uri | . | - |
dc.identifier.uri | http://hdl.handle.net/123456789/5271 | - |
dc.description.abstract | The emergence of Web technology generated a massive amount of raw data by enabling Internet
users to post their opinions, reviews, comments on the web. Processing this raw data to extract
useful information can be a very challenging task. Sentiment Analysis involves extracting,
understanding, classifying and presenting the emotions and opinions expressed by users. We
explored opinion mining as a text classification task and employed unigram as a feature set. We
have performed different experiments that can be grouped into three.
In the first group (lexical classifier), we developed an algorithm to classify reviews based on the
number of count of opinion words. The performance of this algorithm has been evaluated by
comparing the result of lexical classifier algorithm with the actual labels of the reviews. In the
second group of experiments, three popular feature selection methods Chi-Square, MutualInformation-Gain and Galavvotti-Sebastiani-Simi (GSS) coefficient have been compared for
performance in selecting a better subset of feature set. For these comparisons, three supervised
classifiers Nave Bayes, Logistic-Regression and SVM have been used. Experiments on these
three classifiers have been done using all three of the above feature selection methods with 750,
1000, 1250, and 1500 numbers of features. Here, It enabled us to know which combinations of
feature selection methods, classifier, and a number of features work best in our domain. In the
third group of experiments, we combine the lexical classifier with machine learning sequentially.
In this research work, hybrid sentiment classification has been done for classifying Amharic
book reviews into positive and negative. The experiments are conducted using 600 Amharic
book reviews collected from different sources like facebook, personal blogs, and manually
collected from individual book readers. For machine learning, the experiment indicates that the
Naïve Bayes algorithm, using Mutual Information Gain feature selection method, with 1500
number of features perform best with an accuracy of 93.33%. The experiment also indicates a
hybrid approach with accuracy (87%) outperform lexical approach with 74% accuracy but
not machine learning approach which performs with an accuracy of 93.33%. | en_US |
dc.language.iso | en | en_US |
dc.publisher | St. Mary's University | en_US |
dc.subject | Opinion, Sentiment Analysis | en_US |
dc.subject | Lexicon-Based Classifier, Machine Learning, Hybrid Classifier | en_US |
dc.title | A Hybrid Sentiment Classification for Amharic Book Reviews | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | Master of computer science
|