الفهرس | Only 14 pages are availabe for public view |
Abstract Due to the rapid growth and recent advances of information technology, sentiment analysis (SA) become an important area of research. Social media like Facebook and Twitter provide a lot of comments, posts, and users` opinions about different themes like business products, customer services, users` reviews, … etc. Analysis and monitoring of such comments and posts are very important to improve the product functions, improve a customer service and extract useful information. This thesis adopts a sentiment analysis (SA) model to analyze the Arabic users` comments collected from the Facebook. Some preprocessing operations are done such as tokenization, noise rejections, normalization and stemming. The negation words, intensifiers words, and emotions are discussed and handled as they can improve the performance of SA. Precision, recall, and error rate were improved by approximately 20.2%, 18.8%, and 8.2% respectively when applying intensifiers, negations and emotions on a chosen dataset. The accuracy values of predicting the positive and negative comments were affected by users` writing style. The best values were for those comments written in modern standard Arabic (MSA), then those comments written in mixed Arabic, while the worst values were for those comments written in informal Arabic. The thesis also presents a study to classify the sentiment using some supervised machine learning approaches. The approaches are Naïve Bayes (NB), Stochastic Gradient Decent (SGD), Logistic Regression (LR), and Support Vector Machine (SVM). The approaches classified the sentiment and/or the users` opinions for the “Restaurant” dataset containing about 8000 reviews which was taken from the Facebook. The performance metrics such as precision, recall, and F-measure were the best for the SVM classifier and the worst values on the other hand were for the LR. The values of the performance metrics for NB and SGD were between those values of SVM and LR. Moreover, different methods of features selections are discussed, applied, and compared. The methods are based on term-weighting (such as uni-gram, bigram, trigram, and their combination), correlation between the individual features and target class, ꭓ2, and mutual information. from the experimental research it was noticed that the performance was the best for the feature selection method based on uni-gram term weighting. The term weighting based on combing uni-gram and bi-gram presented good results which were approximately close to that based on uni-gram only. The feature selection method based on correlation between the chosen features and target presented also promising performance similar to that based on uni-gram term weighting. It is important to mention that using small member of features or large member of features presented bad results. The performance was promising when using the most significant features which were chosen based on adopting threshold values. The thesis also aims to discuss, analyze, and apply two deep learning models for classifying Arabic sentiments. The models are Long-short term memory (LSTM) and bidirectional Long-short term memory (Bi-LSTM). Several experiments are applied and compared for classifying sentiments of two chosen dataset called Arabic sentiment tweets dataset (ASTD) and Arabic Twitter dataset (abbreviated as ArTwitter dataset). The experiments study and monitor the effect of the deep hyper-parameters on the performance. The hyper-parameters are: the member of epochs, batch size, number of hidden layers, number of neurons, vector size, learning rate, and others. from the experimental results, it was noticed that precision, recall, and F-measure were always better for the Bi-LSTM than those of the LSTM. The learning time and prediction time; on the other hand; for LSTM were less than those values of the Bi-LSTM. Finally, the size, nature, and characterization of the chosen datasets or testbeds play an important role on the performance of both machine learning and deep learning models. |