Search In this Thesis
   Search In this Thesis  
العنوان
Opinion Mining for Arabic Text /
المؤلف
Barmeel, Ibrahim Rouby Sayed.
هيئة الاعداد
باحث / ابراهيم روبى سيد برمي ل
مشرف / محمد نور السيد أحمد
مشرف / محمد بدوى محمد بدوى
الموضوع
Artificial intelligence. Data mining. Optical data processing. Application software.
تاريخ النشر
2023.
عدد الصفحات
116 p. :
اللغة
الإنجليزية
الدرجة
ماجستير الهندسة
التخصص
الهندسة
الناشر
تاريخ الإجازة
23/8/2023
مكان الإجازة
جامعة المنوفية - كلية الهندسة الإلكترونية - هندسة وعلوم الحاسبات
الفهرس
Only 14 pages are availabe for public view

from 134

from 134

Abstract

Due to the rapid growth and recent advances of information technology, sentiment
analysis (SA) become an important area of research. Social media like Facebook and
Twitter provide a lot of comments, posts, and users` opinions about different themes
like business products, customer services, users` reviews, … etc. Analysis and monitoring
of such comments and posts are very important to improve the product functions,
improve a customer service and extract useful information.
This thesis adopts a sentiment analysis (SA) model to analyze the Arabic users`
comments collected from the Facebook. Some preprocessing operations are done such
as tokenization, noise rejections, normalization and stemming. The negation words,
intensifiers words, and emotions are discussed and handled as they can improve the
performance of SA. Precision, recall, and error rate were improved by approximately
20.2%, 18.8%, and 8.2% respectively when applying intensifiers, negations and emotions
on a chosen dataset. The accuracy values of predicting the positive and negative
comments were affected by users` writing style. The best values were for those
comments written in modern standard Arabic (MSA), then those comments written in
mixed Arabic, while the worst values were for those comments written in informal
Arabic.
The thesis also presents a study to classify the sentiment using some supervised machine
learning approaches. The approaches are Naïve Bayes (NB), Stochastic Gradient Decent
(SGD), Logistic Regression (LR), and Support Vector Machine (SVM). The approaches
classified the sentiment and/or the users` opinions for the “Restaurant” dataset
containing about 8000 reviews which was taken from the Facebook. The performance
metrics such as precision, recall, and F-measure were the best for the SVM classifier and
the worst values on the other hand were for the LR. The values of the performance
metrics for NB and SGD were between those values of SVM and LR.
Moreover, different methods of features selections are discussed, applied, and
compared. The methods are based on term-weighting (such as uni-gram, bigram, trigram, and their combination), correlation between the individual features and target class, ꭓ2, and mutual information. from the experimental research it was noticed that
the performance was the best for the feature selection method based on uni-gram term
weighting. The term weighting based on combing uni-gram and bi-gram presented good
results which were approximately close to that based on uni-gram only. The feature
selection method based on correlation between the chosen features and target
presented also promising performance similar to that based on uni-gram term
weighting. It is important to mention that using small member of features or large
member of features presented bad results. The performance was promising when using
the most significant features which were chosen based on adopting threshold values.
The thesis also aims to discuss, analyze, and apply two deep learning models for
classifying Arabic sentiments. The models are Long-short term memory (LSTM) and
bidirectional Long-short term memory (Bi-LSTM). Several experiments are applied and
compared for classifying sentiments of two chosen dataset called Arabic sentiment
tweets dataset (ASTD) and Arabic Twitter dataset (abbreviated as ArTwitter dataset).
The experiments study and monitor the effect of the deep hyper-parameters on the
performance. The hyper-parameters are: the member of epochs, batch size, number of
hidden layers, number of neurons, vector size, learning rate, and others. from the
experimental results, it was noticed that precision, recall, and F-measure were always
better for the Bi-LSTM than those of the LSTM. The learning time and prediction time;
on the other hand; for LSTM were less than those values of the Bi-LSTM.
Finally, the size, nature, and characterization of the chosen datasets or testbeds play an
important role on the performance of both machine learning and deep learning models.