Search In this Thesis
   Search In this Thesis  
العنوان
An efficient technique for documents analysis using data mining /
المؤلف
Khater, Seham Hemdan Hamed.
هيئة الاعداد
باحث / سهام حمدان حامد خاطر
مشرف / حازم البكري
مشرف / هيثم عبدالمنعم الغريب
مناقش / هشام عرفات
الموضوع
Documents analysis.
تاريخ النشر
2023.
عدد الصفحات
106 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Artificial Intelligence
تاريخ الإجازة
1/1/2023
مكان الإجازة
جامعة المنصورة - كلية الحاسبات والمعلومات - قسم نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 104

from 104

Abstract

In recent years, Artificial intelligence (AI) has been commonly embraced, with machine learning systems have demonstrated superhuman capability in a variety of tasks, however, greater model complexity is often used to achieve this increase in performance, such systems are regarded as ”black box” approaches, which raises questions about how they work and, ultimately, how they make judgments. Because of this uncertainty and ambiguity, machine learning systems have had difficulty being accepted in a variety of fields like the medical field, As a result, the field of Explainable Artificial Intelligence has sparked scientific attention (XAI), a field that produces more explainable models while maintaining high learning performance, our problem is there is no transparent decision support system, During these difficult times, many have turned to social media to communicate their concerns, ideas, and perspectives on the global pandemic. COVID19 is becoming a source of despair, stress, and anxiety as a result of negative feelings on social media. To avoid a flood of negative feelings spreading during times of crisis, agencies should provide a transparent decision support system that can classify negative words and support its decision with the reasons behind the decision, To offer a transparent decision support system able of classifying tweets’ sentiment into positive, neutral, and negative sentiment and explains the prediction result by XAI techniques Methods, the proposed framework is run by using two datasets ,Covid-19 tweets dataset and Stanford 140 dataset .We started by data preprocessing phase. For data representation, we used TF-IDF, and we applied four machine-learning algorithms including Naive Bayes, random forest, logistic regression, and support vector machine, as well as four deep learning RNN, LSTM, GRU, and Bi-directional RNN. To raise model trust, we used LIME and SHAP to improve model explainability. The empirical findings show that the Logistic Regression model and SVM model using the TFIDF feature extraction approach have the best performance when compared to the other models, with an average accuracy of 84% and 86% respectively. The data balancing step pushed the accuracy of the Random Forest model from 47% to 73%, other models slightly changed. The performance of the SVM decreases in dataset2 as the data amount increases, but the naïve Bayes model works well with large amounts of data. The performance of deep learning models was better than traditional machine learning models, LSTM and GRU achieve approximately 78% for dataset 1, and Bi-directional RNN achieve 79% for dataset 2. We propose a highly accurate approach for sentiment analysis. Also, to increase trust in model prediction, we explain the predicted sentiment. The potential issues of the proposed approach: The preprocessing helps to increase accuracy because good data leads to good performances, also balancing data for machine learning models to get not biased accuracy, Adjusting the hyperparameters’ values, such as the number of layers, number of the epoch, and so on, takes a substantial amount of time and effort because it is a trial-and-error experiment.