Search In this Thesis
   Search In this Thesis  
العنوان
Improving the detection of Arabic spam web pages /
الناشر
Mohammed Abdullah Saleh Ahmed ,
المؤلف
Mohammed Abdullah Saleh Ahmed
تاريخ النشر
2014
عدد الصفحات
71 Leaves :
الفهرس
Only 14 pages are availabe for public view

from 88

from 88

Abstract

Most People turn to search engines for searching about helpful information. However, spam web pages used to manipulate search engines results. Such pages cause multiple negative effects to the users and search engines. Few studies have been conducted in the field of Arabic Spam Web Pages. For detecting Arabic spam web pages, features play very important role . In this thesis research we proposed new set of features to improve the detection of the Arabic spam web pages. These Features include: Global Popular Keywords (GPK) features, character N-Gram Graph (CNGG)features and Sentence Level Frequent Words (SLFW) features. We denoted the new proposed set of features as Suggested Arabic Spam Web-pages features (SASW) in contrast to the state-of-art featured which denoted by Current Arabic Spam Web-pages features (CASW). We combined our features (SASW) with the state-of-art features (CASW) to get Current and Suggested Arabic Spam Web-pages features (CSASW) and then fed the resulting features (CSASW) into different classification algorithms include Ensemble Decision Tree with Bagging and Boosting ensemble methods, Decision Tree J48 , and Random Forest classifiers to achieve our results. In our results we achieved an F-measure of about 99.54% with the Random Forest classifier.