Author: Mohammed, Marwa Khairy./ Title: A Study of Content-Based Spam Filtering Techniques /

Search In this Thesis

العنوان

A Study of Content-Based Spam Filtering Techniques /

المؤلف

Mohammed, Marwa Khairy.

هيئة الاعداد

باحث / مروه خيرى محمد

مشرف / طارق مصطفى محمود

مشرف / علاء اسماعيل النشار

مشرف / طارق عبد الحفيظ عبد الرحمن

الموضوع

Digital filters (Mathematics). Computer science.

تاريخ النشر

2013.

عدد الصفحات

107 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

إدارة التكنولوجيا والابتكار

تاريخ الإجازة

1/1/2013

مكان الإجازة

جامعة المنيا - كلية العلوم - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

121

from

121

Abstract

The Internet has almost become an indispensible part of everyday life. Internet usage will continue to grow and therefore online communication and information exchange is gaining immense popularity, affecting users’ social and commercial lives. Along with the growth of e-mail and social networking websites, there has been an increased production of spam over the years. Spam is nothing but unsolicited messages.
The problem of spam (unwanted) emails is nowadays a serious issue. The average of email spam messages sent per day is 94 billion in 2012 representing more than 70% of all incoming email, it turns out they add up to a $20 billion cost to society , according to a new paper called “The Economics of spam,”. Spam causes misuse of traffic, storage space and computational power. It causes also legal problems by advertising for a variety of products and services, such as pharmaceuticals, jewellery, electronics, software, loans, stocks, gambling, weight loss, and pornography, as well as phishing (identity theft) and malware distribution attempts.
Many spam-filtering techniques based on supervised machine learning algorithms have been proposed to automatically classify email messages as spam or legitimate (ham). Naive Bayesian classifier is one of the most popular learning algorithms that give promising results in separating spam from legitimate mail.
Artificial Immune System (AIS) is an area of research that bridges the disciplines of immunology, computer science and engineering. The immune system has drawn significant attention as a potential source of inspiration for novel approaches to solving complex computational problems. Many AIS models have been proposed to solve these problems. Some prevalent ones are Clonal selection and Negative selection algorithms.
The main objective of this thesis is building an efficient content-based system for filtering email spam messages. The proposed Filtering System consists of four phases: Training phase, Classification phase, Optimization phase and Testing phase. In the Training phase, we use 2500 spam messages and 2500 non-spam messages to train the system. In the Classification phase, we use the Bayesian, Clonal selection and Negative selection algorithms to classify the email messages. In the Optimization phase, we try to improve the performance of the system via combine the three considered algorithms. In the Testing phase, we randomly choose dataset consists of 10000 messages from the TREC 2007.