Search In this Thesis
   Search In this Thesis  
العنوان
Clustering with missing data under different mechanisms: a mathematical programming approach :
الناشر
Hend Galal Elsayed Gaber ,
المؤلف
Hend Galal Elsayed Gaber
هيئة الاعداد
باحث / Hend Galal Elsayed Gaber
مشرف / Ahmed Mahmoud Gad
مشرف / Mahmoud Mostafa Rashwan
مشرف / Ahmed Mahmoud Gad
تاريخ النشر
2019
عدد الصفحات
66 P. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الإحصاء والاحتمالات
تاريخ الإجازة
21/8/2019
مكان الإجازة
جامعة القاهرة - كلية اقتصاد و علوم سياسية - Statistics
الفهرس
Only 14 pages are availabe for public view

from 103

from 103

Abstract

Cluster analysis is a convenient method for identifying homogenous groups of observations called clusters. The goal of clustering is to discover a natural grouping in a set of points or objects without knowledge of any class labels. In real applications, data sets suffer from the existence of missing data and outliers. So, the clustering problem becomes more challenging in the presence of such problems. The presence of missing data in a data set can affect the results and there is difficulty in obtaining accurate estimates. When missing data are ignored the results are biased, unrealistic and insignificant. In real-life situations, there are many reasons that lead to the presence of missing values. For example, a respondent in a household survey may refuse to report income. Sometimes missing values are caused by the researcher{u2014}for example, when data collection is done improperly, mistakes are made in data entry, or poorly designed questionnaire. Supervised by Prof. Ahmed Mahmoud GadDr. Mahmoud Mostafa Rashwan Professor of StatisticsAssistant Professor of Statistics Department of StatisticsDepartment of Statistics Faculty of Economics and Political ScienceFaculty of Economics and Political Science This thesis treats the problem of clustering with different mechanisms for handling missing data. Firstly, the thesis addresses the definition of cluster analysis; covers procedures for handling missing data in general; reviews the literature of treating missing values in cluster analysis. Finally, the thesis presents a new approach that depends on formulating a mathematical programming model for cluster analysis in case of existence of missing data without the need of pre-processing the data