Search In this Thesis
   Search In this Thesis  
العنوان
Enhancing mapping of diseases using GIS and data mining techniques /
المؤلف
Naguib, Ahmad Mohamad.
هيئة الاعداد
باحث / Ahmad Mohamad Naguib
مشرف / Karam Abd-Elghani Gouda
مشرف / Mustafa El-Sayed Abdul Salam
مناقش / Wael abdelkader awad
مناقش / Islam ahmed sayed
الموضوع
Medical geography Maps Data processing. Communicable diseases Maps Data processing. Artificial intelligence. Computer graphics.
تاريخ النشر
2023.
عدد الصفحات
103 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Information Systems
تاريخ الإجازة
16/7/2023
مكان الإجازة
جامعة بنها - كلية الحاسبات والمعلومات - نظم المعلومات
الفهرس
Only 14 pages are availabe for public view

from 103

from 103

Abstract

Diseases mapping enhancement has become one of the most interesting topics for researchers especially in the last few years due to COVID-19 pandemic. Data mining and machine learning techniques play a significant role for clustering hotspots regions for public safety. Many clustering techniques have been used by researchers to get a significant result for hotspot clustering. In our study we compare between four clustering algorithms K-Means, hierarchal clustering(HC), Fuzzy c-mean (FCM), and Density-based clustering algorithm (DBSCAN).
According to this comparison DBSCAN shows more reasonable results compared to other clustering algorithms for detecting hotspots. DBSCAN groups more density data and considers the outliers as noise data, but DBSCAN has two main limitations. first limitation, that its two parameters EPS and MinPts require to be selected manually, but this parameters selection can be difficult thus leading to poor clustering quality for datasets with higher dimensionality and larger data volume. second limitation is the inability to identify clusters with variable density distributions and partially overlapping bounders, which is often a characteristic of both scientific data and real-world data.
To handle these two limitations, we propose a hybrid model named fuzzy bat optimized DBSCAN(FBO-DBSCAN), this model combine bat swarm optimization algorithm and fuzzy set theory, the proposed model architecture is separated into two phases, first is optimization, and second is fuzzification.
v
In first phase we use bat optimization to determine the two main parameters of DBSCAN (MinPts and EPS), the outcome result is evaluated by silhouette coefficient to select the most optimized parameters.
In second phase, the original two parameters is replaced by new four parameters (Minptsmin , Minptsmax , EPSmin, EPSmax), this allow to regenerate core and border points with membership degree, and generate cluster with overlapping boundaries, and each cluster contain one or more core point with membership degree that defines the weight for each cluster.
We use DS4C dataset for the COVID-19 infected cases in south Kore , this dataset is taken from the Korea Centre for Disease Control and prevention’s official repository (KCDC) from data testing for 244 patient’s, the advantage of this dataset is the location was determined precisely with its geolocation (Long., Lat.) .
The study results presented in two main sections, first, we approve that DBSCAN is the suitable algorithm with highest performance for clustering and detecting hotspots, we compare DBSCAN with K-Means, hierarchal and FCM. and DBSCAN shows the highest silhouette score with 0.68.
Second, we test FBO-DBSCAN model with four evaluation phases. first we test the model on label datasets with known cluster numbers to check the accuracy of FBO-DBSCAN, according to the first evaluation our model detects the actual cluster number with 100% accuracy, the second evaluation checks the cluster quality using dataset without noisy data, and the third evaluation checks cluster quality using
vi
datasets with noisy data, in second and third evaluation we use a silhouette coefficient score as evaluation method for cluster quality, and our model shows the highest silhouette score with 0.88 as a mean value for datasets without noisy data and with 0.75 as a mean value for datasets with noisy data . Fourth evaluation is checking the fuzzy performance index (FPI) for FBO-DBSCAN by comparing it with FCM, and our model shows a better result with 0.92.