الفهرس | Only 14 pages are availabe for public view |
Abstract Outlier detection algorithms settle down the throne of data mining field. Several applications rely on outlier detection such as intrusion detection,fraud detection, medical and public health data,image processing, etc.Clustering-based outlier detection methods are considered as the most challenging outlier detection approaches.Clustering methods are not developed originally for outlier detection; nevertheless, they can be optimized to do so. These algorithms can perform well if they are clustering outlier free datasets and the algorithms that are immune to outliers have expensive calculations. Many clustering-based outlier detection approaches are developed to detect outliers; however, they suffer from high and increasing false positive rate even with high detection rate. In this thesis, we propose a hybrid clustering-based outlier detection algorithm based on a modified K-Medoids clustering and density measures. This algorithm avoids the repeated distance calculations and minimizes the outlierness factor calculations. It supports searching for outliers not only in small clusters but also in large clusters with reduced calculation methodology. The experimental results demonstrate the good performance of the algorithm in terms of detection sensitivity by increasing the detection rate, decreasing the false positive rate till reaches a non-increasing saturation point and minimizing outlierness factor calculations.Most outlier detection algorithms developed till now, are One At-ATime algorithms that run from the beginning each time that makes them infeasible for real-time applications. An on-the-fly clustering based outlier detection framework, called OFCOD, is also proposed to enable analysts to effectively find out outliers in-time with request even within huge datasets.The experimental results of this framework showed its effectiveness in detecting the new outliers without reperforming the clustering process. This enables one to use this framework in real-time applications. |