Search In this Thesis
   Search In this Thesis  
A Comparative Study of Binary Qualitative Response Models Vs. Machine Learning Models :
Beram, Reham Salah Ahmed Moahmmed.
هيئة الاعداد
باحث / ريهام صلاح أحمد محمد بيرم
مشرف / مصطفي عبد المنعم محمد الخواجة
مشرف / أحمد محمد القطوري
مناقش / مصطفي عبد المنعم الخواجة
Logistic Regression. Probit Regression. Discriminant Analysis. Machine learning Algorithms.
تاريخ النشر
عدد الصفحات
.xx, 169 p :
الإحصاء والاحتمالات
تاريخ الإجازة
مكان الإجازة
جامعة الاسكندريه - كلية الاعمال - الاحصاء والرياضيات والتأمين
Only 14 pages are availabe for public view

from 203

from 203


This empirical-based study aims to compare the performance of statistical models (logistic regression, probit regression, and discriminant analysis) with machine learning algorithms (support vector machines, classification and regression tress, and k-nearest neighbors) in order to provide a comprehensive understanding of their suitability for classification tasks. The study is based on a simulation and application on a set of 3 reallife datasets, namely; the diabetes dataset, the credit card fraud dataset and the Egyptian Households Income, Expenditure, and Consumption Survey. The findings of this study have the potential to guide practitioners and researchers in selecting the most appropriate modeling technique for their specific needs, ultimately enhancing the accuracy and reliability of classification outcomes across various domains. The results of the simulation as well as the real data comparisons revealed that the two statistical models -probit and logit- outperformed in the majority of the simulation scenarios and real data. Markedly, the wellgrounded, theory-based models of the logit regression as well as the probit regression models yielded the most accurate predictions in 78.5% and 83.6% of the whole set of simulated scenarios, respectively. Interestingly, the performance of the probit model was the best when the binary response variable was balanced with 50% in one group and the other 50% in the other group and when it was too imbalanced with 10% in the positive class and the remaining 90% in the negative one. In the remaining group of scenarios, the logit model outperformed. Further, the SVM algorithm exhibited the best discrimination performance under the most imbalanced dependent variable case (0.9 cutoff) with 5 regressors and 1,000 sample size as well as the 10 regressors with 10,000 sample size. Thus, when the dimensionality of the data increased with large sample size, the machine learning classifier was superior. Besides, the KNN classifier was almost always better than the CART in the simulated scenarios.