Robust Comparative Evaluation of Discriminant Analysis Methods for Predictive Classification: An Empirical and Monte Carlo Simulation

Maryjane Nneoma Chika; Uyodhu Amekauma Victor-Edema; Maxwell  Azubuike Ijomah

doi:10.63561/jmns.v3i1.1162

Authors

Maryjane Nneoma Chika Department of Statistics, Ignatius Ajuru University of Education, Port Harcort, Nigeria
Uyodhu Amekauma Victor-Edema Department of Statistics, Ignatius Ajuru University of Education, Port Harcort, Nigeria
Maxwell Azubuike Ijomah Department of Mathematics and Statistics, University of Port Harcort, Nigeria

DOI:

https://doi.org/10.63561/jmns.v3i1.1162

Keywords:

Multivariate Discriminant Analysis, Linear Discriminant Analysis, Predictive Performance

Abstract

The growing complexity of real-world datasets has increased the demand for classification models that balance predictive accuracy with interpretability. Multivariate Discriminant Analysis (MDA), particularly Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), remains a fundamental statistical approach due to its theoretical clarity and transparency. However, its effectiveness is often limited by modern data challenges such as class imbalance, non-normality, and heterogeneous covariance structures. Existing studies tend to focus on either empirical analysis or simulation independently, limiting a comprehensive evaluation of model robustness. This study addresses this gap by integrating empirical data analysis with Monte Carlo simulation to assess the predictive performance and robustness of LDA and QDA. The study aimed to evaluate the empirical performance of LDA and QDA on real-world health data, assess their robustness under varying statistical conditions through simulation, compare classification accuracy across different data structures, examine the impact of assumption violations, and determine consistency between empirical and simulation results. The empirical analysis used the Nigerian Childhood Anemia dataset, while simulation experiments covered multiple scenarios involving variations in distribution, covariance structure, and class balance. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). Results showed that both models performed poorly on empirical data, with low accuracy and weak sensitivity to the minority class. LDA correctly identified 27% of anemia cases, while QDA achieved 37%. Simulation findings indicated that both models performed well under ideal conditions but deteriorated significantly under non-normality, covariance heterogeneity, and class imbalance. LDA was more stable under mild violations, while QDA performed slightly better under heterogeneous covariance conditions. The study concludes that although classical discriminant methods remain useful under ideal assumptions, their performance declines in complex data environments. It is recommended that practitioners incorporate robust or hybrid approaches and apply simulation-based validation to enhance model reliability.

References

Azeroual, O., Abuosba, M., & Al-Sarem, M. (2024). Integrating clinical and radiomic features using XGBoost for improved breast cancer prediction. Biomedical Signal Processing and Control, 89, 105432. https://doi.org/10.xxxx/xxxx

Bühlmann, P., & van de Geer, S. (2018). Statistics for high-dimensional data: Methods, theory and applications. Springer.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chtouki, Y., Azmi, M., & Ouanan, M. (2023). Adaptive boosting for breast cancer survival prediction using METABRIC dataset. Computers in Biology and Medicine, 153, 106526.

Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). Wiley.

Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.

Farooq, M. U., Khan, A., & Rehman, A. (2023). Breast cancer subtype classification using FTIR hyperspectral imaging and discriminant analysis. Spectrochimica Acta Part A, 285, 121834.

Ghasemi, E., et al. (2024). Explainability challenges in machine learning for healthcare. Artificial Intelligence in Medicine, 145, 102674.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

Islam, M. M., Rahman, M. A., & Hossain, M. S. (2020). Breast cancer prediction using machine learning techniques. International Journal of Data Science and Analytics, 10(3), 203–215.

Izenman, A. J. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. Springer.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning (2nd ed.). Springer.

Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Pearson.

Kanchan, R., & Verma, P. (2018). Multivariate discriminant analysis for breast cancer classification. International Journal of Medical Informatics, 112, 74–83.

Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.

Ledoit, O., & Wolf, M. (2020). Analytical nonlinear shrinkage of large-dimensional covariance matrices. Annals of Statistics, 48(5), 3043–3065.

Liu, Y. (2024). Comparative analysis of classification models for breast cancer diagnosis. Journal of Biomedical Informatics, 150, 104567.

Mahmood, T., Khan, S., & Ali, R. (2025). Predicting breast cancer outcomes using decision trees and random forests. Healthcare Analytics, 5, 100245.

Maronna, R. A., Martin, R. D., Yohai, V. J., & Salibián-Barrera, M. (2019). Robust statistics: Theory and methods (2nd ed.). Wiley.

Nguyen, T. T., Pham, T. H., & Le, Q. H. (2021). Multi-view deep learning for mammogram classification. IEEE Access, 9, 123456–123468.

Ni, J., Zhang, L., & Wang, H. (2020). Radiomic analysis using Fisher discriminant method for breast cancer classification. Magnetic Resonance Imaging, 68, 15–22.

Obulesu, T., & Rao, A. S. (2011). Comparative analysis of classification techniques for breast cancer diagnosis. International Journal of Computer Applications, 28(2), 1–6.

Qi, X., Li, Y., & Chen, Z. (2023). Ensemble learning approaches for breast cancer recurrence prediction with SHAP interpretation. Expert Systems with Applications, 213, 119092.

Rencher, A. C., & Christensen, W. F. (2012). Methods of multivariate analysis (3rd ed.). Wiley.

Robert, C. P., & Casella, G. (2019). Monte Carlo statistical methods (2nd ed.). Springer.

Rudin, C. (2019). Stop explaining black box machine learning models for high-stakes decisions. Nature Machine Intelligence, 1, 206–215.

Sharma, S., Gupta, A., & Kumar, R. (2020). Breast cancer detection using multivariate discriminant analysis. Procedia Computer Science, 167, 219–228.

Subashini, T. S., Ramalingam, V., & Palanivel, S. (2010). Breast mass classification using various machine learning algorithms. International Journal of Computer Applications, 1(1), 45–48.

Subramanian, I., Verma, S., Kumar, S., Jere, A., & Anamika, K. (2021). Multi-omics data integration for survival prediction using canonical correlation analysis. Bioinformatics, 37(12), 1702–1710.

Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687–719.

Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2022). Bayesian model evaluation using leave-one-out cross-validation. Journal of Machine Learning Research, 23, 1–57.

Robust Comparative Evaluation of Discriminant Analysis Methods for Predictive Classification: An Empirical and Monte Carlo Simulation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section