Comparative Study of Low and High Sparsity of a Sparse Principal Component Analysis Model on Some Health Indices
DOI:
https://doi.org/10.63561/jmns.v3i1.1212Keywords:
Principal Component Analysis, Prevalence rate of the infections, Statistical Technique, Health, SparsityAbstract
This research investigates comparative study of low and high sparsity of a sparse principal component analysis model on some health indices which define the prevalence rate of some common diseases in Nigeria. Two cases of multivariate data sets were considered. The Kaiser-Meyer Olin (KMO) and Bartlett’s tests were used to test the adequacy of the two cases of multivariate data sets to determine if they are fit for principal component analysis. The Pearson correlation coefficient was used to determine the relationship between the multiple dimensional data set. Then, investigate the impact of the sparse principal component analysis (SPCA) at the levels of sparse: (i.e. low sparsity at 15% and high sparsity at 75%). Some diseases considered are Malaria, TB, Diabetes, Diarrhea, Anemia, Overweight, HIV and Stunting Growth etc. The main objective was to explore the potential of this sparse model in identifying the key variable and then compare the structure and pattern of the sparsity levels, using SPCA. In the analysis, it was observed that the Kaiser-Meyer Olin (KMO) and Bartlett’s tests result decreases as the percentage sparse increases. The KMO and Bartlett’s test results are 0.561 (249.655) for 15%, and 0.465 (99.727) for 75%. It indicated that 15% sparse in the data will not impact the accuracy and robustness of statistical analyses, which will sufficiently account for the variation in the data. These research, reveal the impact of the levels based on adequacy, relationship between the values, structures and patterns of the variables at low sparse of 15% and high sparse of 75%. Furthermore, this research revealed that the SPCA results at different levels of the sparse; uncovering that the high sparse, has less adequacy on the data set analysis. This study was able to identify a smaller subset of the variables (diseases) and then identified that 15% sparse in the data sets will not impact the accuracy and robustness of statistical analyses, it will sufficiently account for the variance (variation) in the data.
References
Ama, S. (2021). Sparse principal component analysis for dimension reduction in high-dimensional genomic data. BMC Bioinformatics, 22(1), 1-16.
Chun-Mei, L., Yu-Cheng, L., & Tien-Yu, L. (2016). Improved principal component analysis using sparse representation for hyperspectral image classification.
Brown, R. A. (2009). Dimensionality reduction of data. Wiley Interdisciplinary Reviews: Computational Statistics, 1(3), 261–269. https://doi.org/10.1002/wics.14
Brusch, J. L. (2009). Typhoid Fever. In StatPearls [Internet]. StatPearls Publishing.
Buzby, J. C., Roberts, T., & MacDonald, J. M. (2001). Bacterial foodborne disease: Medical costs and productivity losses. Agricultural Economics Reports, 794.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10
Jolliffe, I. T., & Cadima, J. (2016). Principal component Analysis: A review and recent developments. philosophical transactions of the royal society. A Mathematical, Physical and Engineering Science, 374, Article 20150202. https://doi.org/10.1098/rsta.2015.020
MathWorks, (2021) 2021b MATLAB and Simulink. Eastern DayLight Time; Sep 28, 2021. www.businesswire.com
MathWorks, (2021) 2021a MATLAB and Simulink. Microwave Journal; March 16, 2021. www.businesswire.com
WHO. (2022). Tuberculosis. Retrieved from [insert URL]
World Health Organization. (2017). Global health estimates 2020: Deaths by cause, age, sex, by country and by region, 2000-2019. Retrieved from [insert URL]
World Health Organization. (2019). Malaria. Retrieved from https://databank.worldbank.org/databases
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286. https://doi.org/10.1198/106186006X113430