COMPARATIVE ANALYSIS OF MACHINE LEARNING AND DEEP LEARNING PIPELINES FOR STROKE RISK PREDICTION

Thị Ngọc Nguyên Nguyễn, Tấn Đạt Nguyễn

Main Article Content

Abstract

Objective: This study compares traditional machine learning (ML) pipelines versus modern deep learning (DL) models to identify the optimal approach for stroke risk prediction on tabular data. Subjects and methods: A dual-stream comparative analysis was conducted on 6,387 individuals (including public data and 25% additional private data). The ML stream optimized 5 algorithms (XGBoost, LightGBM, CatBoost, Random Forest) with sampling and feature selection. The DL stream trained specialized models (TabNet, FT-Transformer, ResNet). Performance was evaluated using 5-fold cross-validation. Results: The pipeline combining LightGBM, RandomOverSampler, and Mutual Information feature selection achieved the best performance: Accuracy 95.2%, Macro F1 70.2%. T-test analysis indicated significant superiority over baseline and DL models (p = 0.0403). Conclusion: For medium-sized tabular data, optimized ML models outperform DL approaches. Data augmentation and class imbalance handling are key for prediction improvement. 

Article Details

References

1. Arik S O, Pfister T. TabNet: Attentive Interpretable Tabular Learning. AAAI Conference on Artificial Intelligence2021; 35(8): 6679-6687.
2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 2002; 16: 321-357.
3. Dritsas E, Trigka M. Stroke Risk Prediction with Machine Learning Techniques. Sensors 2022; 22(15): 4670.
4. Fedesoriano. Stroke Prediction Dataset. Kaggle. Available from: https://www.kaggle.com/ datasets/fedesoriano/stroke-prediction-dataset.
5. Feigin VL, et al. Global burden of stroke and risk factors in 188 countries, 1990–2022. The Lancet Neurology 2024; 23(4): 345-356.
6. Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 2017; 30: 3146-3154.