COMPARATIVE ANALYSIS OF MACHINE LEARNING AND DEEP LEARNING PIPELINES FOR STROKE RISK PREDICTION
Main Article Content
Abstract
Objective: This study compares traditional machine learning (ML) pipelines versus modern deep learning (DL) models to identify the optimal approach for stroke risk prediction on tabular data. Subjects and methods: A dual-stream comparative analysis was conducted on 6,387 individuals (including public data and 25% additional private data). The ML stream optimized 5 algorithms (XGBoost, LightGBM, CatBoost, Random Forest) with sampling and feature selection. The DL stream trained specialized models (TabNet, FT-Transformer, ResNet). Performance was evaluated using 5-fold cross-validation. Results: The pipeline combining LightGBM, RandomOverSampler, and Mutual Information feature selection achieved the best performance: Accuracy 95.2%, Macro F1 70.2%. T-test analysis indicated significant superiority over baseline and DL models (p = 0.0403). Conclusion: For medium-sized tabular data, optimized ML models outperform DL approaches. Data augmentation and class imbalance handling are key for prediction improvement.
Article Details
Keywords
Stroke, Machine Learning, Deep Learning, LightGBM, TabNet.
References
2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 2002; 16: 321-357.
3. Dritsas E, Trigka M. Stroke Risk Prediction with Machine Learning Techniques. Sensors 2022; 22(15): 4670.
4. Fedesoriano. Stroke Prediction Dataset. Kaggle. Available from: https://www.kaggle.com/ datasets/fedesoriano/stroke-prediction-dataset.
5. Feigin VL, et al. Global burden of stroke and risk factors in 188 countries, 1990–2022. The Lancet Neurology 2024; 23(4): 345-356.
6. Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 2017; 30: 3146-3154.