SO SÁNH HIỆU QUẢ CÁC QUY TRÌNH HỌC MÁY VÀ HỌC SÂU TRONG DỰ ĐOÁN NGUY CƠ ĐỘT QUỴ

Thị Ngọc Nguyên Nguyễn; Tấn Đạt Nguyễn

doi:10.51298/vmj.v558i1.16976

pdf

Date Published: 28/01/2026

Abstract Views: 96
Views pdf: 69

DOI: https://doi.org/10.51298/vmj.v558i1.16976

Issue

Vol. 558 No. 1 (2026)

Section

Các bài báo

How to Cite

Nguyễn, T. N. N., & Nguyễn, T. Đạt. (2026). COMPARATIVE ANALYSIS OF MACHINE LEARNING AND DEEP LEARNING PIPELINES FOR STROKE RISK PREDICTION. Vietnam Medical Journal, 558(1). https://doi.org/10.51298/vmj.v558i1.16976

COMPARATIVE ANALYSIS OF MACHINE LEARNING AND DEEP LEARNING PIPELINES FOR STROKE RISK PREDICTION

Thị Ngọc Nguyên Nguyễn, Tấn Đạt Nguyễn

Abstract

Objective: This study compares traditional machine learning (ML) pipelines versus modern deep learning (DL) models to identify the optimal approach for stroke risk prediction on tabular data. Subjects and methods: A dual-stream comparative analysis was conducted on 6,387 individuals (including public data and 25% additional private data). The ML stream optimized 5 algorithms (XGBoost, LightGBM, CatBoost, Random Forest) with sampling and feature selection. The DL stream trained specialized models (TabNet, FT-Transformer, ResNet). Performance was evaluated using 5-fold cross-validation. Results: The pipeline combining LightGBM, RandomOverSampler, and Mutual Information feature selection achieved the best performance: Accuracy 95.2%, Macro F1 70.2%. T-test analysis indicated significant superiority over baseline and DL models (p = 0.0403). Conclusion: For medium-sized tabular data, optimized ML models outperform DL approaches. Data augmentation and class imbalance handling are key for prediction improvement.

Keywords

Stroke, Machine Learning, Deep Learning, LightGBM, TabNet.

References

1. Arik S O, Pfister T. TabNet: Attentive Interpretable Tabular Learning. AAAI Conference on Artificial Intelligence2021; 35(8): 6679-6687.
2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 2002; 16: 321-357.
3. Dritsas E, Trigka M. Stroke Risk Prediction with Machine Learning Techniques. Sensors 2022; 22(15): 4670.
4. Fedesoriano. Stroke Prediction Dataset. Kaggle. Available from: https://www.kaggle.com/ datasets/fedesoriano/stroke-prediction-dataset.
5. Feigin VL, et al. Global burden of stroke and risk factors in 188 countries, 1990–2022. The Lancet Neurology 2024; 23(4): 345-356.
6. Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 2017; 30: 3146-3154.

Article Sidebar

Main Article Content

Abstract

Article Details

Keywords

References