Reproducibility Crisis in Cardiovascular Machine Learning: Identification and Correction of Systematic Data Pipeline Errors in Angina Pectoris Recognition
Keywords:
angina pectoris, machine learning, data leakage, ICD-10, reproducibility, MIMIC-IV, cohort contamination, cardiovascular AIAbstract
The reproducibility of machine learning (ML) models in cardiovascular medicine is frequently undermined by systematic data pipeline errors that are neither reported nor corrected in published work. This study identifies three critical methodological errors in an ML pipeline for angina pectoris recognition from the MIMIC-IV electronic health record database, quantifies their individual and cumulative impact on reported performance metrics, and presents corrected implementations. The errors are: (1) incorrect ICD-10 diagnostic code prefix selection, which excluded the majority of valid angina subtypes from the positive cohort; (2) population-level median imputation applied before train-test splitting, constituting data leakage; and (3) inclusion of patients with heart failure, myocardial infarction, and arrhythmias in the negative control cohort, creating a contaminated and artificially easy classification task. Correcting these three errors increased the cohort from 7,800 to 58,486 admissions while reducing the reported ROC-AUC from 0.821 to 0.797 — a reduction that reflects the elimination of systematic bias rather than deterioration of model quality. Importantly, Recall improved from 0.760 to 0.818, demonstrating genuine clinical improvement in the most critical metric for medical screening. These findings highlight the need for rigorous methodological standards in cardiovascular AI research
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.