Scalable Deep Learning for Single-Cell RNA-seq Big Data: A Hybrid Autoencoder–Attention Framework with Density-Aware Outlier Detection for Rare Cell Type Discovery

Authors

  • David Aphkhazava PhD, Professor, University of Georgia, Tbilisi, Georgia. Orcid: https://orcid.org/0000- 0001- 6216-64
  • Levan Gulua PhD, Professor, Head of bachelor program of Biomedicine at University of Georgia, Tbilisi, Georgia
  • Mzia Tsiklauri PhD, Affiliated Professor of the Medical Programs of Gr.Robakidze University, Microbiology, Immunology, Virology, Infection Control. Invited Professor of the Medical Programs of Alte University, Tbilisi, Georgia. Invited Professor of the Medical Programs of Caucasus International University, Laboratory Medicine, Tbilisi, Georgia. Member of the Georgian Immunologists Association, Member of the Accreditation Council of the Quality Development, Center of the Ministry of Education of Georgia
  • Manana Makharadze Prof. David Agmashenebeli University of Georgia, Tbilisi, Georgia. Maia Berodze Assistant Professor at Caucasus International University, Tbilis, Georgia
  • Nodar Sulashvili MD, PhD, Doctor of Pharmaceutical and Pharmacological Sciences In Medicine, Invited Lecturer (Professor) of Scientific Research-Skills Center at Tbilisi State Medical University; Professor of Medical and Clinical Pharmacology of International School of Medicine at Alte University; Professor of Pharmacology of Faculty of Medicine at Georgian National University SEU, Associate Affiliated Professor of Medical Pharmacology of Faculty of Medicine at Sulkhan-Saba Orbeliani University; Associate Professor of Medical Pharmacology at School of Medicine at David Aghmashenebeli University of Georgia; Associate Professor of Biochemistry and Pharmacology Direction of School of Health Sciences at the University of Georgia. Associate Professor of Pharmacology of Faculty of «Research Retrieval and Academic Letters» (April 2-3, 2026). Warsaw, Poland 427 Dentistry and Pharmacy at Tbilisi Humanitarian Teaching University; Tbilisi, Georgia; Orcid: https://orcid.org/0000-0002-9005-8577.
  • Cezar Goletiani Professor at Free University of Tbilisi, Tbilisi, Georgia, Head scientist at Agricultural University of Georgia, Tbilisi, Georgia
  • Nino Nebieridze Associate Professor at Free University of Tbilisi, Tbilisi, Georgia
  • Ketevan Chakhnashvili Clinical Director at Pineo Medical Ecosystem. Vice Dean of School of Medicine at Grigol Robakidze University. Tbilisi, Georgia
  • Lolita Shengelia PhD, Invited lecturer of Georgian National University, Tbilisi, Georgia; Invited lecturer of Georgian American University, Tbilisi, Georgia
  • Mohd Amaan Khan University of Georgia, Tbilisi, Georgia
  • Ayushi Hanumant Datir Alte University, Tbnilisi, Georgia
  • George Maglakelidze PhD, Professor, University of Georgia, Tbilisi, Georgia
  • Ilia Atanelishvili Medical University of South Carolina, Charleston, SC, USA

Keywords:

single-cell RNA sequencing, rare cell type detection, deep learning, variational autoencoder, attention mechanism, density-aware clustering, scalable bioinformatics, big data genomics

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) has transformed the study of cellular heterogeneity, enabling high-resolution dissection of complex tissues (Tang et al., 2009; Macosko et al., 2015; Klein et al., 2015). However, the rapid expansion of scRNA-seq datasets, now routinely exceeding millions of cells, has revealed major limitations in existing analytical pipelines (Regev et al., 2017; Tabula Sapiens Consortium, 2022). Conventional clustering methods recover abundant populations effectively but consistently fail to detect rare cell types (typically <1% of total cells), despite their disproportionate biological and clinical importance in oncogenesis, immune surveillance, and developmental transitions (Grün et al., 2015; Jiang et al., 2016; Tirosh et al., 2016).

Methods: We introduce DeepRareSC, a scalable deep learning framework that integrates three complementary components: (i) a denoising variational autoencoder that learns a compact, noise-robust latent representation of sparse expression matrices (Lopez et al., 2018; Eraslan et al., 2019); (ii) a multi-head self-attention module that adaptively reweights informative genes and suppresses dropout artifacts (Vaswani et al., 2017); and (iii) a density-aware outlier detection algorithm operating in the learned latent space to identify low-frequency populations missed by global clustering (Jindal et al., 2018; Dong & Yuan, 2020). The pipeline employs mini-batch training, GPU-accelerated graph construction, and distributed data loading based on Hierarchical Navigable Small World indexing (Malkov & Yashunin, 2020) to ensure linear scalability.

Results: Across eight benchmark scRNA-seq datasets and two atlas-scale corpora, DeepRareSC achieved a 23–41% improvement in rare-cell recovery (F1-score) over state-of-the-art baselines (Stuart et al., 2019; Wolf et al., 2018; Tian et al., 2019; Dong & Yuan, 2020), while maintaining competitive runtime and memory efficiency. The framework recovered known rare populations and uncovered previously uncharacterized transitional states in human bone marrow and tumor microenvironment datasets.

Conclusions: DeepRareSC offers a scalable, interpretable, and biologically meaningful approach for rare cell discovery, with direct relevance to precision medicine, drug-target identification, and translational research.

Published

2026-05-04

How to Cite

David Aphkhazava, Levan Gulua, Mzia Tsiklauri, Manana Makharadze, Nodar Sulashvili, Cezar Goletiani, Nino Nebieridze, Ketevan Chakhnashvili, Lolita Shengelia, Mohd Amaan Khan, Ayushi Hanumant Datir, George Maglakelidze, & Ilia Atanelishvili. (2026). Scalable Deep Learning for Single-Cell RNA-seq Big Data: A Hybrid Autoencoder–Attention Framework with Density-Aware Outlier Detection for Rare Cell Type Discovery. Modern Scientific Method, (13). Retrieved from https://ojs.publisher.agency/index.php/MSM/article/view/8540

Issue

Section

Biological Sciences