Authors
-
David Aphkhazava
PhD, Professor, University of Georgia, Tbilisi, Georgia. Orcid: https://orcid.org/0000- 0001- 6216-64
-
Levan Gulua
PhD, Professor, Head of bachelor program of Biomedicine at University of Georgia, Tbilisi, Georgia
-
Mzia Tsiklauri
PhD, Affiliated Professor of the Medical Programs of Gr.Robakidze University, Microbiology, Immunology, Virology, Infection Control. Invited Professor of the Medical Programs of Alte University, Tbilisi, Georgia. Invited Professor of the Medical Programs of Caucasus International University, Laboratory Medicine, Tbilisi, Georgia. Member of the Georgian Immunologists Association, Member of the Accreditation Council of the Quality Development, Center of the Ministry of Education of Georgia
-
Manana Makharadze
Prof. David Agmashenebeli University of Georgia, Tbilisi, Georgia. Maia Berodze Assistant Professor at Caucasus International University, Tbilis, Georgia
-
Nodar Sulashvili
MD, PhD, Doctor of Pharmaceutical and Pharmacological Sciences In Medicine, Invited Lecturer (Professor) of Scientific Research-Skills Center at Tbilisi State Medical University; Professor of Medical and Clinical Pharmacology of International School of Medicine at Alte University; Professor of Pharmacology of Faculty of Medicine at Georgian National University SEU, Associate Affiliated Professor of Medical Pharmacology of Faculty of Medicine at Sulkhan-Saba Orbeliani University; Associate Professor of Medical Pharmacology at School of Medicine at David Aghmashenebeli University of Georgia; Associate Professor of Biochemistry and Pharmacology Direction of School of Health Sciences at the University of Georgia. Associate Professor of Pharmacology of Faculty of «Research Retrieval and Academic Letters» (April 2-3, 2026). Warsaw, Poland 427 Dentistry and Pharmacy at Tbilisi Humanitarian Teaching University; Tbilisi, Georgia; Orcid: https://orcid.org/0000-0002-9005-8577.
-
Cezar Goletiani
Professor at Free University of Tbilisi, Tbilisi, Georgia, Head scientist at Agricultural University of Georgia, Tbilisi, Georgia
-
Nino Nebieridze
Associate Professor at Free University of Tbilisi, Tbilisi, Georgia
-
Ketevan Chakhnashvili
Clinical Director at Pineo Medical Ecosystem. Vice Dean of School of Medicine at Grigol Robakidze University. Tbilisi, Georgia
-
Lolita Shengelia
PhD, Invited lecturer of Georgian National University, Tbilisi, Georgia; Invited lecturer of Georgian American University, Tbilisi, Georgia
-
Mohd Amaan Khan
University of Georgia, Tbilisi, Georgia
-
Ayushi Hanumant Datir
Alte University, Tbnilisi, Georgia
-
George Maglakelidze
PhD, Professor, University of Georgia, Tbilisi, Georgia
-
Ilia Atanelishvili
Medical University of South Carolina, Charleston, SC, USA
Keywords:
single-cell RNA sequencing, rare cell type detection, deep learning, variational autoencoder, attention mechanism, density-aware clustering, scalable bioinformatics, big data genomics
Abstract
Background: Single-cell RNA sequencing (scRNA-seq) has transformed the study of cellular heterogeneity, enabling high-resolution dissection of complex tissues (Tang et al., 2009; Macosko et al., 2015; Klein et al., 2015). However, the rapid expansion of scRNA-seq datasets, now routinely exceeding millions of cells, has revealed major limitations in existing analytical pipelines (Regev et al., 2017; Tabula Sapiens Consortium, 2022). Conventional clustering methods recover abundant populations effectively but consistently fail to detect rare cell types (typically <1% of total cells), despite their disproportionate biological and clinical importance in oncogenesis, immune surveillance, and developmental transitions (Grün et al., 2015; Jiang et al., 2016; Tirosh et al., 2016).
Methods: We introduce DeepRareSC, a scalable deep learning framework that integrates three complementary components: (i) a denoising variational autoencoder that learns a compact, noise-robust latent representation of sparse expression matrices (Lopez et al., 2018; Eraslan et al., 2019); (ii) a multi-head self-attention module that adaptively reweights informative genes and suppresses dropout artifacts (Vaswani et al., 2017); and (iii) a density-aware outlier detection algorithm operating in the learned latent space to identify low-frequency populations missed by global clustering (Jindal et al., 2018; Dong & Yuan, 2020). The pipeline employs mini-batch training, GPU-accelerated graph construction, and distributed data loading based on Hierarchical Navigable Small World indexing (Malkov & Yashunin, 2020) to ensure linear scalability.
Results: Across eight benchmark scRNA-seq datasets and two atlas-scale corpora, DeepRareSC achieved a 23–41% improvement in rare-cell recovery (F1-score) over state-of-the-art baselines (Stuart et al., 2019; Wolf et al., 2018; Tian et al., 2019; Dong & Yuan, 2020), while maintaining competitive runtime and memory efficiency. The framework recovered known rare populations and uncovered previously uncharacterized transitional states in human bone marrow and tumor microenvironment datasets.
Conclusions: DeepRareSC offers a scalable, interpretable, and biologically meaningful approach for rare cell discovery, with direct relevance to precision medicine, drug-target identification, and translational research.
How to Cite
David Aphkhazava, Levan Gulua, Mzia Tsiklauri, Manana Makharadze, Nodar Sulashvili, Cezar Goletiani, Nino Nebieridze, Ketevan Chakhnashvili, Lolita Shengelia, Mohd Amaan Khan, Ayushi Hanumant Datir, George Maglakelidze, & Ilia Atanelishvili. (2026). Scalable Deep Learning for Single-Cell RNA-seq Big Data: A Hybrid Autoencoder–Attention Framework with Density-Aware Outlier Detection for Rare Cell Type Discovery. Modern Scientific Method, (13). Retrieved from https://ojs.publisher.agency/index.php/MSM/article/view/8540
Section
Biological Sciences