GRAPH AUTOENCODER-BASED CLUSTERING OF URBAN MOBILITY ZONES: SCALABILITY VALIDATION AND CROSS-CITY ANALYSIS ON CHICAGO AND NEW YORK CITY

Authors

  • Abylai Ayan Azamatuly Master's student, Applied Data Analytics, Astana IT University, Astana, Kazakhstan

Keywords:

graph autoencoder, GCN, urban mobility, NYC Yellow Taxi, Chicago, Top-K sparsification, stratified clustering, cross-city validation, OD matrix, UMAP, HDBSCAN

Abstract

This paper proposes and validates a Graph Autoencoder (GAE)-based pipeline for unsupervised clustering of urban mobility zones using Origin-Destination (OD) taxi data. The pipeline is evaluated across two US cities: Chicago (6.5M trips, 195 zones) and New York City (38.3M trips, 236 Taxi Zones) — a 14× difference in scale. For NYC, two methodological extensions are introduced: (1) Top-K graph sparsification to resolve GCN over-smoothing caused by the dense NYC OD graph (density 0.837), and (2) a stratified clustering protocol separating Manhattan from Outer Boroughs. The resulting ten-cluster NYC typology is externally validated against ACS 2019–2023 census data with large effect sizes (η²H = 0.348 for median income). Monthly OD temporal stability reaches r̄ = 0.993, closely matching Chicago (r̄ = 0.996). All four research hypotheses are confirmed, demonstrating that the proposed framework generalises across cities of fundamentally different scales and spatial structures

Published

2026-05-17

How to Cite

Abylai Ayan Azamatuly. (2026). GRAPH AUTOENCODER-BASED CLUSTERING OF URBAN MOBILITY ZONES: SCALABILITY VALIDATION AND CROSS-CITY ANALYSIS ON CHICAGO AND NEW YORK CITY. World Scientific Reports, (13). Retrieved from https://ojs.publisher.agency/index.php/WSR/article/view/8668

Issue

Section

Technical Sciences