GRAPH AUTOENCODER-BASED CLUSTERING OF URBAN MOBILITY ZONES: SCALABILITY VALIDATION AND CROSS-CITY ANALYSIS ON CHICAGO AND NEW YORK CITY
Keywords:
graph autoencoder, GCN, urban mobility, NYC Yellow Taxi, Chicago, Top-K sparsification, stratified clustering, cross-city validation, OD matrix, UMAP, HDBSCANAbstract
This paper proposes and validates a Graph Autoencoder (GAE)-based pipeline for unsupervised clustering of urban mobility zones using Origin-Destination (OD) taxi data. The pipeline is evaluated across two US cities: Chicago (6.5M trips, 195 zones) and New York City (38.3M trips, 236 Taxi Zones) — a 14× difference in scale. For NYC, two methodological extensions are introduced: (1) Top-K graph sparsification to resolve GCN over-smoothing caused by the dense NYC OD graph (density 0.837), and (2) a stratified clustering protocol separating Manhattan from Outer Boroughs. The resulting ten-cluster NYC typology is externally validated against ACS 2019–2023 census data with large effect sizes (η²H = 0.348 for median income). Monthly OD temporal stability reaches r̄ = 0.993, closely matching Chicago (r̄ = 0.996). All four research hypotheses are confirmed, demonstrating that the proposed framework generalises across cities of fundamentally different scales and spatial structures
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.