MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Abstract
Representation learning of geospatial locations remains a core challenge in achieving general geospatial intelligence, with increasingly diverging philosophies and techniques. While Earth observation paradigms excel at depicting locations in their physical states, we propose that a location’s full characterization requires grounding in both its physical attributes and its internal human activity pattern, the latter being particularly crucial for understanding its human-centric functions. We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location. MoRA achieves this through the integration of spatial tokenization, GNNs, and asymmetric contrastive learning to align 100M+ POIs, massive remote sensing imagery, and structured demographic statistics with a billion-edge mobility graph, ensuring the three auxiliary modalities are interpreted through the lens of fundamental human dynamics. To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. Experiments show that MoRA, with four input modalities and a compact 128-dimensional representation space, achieves superior predictive performances than state-of-the-art models by an average of 12.9\%. Echoing LLM scaling laws, we further demonstrate the scaling behavior in geospatial representation learning. We open-source code and pretrained models at: https://anonymous.4open.science/r/MoRA-.