Poster
in
Workshop: ICLR 2025 Workshop on Tackling Climate Change with Machine Learning: Data-Centric Approaches in ML for Climate Action
Large Language Models as a New Modality for Generalizable Earth Data Monitoring
Tong Nie · Junlin He · Wei Ma
Earth observation data are critical for monitoring progress toward Sustainable Development Goals (SDGs), yet persistent challenges in accessibility, integration of multimodal data, and geographic bias hinder comprehensive global assessments. While satellite imagery paired with machine learning (SIML) offers cost-effective monitoring, it struggles with socioeconomic indicators, data inequity, and spatial biases. This paper presents a novel framework leveraging large language models (LLMs) as a complementary modality to address these limitations. By extracting geospatial knowledge from pretrained LLMs through structured prompting—encoding coordinates into rich, task-agnostic embeddings—we enable efficient prediction of diverse earth monitoring indicators using linear regression. Evaluated on 25 global tasks spanning from climate metrics (e.g., temperature) to socioeconomic variables (e.g., poverty rates), our method outperforms state-of-the-art SIML approaches, achieving higher accuracy and sample efficiency. Notably, LLM-derived representations exhibit reduced geographic bias compared to existing methods and inherently capture socioeconomic contexts that form semantically meaningful clusters aligned with regional development patterns.