The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
Abstract
State-of-the-art models for code representation, such as GraphCodeBERT, embed the hierarchical structure of source code into Euclidean space. This approach can lead to significant representation distortion, especially when embedding deep or highly branched hierarchies,limiting the models' ability to capture deep program semantics. We argue that the natural geometry for code is hyperbolic, as its exponential volume growth perfectly matches the tree-like structure of a code's Abstract Syntax Tree (AST), enabling low-distortion hierarchical embeddings. We introduce {HypeCodeNet}, a geometric deep learning framework that operates natively in hyperbolic space. Formulated in the numerically stable Lorentz model, its manifold-aware components include a hyperbolic embedding layer, a tangent space message-passing mechanism, and a geodesic-based attention module. On code clone detection, code completion, and link prediction, HypeCodeNet significantly outperforms existing Euclidean models, especially on tasks requiring deep structural understanding. Our work suggests that hyperbolic geometry offers a geometrically sound foundation for code representation, establishing hyperbolic geometry as a key to unlocking the structured semantics of code.