Group Representational Position Embedding
Yifan Zhang · Zixiang Chen · Yifeng Liu · Qin Zhen · Rina Hughes · Kangping Xu · Yang Yuan · Quanquan Gu · Andrew Yao
Abstract
We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) \emph{multiplicative} rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) \emph{additive} logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a position $n \in \mathbb{Z}$ (or $t\in\mathbb{R}$) acts as $\mathbb{G}(n)=\exp(n\,\omega\,\mathbf{L})$ with a rank‑2 skew generator $\mathbb{L} \in \mathbb{R}^{d \times d}$, yielding a relative, compositional, norm‑preserving map with a closed‑form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log‑uniform spectrum. Learned commuting subspaces and compact non‑commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at $O(d)$ and $O(rd)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank‑1 (or low‑rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long‑context models, subsuming RoPE and ALiBi as special cases.
Successful Page Load