ICLR Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

Poster
in
Affinity Workshop: Tiny Papers Poster Session 6

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

Yutong Hu · Quzhe Huang · Mingxu Tao · Chen Zhang · Yansong Feng

Halle B #296

[ Abstract ] [ Project Page ]

Abstract:

Recent studies have shown that Large Language Models (LLMs) have the potential to process extremely long text with evidence that LLMs could perform well in the language modeling task with even 1 million input tokens. When the input context length increases, the perplexity (PPL) of the model is observed to maintain at a low level or even decrease. However, in our study, we find that the PPL may only reflect the model's ability to model local information instead of catching long-range dependency, and thus only using PPL to prove the model could process very long context is not appropriate. The local focus feature of perplexity could also explain some existing phenomena, such as the great extrapolation ability of the position method ALiBi. When evaluating a model's ability in long text, we might pay more attention to the limitation of PPL and avoid overly reliance on it.

Live content is unavailable. Log in and register to view live content

Poster in Affinity Workshop: Tiny Papers Poster Session 6

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?

Yutong Hu · Quzhe Huang · Mingxu Tao · Chen Zhang · Yansong Feng

Halle B #296

Poster
in
Affinity Workshop: Tiny Papers Poster Session 6