Skip to yearly menu bar Skip to main content


Poster

The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory before In-context Learning

Tian Jin · Nolan Clement · Xin Dong · Vaishnavh Nagarajan · Michael Carbin · Jonathan Ragan-Kelley · Gintare Karolina Dziugaite

Halle B #133
[ ]
Tue 7 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract:

We study how down-scaling large language model (LLM) size impacts LLM capabilities. We begin by measuring the effects of weight pruning – a popular technique for reducing model size – on the two abilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in context. Surprisingly, we find that existing pruning techniques affect these two abilities of LLMs differently. For example, pruning more than 30% of weights significantly decreases an LLM’s ability to recall facts presented during pre-training. Yet pruning 60-70% of weights largely preserves an LLM’s ability to process information in-context, ranging from retrieving answers based on information presented in context to learning parameterized functions such as a linear classifier based on a few examples. Moderate pruning impairs LLM’s ability to recall facts learnt from pre-training. However, its effect on model’s ability to process information presented in context is much less pronounced. The said disparate effects similarly arise when replacing the original model with a smaller dense one with reduced width and depth. This similarity suggests that model size reduction in general underpins the said disparity.

Live content is unavailable. Log in and register to view live content