Skip to yearly menu bar Skip to main content


Poster
in
Workshop: The 3rd DL4C Workshop: Emergent Possibilities and Challenges in Deep Learning for Code

NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits

Tushar Aggarwal · Swayam Singh · Abhijeet Awasthi · Aditya Kanade · Nagarajan Natarajan


Abstract:

Software engineering activities frequently involve edits to existing code. However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. Starting with seed code examples and diverse editing criteria, our pipeline generates high-quality samples comprising original and modified code, along with natural language instructions in different styles and verbosity. Today's code LMs come bundled with strong abilities, such as code generation and instruction following, which should not be lost due to fine-tuning. To ensure this, we propose a novel adaptation algorithm, SeleKT, that (a) leverages a dense gradient-based step to identify the weights that are most important for code editing, and (b) does a sparse projection onto the base model to avoid overfitting. Using our approach, we obtain a new model NextCoder (adapted from Qwen2.5-Coder-7B) that achieves strong results on four code-editing benchmarks, outperforming comparable size models and even several larger ones. We show the generality of our approach by improving DeepSeekCoder-6.7B and Qwen2.5-Coder-7B, compare against other fine-tuning approaches, and demonstrate robustness by showing retention of code generation abilities post adaptation.

Chat is not available.