Poster
in
Workshop: The 3rd DL4C Workshop: Emergent Possibilities and Challenges in Deep Learning for Code

CodeEditorBench: Evaluating Code Editing Capability of LLMs

Jiawei Guo · Ziming Li · Xueling Liu · Kaijing Ma · Tianyu Zheng · Zhouliang Yu · Ding Pan · Yizhi Li · Ruibo Liu · Yue Wang · Shuyue Guo · Xingwei Qu · Xiang Yue · Ge Zhang · Wenhu Chen · Jie Fu

Project Page [ OpenReview]

Abstract

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, a pioneering evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curated diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluating 19 LLMs revealed that despite the relative consistency observed between the models' code editing and code generation abilities, notable differences persist.The results highlight the models’ limitations in code polishing and code rewriting as required and also indicate that models specifically tailored for code feedback capabilities show significant improvements in code editing tasks.CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release the dataset and evaluation code to enable the community to study code editing tasks of LLMs.

Video

Chat is not available.