Skip to yearly menu bar Skip to main content


Oral Presentation
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

CHI Oral 2: Augmenting Image Annotation: A Human–LLM Collaborative Framework for Efficient Object Selection and Label Generation

HE ZHANG


Abstract:

Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper introduces a novel framework that leverages the visual understanding capabilities of large language models (LLMs), particularly GPT, to assist annotation workflows. In our proposed approach, human annotators focus on selecting objects via bounding boxes, while the LLM autonomously generates relevant labels. This human-AI collaborative framework enhances annotation efficiency by reducing the cognitive and time burden on human annotators. By analyzing the system's performance across various types of annotation tasks, we demonstrate its ability to generalize to tasks such as object recognition, scene description, and fine-grained categorization. Our proposed framework highlights the potential of this approach to redefine annotation workflows, offering a scalable and efficient solution for large-scale data labeling in computer vision. Finally, we discuss how integrating LLMs into the annotation pipeline can advance bidirectional human–AI alignment, as well as the challenges of alleviating the "endless annotation" burden in the face of information overload by shifting some of the work to AI.

Chat is not available.