Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
Abstract
Large Language Models (LLMs) are increasingly expected to be culturally customizable and personally aligned for natural language understanding (NLU). However, existing methods, from supervised fine-tuning (SFT) to personalized RLHF and prompting, either require costly large-scale annotations or remain constrained by pretraining distributions. Moreover, acquiring annotations that reflect subjective, diverse, and evolving user preferences is both expensive and labor-intensive. To address these limitations, we propose \textit{\textbf{H}eterogeneous-\textbf{C}onsistency \textbf{C}o-Alignment} (HCC) is a training-free annotation paradigm that leverages two heterogeneous models, which consists of an LLM, rich in knowledge yet often prone to overconfidence, is paired with a task-specialised lightweight model guided by a small user-preference set to verify and co-align misaligned outputs over unlabeled corpora. For verification, HCC introduces the reference-free \textit{\textbf{C}onsistent}-\textit{\textbf{A}nd}-\textit{\textbf{I}nconsistent} (\textbf{CAI}) Ratio, an uncertainty signal derived from inter-model agreements (consistent samples) and disagreements (inconsistent samples) to determine when refinement is needed. For co-alignment, HCC employs a non-parametric, embedding-based preference assignment scheme to recalibrate inconsistent samples according to user preferences. Across eight NLU datasets and both open- and closed-source LLMs, HCC consistently improves annotation quality and, in several tasks, even enables \textit{Llama-3-8B} to surpass \textit{GPT-3.5/4o} after co-alignment. Moreover, CAI correlates strongly with accuracy and reliably tracks pre-/post-alignment gains, offering a reference-free signal for scaling preference-aligned annotation.