Toggle Poster Visibility
Oral
Fri Apr 25 12:30 AM -- 12:42 AM (PDT) @ Hall 1 Apex None
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
[
OpenReview]
Oral
Fri Apr 25 12:42 AM -- 12:54 AM (PDT) @ Hall 1 Apex None
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
[
Slides]
[
OpenReview]
Oral
Fri Apr 25 12:54 AM -- 01:06 AM (PDT) @ Hall 1 Apex None
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
[
OpenReview]
Oral
Fri Apr 25 01:06 AM -- 01:18 AM (PDT) @ Hall 1 Apex None
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
[
OpenReview]
Oral
Fri Apr 25 01:18 AM -- 01:30 AM (PDT) @ Hall 1 Apex None
REEF: Representation Encoding Fingerprints for Large Language Models
[
Slides]
[
OpenReview]
Oral
Fri Apr 25 01:30 AM -- 01:42 AM (PDT) @ Hall 1 Apex None
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
[
OpenReview]
Successful Page Load