Skip to yearly menu bar Skip to main content


(6 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Fri Apr 25 12:30 AM -- 12:42 AM (PDT) @ Hall 1 Apex None
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li · Satyapriya Krishna · Hima Lakkaraju
[ OpenReview
Oral
Fri Apr 25 12:42 AM -- 12:54 AM (PDT) @ Hall 1 Apex None
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song · Shang Hong Sim · Rishabh Bhardwaj · Hai Leong Chieu · Navonil Majumder · Soujanya Poria
[ Slides [ OpenReview
Oral
Fri Apr 25 12:54 AM -- 01:06 AM (PDT) @ Hall 1 Apex None
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang · Dian Yu · Baolin Peng · Linfeng Song · Ye Tian · Mingyue Huo · Nan Jiang · Haitao Mi · Dong Yu
[ OpenReview
Oral
Fri Apr 25 01:06 AM -- 01:18 AM (PDT) @ Hall 1 Apex None
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu · Zijun Yao · Rui Min · Yixin Cao · Lei Hou · Juanzi Li
[ OpenReview
Oral
Fri Apr 25 01:18 AM -- 01:30 AM (PDT) @ Hall 1 Apex None
REEF: Representation Encoding Fingerprints for Large Language Models
Jie Zhang · Dongrui Liu · Chen Qian · Linfeng Zhang · Yong Liu · Yu Qiao · Jing Shao
[ Slides [ OpenReview
Oral
Fri Apr 25 01:30 AM -- 01:42 AM (PDT) @ Hall 1 Apex None
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun · Yunyi Shen · Jean-Francois Ton
[ OpenReview