Skip to yearly menu bar Skip to main content

Workshop

ICLR 2025 Workshop on Bidirectional Human-AI Alignment

Hua Shen · Ziqiao Ma · Reshmi Ghosh · Tiffany Knearem · Michael Xieyang Liu · Sherry Wu · Andrés Monroy-Hernández · Diyi Yang · Antoine Bosselut · Furong Huang · Tanu Mitra · Joyce Chai · Marti Hearst · Dawn Song · Yang Li

Project Page

Abstract

As AI systems grow more integrated into real-world applications, the traditional one-way approach to AI alignment is proving insufficient. Bidirectional Human-AI Alignment proposes a new, dynamic framework where alignment is viewed as an ongoing, reciprocal process, with both humans and AI systems adapting over time. This paradigm acknowledges the complexity of human-AI interactions and emphasizes the need for continuous adaptation to evolving human values, societal contexts, and feedback loops. Our workshop at ICLR 2025 focuses on machine learning techniques that can drive this bidirectional alignment, including reinforcement learning, interactive learning, and multi-task learning, enabling AI systems to evolve in response to real-world changes. We also explore value specification, human-in-the-loop frameworks, and scalable post-training alignment methods. Additionally, the workshop will address evaluation techniques for real-time alignment adjustments and the societal implications of maintaining alignment across diverse human populations. By fostering collaboration between AI, HCI, and social science researchers, the workshop aims to create scalable, adaptive alignment frameworks that reflect ethical and societal goals. This event offers a novel approach to alignment research, emphasizing mutual human-AI adaptation and interdisciplinary cooperation to ensure AI systems remain aligned with human values.

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

5:50 PM

Welcome and Opening Remarks

Hua Shen · Joyce Chai · Yang Li · Antoine Bosselut

Video

6:00 PM

Keynote 1

Been Kim

Video

6:30 PM

Keynote 2

Frauke Kreuter

Video

7:00 PM

Poster and Coffee break

8:10 PM

Keynote 3

Hung-yi Lee

Video

8:40 PM

Keynote 4

Brad Myers

Video

9:10 PM

Lunch Break

10:10 PM

CHI Oral 1: Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking

Kevin Feng

10:20 PM

CHI Oral 2: Augmenting Image Annotation: A Human–LLM Collaborative Framework for Efficient Object Selection and Label Generation

HE ZHANG

10:30 PM

Keynote 5

Dan Bohus

Video

11:00 PM

Keynote 6

Pavel Izmailov

Video

11:30 PM

ICLR Oral 1: Scalably Solving Assistance Games

Cassidy Laidlaw

Video

11:40 PM

ICLR Oral 2: Preference Optimization For Concept Bottleneck Models

Emiliano Penaloza

Video

11:50 PM

ICLR Oral 3: Societal Impacts Research Requires Usage-Based Benchmarks for Creative Tasks

Judy Shen

Video

12:00 AM

ICLR Oral 4: InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback

Henry Zhao

Video

12:10 AM

Poster and Coffee Break

1:00 AM

ICLR Oral 5: Representational Alignment Supports Effective Teaching

Ilia Sucholutsky

Video

1:10 AM

ICLR Oral 6: PARSE-Ego4D: Toward Bidirectionally Aligned Action Recommendations for Egocentric Videos

Steven Abreu

Video

1:20 AM

ICLR Oral 7: AI-enhanced semantic feature norms for 786 concepts

Siddharth Suresh

Video

1:30 AM

ICLR Oral 8: SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Fengqing Jiang

Video

1:40 AM

Keynote 7

Richard Ngo

Video

2:10 AM

Panel Discussion with Experts

Richard Ngo · Frauke Kreuter · Pavel Izmailov · Tammy Masterson

Video

2:50 AM

Paper Award Announcement

Hua Shen · Yang Li · Joyce Chai · Antoine Bosselut

Video

3:00 AM

Closing Remarks

Hua Shen · Yang Li · Joyce Chai · Antoine Bosselut

Video

PREFERENCE OPTIMIZATION FOR CONCEPT BOTTLENECK MODELS

Emiliano Penaloza · Tianyue Zhang · Laurent Charlin · Mateo Espinosa Zarlenga

PREFERENCE OPTIMIZATION FOR CONCEPT BOTTLENECK MODELS

Learning From Diverse Experts: Behavior Alignment Through Multi-Objective Inverse Reinforcement Learning

Jun-Jie Yang · Qian-You Zhang · Chia-Heng Hsu · Xi Liu · Ping-Chun Hsieh

Learning From Diverse Experts: Behavior Alignment Through Multi-Objective Inverse Reinforcement Learning

Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun · Yijiang Li · Qingying Gao · Haiyun Lyu · Dezhi Luo · Hokin Deng

Probing Mechanical Reasoning in Large Vision Language Models

The Human Visual System Can Inspire New Interaction Paradigms for LLMs

Diana Robinson · Neil Lawrence

The Human Visual System Can Inspire New Interaction Paradigms for LLMs

AI-enhanced semantic feature norms for 786 concepts

SIDDHARTH SURESH · Kushin Mukherjee · Tyler Giallanza · Xizheng Yu · Mia Patil · Jonathan Cohen · Timothy Rogers

Scalably Solving Assistance Games

Cassidy Laidlaw · Eli Bronstein · Timothy Guo · Dylan Feng · Lukas Berglund · Justin Svegliato · Stuart Russell · Anca Dragan

Patterns and Mechanisms of Contrastive Activation Engineering

Yixiong Hao · Ayush Panda · Stepan Shabalin · Sheikh Abdur Raheem Ali

Patterns and Mechanisms of Contrastive Activation Engineering

Value Alignment in the Global South: A Multidimensional Approach to Norm Elicitation in Indian Contexts

Atmadeep Ghoshal · Martim Brandao · Ruba Abu-Salma

Value Alignment in the Global South: A Multidimensional Approach to Norm Elicitation in Indian Contexts

Active Human Feedback Collection via Neural Contextual Dueling Bandits

Arun Verma · Xiaoqiang Lin · Zhongxiang Dai · Daniela Rus · Bryan Kian Hsiang Low

Active Human Feedback Collection via Neural Contextual Dueling Bandits

CTRL-Rec: Controlling Recommender Systems With Natural Language

Micah Carroll · Adeline Foote · Marcus Williams · Anca Dragan · W. Bradley Knox · Smitha Milli

CTRL-Rec: Controlling Recommender Systems With Natural Language

Vision Language Models See What You Want but not What You See

Qingying Gao · Yijiang Li · Haiyun Lyu · Haoran Sun · Dezhi Luo · Hokin Deng

Vision Language Models See What You Want but not What You See

Data-adaptive Safety Rules for Training Reward Models

Xiaomin Li · Mingye Gao · Zhiwei Zhang · Fan · Weiyu Li

Data-adaptive Safety Rules for Training Reward Models

D3PO: Preference-Based Alignment of Discrete Diffusion Models

Umberto Borso · Davide Paglieri · Jude Wells · Tim Rocktaeschel

D3PO: Preference-Based Alignment of Discrete Diffusion Models

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Fengqing Jiang · Zhangchen Xu · Yuetai Li · Luyao Niu · Zhen Xiang · Bo Li · Bill Yuchen Lin · Radha Poovendran

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

Ravi Hammond · Dustin Craggs · Mingyu Guo · Jakob Foerster · Ian Reid

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

A Sociotechnical Perspective on Aligning AI with Pluralistic Human Values

Dalia Ali · Aysenur Kocak · Michèle Wieland · Dora Zhao · Allison Koenecke · Orestis Papakyriakopoulos

A Sociotechnical Perspective on Aligning AI with Pluralistic Human Values

Mitigating Societal Cognitive Overload in the Age of AI: Challenges and Directions

Salem Lahlou

Mitigating Societal Cognitive Overload in the Age of AI: Challenges and Directions

Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking

Kevin Feng · Inyoung Cheong · Quan Chen · Amy Zhang

A Benchmark for Scalable Oversight Mechanisms

Abhimanyu Pallavi Sudhir · Jackson Kaunismaa · Arjun Panickssery

A Benchmark for Scalable Oversight Mechanisms

From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

Ruben Weijers · Denton Wu · Hannah Betts · Yuxiang Guan · Vidya Sujaya · Kushal Dev · Reihaneh Rabbany · Jean-François Godbout · Kellin Pelrine · Tamara Jacod · William Delooze · Ying Wu

From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

Understanding (Un)Reliability of Steering Vectors in Language Models

Joschka Braun · Carsten Eickhoff · David Krueger · Seyed Ali Bahrainian · Dmitrii Krasheninnikov

Understanding (Un)Reliability of Steering Vectors in Language Models

Processing, Priming, Probing: Human Interventions for Explainability Alignment

Kenza Amara

Processing, Priming, Probing: Human Interventions for Explainability Alignment

Aligning LLMs with Domain Invariant Reward Models

David Wu · Sanjiban Choudhury

Aligning LLMs with Domain Invariant Reward Models

Drift: Efficient Implicit Personalization of Large Language Models

Minbeom Kim · Kang-il Lee · Seongho Joo · Hwaran Lee · Kyomin Jung

Drift: Efficient Implicit Personalization of Large Language Models

Bidirectional Alignment for Inclusive Narrative Generation

Ken Kawamura

Bidirectional Alignment for Inclusive Narrative Generation

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Yifan Zhang · Ge Zhang · Yue Wu · Kangping Xu · Quanquan Gu

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment

Jiseon Kim · Jea Kwon · Luiz Felipe Vecchietti · Alice Oh · Meeyoung Cha

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment

Augmenting Image Annotation: A Human–LLM Collaborative Framework for Efficient Object Selection and Label Generation

HE ZHANG · Xinyi Fu · John Carroll

CoPL: Collaborative Preference Learning for Personalizing LLMs

Youngbin Choi · Seunghyuk Cho · Minjong Lee · MoonJeong Park · Yesong Ko · Jungseul Ok · Dongwoo Kim

CoPL: Collaborative Preference Learning for Personalizing LLMs

Broaden your SCOPE! Efficient Conversation Planning for LLMs using Semantic Space

Zhiliang Chen · Xinyuan Niu · Chuan Sheng Foo · Bryan Kian Hsiang Low

Broaden your SCOPE! Efficient Conversation Planning for LLMs using Semantic Space

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

Jan Batzner · Volker Stocker · Stefan Schmid · Gjergji Kasneci

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

The Alignment Trilemma: A Theoretical Perspective on Recursive Misalignment and Human-AI Adaptation Dynamics

Tarun Raheja · Nilay Pochhi

The Alignment Trilemma: A Theoretical Perspective on Recursive Misalignment and Human-AI Adaptation Dynamics

Superalignment with Dynamic Human Values

Florian Mai · David Kaczér · Nicholas Corrêa · Lucie Flek

Superalignment with Dynamic Human Values

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback

Henry Zhao · Wenqi Pei · Yifei Tao · Haiyang Mei · Mike Zheng Shou

Human Alignment: How Much We Adapt to LLMs?

Cazalet Tanguy · Ruben Janssens · Tony Belpaeme · Joni Dambre

Human Alignment: How Much We Adapt to LLMs?

OUTLIER-AWARE PREFERENCE OPTIMIZATION FOR LARGE LANGUAGE MODELS

Pragya Srivastava · Sai Nalli · Amit Jayant Deshpande · Amit Sharma

Towards LVLM-Aided Alignment of Task-Specific Vision Models

Alexander Koebler · Christian Greisinger · Jan Paulus · Ingo Thon · Florian Buettner

Towards LVLM-Aided Alignment of Task-Specific Vision Models

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective

Jiawei Huang · Bingcong Li · Christoph Dann · Niao He

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective

TRIG-Bench: A Benchmark for Text-Rich Image Grounding

Ming Li · Ruiyi Zhang · Jian Chen · Tianyi Zhou

TRIG-Bench: A Benchmark for Text-Rich Image Grounding

Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective

Shu Zhong · Marianna Obrist

Envision Human-AI Perceptual Alignment from a Multimodal Interaction Perspective

Monitoring LLM Agents for Sequentially Contextual Harm

Chen Yueh-Han · Nitish Joshi · Yulin Chen · He He · Rico Angell

Monitoring LLM Agents for Sequentially Contextual Harm

Representational Difference Clustering

Neehar Kondapaneni · Emily Gu · Oisin Mac Aodha · Pietro Perona

Representational Difference Clustering

Position: Interpretability is a Bidirectional Communication Problem

Kola Ayonrinde

Position: Interpretability is a Bidirectional Communication Problem

PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng · Ariel Kwiatkowski · Kunhao Zheng · Julia Kempe · Yaqi Duan

PILAF: Optimal Human Preference Sampling for Reward Modeling

AI Systematically Rewires the Flow of Ideas

Zhonghao He · Tianyi Qiu · Tao Lin · Moshe Glickman · Atoosa Kasirzadeh · John Wihbey · Max Kleiman-Weiner

AI Systematically Rewires the Flow of Ideas

PARSE-Ego4D: Toward Bidirectionally Aligned Action Recommendations for Egocentric Videos

Steven Abreu · Tiffany Do · Karan Ahuja · Eric Gonzalez · Lee Payne · Daniel McDuff · Mar Gonzalez-Franco

Online Learning and Equilibrium Computation with Ranking Feedback

Mingyang Liu · Yongshan Chen · Zhiyuan Fan · Gabriele Farina · Asuman Ozdaglar · Kaiqing Zhang

Online Learning and Equilibrium Computation with Ranking Feedback

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Stanczak · Nicholas Meade · Mehar Bhatia · Hattie Zhou · Konstantin Böttinger · Jeremy Barnes · Jason Stanley · Jessica Montgomery · Richard Zemel · Nicolas Papernot · Nicolas Chapados · Denis Therien · Timothy Lillicrap · Ana Marasovic · Sylvie Delacroix · Gillian Hadfield · Siva Reddy

Societal Alignment Frameworks Can Improve LLM Alignment

TraCeS: Trajectory Based Credit Assignment For Safe Reinforcement Learning

Siow Meng Low · Akshat Kumar

TraCeS: Trajectory Based Credit Assignment For Safe Reinforcement Learning

ValueMap: Mapping Crowdsourced Human Values to Computational Scores for Bi-directional Alignment

Priya DCosta · Rupkatha Hira

ValueMap: Mapping Crowdsourced Human Values to Computational Scores for Bi-directional Alignment

Observability of Latent States in Generative AI Models

Tian Yu Liu · Stefano Soatto · Matteo Marchi · Pratik A Chaudhari · Paulo Tabuada

Observability of Latent States in Generative AI Models

Rethinking Anti-Misinformation AI

Vidya Sujaya · Kellin Pelrine · Andreea Musulan · Reihaneh Rabbany

Rethinking Anti-Misinformation AI

Negotiative Alignment: An interactive approach to human-AI co-adaptation for clinical applications

Florence X Doo · Nikhil Shah · Pranav Kulkarni · Vishwa Parekh · Heng Huang

Negotiative Alignment: An interactive approach to human-AI co-adaptation for clinical applications

Societal Impacts Research Requires Usage-Based Benchmarks for Creative Tasks

Judy Shen · Carlos Guestrin

Inference-time Alignment in Continuous Space

Yige Yuan · Teng Xiao · Li Yunfan · Bingbing Xu · Shuchang Tao · Yunqi Qiu · Huawei Shen · Xueqi Cheng

Inference-time Alignment in Continuous Space

SWEPO: Simultaneous Weighted Preference Optimization for Group Contrastive Alignment

Taneesh Gupta · Rahul Madhavan · Xuchao Zhang · Chetan Bansal · Saravanakumar Rajmohan

SWEPO: Simultaneous Weighted Preference Optimization for Group Contrastive Alignment

Investigating Alignment Signals in Initial Token Representations

Carl Rosenblatt

Investigating Alignment Signals in Initial Token Representations

Rethinking AI cultural alignment

Michal Bravansky · Filip Trhlík · Fazl Barez

Rethinking AI cultural alignment

A Roadmap for Human-Agent Moral Alignment: Integrating Pre-defined Intrinsic Rewards and Learned Reward Models

Elizaveta Tennant · Stephen Hailes · Mirco Musolesi

A Roadmap for Human-Agent Moral Alignment: Integrating Pre-defined Intrinsic Rewards and Learned Reward Models

Representational Alignment Supports Effective Teaching

Ilia Sucholutsky · Katherine Collins · Maya Malaviya · Nori Jacoby · Weiyang Liu · Theodore Sumers · Michalis Korakakis · Umang Bhatt · Mark Ho · Joshua B Tenenbaum · Bradley Love · Zachary Pardos · Adrian Weller · Thomas L. Griffiths

We Shape AI, and Thereafter AI Shape Us: Humans Align with AI through Social Influences

Jingshu Li · Tianqi Song · Beichen Xue · Yi-Chieh Lee

We Shape AI, and Thereafter AI Shape Us: Humans Align with AI through Social Influences

The Lock-in Hypothesis: Stagnation by Algorithm

Tianyi Qiu · Zhonghao He · Tejasveer Chugh · Max Kleiman-Weiner

The Lock-in Hypothesis: Stagnation by Algorithm

Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

Edward Chen · Sang Truong · Natalie Dullerud · Sanmi Koyejo · Carlos Guestrin

Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

Cooperative Agency-Centered LLMs

Iyadunni J. Adenuga

Cooperative Agency-Centered LLMs

A Pilot Study of Weak-to-Strong Generalization in Safety, Toxicity, and Legal Reasoning

Ruimeng Ye · Yang Xiao · Bo Hui

A Pilot Study of Weak-to-Strong Generalization in Safety, Toxicity, and Legal Reasoning

Moral Alignment for LLM Agents

Elizaveta Tennant · Stephen Hailes · Mirco Musolesi

Moral Alignment for LLM Agents

Shared Similarity Between Humans and Chatbots: Exploring Human Willingness to Seek Social Support From Chatbots

Zicheng Zhu · Tianqi Song · Jefferson Lim · Chi-Lan Yang · Yi-Chieh Lee

Shared Similarity Between Humans and Chatbots: Exploring Human Willingness to Seek Social Support From Chatbots

Addressing and Visualizing Misalignments in Human Task-Solving Trajectories

Sejin Kim · Hosung Lee · Sundong Kim

Addressing and Visualizing Misalignments in Human Task-Solving Trajectories

Vision Language Models Know Law of Conservation without Understanding More-or-Less

Dezhi Luo · Haiyun Lyu · Qingying Gao · Haoran Sun · Yijiang Li · Hokin Deng

Order Independence With Finetuning

Katrina Brown · Reid McIlroy-Young