Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment

Jiseon Kim · Jea Kwon · Luiz Felipe Vecchietti · Alice Oh · Meeyoung Cha


Abstract:

Deploying large language models (LLMs) with agencies in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? Here, we study the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. Our findings reveal that the moral decisions of LLMs vary substantially by persona, showing greater shifts in decision boundaries than humans, even for critical tasks. We also report an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.

Chat is not available.