Skip to yearly menu bar Skip to main content


GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Haibo Jin · Ruoxi Chen · Andy Zhou · Yang Zhang · Haohan Wang

Abstract

Chat is not available.