Publications

You can also find my publications on my Google Scholar profile.

Stronger Universal and Transfer Attacks by Suppressing Refusals

Published in NAACL 2025 (abridged version accepted NeurIPS SafeGenAI 2024), 2025

A novel algorithm leveraging model refusal representation for automated jailbreaking suffix generation on LLMs

Recommended citation: Huang, D., Shah, A., Araujo, A., Wagner, D., & Sitawarin, C. (2025). Stronger universal and transfer attacks by suppressing refusals. NAACL 2025. https://shavidan123.github.io/files/NAACL__Stronger_Universal_and_Transferable_Attacks_by_Refusal_Suppression.pdf