Posts by Collection

publications

Stronger Universal and Transfer Attacks by Suppressing Refusals

Published in NAACL 2025 (abridged version accepted NeurIPS SafeGenAI 2024), 2025

A novel algorithm leveraging model refusal representation for automated jailbreaking suffix generation on LLMs

Recommended citation: Huang, D., Shah, A., Araujo, A., Wagner, D., & Sitawarin, C. (2025). Stronger universal and transfer attacks by suppressing refusals. NAACL 2025. https://shavidan123.github.io/files/NAACL__Stronger_Universal_and_Transferable_Attacks_by_Refusal_Suppression.pdf

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.