Posts by Collection

publications

Stronger Universal and Transfer Attacks by Suppressing Refusals

Published in NAACL 2025 (abridged version accepted NeurIPS SafeGenAI 2024), 2025

A novel algorithm leveraging model refusal representation for automated jailbreaking suffix generation on LLMs

Recommended citation: Huang, D., Shah, A., Araujo, A., Wagner, D., & Sitawarin, C. (2025). Stronger universal and transfer attacks by suppressing refusals. NAACL 2025. https://shavidan123.github.io/files/NAACL__Stronger_Universal_and_Transferable_Attacks_by_Refusal_Suppression.pdf

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

Published in ICML 2026 (Spotlight), 2026

Practical prompt-injection-based poisoning attacks against the Rapid Response jailbreak detection framework, including a novel Omission Attack that flips ~90% of target labels with only 1% poisoning rate.

Recommended citation: Huang, D., Chang, J., Shah, A., Mittal, P., & Sitawarin, C. (2026). Rapid Poison: Practical poisoning attacks against the Rapid Response framework. ICML 2026. https://shavidan123.github.io/files/Rapid_Poison__ICLR_Workshop_ (3).pdf

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.