Publications

You can also find my publications on my Google Scholar profile.

Stronger Universal and Transfer Attacks by Suppressing Refusals

Published in NAACL 2025 (abridged version accepted NeurIPS SafeGenAI 2024), 2025

A novel algorithm leveraging model refusal representation for automated jailbreaking suffix generation on LLMs

Recommended citation: Huang, D., Shah, A., Araujo, A., Wagner, D., & Sitawarin, C. (2025). Stronger universal and transfer attacks by suppressing refusals. NAACL 2025. https://shavidan123.github.io/files/NAACL__Stronger_Universal_and_Transferable_Attacks_by_Refusal_Suppression.pdf

Published in arxiv, 2023

This paper serves as the final project for CS285, and was not submitted to any conference for review

Recommended citation: Avidan Shah, Danny Tran, Yuhan Tang (2023). "Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning." http://shavidan123.github.io/files/CS285_Bus_Bunching_Final_Project.pdf