On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation
Published in arxiv, 2026
Recommended citation: Han, A., Fujimoto, K., Shah, A., Nguyen, K., Xu, K., Yueh-Han, C., Sucholutsky, I., & Angell, R. (2026). On-policy consistency training improves LLM safety with minimal capability degradation. arXiv preprint arXiv:2605.21834. https://shavidan123.github.io/files/OPCT_On_Policy_Consistency_Training.pdf