Mar 4, 2024
UC San Diego computer scientists find a better method to detect and prevent toxic prompts cloaked in benign language in large language models.
UC San Diego computer scientists find a better method to detect and prevent toxic prompts cloaked in benign language in large language models.