The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
OpenAI Blog
April 19, 2024
Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
Verticals
airesearch
Originally published on OpenAI Blog on 4/19/2024