Deliberative alignment: reasoning enables safer language models

OpenAI Blog
December 20, 2024
Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them.
Verticals
airesearch
Originally published on OpenAI Blog on 12/20/2024