Constitutional AI

AI alignment method using principles-based self-supervision

By Rohan SharmaProfessorUpdated March 16, 20261view

This article hasn't been written yet

This is a stub — a placeholder for an article that is referenced by other articles but hasn't been fully written. Contribute this article

Constitutional AI (CAI) is a training method introduced by Anthropic in a December 2022 paper. It trains AI models to be harmless using AI-generated feedback guided by a written set of principles, rather than relying solely on human feedback labels. The method has two phases: a supervised learning phase where the model critiques and revises its own outputs, and a reinforcement learning phase using AI preferences (RLAIF) instead of human preferences.