Contents
AI alignment method using principles-based self-supervision
This article hasn't been written yet
This is a stub — a placeholder for an article that is referenced by other articles but hasn't been fully written. Contribute this article
Constitutional AI (CAI) is a training method introduced by Anthropic in a December 2022 paper. It trains AI models to be harmless using AI-generated feedback guided by a written set of principles, rather than relying solely on human feedback labels. The method has two phases: a supervised learning phase where the model critiques and revises its own outputs, and a reinforcement learning phase using AI preferences (RLAIF) instead of human preferences.
Constitutional AI (CAI) is a training method introduced by Anthropic in a December 2022 paper. It trains AI models to be harmless using AI-generated feedback guided by a written set of principles, rather than relying solely on human feedback labels. The method has two phases: a supervised learning phase where the model critiques and revises its own outputs, and a reinforcement learning phase using AI preferences (RLAIF) instead of human preferences.