AI Alignment

Research field focused on ensuring AI systems act in accordance with human values

By Rohan SharmaProfessorUpdated March 16, 20261view

This article hasn't been written yet

This is a stub — a placeholder for an article that is referenced by other articles but hasn't been fully written. Contribute this article

AI alignment is a subfield of AI safety research concerned with ensuring that artificial intelligence systems pursue goals and exhibit behaviors consistent with human intentions and values. The field addresses problems including reward hacking, specification gaming, and value misalignment, and encompasses techniques such as reinforcement learning from human feedback (RLHF), Constitutional AI, and interpretability research.