Contents
Research field focused on ensuring AI systems act in accordance with human values
This article hasn't been written yet
This is a stub — a placeholder for an article that is referenced by other articles but hasn't been fully written. Contribute this article
AI alignment is a subfield of AI safety research concerned with ensuring that artificial intelligence systems pursue goals and exhibit behaviors consistent with human intentions and values. The field addresses problems including reward hacking, specification gaming, and value misalignment, and encompasses techniques such as reinforcement learning from human feedback (RLHF), Constitutional AI, and interpretability research.
AI alignment is a subfield of AI safety research concerned with ensuring that artificial intelligence systems pursue goals and exhibit behaviors consistent with human intentions and values. The field addresses problems including reward hacking, specification gaming, and value misalignment, and encompasses techniques such as reinforcement learning from human feedback (RLHF), Constitutional AI, and interpretability research.