Science ❯ Computer Science ❯ AI Research
Risk Assessment Mechanistic Interpretability Critical Capability Levels
The revision elevates misalignment plus persuasion into formal risk thresholds with mandatory safety reviews before release.