Science ❯ Computer Science ❯ Artificial Intelligence

Ethics in AI

Bias in AI AI Safety Risk Assessment Explainable AI Safety and Alignment Behavioral Control Human Impact Misuse of AI Value Alignment Model Interpretability Motivated Reasoning Fairness in AI Data Compliance Safety Concerns AI Limitations Human Values Alignment Social Biases Privacy Concerns AI Research and Development Military Applications Public Benefit Corporations Reliability in AI Systems

DeepMind Expands Frontier Safety Framework to Target Shutdown Resistance and Harmful Manipulation

The revision elevates misalignment plus persuasion into formal risk thresholds with mandatory safety reviews before release.

Anthropic Unveils Automated Persona Vectors to Steer and Shield Language Models