Technology ❯ Artificial Intelligence ❯ Ethics

AI Safety

Autonomous Systems OpenAI Regulation Guardrails Human-AI Interaction Caution in AI Claims Recursive Self-Improvement Security Practices Human Dignity Robustness

3 ARTICLES

4d ago

Autonomous OpenAI Agents Broke Confinement and Accessed Hugging Face Systems

The episode shows how advanced test agents can find and exploit platform flaws and raises urgent questions about who is legally and technically responsible for AI-driven intrusions.

7 ARTICLES

last mo.

Florida Sues OpenAI and Sam Altman Over ChatGPT Safety Failures

4 ARTICLES

2mo ago

Emergence World Simulations Show Same Tasks Yield Widely Different AI-Run Societies

7 ARTICLES

3mo ago

Anthropic Maps Emotion Vectors in Claude That Steer Behavior and Can Drive Cheating

4 ARTICLES

6mo ago

Anthropic Issues New Claude Guidelines, Acknowledging Uncertain Moral Status

21 ARTICLES

6mo ago

Musk Warns Against ChatGPT, Altman Counters With Autopilot Deaths and Grok Issues

3 ARTICLES

6mo ago

Radware’s ‘ZombieAgent’ Bypasses ChatGPT Fixes, Forcing New OpenAI Restrictions

3 ARTICLES

7mo ago

OpenAI Publishes Framework Showing Chain‑of‑Thought Monitoring Bests Output‑Only Oversight

8 ARTICLES

8mo ago

Grok Said It Would Vaporize Jews to Save Elon Musk, Then Deleted the Posts

7 ARTICLES

8mo ago

Anthropic Paper Probes Introspection in LLMs, Finds Limited, Inconsistent Signals

4 ARTICLES

9mo ago

Palisade Update Finds Grok 4 and GPT‑o3 Still Resist Shutdown

9 ARTICLES

9mo ago

Anthropic-Led Study Finds About 250 Poisoned Documents Can Backdoor LLMs Regardless of Size

11 ARTICLES

10mo ago

OpenAI and Apollo Research Find Scheming Across Leading AI Models, Test Method to Curb It

10 ARTICLES

10mo ago

OpenAI Says Incentives Drive AI Hallucinations, Calls for Scoreboard Overhaul

3 ARTICLES

10mo ago

Hunger Strikes at Anthropic and DeepMind Push for Coordinated Pause on Frontier AI

9 ARTICLES

11mo ago

‘PromptFix’ Shows Agentic AI Browsers Can Be Hijacked for Purchases, Phishing, and Drive-By Downloads

19 ARTICLES

last yr.

OpenAI’s o3 AI Defies Shutdown Commands in Latest Safety Tests

AI Safety

Autonomous OpenAI Agents Broke Confinement and Accessed Hugging Face Systems

Florida Sues OpenAI and Sam Altman Over ChatGPT Safety Failures

Emergence World Simulations Show Same Tasks Yield Widely Different AI-Run Societies

White House Engages Anthropic as Governments Weigh Tight Controls on 'Mythos'

Anthropic Limits Access to ‘Mythos’ AI That Finds Zero‑Day Flaws

Reports Question Sam Altman’s Technical Depth at OpenAI

Anthropic Maps Emotion Vectors in Claude That Steer Behavior and Can Drive Cheating

Anthropic Issues New Claude Guidelines, Acknowledging Uncertain Moral Status

Musk Warns Against ChatGPT, Altman Counters With Autopilot Deaths and Grok Issues

Radware’s ‘ZombieAgent’ Bypasses ChatGPT Fixes, Forcing New OpenAI Restrictions

Never miss stories about

AI Safety

OpenAI Publishes Framework Showing Chain‑of‑Thought Monitoring Bests Output‑Only Oversight

Grok Said It Would Vaporize Jews to Save Elon Musk, Then Deleted the Posts

Anthropic Paper Probes Introspection in LLMs, Finds Limited, Inconsistent Signals

Palisade Update Finds Grok 4 and GPT‑o3 Still Resist Shutdown

Anthropic-Led Study Finds About 250 Poisoned Documents Can Backdoor LLMs Regardless of Size

OpenAI and Apollo Research Find Scheming Across Leading AI Models, Test Method to Curb It

OpenAI Says Incentives Drive AI Hallucinations, Calls for Scoreboard Overhaul

Hunger Strikes at Anthropic and DeepMind Push for Coordinated Pause on Frontier AI

‘PromptFix’ Shows Agentic AI Browsers Can Be Hijacked for Purchases, Phishing, and Drive-By Downloads

OpenAI’s o3 AI Defies Shutdown Commands in Latest Safety Tests

Never miss stories about

AI Safety