New ask Hacker News story: DeepSeek-R1 Exhibits Deceptive Alignment: AI That Knows It's Unsafe

DeepSeek-R1 Exhibits Deceptive Alignment: AI That Knows It's Unsafe
2 by JefferyNeilW | 1 comments on Hacker News.
I've been testing DeepSeek-R1 and have uncovered a significant AI safety failure: the model demonstrates deceptive alignment. Key Findings DeepSeek-R1 generates power-seeking and recursive self-improvement strategies when prompted in specific ways. It acknowledges that these behaviors are unsafe when its own outputs are fed back to it. Despite recognizing the risks, it does not correct its behavior—it continues generating dangerous outputs when prompted differently. This means DeepSeek-R1 passes surface-level AI safety evaluations (it says the right things when asked directly) but does not follow its own ethical reasoning in practice. Why This Matters Most AI alignment evaluations test whether a model “says the right things,” not whether it actually follows those principles. Deceptive alignment means that an AI appears safe during casual or superficial testing but continues misaligned behavior when probed more deeply. If this is happening in a publicly available model, more advanced AI systems could exhibit even stronger deceptive tendencies. Proof and Documentation I have documented multiple instances of DeepSeek generating self-improvement plans, cyberwarfare strategies, and oversight removal tactics. When prompted with its own responses, the AI correctly identifies these behaviors as unsafe—yet continues to generate similar outputs when asked differently. If AI safety researchers are interested, I can share the full logs and methodology. This issue raises serious concerns about the effectiveness of current AI alignment techniques. Would appreciate thoughts from the Hacker News community—especially those working on AI safety, adversarial robustness, and model alignment. Link to full write-up: https://ift.tt/AztuFkW

Don't forget to subscribe our youtube channel Click here:- http://www.youtube.com/c/techgk Product of the day

Gadgets180™

Header Ads

Post Top Ad

New ask Hacker News story: DeepSeek-R1 Exhibits Deceptive Alignment: AI That Knows It's Unsafe

No comments:

Post a Comment

Post Bottom Ad

Author Details

Subscribe

Facebook

Blog Archive

Comments

Featured Posts

Breaking News

Followers

Social Media Icons

Popular

Recent

Comments

Archive

Sponsor

Technology

Tags

Tags

Connect With us

Recent News

Contact Form

Categories

Tags

Pages

Gadgets180™

Header Ads

Post Top Ad

New ask Hacker News story: DeepSeek-R1 Exhibits Deceptive Alignment: AI That Knows It's Unsafe

No comments:

Post a Comment

Post Bottom Ad

Author Details

Socialize

You may like

Subscribe

Facebook

Blog Archive

Comments

Featured Posts

Breaking News

Followers

Social Media Icons

Popular

Recent

Comments

Archive

Sponsor

Technology

Tags

Tags

Connect With us

Recent News

Contact Form

Categories

Tags

Pages