{"id":99632,"date":"2025-09-19T06:58:05","date_gmt":"2025-09-19T06:58:05","guid":{"rendered":"https:\/\/neclink.com\/index.php\/2025\/09\/19\/openais-research-on-ai-models-deliberately-lying-is-wild\/"},"modified":"2025-09-19T06:58:05","modified_gmt":"2025-09-19T06:58:05","slug":"openais-research-on-ai-models-deliberately-lying-is-wild","status":"publish","type":"post","link":"https:\/\/neclink.com\/index.php\/2025\/09\/19\/openais-research-on-ai-models-deliberately-lying-is-wild\/","title":{"rendered":"OpenAI\u2019s research on AI models deliberately lying is wild\u00a0"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its <a href=\"https:\/\/techcrunch.com\/2024\/12\/10\/google-says-its-new-quantum-chip-indicates-that-multiple-universes-exist\/\">latest quantum chip<\/a> indicated multiple universes exist. Or when Anthropic gave its AI agent Claudius a snack\u00a0vending machine to run and <a href=\"https:\/\/techcrunch.com\/2025\/06\/28\/anthropics-claude-ai-became-a-terrible-business-owner-in-experiment-that-got-weird\/\" target=\"_blank\" rel=\"noreferrer noopener\">it went amok, calling security on people<\/a> and insisting it was human. \u00a0<\/p>\n<p class=\"wp-block-paragraph\">This week, it was OpenAI\u2019s turn to raise our collective eyebrows. <\/p>\n<p class=\"wp-block-paragraph\">OpenAI released on Monday some research that explained <a href=\"https:\/\/openai.com\/index\/detecting-and-reducing-scheming-in-ai-models\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">how it\u2019s stopping AI models from \u201cscheming.\u201d<\/a> It\u2019s a  practice in which an \u201cAI behaves one way on the surface while hiding its true goals,\u201d OpenAI <a href=\"https:\/\/x.com\/OpenAI\/status\/1968361701784568200\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">defined in its tweet<\/a> about the research.\u00a0 \u00a0<\/p>\n<p class=\"wp-block-paragraph\">In the\u00a0paper, conducted with Apollo Research, researchers went a bit further, likening AI scheming to a human stock broker breaking the law to make as much money as possible. The researchers, however, argued that most AI \u201cscheming\u201d wasn\u2019t that harmful. \u201cThe most common failures involve simple forms of deception \u2014 for instance, pretending to have completed a task without actually doing so,\u201d they wrote.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The paper was mostly published to show that \u201cdeliberative alignment\u2060\u201d\u00a0\u2014 the anti-scheming technique they were testing \u2014 worked well.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">But\u00a0it also explained that AI developers haven\u2019t figured out a way to train their models not to scheme. That\u2019s because such training could actually teach the model how to scheme even better to avoid being detected.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cA major failure mode of attempting to \u2018train out\u2019 scheming is simply teaching the model to scheme more carefully and covertly,\u201d the researchers wrote.\u00a0<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 27-29, 2025<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">Perhaps the most astonishing part is that, if a model understands that it\u2019s\u00a0being tested, it can pretend it\u2019s not scheming just to pass the test, even if it is still scheming. \u201cModels often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment,\u201d the researchers wrote.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s not news that AI models will lie. By now most of us have experienced AI hallucinations, or the model confidently giving an answer to a prompt that simply isn\u2019t true. But hallucinations are basically presenting guesswork with confidence, as OpenAI research released <a href=\"https:\/\/openai.com\/index\/why-language-models-hallucinate\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">earlier this month<\/a> documented.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Scheming is something else. It\u2019s deliberate. \u00a0<\/p>\n<p class=\"wp-block-paragraph\">Even this revelation \u2014 that a model will deliberately mislead humans \u2014 isn\u2019t new.\u00a0Apollo Research first <a href=\"https:\/\/www.apolloresearch.ai\/research\/scheming-reasoning-evaluations\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">published a paper in December<\/a> documenting how five models schemed when they were given instructions to achieve a goal\u00a0\u201cat all costs.\u201d \u00a0<\/p>\n<p class=\"wp-block-paragraph\">The news here is actually good news: The researchers saw significant reductions in scheming by using \u201cdeliberative alignment\u2060.\u201d That technique involves teaching the model an \u201canti-scheming specification\u201d and then making the model go review it before acting. It\u2019s a bit like making little kids repeat the rules\u00a0before allowing them to play.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">OpenAI researchers insist that the lying they\u2019ve caught with their own models, or even with ChatGPT, isn\u2019t that serious. As OpenAI\u2019s co-founder Wojciech Zaremba told TechCrunch\u2019s Maxwell Zeff about this research: \u201cThis work has been done in the simulated environments, and we think it represents future use cases. However,\u00a0today, we haven\u2019t seen this kind of consequential scheming in our production traffic. Nonetheless, it is well known that there are forms of deception in ChatGPT. You might ask it to implement some website, and it might tell you, \u2018Yes, I did a great job.\u2019 And that\u2019s just the lie. There are some petty forms of deception that we still need to address.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The fact that AI models from multiple players intentionally deceive humans is, perhaps, understandable. They were built by humans, to mimic humans, and (synthetic data aside) for the most part trained on data produced by humans.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s also bonkers.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">While we\u2019ve all experienced the frustration of poorly performing technology (thinking of you, home printers of yesteryear), when was the last time your not-AI software deliberately lied to you? Has your inbox ever fabricated\u00a0emails on its own? Has your CMS logged new prospects that didn\u2019t exist to pad its numbers? Has your fintech app made up its own bank transactions?\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s worth pondering this as the corporate world barrels toward an AI future where companies believe agents can be treated like independent employees. The researchers of this paper have the same warning.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAs AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow \u2014 so our safeguards and our ability to rigorously test must grow correspondingly,\u201d they wrote.\u00a0<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/09\/18\/openais-research-on-ai-models-deliberately-lying-is-wild\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple<\/p>\n","protected":false},"author":1,"featured_media":99633,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-99632","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/99632","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/comments?post=99632"}],"version-history":[{"count":0,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/posts\/99632\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media\/99633"}],"wp:attachment":[{"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/media?parent=99632"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/categories?post=99632"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neclink.com\/index.php\/wp-json\/wp\/v2\/tags?post=99632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}