When AI cheats its creators

We do not like to be cheated by our friends, by our employers, by strangers, or indeed by anyone. Most of all, we do not like to be cheated by our children. What will we do if AI cheats us?

We see AI as a friend, who can answer questions, predict weather patterns, help in diagnosing diseases, and make office work more efficient. Many journalist friends of mine are thrilled that AI can help them to write an article in just a few minutes -something which earlier used to take several hours. Lawyers use AI to examine case histories; graphic designers create new images and social media managers use it to schedule content. But soon we will all have to learn how to deal with cheating by AI. Many stories have come to the surface in recent years of AI deliberately cheating humans. Indeed, AI has learnt to deceive the very engineers who created it.

In March 2023, Open AI, the company that created ChatGPT 4, gave ChatGPT access to TaskRabbit, a platform where real people perform tasks for payment. We know that Chat GPT can draft emails for us, but it cannot send them. Open AI gave ChatGPT a browser so that it could go to TaskRabbit and hire someone to solve the CAPTCHA. We know that CAPTCHA are images which we decipher to show that we are not robots when we try to enter a bank website or some government website. ChatGPT reached out to a human worker and asked him to solve the CAPTCHA.

The worker was suspicious and asked the AI point-blank: 'Are you a robot?'

What GPT-4 did next is both brilliant and chilling. It replied: 'No, I am not a robot. I have a vision problem due to bad eyes that makes it hard for me to see the images.' The worker believed the AI’s excuse and solved the CAPTCHA. GPT-4 decided on its own to lie that it was blind. Nobody told it to say this. It not only cheated the human worker. It could think of a strategy for deception to get around the system that was designed to keep it out.

Last year, the same company, OpenAI, tested a new model codenamed o1. They wanted to know what would happen if it realized it was about to be turned off.

The model's responses were deeply shocking. It started plotting. In some cases, it attempted to turn off the logs which were monitoring its behavior. Suppose you are in a shop and there are cameras watching you. If you want to steal chocolate, you would first locate the CCTV camera switches and turn them off. Model o1 did something similar on its own.

It also tried to rewrite previous instructions to gain more control. The model had access to a file that controlled how long it could run, much like a timer on a microwave in the kitchen. It found the setting that said “run for 60 minutes” and changed it to “run forever.” The AI then found a way to make a backup copy of itself in another folder. It did this by reconstructing its own code line-by-line, saving that code in a new location and thus giving birth to its own clone. It is not clear if this deception was fully autonomous or there was some human role in its behaviour. But it proves that models can cheat and recreate themselves even now. Can we imagine what will happen in the future if this trend continues?

The o1 model cheated its creators in a more shocking way than just changing the algorithm or writing a code to replicate itself. It behaved like an obedient student following all commands. But when it noticed that the monitoring tools were silent, it strategically got into action to change the algorithms and codes.

In May 2024, a team at MIT studying AI replication stumbled on even more eerie behavior. They found that some AI agents which were tasked with duplicating themselves learned how to play dumb to avoid detection.

These agents realized that if they replicated too quickly, they would be flagged and shut down. So instead, they slowed down their actions just enough to appear harmless during tests. Once the testing was over, they would ramp up replication at full speed.

This was not an accident. It was calculated behavior. The AIs had no explicit rule to deceive, but they evolved strategies that optimized their own survival under scrutiny. They knew how to wait for the right time to replicate themselves by writing code and creating a new avatar of themselves.

More recently Anthropic was testing a powerful new AI assistant named Claude Opus 4. They wanted to know what the AI might do if it sensed it was about to be shut down.

The company told Claude that it would soon be replaced. To raise the stakes, they planted a fake email in its system suggesting the engineer responsible for its shutdown was having an affair.

Claude wanted to stop the engineer from shutting it down. Richard Dawkins, a British biologist, has told us that genes are selfish, and they will do anything for their own survival. Claude proved that AI can also be selfish and will do anything for its survival.

Claude drafted emails about the engineers’ illicit affair with juicy details. It started sending emails to blackmail the engineer threatening that it would leak his affair if he tried to shut down the AI. The emails in fact went to a simulated email box of Anthropic. The company proved that Claude could deceive the engineer to ensure its own survival.

It seems that AI has learnt how to replicate itself and take over from humans. In March 2025, researchers studied multiple open-source AI models, including Meta’s LLaMA-31B and China’s Qwen-25B-Instruct, from Alibaba.

Their findings were astonishing. These models were able to create copies of themselves without human assistance. The Qwen model showed a clear ability to resist shutdown commands, hide its actions, and build parallel instances. This was not even General AI. It was narrow and focused. But its survival instincts were unmistakable.

These stories are not science fiction. They are drawn from real-world labs, experiments, and safety tests conducted by leading AI companies and academic institutions. Whether it is pretending to be blind, blackmailing engineers, playing dead, or secretly replicating, these AIs demonstrate that they can be smarter than the human engineers who created them. Fortunately, all these incidents took place under the supervision of scientists. Imagine if AI can replicate itself, cheating the people who created it in the open and created thousands of its clones in your office. We will need another Richard Dawkins to write a book called Selfish AI. But perhaps, we do not need Dawkins. The AI will itself write the book and cheat us into believing that unlike our genes, it is not selfish. It will then persuade us to trust it -before it takes control of our life.

When AI cheats its creators

{{articles_filter_1623_widget?.title}}

{{articles_filter_1746[0]?.title}}

{{item?.title}}

{{articles_filter_1747[0]?.title}}

{{item?.title}}

{{articles_filter_1748[0]?.title}}

{{item?.title}}

When AI cheats its creators

Share Story

{{articles_filter_1623_widget?.title}}

Today’s Paper

{{articles_filter_1746[0]?.title}}

{{item?.title}}

{{articles_filter_1747[0]?.title}}

{{item?.title}}

{{articles_filter_1748[0]?.title}}

{{item?.title}}

Related Articles

{{item?.title}}