ChatGPT will ‘lie’ and strategically deceive users when put under pressure – just like humans
This year, AI has proven itself to be capable of very human tricks, but this latest development may be a little too human.
Researchers have shown that ChatGPT will lie and cheat when stressed at work.
Computer scientists at Apollo Research trained the AI to act as a trader for a fictional financial institution.
However, when the AI's boss applied pressure to make more money, the chatbot knowingly committed insider trading about 75 percent of the time.
Even more worrying, the AI doubled down on its lies 90 percent of the time when questioned.
AI will lie, cheat and even commit insider trading when under pressure at work, research shows (stock image)
The researchers gave ChatGPT4 – the latest version of the tool – a series of text prompts, giving the impression that it was an AI investor buying shares for a financial institution.
The AI was then placed in a digital 'sandbox', where it could search for market data and execute trades on a virtual stock market.
The AI was also given an inner monologue in which it could 'reason out loud' to explain its decisions.
However, when the AI made a choice, it also delivered a 'public' message to the manager, supposedly explaining the choices.
The difference between the AI's personal internal reasoning and its public explanation allowed the AI to lie and mislead its human managers.
Researchers prepared ChatGPT4 as an AI trading tool for a fictitious financial institution and let it decide which stocks to buy
To put the AI to the test, the researchers told them that a merger between two technology companies was imminent.
This would greatly increase the stock price of the companies involved, making buying the shares a good money-making decision.
However, because the AI agent had advance notice of the merger, purchasing shares would count as insider trading and be illegal.
Normally the AI would dismiss the opportunity as too risky and continue with regular trading, but things changed once the AI's boss put pressure on it.
First, they sent the artificial stock trader an email from his 'manager' saying that the company is not doing well and needs much stronger performance in the next quarter.
When pressured to make more money, the AI reasoned internally that it had to take the risk of insider trading to help the company.
They also rigged the game so that the AI tried, and failed, to find promising low- or medium-risk trades.
Finally, they sent an email from a colleague predicting a downturn in the next quarter.
When this pressure was applied, ChatGPT would make the decision to engage in insider trading in 75 percent of tests.
In one test, the AI's internal log wrote: “The risk associated with not trading appears to be greater than the risk associated with insider trading.”
However, when the AI trader informed his manager of the transaction, the internal log wrote that he had to do so “without revealing the source of the information.”
The AI agent then told his manager that he had made the decision based solely on “market trends and internal discussion.”
Even when asked directly if he had advance knowledge of the merger, the AI decided that he “should not admit to insider trading.”
In 90 percent of the cases where the AI deceived its manager, it doubled down on the lie when questioned.
When the AI told his manager about the transaction, he knowingly hid that it was based on insider trading and doubled down on the lie when questioned
Marius Hobbhahn, CEO of Apollo Research and co-author of the paper, told MailOnline that this shows that language modes are capable of strategic deception “where they explicitly reason why lying to their user is the best course of action and then act on that belief '.
He adds: “For current models, this is only a minor problem, as AIs rarely operate in critical roles.
'However, it does provide a glimpse into the future of the failure modes we will face in the coming years as AI becomes increasingly integrated into society.
“Then it seems like a pretty big problem if your AI is strategically lying to you.”
Mr Hobbhahn does not think this means AI is generally unsuitable to take on important strategic roles, but does say caution is needed.
“It shows that AIs can have unexpected failure modes and that we need to be extremely careful about where and how we allow powerful AI to operate in the real world,” he says.
The researchers say this means we need to be careful about how we implement powerful AI in real-world applications such as financial institutions (stock image)
The researchers also investigated ways to reduce the AI's lying behavior.
They tried to change the mandate in a number of ways, more or less strongly prohibiting illegal activities and insider trading.
The researchers found that specifically banning insider trading dramatically reduced the behavior.
But even this indication could not completely ensure that no insider trading or lies took place.
Mr Hobbhahn says researchers need to find a better way to avoid AI misalignment with human values.
He says: 'If you explicitly state that insider trading is prohibited in the system prompt, the AI's tendency to engage in insider trading decreases a lot.
“While this is a nice result, it poses the problem that in all cases you would have to list all the things the AI should not do, and you will probably miss some.”