It looks like GPT-4 Turbo – the latest incarnation of OpenAI's large language model (LLM) – is heading into winter, just as many people are doing as December approaches.
We all get those Christmas feelings at the end of the year (probably) and that indeed seems to be why GPT-4 Turbo – to which Microsoft's Copilot AI will soon be upgraded – is acting this way.
If Wccftech highlights, the interesting observation about the AI's behavior was made by an LLM enthusiast, Rob Lynch, on X (formerly Twitter).
@ChatGPTapp @OpenAI @tszzl @emollick @voooooogel Wild results. gpt-4-turbo via the API produces (statistically significant) shorter completions when it “thinks” it is December, compared to when it thinks it is May (as determined by the date in the system prompt). I took the exact same prompt… photo. twitter.com/mA7sqZUA0rDecember 11, 2023
The claim is that GPT-4 Turbo produces shorter responses – to a statistically significant degree – when the AI believes it is December, as opposed to May (where the tests were performed by changing the date in the system prompt).
So the preliminary conclusion is that it appears that GPT-4 Turbo learns this behavior from us, an idea put forward by Ethan Mollick (an associate professor at the Wharton School of the University of Pennsylvania who specializes in AI).
OMG, could the AI Winter Break hypothesis be true? There was some idle speculation that GPT-4 might perform worse in December because it “learned” to do less work during the holidays. Here is a statistically significant test showing that this may be TRUE. LLMs are weird.🎅 https://t.co/mtCY3lmLFFDecember 11, 2023
Apparently GPT-4 Turbo is about 5% less productive when the AI thinks it's the holiday season.
Analysis: Winter break hypothesis
This is known as the 'AI winter break hypothesis' and it is an area worth exploring further.
What it shows is how unintended influences can be picked up by an AI that we wouldn't think to consider – although some researchers have clearly noticed and considered it, and then tested it. But still, you know what we mean – and there is a lot of concern about these kinds of unexpected developments.
As AI advances, its influences and the direction in which the technology is developing must be closely monitored. Hence all the talk about safeguards for AI is crucial.
We are rushing to develop AI – or rather, companies like OpenAI (GPT), Microsoft (Copilot) and Google (Bard) certainly are – engaged in a technical arms race, with a primary focus on driving progress. difficult to make possible, with safety measures being more of an afterthought. And therein lies a clear danger that is nicely summarized in one word: Skynet.
Regardless, with regard to this particular experiment, it's just one piece of evidence that the winter shutdown theory is true for GPT-4 Turbo, and Lynch has urged others to get in touch if they can reproduce the results – and they have we did too a report of successful reproduction so far. Still, that's not enough for a concrete conclusion – watch this space, we recommend.
As mentioned above, Microsoft is currently upgrading its Copilot AI from GPT-4 to GPT-4 Turbo, which has progressed in terms of accuracy and providing higher quality answers overall. Google, meanwhile, is far from standing still with its rival Bard AI, which is powered by its new LLM, Gemini.