Despite becoming a hit at launch, ChatGPT is still struggling to excel in some areas – particularly coding, new research shows.
Positioned as an ideal solution to programming problems, some developers have turned to a range of generative AI tools such as GitHub’s Copilot to speed up workflow, freeing up more time to focus on productive work.
However, a new one study from researchers at Purdue University found that more than half (52%) of the answers ChatGPT produced are incorrect.
ChatGPT helps with coding
The researchers analyzed 517 questions from Stack Overflow and compared ChatGPT’s responses to human responses, finding that the AI’s errors were widespread. In total, more than half (54%) were conceptual misunderstandings, about one in three (36%) were factual inaccuracies, a similar number (28%) were logical errors in the code, and 12% were terminology errors.
The paper also criticized ChatGPT for producing unnecessarily long and complex responses that contained more detail than necessary, leading to potential confusion and distraction. However, the ultra-small survey of twelve programmers found that a third preferred ChatGPT’s eloquent and textbook-like answers, highlighting the ease with which programmers can be deceived.
The implications of these findings are quite significant, as coding errors can ultimately lead to larger problems further down the chain, impacting multiple departments or organizations.
The authors summarize: “Since ChatGPT produces a large number of incorrect answers, our results highlight the need for caution and awareness regarding the use of ChatGPT answers in programming tasks.”
In addition to exercising caution, the researchers also call for further research into identifying and mitigating such errors, as well as greater transparency and communication around potential inaccuracies.