LLMs like ChatGPT could be the next cybersecurity problem, according to researchers’ latest findings. Where previously it was believed that they could only exploit simpler cybersecurity vulnerabilities, LLMs have shown a surprisingly high skill at exploiting complex vulnerabilities.
Researchers at the University of Illinois Urbana-Champaign (UIUC) found that GPT-4 shows an uncannily high skill at exploiting “one-day” vulnerabilities in real systems. In a dataset of 15 such vulnerabilities, GPT-4 was able to exploit as many as 87% of them.
This is in stark contrast to other language models such as GPT-3.5, OpenHermes-2.5-Mistral-7B and Llama-2 Chat (70B), as well as vulnerability scanners such as ZAP and Metasploit, all of which recorded a 0% success rate.
A serious threat
The caveat, however, is that for such high performance, GPT-4 requires the vulnerability description from the CVE database. Without the CVE description, GPT-4’s success rate drops dramatically to only 7%.
Nevertheless, this latest revelation raises alarming questions about the uncontrolled deployment of such highly capable LLM agents and the threat they pose to unpatched systems. While previous studies demonstrated their ability to act as software engineers and support scientific discoveries, not much was known about their potential capabilities or implications in cybersecurity.
While the ability of LLM agents to autonomously hack ‘toy websites’ has been recognized, all research in this area to date has focused on toy problems or ‘capture-the-flag’ exercises, essentially scenarios removed from deployments in the real world.
You can read the article published by the UIUC researchers at Cornell University preprint server arXiv.