The Associated Press reported recently interviewed more than a dozen software engineers, developers and academic researchers who disagree with artificial intelligence developer OpenAI’s claim that one of its machine learning tools, used in clinical documentation at many U.S. health care systems used, human like accuracy.
WHY IT’S IMPORTANT
Researchers from the University of Michigan and others found that AI hallucinations resulted in erroneous transcriptions — sometimes involving racist and violent rhetoric alongside imagined medical treatments, according to the AP.
Of concern is the widespread adoption of tools that use Whisper, available open source or as APIwhich could lead to incorrect patient diagnoses or poor medical decision making.
Hint Health is a clinical technology vendor that added the Whisper API last year, allowing physicians to record patient consultations into the vendor’s app and transcribe them using OpenAI’s major language models.
Meanwhile, more than 30,000 physicians and 40 healthcare systems, such as Children’s Hospital Los Angeles, use ambient AI from Nable that includes a Whisper-based tool. Nabla said Whisper has been used to transcribe approximately seven million medical visits, according to the report.
A spokesperson for that company mentioned one blogging posted Monday and details the specific steps the company is taking to ensure models are properly handled and monitored during use.
“Nabla detects incorrectly generated content based on manual edits to the note and plain language feedback,” the company said in the blog. “This provides an accurate measure of real-world performance and gives us additional input to improve models over time.”
Note that Whisper is also integrated into some versions of OpenAI’s flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft’s cloud computing platforms, the AP said.
Meanwhile, OpenAI warns users that the tool should not be used in “high-risk domains” and in its online disclosures recommends against using Whisper in “decision-making contexts, where deficiencies in accuracy can lead to pronounced shortcomings in outcomes.” ”
“Will the next model improve the issue of the big v3 generating a significant amount of hallucinations?” one user wondered OpenAI’s GitHub Whisper discussion board on Tuesday. A question that was still unanswered at the time of going to press.
“This seems solvable if the company is willing to prioritize it,” William Saunders, a San Francisco-based research engineer who left OpenAI earlier this year, told the AP. “It’s problematic when you put this out there and people have too much confidence in what it can do and integrate it into all these other systems.”
Note that OpenAI recently launched a vacancy for a health AI researcher, whose main responsibilities would be “designing and applying practical and scalable methods to improve the safety and reliability of our models” and “evaluating methods using health-related data, to ensure that models provide accurate, reliable and trustworthy information.”
THE BIG TREND
In September, Texas Attorney General Ken Paxton announced a settlement with Dallas-based artificial intelligence developer Pieces Technologies over allegations that the company’s generative AI tools compromised patient safety with too much accuracy to promise. That company uses genAI to summarize real-time electronic health record data about patients’ conditions and treatments.
And in one study looking at the accuracy of the LLM in producing medical notes by the University of Massachusetts Amherst and Mendel, an AI company focused on the detection of AI hallucinations, there were many errors.
Researchers compared Open AI’s GPT-4o and Meta’s Llama-3 and found of 50 medical notes, GPT had 21 summaries with incorrect information and 50 with general information, while Llama had 19 errors and 47 generalizations.
ON THE RECORD
“We take this issue seriously and are continuously working to improve the accuracy of our models, including reducing hallucinations,” an OpenAI spokesperson told us. Healthcare IT news by email Tuesday.
“For Whisper use on our API platform, our usage policy prohibits use in certain high-stakes decision-making contexts, and our open source usage model map includes recommendations against use in high-risk domains. We thank researchers for sharing their findings. “
Andrea Fox is editor-in-chief of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.