Clinicians and researchers at the University of Maryland School of Medicine, the UMD Institute for Health Computing and the VA Maryland Healthcare System are concerned that large language models that summarize clinical data could meet the U.S. Food and Drug Administration’s device exemption criteria ( FDA) and could cause harm to the patient. .
WHY IT MATTERS
Artificial intelligence that summarizes clinical notes, medications and other patient data without FDA oversight will soon reach patients, doctors and researchers say in a new point of view published Monday on the JAMA Network.
Although the FDA has interpreted clinical decision support software involved in “time-critical” decision making as a regulated device function — which the authors said could potentially include LLM generation of a clinical summary — the authors analyzed the FDA’s final guidance on clinical decision support software. .
Published about two months before ChatGPT’s release, the researchers said the guidelines “provide an unintended ‘roadmap’ for how LLMs can circumvent FDA regulations.”
Generative AI will change everyday clinical tasks. It has received a lot of attention for its promise to reduce physician and nurse burnout and improve healthcare operational efficiency, but LLMs that summarize clinical notes, medications, and other forms of patient data ‘could exert important and unpredictable effects on physicians’ decision-making. make,” the researchers said.
They ran tests using ChatGPT and anonymized patient record data, and examined the summary results, concluding that the results raise questions beyond “accuracy.”
“In the clinical context, sycophantic summaries could accentuate or otherwise emphasize facts that align with physicians’ pre-existing suspicions, risking introducing a confirmation bias that could increase diagnostic errors,” they said.
“For example, when asked to summarize previous admissions for a hypothetical patient, the summaries varied in clinically meaningful ways depending on whether there was concern about myocardial infarction or pneumonia.”
Lead author Katherine Goodman, a legal expert in the UMD School of Medicine Department of Epidemiology and Public Health, studies clinical algorithms and regulations to understand adverse patient effects.
She and her research team said they found the summaries generated by LLM to be highly variable. While they may be designed to prevent outright hallucinations, they can also contain minor errors that have an important clinical impact.
In one example from their study, a chest radiographic report noted “indications of chills and nonproductive cough,” but the LLM summary added “fever.”
“The inclusion of ‘fever,’ although a one-word error, completes a disease scenario that could lead a doctor to a diagnosis of pneumonia and initiation of antibiotics when they might not otherwise have reached that conclusion,” they said she.
However, it is a dystopian danger that typically arises “when LLMs tailor responses to the user’s perceived expectations” and become virtual AI yes-men for physicians.
“Just like the behavior of an enthusiastic personal assistant.”
THE BIG TREND
Others have said that the FDA’s regulatory framework around AI as medical devices could limit innovation.
During a discussion on the practical application of AI in the medical device industry in London in December, Tim Murdoch, head of business development for digital products at the Cambridge Design Partnership, was critical that FDA regulations would rule out genAI innovation.
“The FDA allows AI as a medical device,” he said story By the Medical device network.
“They are still focused on locking down the algorithm. It is not a continuous learning exercise.”
A year ago, the CDS Coalition asked the FDA to withdraw its guidance on supporting clinical decision making and better balance regulatory oversight with the healthcare industry’s need for innovation.
The coalition suggested that the FDA’s final guidance compromised its ability to enforce the law, in a situation it said would lead to harm to public health.
ON THE RECORD
“Large language models that summarize clinical data promise powerful capabilities to streamline the collection of information from the EHR,” the researchers acknowledged in their report. “But by trading in language, they also introduce unique risks that are not clearly covered by existing FDA regulatory safeguards.”
“As summary tools move closer to clinical practice, the transparent development of standards for LLM-generated clinical summaries, combined with pragmatic clinical trials, will be critical to the safe and sensible rollout of these technologies.”
Andrea Fox is editor-in-chief of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.