AI used to transcribe medical visits is having wild and sometimes spooky “hallucinations”
The OpenAI tool sometimes adds made-up, unsettling details.
“ChatGPT can make mistakes. Check important info,” reads a warning at the bottom of OpenAI’s ChatGPT 4o chat interface.
That message is there because many of today’s AI tools are prone to “hallucinations,” where incorrect or “imagined” facts are included in a response to the user.
That may not matter much if it invents an extra ingredient for the chocolate-chip cookie recipe you asked for (even if it recommends adding glue to your pizza), but if you’re using an AI tool to help with more critical tasks — such as transcribing medical interviews — the results could be disastrous.
A new investigation by the AP found that OpenAI’s Whisper text-transcription tool is being widely used for medical transcription, despite the fact that it’s been found to hallucinate “racial commentary, violent rhetoric, and even imagined medical treatments,” according to the report. And considering how many medical professionals are using the tool, researchers found a troubling rate of hallucinations.
The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined.
That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said.
OpenAI warns against using Whisper in “high-risk domains,” but that has not stopped its use in healthcare software like Nabla, which has been used to transcribe an estimated 7 million medical visits, according to the report, which included the following example:
A speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”
But the transcription software added: “He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”
Adding to the problem is that Nabla deletes the original recordings of the interviews for “data safety reasons,” leaving no mechanism for verifying the transcriptions.