01.02.12
Making sense of what people say
Source: National Health Executive Jan/Feb 2012
What are the challenges in ensuring speech recognition systems stay up-to-date with the latest medical terminology and with the wide range of accents found in the NHS? NHE got a supplier’s perspective from Nuance’s Carina Edwards and Joseph Petro, who also discuss alignment with the QIPP agenda and the changing role of medical secretaries.
Voice recognition software might have struggled to know what to do with the words ‘Lady Gaga’ ahead of around 2008 – but the minute people started talking about her online, the systems’ models were updated to ensure they remained at the cutting edge of language use.
The exact same process is applied for medical terminology, which like pop music is full of its own trends, jargon, homonyms and frequent newcomers, especially in terms of drugs and medical devices.
Nuance Communications’ Joseph Petro, senior vice president for healthcare research and development, and vice president for healthcare solutions marketing Carina Edwards, are both based in the US but have become much more knowledgeable about the NHS as the company has expanded in the UK via its more than 30 partners, as its software is built into many other systems.
Edwards spoke to NHE from Boston, saying: “I’m very familiar with the QIPP programme and the shift from digital dictation to speech recognition really supports the goals of QIPP. First and foremost, you get improved quality; as the physician is able to document with their voice and not to have to fumble around with their new electronic health record systems, which although they do capture all that wonderful information, at times can be cumbersome to navigate and get information into.
“We’ve been very focused on allowing the clinician to capture the patient’s story and all their information quickly, in their workflow, on any device, so they can share that information for a faster turnaround time across the organisation.”
The patient narrative
Some Trusts that have shifted to electronic health records are using systems that allow them to put the patient narrative right into the record, and navigate the record through voice alone. Among those not yet using such systems, many have at least switched to digital dictation rather than tapes.
Edwards said: “In the past, there’s truly been departmental and one-on-one medical secretaries; but now, the technology allows the secretaries to free up their time a little bit and almost ‘pool resources’ to have maybe one or two people truly focused on multiple departments and multiple workflows, typing for different doctors, and the speech recognition element of it really turns them into editors. We’re very familiar with that here in the US, but it’s not really been adopted yet in the UK.
“But today, with digital dictation, they get an audio recording, they put on their headset and they type. In eScription, for example, the speech recognition not only does voice-to-text, but also pre-formats that information into whatever type of template or format they would like. If a clinic has a certain letter format or if they want the patient report in a certain way, it does the hard work for them and knows and understands what is said. Medical secretaries really become editors, and you can imagine the amount of time that takes out.”
‘This is my job’
She acknowledged, however, that medical secretaries tend to be most suspicious of the new technology at first – needlessly so.
She explained: “With physicians, the first time they take that microphone in their hand, they kind of go ‘hmm, I don’t know about this’. But once they see they’re able to do that dictation very quickly, they see how accurate the speech recognition capture is, and that they’re able to use macros and templates, they jumped at the opportunity.
“Once they see how accurate the speech recognition is, they realise it’s not that hard, then it’s the opposite problem –it becomes hard to prise that microphone out of their hands.
“On the radiology side, they’re just wired to be efficient. They’re trying to get as many scans and as much information back out to the constituents as quickly as possible. In the past, transcription was always a challenge, because it had an increased turnaround time. You could wait a day or even up to two days to get that report back, or that letter out to the patient, or the ordering physician.
“Where there is hesitation is among medical secretaries. They’re concerned, thinking: ‘I’ve done this job, this is my job, I don’t want our jobs to be pooled’.
“But what we’ve found out with the pilot sites is that it’s actually very liberating for them. We had some examples where there were departments saying ‘it’s 2pm and I’m done with a pile of work that would have been on my desk, and now I can actually build a better relationship with my clinical staff because I can get them more organised, and really focus on the patient’s schedule’.
“It’s important to overcome that fear, and change the mindset. It probably takes them a week for them to get over the feeling that ‘wow, I’m not typing any more’. It’s a different workflow: they’re looking, they’re reading, they’re editing. But then once they get that, all of a sudden they’re off and running.”
Striving for 100%
Petro talked NHE through the recent development in speech recognition technology.
He said: “If you look historically at where we’ve come from and where we’re going, then 10 years ago speech recognition accuracy was probably bumping up against something like 75% accuracy, and from a medical perspective that’s obviously far too error-prone to be useful.
“Over the last 10 or 12 years, though, it’s gone from 75% into the high 90’s. If you look at radiologists for example, it’s not uncommon for them to have 98% or 99% accuracy all day long.
“In our research, we try to move that ‘edge’. For very clear speakers, they’re already topping out and we’re trying to get them from 98% to 98.5% accuracy, say, through algorithmic changes and noise filtering.”
Driving the technology
Improvements in technologies like noise filtering have also come to Nuance’s healthcare systems from the company’s other divisions, such as those dealing with noise cancellation in cars.
He said: “We learn a lot from what happens in a car, because a car is very noisy, and you’re always speaking over tyre noise, wind noise, and that stuff finds its way into the speech recognition models.
“All the boats rise in the harbour, so to speak; as we add these things, everything gets better.”
He acknowledged that for heavily accented speakers and those who speak particularly unclearly, accuracy can be more like 91-92%.
He said: “For them, we put special programmes in place – gathering audio data, making adjustments to the models for low-volume speakers, for example. Then there’s research and investment going on in understanding it all – taking the speech, applying natural language processing to it, extracting clinicallyorientated facts from it, and applying those facts downstream in analytics applications, clinical decision support applications, and that type of thing.
“We actually do very well with Indian-orientated accenting and Asian accenting. We develop specific acoustic models that are associated with a specific accent.
“Just small differences, such as between North American accents, can be accommodated for in those acoustic models. If you look at the acoustic profile of each word – the WAV file, for example – there are material differences between one accent and another.”
Neologisms
Keeping the speech recognition software up-to-date with new medical terminology as people are using it is a “very automated process”, Petro explained.
He said: “We’re always spidering out over the web to pick up new search terms that folks are using, which become part of the general models. There’s an ongoing natural adaptation that’s happening as the planet of human speakers using technology evolves on a day-to-day basis.
“We’ve also got commercial partnerships in terms of drug databases, a drug formulary of sorts, we keep track of the new drugs, we roll the names of those out into the models, and there’s an automatic updating process where the IT administrator on a specific site can roll those changes in.
“We try not to let a gap develop between the way people are speaking today, or new semantics getting introduced into the market, and the cutting edge of what our model supports.”
Tell us what you think – have your say below, or email us directly at [email protected]