Achieving human parity in conversational speech recognition has further to go

Microsoft is in the news for claiming its artificial intelligence (AI) technology can now recognize conversational speech slightly better than humans who do so professionally. Its technology now has an error rate of 5.9%, which it claims is the same as a trained transcriptionist. That equates to roughly one error every 20 words. So does this mean the typed word is going away?

We have our doubts:

  1. To be qualified as a stenographer, you must be able to transcribe with an error rate of 0.1% or lower. That’s a single mistake every four pages. Microsoft’s error rate of 5.9% isn’t acceptable for live transcription.
  2. Microsoft was working in a controlled environment, with relatively little background noise.
  3. Speaking for long periods of time is more tiring than writing or typing.

So it seems like there’s a lot of work still to be done.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.