Our method for creating online courses involves making an audio recording of the presenter, transcribing it, editing the script and then recording the final, video presentation. We’ve tried using speech recognition software to create the transcribed script, and it has been a deeply frustrating experience.
While speech recognition is proving successful for searching and issuing commands (using Siri, Google Voice and Amazon Echo), we’re not sure it will replace the keyboard as the way we create written content.
We found the software struggled to distinguish between “DITA” and “data”, and even simpler phrases such as “going to”. We had to correct the errors in the output as we went along, which meant it took a long time to create our script. Too long, in fact.
YouTube’s speech recognition is even worse:
The other issue with speech recognition is there’s a difference between the way people speak and the way they write. The spoken word contains rhetorical flourishes, repetitions and rhythms; the written word is more succinct. That’s why novels are easier to read than plays, and articles are easier to read than speech transcripts.
For speech recognition to create content optimised for readability, we’d need to speak in the same way that we write.
This means that speech recognition software is not one of our “cool tools”. We can only use its output as a rough draft, and we have other tools for creating rough drafts more quickly.
What’s your experience of speech recognition software?
Please share your thoughts below.