Does voice transcription deserve its bad reputation?

by | Apr 2, 2018 | Advertising and Media

Voice recognition technology has well and truly ‘talked’ its way into the consumer psyche. One of the biggest selling products over the festive season last year was Amazon Echo. Sales of this device, that boasts an intelligent personal assistant called Alexa, show no signs of slowing down this Christmas. In fact, last year Google revealed that 20 percent of mobile search queries were initiated by voice.

But despite this, one of the first questions we’re asked when we describe what does is: “Automated voice transcription – how accurate is it really?”. A lot of people have had frustrating experiences with speech recognition when ringing call centres, or indeed when talking to Alexa or their Google Home device, so it’s understandable that they have reservations about the usefulness of voice transcription.

Transcription technology in the world today

So why are we bringing this technology into the business world? Many would argue that the development of technology that can transcribe spoken language into text is the next biggest opportunity for all manner of companies. Why? As the pace of communication continues to increase, aided by the proliferation of cloud-based tools and social media platforms, it has never been more important to have a clear record of who said what.

Here at Yack we can see so many applications for voice transcription, especially in a business context. From automatically transcribing virtual meetings and conference calls, to being able to refer back to what was agreed, both employees and managers could save themselves a lot of time.

However, delivering on that comes with challenges. Transcription tools have had a hard time in the past and there has been a long-held consensus that flawless voice recognition is an impossible challenge for computers alone to master.

Why is speech recognition so hard to master?

Firstly, such programs were designed to register any noise coming through a microphone, not just the voices of those involved. The inevitable echoes, the scuffle of an object being moved about or someone sneezing in the next room all impact the accuracy of the final transcription. If you then add in the mix different accents, slang words and unknown acronyms, it’s easy to understand why automated voice transcription is such a tricky thing to pull off without human intervention.

Secondly, for a conference call involving a number of people, the amount of unstructured data created is huge. Typically, you still need a human brain to decipher that data, including attributing what was said to the right person, and then make sense of it all.

Finally, collecting data of any kind comes with privacy concerns and voice recognition brings an added layer to that problem. This kind of technology, as mentioned, could potentially pick up sensitive conversations that might be taking place in the background. Clear parameters will need to be part of the process if voice recognition software is to build and maintain a long-term reputation as a valuable business tool.

So does voice transcription really deserve its bad reputation?

In a way, yes. It’s a work in progress though – and that progress has come on in leaps and bounds in the last two or three years. The development continues to be pushed along rapidly thanks to the investment from some of the biggest players in technology who all believe that voice recognition has a bright future.

From our own more humble perspective, we have worked tirelessly on enhancing our transcription engine and will continue to push its limits. We’re excited that our technology exists in such a fast-changing environment, and can’t wait to see what developments are made in transcription technology in the coming months and years!

This blog post has been taken from an article that Alan, CEO of, wrote for Compare the Cloud recently. Read the full article here.


Submit a Comment

Your email address will not be published. Required fields are marked *

Share This