Technology, Applications

Speech & Sound: The Next "Killer Paradigm Shift"..?

Blog-post by,

There was a time, not so very long ago, when IT directors and chief information officers dismissed the Internet as something of a passing fad. Somehow though, things took off pretty well with the whole web thing didn't they? Mobile telephony has also grown to a level of dominance that we could never have predicted when it first started appearing around 30-years ago.

Then came the tablet... just another fad right? Well, the first few were, but then "Magic Steve" produced the tablet we all love and cherish didn't he? (OK yes - I know Android is doing well in this space too, you don't need to write in)... so what's coming next?


What is our next killer paradigm?

Many believe that "sound" will be the next killer element of "social computing" in terms of information share. After all, we share text in various forms, images and video and all the time. Shouldn't this mean that "sound" should be our next most logically interesting data-share element?

What kind of sound? Our own spoken voice, recorded speech, random commentary, music, environmental recordings -- it's a long list and you can certainly add at least one of your own if you give it a moment's thought. Yes we can link to each other's podcasts already, but we are talking about a level beyond that.

The next tier for sound is allied to its close first cousin "speech" and both could (arguably) be about to move from the playground to the boardroom and therefore potentially move into the CIO's line of sight.


The speech steeplechase

The problem is that in its early years, speech/voice recognition technology was something of a novelty. But look at the facts, fingerprint recognition biometrics only surfaced towards the end of the last millennium and now we have "secure USB flash drives" that work by a finger-swipe; so the rapid development curve for surface-level extremely user-facing technologies has been in overdrive for the last decade, if not more.

So speech recognition companies, like Nuance who produces the Dragon NaturallySpeaking off-the-shelf product, see a future in several corporate deployment scenarios for their technology which is grounded in individual user suitability. The company is something of a market leader with manufacturers from HP to Apple to IBM all working with its technology.

According to Nuance, the human voice is described as an “incredibly rich, natural and efficient means of communication” -- and the industry is now working to build solutions that enable computers, phones, tablets, automobiles, TVs and consumer electronics to understand the human voice, providing a “natural interface” between man and machine. 

So speech recognition could impact the business and at a variety of levels:

  • Speech is used in CRM analytics inside call centre deployment scenarios so that customer conversations can be analysed and filtered in order to discover what keywords customers are using.
  • Healthcare CIOs will already know that CLU (Clinical Language Understanding) technology has a huge role to play in terms of helping healthcare enterprises working to overcome challenges with "big data" and the ensuing challenges associated with the ability to collect, process, interpret and then utilise information.
  • Nuance is not alone...  Google is also said to be attempting to “pioneer” technology that will ultimately enable users to search by the spoken word. Microsoft has similar plans with Bing.
  • Mobile applications (at the consumer and enterprise level too) will have a large number of opportunities for speech recognition to be leveraged. From simple voice commands used to control smartphones, to more powerful voice-driven in-car entertainment and/or so-called “infotainment systems”, speech arguably has a strong new role to play.


So how does it work? Nuance explains...

➊ A user speaks a command into a microphone

➋ System converts sound input into digital signal

➌ The signal is analysed and chopped into component speech sounds called “phonemes

➍ Each phoneme is examined in context with those around it and statistical probability algorithms used to determine the intended word from a stored list. This happens for each word

➎ Each word is examined in context with those around it and statistical probability algorithms used to determine the intended command

➏ The appropriate response for the command is triggered


The CIOs central message

So it seems that many real-world scenarios could be using not only speech recognition technologies, but also its sister disciplines i.e. text-to-speech technology and also document imaging and electronic dictation services, which do of course throw up their own data storage challenges.

Nuance VP Peter Mahoney has suggested that really robust industrial-grade speech recognition in the space-age style as depicted in Hollywood movies (or to give it its proper name - "robust natural language" technology) is not far off at all – and that we should see six to ten languages fully supported by this technology as soon as the end of this year.

It’s not Star Trek quite yet, but we’re close!

 

 

Twitter

LinkedIn

Google+

Tumblr

Facebook


Discussion
Would you like to comment on this content? Log in or Register.