‘There’s nothing wrong with technology. It’s when technology is the story and not the artist, that’s the problem.’
– Billy Corgan
The home of freelance writer Martyn Casserly
The last couple of years have seen a huge emphasis put on voice control interfaces. From Apple’s Siri to Google Now and the upcoming Google Glass project it seems that the future is definitely going to be a louder one. There’s no doubt that as these technologies mature they will become a central part of our interaction with devices, but they still have a fair way to go in terms of accuracy. Siri can be a frustrating device to use, especially if you have a heavy accent or even a cold. The anguish that the system can induce is wonderfully highlighted in the NSFW Youtube video ‘Apple Scotland – iPhone commercial for Siri’, which features a scotsman trying in vain to ask Siri for eating advice. The main challenge for the voice deciphering code is that it has to contend with many different factors while interpreting a user’s input. One company that has been working on overcoming these challenges on the desktop are Nuance, whose Dragon Dictate software is one of the most advanced in the industry. I spoke with them recently to discover just what it takes to write a voice interface that we can actually use.
‘Speech recognition is an extraordinarily hard computational problem’ explains Nuance’s Neil Grant. ‘Effectively you’ve got an astronomical search space. An example would be if you had a seventeen word phrase – which is an average length sentence – within a fifty thousand word vocabulary. It’s the equivalent of finding the correct phrase out of seven point six times ten to the seventy nine possibility. Roughly the amount of atoms in the observable universe. Now to put that into context when Google does a search to find a webpage for you, it’s searching somewhere around one times ten to the twelve web pages, so significantly less.
‘If you’re typing something on a keyboard it’s very simple, it’s binary – you either hit the keystroke or you don’t. With speech there’s far more variability in terms of accents, tonality, environmental conditions, background noise, and microphone quality. One of the ways we tighten that with the desktop speech recognition is that a user has a profile attached to them so the computer understands the nuances of the way they speak. The software can apply this data to achieve higher levels of accuracy, and the more you use it and make corrections, the more it learns and then applies those learnings to your profile.’
This dedicated usage is a significant factor that gives Nuance software its famed levels of accuracy. It also highlights one of the challenges ahead for the mobile software that many of us currently use.
‘Something like Siri is effectively speaker independent speech recognition’ says Neil. ‘ Now that means it’s not training a profile for you, certainly not in any great depth. You might use it on your phone then another family member might use it, so it’s dealing with potentially multiple speakers from the same device. It’s a much harder process and means it can’t set itself up in advance for a particular accent.’
Advances in noise cancelling microphones and the continued refinement of voice control software is seeing rapid improvements in all areas of the technology. Nuance itself offers iPad and iPhone versions of their software now, and the continued updates to Siri and Google Voice Search will no doubt push the software even further in the years ahead. Manufacturers are also beginning to incorporate the technology into newer versions of laptops in response to the ever encroaching influence of tablets.
‘One of the key specifications set by Intel on the new ultrabooks is embedded speech recognition’ says Neil. ‘So this is something that is absolutely coming through and what we will see is speech on these devices becoming more and more ubiquitous.’
One of the eye catching elements of Siri that Apple aggressively markets is the system wide integration of commands. Rather than a stand-alone app, Siri is able to control calendar entries, send emails, tweets, update Facebook, and play specific music to you, all from the same interface. For voice control to really make an impact on the everyday computers it needs to offer a similar level of depth.
‘We can get very very deep’ Neil continues. ‘not only dictation capabilities but real command and control of applications like MS Office. For example a chap called Stuart Mangan, a rugby player, was involved in a tackle and broke his neck leaving him paralysed from the neck down. We effectively voice enabled his entire PC, to give him not only his email and documents but, through Nokia PC Suite, he was able text messages and make phone calls. He came back to us saying that we’d given him his independence and privacy back.’
The concept of voice control has been a staple of science fiction for decades, and the representation of communicable computers such as HAL in 2001: A Space Odyssey, or even Holly from Red Dwarf, has been a constant reminder of the convenience and ease with which the interface could work – so long as the computer in question will acquiesce to opening the pod bay doors when you ask. There’s no doubt that this kind of interface is now more of a possibility, but as the way we interact with our technology changes what impact will this have on the systems of the future?
‘A mouse and a keyboard are not a natural way of interfacing with something’ states Neil. ‘They’re a solution to a problem, and they’ve been a very successful solution, but the keyboard layout was designed to slow us down. Stephen Fry came out with a very good quote a couple of years ago where he stated it took less time to get your private pilots license than it did to learn to type at sixty words per minute. So we’ve got these interfaces we’re stuck with at the moment – the keyboard and the mouse – which are fine for certain things but for others there are certainly improvements that can be made. You’re starting to see prototypes coming through, the Google Glass project for one, looking at ultra-mobility – wearable computing – and there is a necessity to change the interface. As your devices become more and more mobile you’re not going to be able to carry a keyboard around. Obviously voice is the natural step for that.’
I’ve been meaning to do this for a long time.
Technology is a bit of a thing of mine. As I look around my home there’s barely a few feet between gadgets, appliances, or some other techno-marvel designed to improve my life and serve its various needs. My television is a gateway to the internet, my mobile phone acts now as a personal assistant, email recipient, and podcast player, while my iPad is a pile of books, magazines, and games packed into something small enough to lose under a newspaper. Technology has become a prevailing part of modern life, as you know by reading this blog that was written in a coffee shop using free wifi and a free software platform from WordPress that then reached you through, I’m guessing, a similar route.
But is this technotopia all that it’s cracked up to be? You see for all the marvellous advantages I now experience thanks to the likes of Apple, HTC, Samsung, Google, Amazon, and a host of others, living with the future can have its frustrations and disappointments.
So, here’s the reason for the blog then. I’m no computer expert. I can’t code, I get lost around acronyms like TCP/IP, AMOLED, and SCSI, plus I’m not exactly rolling in money by western standards and have to pick my devices with great care – Just like most people who actually use technology on a daily basis. In light of this I figured that we needed a voice. A place to talk about our victories and injuries, somewhere it’s safe to ask stupid questions and find that most other people are wondering the same thing.
Hence ‘Living in the Future’. It shall be a canary in the digital cage, trying new things and pondering the use of stuff that’s been around for a while so that we can carry on our adventures together in safety and, hopefully, a little less confusion.
If you fancy the journey then please come along, lend your voice, and increase the wisdom of us all. Send reviews of things you actually own and have to rely on. Tell us your worries or hopes for the digital age, and share links to helpful articles or tips that you find along the way.
I shall be writing about the things I see and tech that I encounter. Let’s have some fun in the digital playground, and maybe together we can make the future a little less frightening…