The Future of Computer Interfaces: Voice Control

The last couple of years have seen a huge emphasis put on voice control interfaces. From Apple’s Siri to Google Now and the upcoming Google Glass project it seems that the future is definitely going to be a louder one. There’s no doubt that as these technologies mature they will become a central part of our interaction with devices, but they still have a fair way to go in terms of accuracy. Siri can be a frustrating device to use, especially if you have a heavy accent or even a cold. The anguish that the system can induce is wonderfully highlighted in the NSFW Youtube video ‘Apple Scotland – iPhone commercial for Siri’, which features a scotsman trying in vain to ask Siri for eating advice. The main challenge for the voice deciphering code is that it has to contend with many different factors while interpreting a user’s input. One company that has been working on overcoming these challenges on the desktop are Nuance, whose Dragon Dictate software is one of the most advanced in the industry. I spoke with them recently to discover just what it takes to write a voice interface that we can actually use.

‘Speech recognition is an extraordinarily hard computational problem’ explains Nuance’s Neil Grant. ‘Effectively you’ve got an astronomical search space. An example would be if you had a seventeen word phrase – which is an average length sentence – within a fifty thousand word vocabulary. It’s the equivalent of finding the correct phrase out of seven point six times ten to the seventy nine possibility. Roughly the amount of atoms in the observable universe. Now to put that into context when Google does a search to find a webpage for you, it’s searching somewhere around one times ten to the twelve web pages, so significantly less.

‘If you’re typing something on a keyboard it’s very simple, it’s binary – you either hit the keystroke or you don’t. With speech there’s far more variability in terms of accents, tonality, environmental conditions, background noise, and microphone quality. One of the ways we tighten that with the desktop speech recognition is that a user has a profile attached to them so the computer understands the nuances of the way they speak. The software can apply this data to achieve higher levels of accuracy, and the more you use it and make corrections, the more it learns and then applies those learnings to your profile.’

DNS12_GMAILFULL

This dedicated usage is a significant factor that gives Nuance software its famed levels of accuracy. It also highlights one of the challenges ahead for the mobile software that many of us currently use.

‘Something like Siri is effectively speaker independent speech recognition’ says Neil. ‘ Now that means it’s not training a profile for you, certainly not in any great depth. You might use it on your phone then another family member might use it, so it’s dealing with potentially multiple speakers from the same device. It’s a much harder process and means it can’t set itself up in advance for a particular accent.’

Advances in noise cancelling microphones and the continued refinement of voice control software is seeing rapid improvements in all areas of the technology. Nuance itself offers iPad and iPhone versions of their software now, and the continued updates to Siri and Google Voice Search will no doubt push the software even further in the years ahead. Manufacturers are also beginning to incorporate the technology into newer versions of laptops in response to the ever encroaching influence of tablets.

‘One of the key specifications set by Intel on the new ultrabooks is embedded speech recognition’ says Neil. ‘So this is something that is absolutely coming through and what we will see is speech on these devices becoming more and more ubiquitous.’

One of the eye catching elements of Siri that Apple aggressively markets is the system wide integration of commands. Rather than a stand-alone app, Siri is able to control calendar entries, send emails, tweets, update Facebook, and play specific music to you, all from the same interface. For voice control to really make an impact on the everyday computers it needs to offer a similar level of depth. 

‘We can get very very deep’ Neil continues. ‘not only dictation capabilities but real command and control of applications like MS Office. For example a chap called Stuart Mangan, a rugby player, was involved in a tackle and broke his neck leaving him paralysed from the neck down. We effectively voice enabled his entire PC, to give him not only his email and documents but, through Nokia PC Suite, he was able text messages and make phone calls. He came back to us saying that we’d given him his independence and privacy back.’

The concept of voice control has been a staple of science fiction for decades, and the representation of communicable computers such as HAL in 2001: A Space Odyssey, or even Holly from Red Dwarf, has been a constant reminder of the convenience and ease with which the interface could work – so long as the computer in question will acquiesce to opening the pod bay doors when you ask. There’s no doubt that this kind of interface is now more of a possibility, but as the way we interact with our technology changes what impact will this have on the systems of the future?

Jetsons Voice

‘A mouse and a keyboard are not a natural way of interfacing with something’ states Neil. ‘They’re a solution to a problem, and they’ve been a very successful solution, but the keyboard layout was designed to slow us down. Stephen Fry came out with a very good quote a couple of years ago where he stated it took less time to get your private pilots license than it did to learn to type at sixty words per minute. So we’ve got these interfaces we’re stuck with at the moment – the keyboard and the mouse – which are fine for certain things but for others there are certainly improvements that can be made. You’re starting to see prototypes coming through, the Google Glass project for one, looking at ultra-mobility – wearable computing – and there is a necessity to change the interface. As your devices become more and more mobile you’re not going to be able to carry a keyboard around. Obviously voice is the natural step for that.’

It Just Borks…

It’s a strange time to be an Apple user.

After years of being the underdog company, and very nearly going out of business, it now sits proudly on the top of the hill with more money than most countries. Its latest iPhone sold over 5 million units in the opening weekend alone (which is extremely impressive when you think that it’s pretty much the most expensive phone you can buy and we’re in the middle of a recession) while the rumoured 8-inch iPad could appear at any moment and sew up the tablet market completely. The Macs that Apple are currently producing are impressive, especially the Airs, and have spawned countless imitations from the rest of the industry. Even Apple TV, which Steve Jobs referred to as a ‘hobby’, is now a very viable and useful product. You’d think that we’d had never had it so good. But that’s not the full story.

A troubling trend is developing and it questions whether the big A really has any more ideas up its wealthy, hand tailored sleeves?

A rarely seen full Apple eclipse.

Beta. There I said it. Beta.

In computing terms a Beta release is a product or service that you let people use under the proviso that it isn’t actually finished. It’s a ‘work-in-progress’, something full of bugs and problems that you hope to iron out with help from Beta testers – brave volunteers who use the software and report back on any problems they encounter. This is standard practice, and a good one, as it allows real-world testing of a product before you release it into the wild. The idea is that you discover any glaring errors and fix them, thus saving your customers the aggravation and confusion when the software finally goes on sale. Microsoft do it, Google do it, even educated fleas do it, so it’s no surprise that Apple do it too. What is a surprise though is that Apple has started doing it with the flagship features included on its devices.

When Siri came out for the iPhone 4S it looked, or rather sounded, incredible.  A voice interface that seemed to have an answer for everything. Apple made it the absolute sole reason to buy the iPhone 4S. The adverts were only about Siri, showing how it could arrange your diary, remind you of anything, and offer light entertainment like some kind of electronic court jester. To make the product even more attractive the company carefully selected celebrities such as Zooey Deschanel, Martin Scorsese and Samuel L Jackson to interact with the faceless wonder and turn the service into a celebrity in its own right. Websites and Youtube videos popped up displaying the answers Siri generated to weird and wonderful questions, while it seemed only a matter of time before the whole world would be talking to our phones rather than those troublesome people that we currently have to deal with.

Then people got the chance to use this technical marvel…and found that it wasn’t really that good. Sure it could do some stuff, and was pretty cool as a party trick, but the effortless lifestyle manager that the ads portrayed never materialised, and for those of us that live outside of the US, which is nearly everybody, the services were missing vital local information so that no matter how much we tried to sound like Zooey we just couldn’t get soup delivered to our houses.

Siri, how much am I being paid for this ad?

In surveys conducted a few months later it turned out that most iPhone users were hardly using Siri at all, which really isn’t that much of a surprise. Apple’s explanation for the limited use of their flagship feature…it was in Beta. That’s right, the one single aspect of the iPhone 4S that you single out for an expensive ad campaign, and which heralds a new era in communications software is in fact not finished, not ready for prime time, not even ready to be called a full product.

While still scratching my head on this one Apple then brought forth the iPhone 5, now with Apple’s own maps – which show 3D Flyover features of major cities. Wow. Except if you live outside the US. Oh. And as the standout feature on the phone (let’s just look at those 3D views of American major cities once more…oooo, that’s useful) it was trumpeted by Apple as the ‘most beautiful, powerful mapping service ever’, that was right up until people used it and found that towns and cities didn’t exist, bridges looked more like Dali paintings, and the directions would quite happily lead you somewhere totally different to your intended destination. People complained, the media collated some of the hilarious gaffs, and Apple had to issue a formal apology and direct users to third party map replacements.

Apple hires Salvador Dali to head up its maps division.

The strange thing is that Apple told consumers that the product was still being developed and that if users would send in the errors they found then the service would improve. So what that really means is…it’s a Beta. Again. Two years running the most profitable technology company on the planet releases unfinished products and expects customers, who paid a small fortune to have the ‘best’ phone experience lots of money can buy, to road test their experimental products. That’s not innovation…it’s just plain lazy.

It used to be the case that you paid the top money for an Apple product but were sure you were getting one of the best machine and software combinations around. Now it seems like the money is nothing more than a high entrance fee for a  focus group product test. At least have the decency to buy us lunch next time…