Looking into the face of future computing

by Alex Watson on August 19, 2010

Alex Watson

As innovative as the iPhone’s touchscreen interface is, it still relies on icons and text input. So how long till computers can listen and see us?

As I’m writing this, my wife is downstairs, occasionally calling up to me to ask how it’s going. The study window is open, and as it’s a warm afternoon, my neighbours are outside, barbecuing and indulging in the kind of parenting that seems to consist solely of yelling at their children.

Contrast this to the way I’m ordering my laptop around: a silent stream of clicks on representative icons, and the odd piano chord-esque roll of keyboard shortcuts. There’s a huge divide between the way we interact with each other and the way we interact with our technology (bar of course, yelling at the TV when we’re forced to suffer watching the England football team).

The iPhone’s touchscreen interface itself was certainly innovative, but despite doing away with all bar one button, it’s still a relatively conservative device in terms of how you feed data into it. The default way to input information for most applications involves either pressing virtual buttons onscreen or typing on a virtual Qwerty keyboard.

However, just as touchscreens look like they’ll lead to the demise of the mouse and keyboard as physical objects, a new wave of technologies and products are showing signs of challenging the way we traditionally input data into computers. Firstly though, it is my duty to inform you that despite the pleas from all three members of the Newton’s devoted fanbase, handwriting recognition seems unlikely to make a comeback, at least as far as Apple is concerned.

Microsoft, on the other hand, has shown far greater faith in handwriting recognition; it’s part of Windows 7 and was to have been central to the interface of a concept machine called Courier. Physically, the Courier looked like a book, with two screens, hinged by a central spine and the interface allowed pen input. Microsoft killed the concept, and at the moment it looks like Steve Jobs has every reason to remain confident in his assertion that ‘if you see a stylus, they blew it’.

In fact, Microsoft’s biggest interest in non-traditional control methods is in the recently unveiled Kinect add-on for its Xbox 360 games console. Derided by many as a copycat response to Nintendo’s hugely successful Wii, now that the details have been revealed, Kinect looks massively ambitious. Consisting of an array of cameras, depth sensors and microphones, it turns your gestures, facial expressions and speech into the control method for games. It’s due to launch by the end of the year, and while some are still sceptical as to whether people will really want to run around their lounge like extras from a Godzilla movie, the demos on Microsoft’s site do show some circumstances in which Kinect looks appealing.

Google recently rolled out an Android app called Google Goggles. The pitch is simple: ‘Use pictures to search the web.’ The site explaining how Goggles works sounds like something from the pages of Neuromancer: point your phone at some foreign text, an artwork, a bottle of wine or a logo, and the image will be uploaded to Google and relevant translations, contextual information and reviews will be returned.

It must be pointed out that Goggles is currently less than brilliant, but the fact that it exists outside of a research lab points towards how interested Google is in developing its understanding of non-text content. It’s Android only at the moment, but Google has expressed an interest in bringing it to the iPhone.

Teaching computers to understand visual data is still a science in its infancy – even something as simple as searching for an image on Google relies as much on the file name of a Jpeg as the actual image content. Still, at least with images, you have some metadata from which the search engine can infer content. When it comes to speech input, computers still have a lot to learn.

The programmer Jeff Atwood wrote a post recently on his blog taking stock of where computer speech recognition is, and it really seems to have stalled. At the beginning of the 1990s, commercial speech recognition software would get 90% of words wrong; this fell as the decade wore on, with Dragon NaturallySpeaking being a high point. Unfortunately, progress has stalled: the best most voice recognition software can do is to get around 80% of the words it hears correct. Humans typically get 96 to 98% of the words they hear right (for a full overview of the field’s history, Robert Fortner’s blog post is excellent: tinyurl.com/macuserspeech).

Still, you’ve probably experienced the joys of dealing with a computer on the phone when dealing with a large company’s call system, and while speech recognition of a conversation is only at 80%, a computer can deal with a very limited set of inputs in this kind of situation. Google has been pushing forward with speech and audio recognition, too. Google Voice (which controversially, Apple didn’t approve for the App Store) unifies all your phones under one number and the system can transcribe voicemails. The New York Times tested it last year and performance wasn’t too bad – particularly as the aim is only to give you a gist of the message. It did, however, fail when someone read in deadpan voice, ‘the weird bit from Bohemian Rhapsody’, turning it into: ‘I see a little, so let’s go about man scott remove scott removed really you into this and then go funder bolt enlightening very very frightening me the L A the L A L D L L A L galloway of gallo label figure role maybe if you go but I’m just a poor boy and nobody loves me.’

Then again, as long as you’re not mates with the guys from Wayne’s World, you’ll be fine. Speech recognition was a key launch feature of the iPhone 3GS, and an area Apple is clearly interested in – it recently paid around $200 million (about £135 million) for an app called Siri. It’s a virtual personal assistant that you give spoken commands to. Sounds like a frippery for $200 million? Consider this: the app is actually based on research part funded by the US Military, and its roots can be traced back to Calo (Cognitive Assistant that Learns and Organizes) one of the most ambitious artificial intelligence programs in US history. According to the project’s homepage: ‘The goal… is to create cognitive software systems, that is, systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise.’

Siri was spun off from this, and Apple’s interest lies not just in speech recognition, but in the AI and processing that lies behind decoding your orders – making complex power and features easy to use is key to what Apple does, and this is surely what Apple sees Siri helping it do.

For more breaking news and reviews, subscribe to MacUser magazine. We'll give you three issues for £1
  • HeatherKay

    OSX has handwriting recognition. Plug a graphics tablet in and all the necessary controls are there in the System Preferences.

    So, the technology has been there for several versions of the desktop OS. It’s just down to Apple deciding to add it to the iOS side of things.

Previous post:

Next post:

>