In Douglas Adams’s The Hitchhiker’s Guide to the Galaxy, the computer onboard starship “Heart of Gold” could be described as intelligent (if “intelligent” is a synonym for “snarky”).
The passengers aboard the ship interacted with the computer by voice commands. The computer understood without getting lost in the ambiguities inherent in human speech. There was no need for video monitors, keyboards, or pointing devices.
This is the long-term, holy-grail promise of artificial intelligence (AI): machines that can interact with humans by natural-language processing, understanding spoken commands and questions and responding in a natural-sounding voice (complete with the intonations and nuances that convey a significant amount of meaning in speech beyond the words themselves).
We aren’t quite there yet. A general-purpose AI-based machine carrying out arbitrary instructions while relying only on speech as an interface is a distant dream. Current AI applications can be quite good at what they do, but each system’s competency is extremely narrow. A system that can distinguish a skin mole from skin cancer by processing a photograph is completely useless at, say, matching fingerprint patterns.
Until we reach that “holy grail” moment, we will continue to need conventional user interfaces (UIs) for AI-based apps. They don’t have to be keyboard-and-mouse GUIs; AI opens up many potential interaction channels. There’s plenty of room for improvement and tons of research in this area.
Let’s examine some novel ways designers are implementing UIs for AI systems:
One area that has seen tremendous progress in recent years is AI-driven handwriting recognition. You’re already seeing this technology in ATMs (automatic teller machines) that process check deposits and read the handwritten amount on the check. The applications in which digitized images of written data are processed to extract meaning are endless. Examples include:
- “Reading” historical documents, such as older census data sheets
- Processing comment cards
- Translating foreign-language signage, restaurant menus, and other written artifacts
These applications make use of “offline recognition;” the handwriting was performed at one time and processed later by the AI system. But there are also “online recognition” systems. These interpret handwritten text as it is being written. This is useful for taking notes with a stylus (or even a finger) on a tablet or phone.
This might sound like a narrowly useful system for most Western languages, where text is often entered more rapidly on a keyboard than by writing in longhand. For other languages such as Chinese and Japanese, keyboard entry is impractical and cumbersome. A system that translates handwritten characters into Unicode in real time for emails or word-processing documents makes computers much more useful in those parts of the world.
A fascinating area of relatively new UI research is gesture recognition. Gesture recognition is useful for augmented- and virtual-reality (AR and VR) applications, in which a user manipulates virtual objects by “handling” them with their real hands.
One approach to gesture recognition is the use of devices such as “wired gloves.” The gloves are festooned with position sensors that constantly provide location data to the computer.
More interesting from an AI standpoint is camera-based gesture recognition. One or more external cameras focus on the user and follow the user’s motions and gestures. This is not only useful for AR and VR applications but also:
- Automatic sign-language recognition and translation
- Automobile driver monitoring (recognizing the telltale signs of drowsy or distracted driving from facial expressions)
- Body language interpretation
- Self-driving cars – recognizing and predicting what pedestrians might do and taking actions to avoid accidents
Or imagine an orchestra conductor, directing an ensemble of robotic MIDI musicians by traditional hand and baton gestures. Wouldn’t that be cool?
Another up-and-coming technology is a direct brain-computer interface (BCI), in which an implanted or external network of sensors detects the user’s neural signals and sends the data to a computer. The computer interprets the signals and takes action. In early research, experimental subjects have been able to type words or move a mouse pointer simply by thinking.
At present, this technology has immense challenges. Implanted sensors can cause infections and create scar tissue that harms the patient and degrades the signals.
Brain-wave signals are difficult to detect (especially when using externally-mounted sensors) and are notoriously noisy. It’s only possible to monitor or stimulate a tiny fraction of the 100 billion neurons in the human brain. And no two humans generate the same pattern for the same thought, so systems must be trained for each user through a long, painstaking process. Even under the best of circumstances, users have managed to “type” only a few letters over several minutes.
However, research proceeds apace, and the potential applications are tantalizing, including:
- Artificial eyes, which can restore vision to patients with acquired blindness
- Neuroprostheses, which can help paralyzed users control robotic exoskeletons with their thoughts
- Speech-synthesis systems that help patients with damage to the speech centers of their brains to “talk”
Some research in BCI is attempting to use the technology to communicate with comatose patients in persistent vegetative states. If realized, the ability to reach these patients would have extraordinary impacts on them and their families.
And Lest We Forget: Speech
Systems that can understand the full range of human speech and respond in kind are nowhere near reality. Yet. But considerable research in natural-language processing is ongoing and making great progress.
Some systems are already in use: Certain customer-service call centers have robot agents answering incoming calls and determining what the caller needs. Although they won’t pass the Turing Test anytime soon, they can be helpful for routing a call to an appropriate human.
Extracting words from digitized sound signals is commonplace now. Systems such as Apple Siri, Microsoft Cortana, and Amazon Echo are good at recognizing words and parsing the words’ face-value meaning. But there’s much more to speech than words:
“Hal, open the pod bay doors” has a different meaning than “HAL! OPEN THE POD BAY DOORS!”
Speech recognition systems can distinguish questions from commands, but so far aren’t so good at detecting and responding to the speaker’s mood. This is an important next-phase in speech recognition research, and you can bet AI will be the driving force.
What Lies Ahead?
What will the future of AI-driven human-computer interaction look like? The field is wide open, and we’re only beginning to explore the possibilities and overcome the limitations. Think of all the nonverbal ways humans communicate with each other, from touching and facial expressions to body language and more.
Now imagine what it would be like for a computerized device to correctly and reliably detect and interpret all of those signals to understand what you really want and take appropriate action.
None of this will be possible without AI-based technologies.
The wide variety of ways humans communicate demands a system that can learn and evolve on the fly. Much research is still needed in ways to train AI systems more quickly and efficiently, with less data. The immense amount of data needed to train current AI systems to perform simple tasks is a major bottleneck. Training AI systems to make decisions reliably, without reflect human biases, is another thorny problem still to be resolved.
But once these hurdles are overcome, there’s almost no limit to what we’ll be able to achieve. AI needs UI, but just as important, UI needs AI.