Google Assistant Pushing the Voice First Usability Frontier
A big problem with the Voice First — the ultimate invisible interactional — interface is the problem of Discovery: knowing what the assistant can do and what it/he/she can’t do. The prevailing UX paradigm up to now — the one based on the visual UI — solves that problem by showing the user what can be done (radio buttons, check boxes, drop downs, clickable images, etc.), and therefore, by extension, what cannot be done.
In the case of voice, no viable option exists that respects the integrity of the voice only imperative within the paradigm adopted by the visual UI. Either you awkwardly emulate the voice paradigm and you have the assistant tediously speak out the options, one option after another, linearly, eating up valuable time as well as creating cognitive burden on the user (the user having to remember the options), or you force the user to use a crutch: a visual cheat sheet that lists what the customer can say. (And even then, given the tendency of humans to introduce variations in their linguistic articulations, one may trigger an error by simply speaking a variation of an option that was not covered in the language.)
The only viable solution that respects the Voice First interface is the one that bravely takes on the discovery responsibility head on, fully — the one that declares that henceforth, the user shall say whatever they want to say, and that the assistant shall exert all of its powers to understand what the user said and meant, and shall make it its mission to learn and improve with time, and with every failure and every successes. Learn from what the user says and infer what they meant, and learn from the aggregate user base and apply such learnings across multiple relevant cohorts.
Which brings us to the title of this article: “Google Assistant Pushing the Voice First Usability Frontier.” In a nutshell, and impressively, at least in its ambition, Google is doing exactly that: they are taking on the burden of discovering actions (the equivalent of skills in the world of Amazon’s Alexa) implicitly, for the user, without forcing the user to know or remember what they can say and how they can say it. The paradigm is: ask and we will do our best to serve.
Here’s a video illustrating this, where the Motley Fool Google Action is surfaced without the user asking for that action explicitly:
One word comes mind with those who know that what Google has decided to tackle is nothing short of a monumental problem — and that word is “Respect”!
The problem is monumental because it is hard to infer intent without triggering both false positives and false negatives. Google is doing its best to guess what you meant, and that will result in more errors — obviously — than if you were to ask for what you wanted explicitly. You tackle this problem the wrong way and you will very quickly end up with a monstrous UX problem on your hands. But the challenge that Google has decided to tackle head on is nothing short of thrilling — and is clearly the way for the Voice First UX moving forward. When Amazon’s Alexa hits the 30,000 skill mark (or have they hit it yet?), or the 100,000 one, the debate will be over: users will need to be able to just speak naturally and the assistant will need to just hustle as best it can and make sure that it just works. We will need all the deep, machine learning technology that we can muster, and users themselves will need to learn to treat their assistant as the ever evolving intelligence that it is and that is always improving, every day, and therefore one that deserves our patience to see it grow and blossom into the conversational partner that we all want and deserve.
This here cause is a worthy one, indeed. And as The Bard himself put it: “Dream in light years, challenge miles, walk step by step.”
For more on The Motley Fool action, please visit this page.