“Good Morning, Alexa” — The Future of Voice User Interface

Not long ago, controlling a machine by simply talking to it was only science fiction.

Now with new tech advancements like AI and Machine Learning, the possibilities with voice command are endless and multiplying everywhere in our daily lives — from our phones, and wearables to our televisions and other smart home accessories.

Smart speakers are among the most popular voice command devices, having become a hit in the United States with sales reaching 146.9 million dollars in the year of 2019. Today, the top two players in the market, Amazon Echo and Google Home have developed into excellent voice assistants that are used across almost every smart device.

It’s extremely common to see voice command integrated into today’s smart devices, but how does it work?

Voice User Interface

A voice user interface uses speech recognition to understand spoken commands and answer questions. They are the primary interfaces used to interact with virtual assistants on smartphones and smart speakers.

Of course, this technology is not all new. The old school automated attendants can respond to the pressing of keypad buttons via dual-tone multi-frequency signaling (DTMF) tones. And the good news is, with new generation Voice Command Devices, they can respond to multiple voices regardless of accent or dialect influences.

Today, voice interfaces have made their way into a wide range of devices. From smartphones and computers to cars and smart speakers, most newer devices around us have some form of voice UI enabled. But what are users’ behavior towards the adoption of voice assistants in their daily lives?

Before we make an assumption, let’s identify two important variables — its retention rate and its adoption rate.

Voice devices like the Amazon Echo have voice applications called “Alexa Skills” that users can enable on their devices. In a report made by Voicelabs in 2017, user retention showed a low 3% two weeks after a voice application acquires a user, but the adoption rate of voice assistants in smart speakers rocketed to 93.3%, and this number is only looking to increase. From a business point of view, not integrating Alexa Skills today could be compared to not having a website for your business — you only lose out to others.

High adoption rates, however, are useless if it fails to retain users. This will all come down to app design and user experience. As new technologies are introduced, their adoption rates are highly dependent on the development of efficient, human-centric UI design.

It’s safe to say that a good voice user interface and experience often come hand in hand with visual screens. As for now, there are 3 methods of voice interaction — screen-first, voice-only, and voice-first. Like any other interaction, each of these has their own benefits and limitations.

Screen-First Interaction

Screen-first interactions are the most common and can be found on our desktops, phones, and other general devices. Buttons and options for voice control were added circa 2012 to enhance the experience on applications designed primarily for screens.

Voice-Only Interaction

Voice-only interaction takes place on devices without a screen. All of its input and output are based on voice commands and sounds. The best example for this would be smart speakers like the Amazon Echo and Google Home.

Voice-First Interaction

Voice-first interaction, which is pretty much the reverse of screen-first interaction, does great in providing the best user experience when it comes to voice user interface. Apps designed primarily for voice are enhanced through the addition of a screen to output information.

Challenges: Privacy and Comprehension

Having an “always listening” microphone laying around in a private space like home had long been a concerning matter to voice command users, even when device developers and manufacturers promise that devices only listen when spoken to directly via their wake words or phrases.

With voice based interactions, it’s also difficult to have a sense of full control, given the numerous possible ways of relaying the same message or asking the same thing, as well as the different accents and languages spoken across the globe. When it comes down to tackling these problems, it’s important to be mindful of user experience. The main challenge here is to avoid, as much as one can, having users be asked to repeat themselves or hear variants of responses like “Sorry, I do not understand”.

To succeed in building voice applications, many brands have drawn their focus to designing voice assistants that, like people, can hold conversations.

Next Step for You

Whether you’re looking to invest in building voice applications or integrate voice interactions into your website, Snappymob can help turn your ideas into reality.

We’ve helped companies across different industries and regions up their game in digital to better serve their customers. Talk to us and we’ll explore the possibilities together!

If you liked this, you may also like:

Design