January 18, 2021

Designing for Voice User Interfaces — Principles and Best Practices

How are voice-first interfaces designed differently from screen-first interfaces? We explore the ins-and-outs of UX in voice design.

Designing for Voice User Interfaces — Principles and Best Practices
Designing for Voice User Interfaces — Principles and Best Practices

Most of the digital products we use today are designed for our eyes, simply because as highly visual beings, we rely on our sight a lot.

Whether we’re reading microcopy to understand what a button does, or scanning through textual instructions to familiarize with a new machine, interacting with devices using visual cues is what we’re very used to.

But what happens when visual feedback is secondary? How does that change the way we interact with interfaces, and the way they’re designed for us?

Voice User Interfaces (VUI) like Amazon Alexa and Google Assistant are designed differently from screen-first interfaces for many reasons — the main one being that the principles of spoken communication differ from those of written communication. Because of that, UX designers cannot approach voice-first interfaces the same way they do screen-first interfaces.

How and why should they be done differently? Let’s dive into some of the best practices in voice design.

Best Practices in Voice Design

According to Amazon’s Alexa Design Guide, voice interfaces should be four things — adaptable, personal, relatable, and available. Both Amazon’s guide and Google Assistant’s Conversation design guide stress that voice-first skills should be natural and user-centric for a seamless experience.

Here’s a breakdown of the best practices in designing user experiences for voice user interfaces.

1. Be Flexible and Receptive

Language is dynamic. Let’s say a user wants to make a call, they can say “Call X”, “Ring X”, “Get X on the phone”, or a thousand more other variations that can be matched to the same “Call” intent. Your skill must be able to pick up on key details in varied utterances, even if there are multiple embedded in a single answer.

Sometimes, to save time, users may string multiple pieces of information in a single command or response to a question. For example, if one says…

“Set a reminder for my interview with Snappymob at 3pm tomorrow.” 

…they likely don’t want to repeat any of the information they’ve already mentioned. It’d show a lack of receptiveness to follow the exchange with “What time would you like to set your reminder for?” or “What’s the reminder for?”, because then the user would have to repeat “3pm” or “Interview with Snappymob”.

When your interface is able to pick up on additional information that wasn’t asked for, and fill in the gaps unprompted, you create a more intuitive experience for your users.

2. Ask Questions, One at a Time

Simply speaking, users need to know when their cue is. With voice-activated devices, asking questions lets them know when they need to respond.

Phrasing in statements rather than questions, like “Let me know where you would like to go.” instead of “Where would you like to go?” makes it unclear if the user needs to prompt the next activity with an answer.

To avoid confusion on both sides, it’s also standard to ask for one piece of information at a time. For instance, asking a user “What is your new contact’s name, mobile number, and email address?” might cause information overload and make forming an answer difficult. Instead, ask single-answer questions to fill in the blanks one by one.

3. Present Clear Options

Unlike visual interfaces, with voice-first, users aren’t choosing from navigation menus they can visually scan through. Hence, it’s important to relay their options in a clear and concise manner. This is where you need to pay attention to language.

Syntax matters especially with distinguishing between either/or questions and yes/no questions. For example, “Would you like twister fries or french fries to go with your meal?” could sound like a yes or no question. Instead, to make the option prompt clearer, it could be phrased like “Which side would you like: Twister fries or french fries?”.

4. Narrow Down to Only the Necessary

This applies to both questions and choices. Text can be scanned through, but spoken words take time to relay — so be succinct.

Only ask for confirmation for actions with high importance, like making calls, sending text messages, and any action that involves recipients or monetary transactions.

Asking for unnecessary confirmation or making users reiterate information can easily frustrate them. For example, an “Are you sure” confirmation might not be necessary for simple commands like “Turn off the lights” or “Tell me the current price for Bitcoin”, but necessary for “Send email” and “Call Cindy”.

As for choices, make sure your users aren’t bombarded with ten choices at a time. Remember that unlike text, users can’t scan back and forth to digest a message with speech. Giving them too much information to process at a time will lead to more errors in comprehension and exchange.

5. Restate Questions

As mentioned before, on voice-first interfaces, users aren’t relying primarily on a visual navigation menu. So to make sure they know where they are, it’s always helpful to reinstate the questions they answer.

For example, when a user asks for the recipe for spaghetti bolognese, it’s clearer to respond with “To make spaghetti bolognese, the ingredients are…” instead of jumping straight into “The ingredients are…”. A simple restatement reassures the user that they’re getting the answer they’re looking for.

6. Be Prepared for Blockers

Being well-prepared includes being prepared for scenarios where things don’t go as planned. Here are some of the most common blockers to think ahead for when designing for voice user interfaces:


When a VUI gets a piece of information wrong, users will naturally correct it with common phrases like “No, I said-” or a simply a reiteration of their command.

Your interface must be able to pick up on corrections, accept them instantaneously, and restate the command for confirmation.

Comprehension Failures

When the VUI fails to understand the user’s command, it’s important to handle the error with grace.

The standard way to approach this is with a statement followed by a repetition of the question, like “Sorry, I didn’t catch that. Who would you like to share this link with?”. This is to make it clear to the user that they should reiterate their answer following the cue.

Unavailable Functions

When a user asks for a function that doesn’t exist, instead of using the standard “Sorry, I didn’t get that” response repeatedly, let them know what can be done for them instead to move the conversation forward.

Here’s a scenario — a user says “Cook me spaghetti”. Instead of ending the exchange with “Sorry, I couldn’t understand that”, respond with “I can only help you find recipes for spaghetti. Would you like me to look for the best ones?”.


Sometimes users forget to answer, or abandon ship halfway through a conversation. Some other times, the VUI doesn’t pick up on the sound input. In situations like these, re-prompts are necessary to give the user another go at answering a question, in case the non-response wasn’t intended.

In the re-prompts, it’s important to restate where they left off, so they know where to pick up. For example, a user doesn’t respond after saying “Set a reminder”. The re-prompt should restate what the user’s intent was, and what they should do next, something like this — “I can help you set a reminder. What would you like to be reminded for?”.

Let Us Help You

Thinking of jumping on the voice-first bandwagon? Snappymob might be the agency you’re looking for.

Our team of expert designers and engineers are passionate about user experience and making products that delight. Talk to us and let us give you a boost in your next project!