How to design a good Voice User Experience (VUX) ?

Written by Aurélien Chapuzet

Aurélien is leading content creation and marketing strategies at Vivoka.

Adopt | Speech Recognition | Speech Synthesis | Wake Word

Speech-to-text: uses & evolution

Why is Automatic Speech Recognition complex ?

Embedded vs. Cloud voice technology? You decide.

What is Voice User Experience VUX

Voice User Experience (VUX) refers to the art of creating voice experiences designed for (and often by) users. Also called “voice design” or “voice UX”, it is the design and development of the user experience with voice as a primary interaction method. It also takes into account the whole “voice” scope in which sound design (based on TTS for instance) is incorporated.

Basically, VUX is a set of principles to provide a friction-free and truly engaging voice experience to the user.

Based of generic UX best practices

As UX comes with UI, VUX comes with VUI. VUI is the design practice allowing users to interact with a system using their voice, without swiping or using their hands. VUX is included in it, as UX is included in UI and they have to work together to offer an optimal overall experience with voice assistants, speech recognition or any voice-enabled device.

As it stems from generic UX, you can apply some of the best practices of UX design to VUX. Here are some laws which are already used for VUX design:

Hick’s Law, which is a principle according to which you shouldn’t offer too many options to your users overwise they could take longer making a decision;
The primary and recency effects state that people usually recall best the first and the last items of a series. According to it, you should place the important information at the beginning and at the end of a dialogue;
Progressive disclosure advises to ask for one information at a time, don’t make several questions in one sentence;
The framing effect is a cognitive bias. A person will decide on options based on whether they are presented with positive or negative connotations. Try to avoid using negative phrases because people can misunderstand it or take it personal.

Still, voice design is different from other forms of UI/UX. Indeed, voice is natural for human-beings and thus it may evoke emotional responses and can offer deeper information such as tone and intention. Moreover, voice is a great way to improve accessibility but, to do so, you have to design it correctly.

Which came first? Voice First and Voice User Experience (VUX)

These last few years, we have witnessed the advent of voice technologies and their usage being tenfold. You may have heard about the voicefirst movement which implies an important growth of voice assistants and speech recognition. As voice is a natural way of expressing ourselves, people use it for any kind of communication.

Then, it is natural that Voice User Experience has become mandatory in order to offer great experiences to users when it comes to voice products. Now, these two are strongly interrelated in the sense that:

VUX wouldn’t be if the world wasn’t getting more and more voice oriented (voice first);
Voice first wouldn’t be an opportunity if voice interfaces weren’t instinctive and convenient thanks to VUX.

Thus, like any other user interface, Voice UX has to make voice a sustainable option and will have to evolve according to users’ expectations. It is the only way voice first has in order to stay in today’s society and impose itself as a strong option.

Close cousin: From the experience comes the concept of Voice Marketing

Voice Marketing covers all brand experiences and contact points enabled by voice technologies, in the same way that digital marketing emerged via the Internet and smartphones.

Would it be new ways to connect with customers and promote the brand

This new form of marketing improves the customer experience by being more engaging and fun as well as empowering products and services with intuitivity. In addition, it makes it possible to deliver the brand message and promise in a more conversational channel, closer to a human-like relationship.

Even if there are multiple ways to communicate with prospects and customers nowadays, voice still represents a more direct way of engaging with them. Asking questions out loud or using voice control to command things hands-free and enabling your product or service to answer users make it a convenient feature that generates satisfaction.

Examples of Voice Marketing

There are now plenty of new touchpoints of customers’ journey companies need to master. For instance, sonic branding and voice search optimization are becoming increasingly important.

Sonic branding is a recent practice which is equivalent to the visual branding but for audio. In fact, it is the use of sounds and music to reinforce notoriety and brand recognition;
Voice Search Optimization (VSO), you’ll get it, is the SEO based on voice as voice search is currently booming. More and more research tools are offering it as an option rather than using the classic writing search bar.

The same way VSO is emerging as an added option with classic SEO, it is only the beginning of voice advertising. Voice advertising provides audio advertisements to listeners, mainly thanks to voice assistants like Alexa. This technique is considered less intrusive than classical ads like video ones or so. Indeed, the system usually asks if the user wants to listen to ads beforehand. At the end of the ad, the system offers the user to give them more information. As it is less intrusive, users are more likely to accept it which should result in a better attention while listening to the ads. Thus, voice advertising would increase attention and related notoriety. Indeed, if well designed with, let’s say, a great sonic branding, voice ads could be a driver for a brand’s notoriety.

New revenue streams with things like voice ads

New business models should then emerge from these new uses of voice. Some already have! Indeed, Apple announced in 2021 its voice business model. Simply put: Apple music is now cheaper for users who use it with Siri:

“The new music experience, designed exclusively for Siri, is just $4.99 per month.”

According to Apple, Siri and Apple Music are natural partners and thus should be used together. Apple has greatly wagered on voice and tries its best to democratise it. Moreover, as Siri already is actively used on millions of devices worldwide, adoption should be fast, even more knowing that Apple adds voice-directed features and promises a personalised experience thanks to this.

How to think the voice experience

You may want to conduct benchmarks in order to see what your competitors are doing in the field of voice user experience. But, more than that, also ask your users what they would need to have. Understand how they would like to use it and what are their motivations. In order to do that, you need to think about your persona first.

Thinking user-centred and with voice only

Before anything else, you need to think about your personas. You should have at least one but ideally as much as there are different types of people who could purchase your product or service. This article is not a marketing course, but personas are mandatory to create user-centred interfaces. It allows you to know who will use your product and thus design it for them.

Personas usually include basic information concerning your target audience such as name, demographics, goals, needs, motivations, pain points, and so on… But when designing for voice, you will need to include environmental information. Background noise, proximity with the microphone, specific jargon, privacy and also where and when they will be interacting with the voice interface. This last one is called “placeona”.

A “placeona” may help take into account environmental situations that could stop users from using a voice system. It is a mix of information about a scenario and the user senses while interacting with a system. Thinking about whether the environmental noise is too high to listen to a response or not, knowing if the user is close to the corresponding GUI and has access to visual information, if they can speak loudly private information… All these are factors that may influence the way you develop your VUI.

How to meet the 2 main objectives

Having the best usability

Once you have your personas and placeonas ready, keep these elements in mind while designing your VUI and VUX. Like any other interface, you will have to think about your information architecture (IA) and how your persona could logically use your voice-enabled product or service. Try to keep it easy to use. Avoid creating too complex commands that would be too difficult to retain. Moreover, keep in mind that (maybe) there is no visual interface to help.

To help you out, you can create flowcharts to map ALL the potential conversations. Indeed, even if it seems clear in your head, it can end up being the most difficult part. Try to be as exhaustive as possible. Think about the potential response in case of error, what happens if there are no results, the response-time and what happens in case it is passed…

Don’t hesitate to process test sessions or surveys with your current customers/prospects or your potential ones according to your persona’s characteristics.

Promote the brand and grant the best experience to encourage returning users

Voice unlocks new opportunities to reach your customers and prospects. How to leverage it to create great experiential marketing and improve your users’ fidelity?

In order to be at the right place at the right time, you need to pick your use cases carefully. As we like to say it, it is counterproductive to implement voice everywhere just because you can. You have to think about the usefulness of your product and how voice can add value to it.

Also think about your system’s persona. As a part of the brand identity, you have to think about how your voice system will respond. What tone, which voice, what “personality” it will have. All of it should be aligned with your brand image and what you want to convey. Still keep in mind your persona and placeona here. Whether you are a B2C or a B2B company, the way you want to address your users is going to be very different.

What added-value will your system offer?
Is there a possibility for up-selling or cross-selling strategies?
Can it speak about additional products or services that may interest the user?
Is it able to give directions in order to optimise tasks and travel moves?

Think about what the user could want, whether it is proven needs or not.

Regarding the feature of your voice solution it will vary

Finally, all of it will depend on the voice solution you want to create. From simple commands to complex conversational AIs, that’s your use case that will define the voice design.

If you need a fully embedded solution with simple commands to get the users from point A to point B, its VUI (Voice User Interface) will be easier to design as it will be basic and exhaustive. Indeed, with concise commands, each aiming for a specific task, users will naturally understand it and use it. For instance, voice picking solutions are designed to be very easy to use because they are made to help supply chain workers. There is a defined set of commands so that users don’t get confused. Moreover, they are rapidly operational and more efficient thanks to voice-assisted directions. You can check this concrete example of voice picking by KFI.

On the contrary, if you want to design complex voice AI such as assistants, your VUX will need extra thinking. It must be clear what people can do with it and how they can do it. Conversational UX needs to be supplemented with humanization. As we said before, you need to study tone of voice and behaviour as well to make them aligned with your brand’s image. For example, Toyota is enhancing in-car voice assistants in order to make interactions more seamless for users. In order to get all the benefits it can offer, it has to be well-thought and executed. Always user-centred.

Defining the scope of your solution and its evolution are key elements to create a good VUX foundation that will easily evolve and welcome new features in an organic manner.

VUX done right

Wake words

One of the easiest illustrations of VUX is how wake words have evolved to take the shape they have today and the use they are solicited for. Indeed, it is by practice that most brands are now using the combination of an interjection like “Hey” and the brand or assistant name like Google or Alexa. The user always has to start the conversation and that begins with a great wake up word.

It quickly became a best practice to have this structure “Hey + Brand”. In order to achieve good results, a wake word should be composed of distinct sounds and enough length to reduce false positives. This is on the design side only, for the marketing benefits, we can rapidly agree that constantly saying the brand’s name in order to interact with its product is one, if not the most efficient way to passively improve brand awareness.

What a good conversational AI looks like

Conversational AIs aim to offer a more “human-like” and seamless way of interacting with users. These are a great illustration of VUX in the sense that they usually include more than just one voice technology. The most impactful example for conversational AI is the e-commerce one, AKA conversational commerce or v-commerce (for voice-commerce obviously). In order to design a great conversation commerce (or any conversational AI), you need to think as if it was a real conversation. Think about how you would like somebody to answer your questions and try to keep this as human-like as possible.

As an example, let’s take the Starbucks one. It is now simple to ask your voice assistant to just “tell Starbuck to start my usual order”. This isn’t necessarily a phrase which you would have thought about and yet people will use it. You need to think about different ways of asking things and try to mix it up so that your conversational AI is able to understand and respond to any of them. That way, users won’t get frustrated repeating their commands in order to get what they want. The same occurs to the variety of your conversational AI answers. If the user has to repeat more than once, don’t make your AI ask them the same way over and over again.

TTS with emotion and prosody

Prosody is the way people are speaking, how they pronounce words and syllables… In order to sound natural, the voices you use have to display their own personality. Indeed, everyone is speaking differently depending on their prosody and their emotions. Then, having your voice assistant answering users with its own voice and speaking-style will make interactions way more human and feel more personalised.

This is what we call a system persona. As explained earlier, your voice interface should be personified in order to connect with the user. According to Don Norman, an American Researcher who wrote a book entitled “Emotional Design: Why We Love (or Hate) Everyday Thing”, people are more prompted to relate to a product when they are able to connect with it on a personal level. This personal level is emotional design and you can achieve it in the field of voice technology.

The importance of SSML on TTS

Speech Synthesis Markup Language (SSML) is a markup language of which the tags are used to customise aspects of speech, such as pronunciation, inflection, pitch and more… It helps you make the voice correspond to the message you share and the way you want it to be said. It can add pauses, emphasise words, set the volume of the speaking voice and so on. Thus, without SSML, text-to-speech there are some words that wouldn’t be said as they should be. More than that, there would be no intonation and the voice would sound robotic.

Google Assistant VS Alexa

A great comparison is the one between Alexa and Google Assistant. Indeed, one has a name and the other not really. Alexa seems to be more human than Google assistant then because we consider it as a real assistant even though we know it is only a voice interface. And guess what? Alexa has always had a little more market share than Google Assistant.

It is also a question of “how does it respond and what level of details will it give?”. Indeed, Techradar conducted a benchmark between Alexa and Google Assistant and asked them both basic general knowledge questions. To “What is the tallest mountain in the world?”, Alexa gave details about Mount Everest, which is the highest mountain in the world. But it also spoke about Mauna Kea, which is technically the tallest mountain even if most of it is immersed in water. Whereas, Google Assistant simply listed some of the highest mountains on earth.

This, once again, depends on the level of personality you want to give any of your voice-enabled devices and the level of professionalism required.

Using voice at specific places for a good and valuable use

An overall good practice for voice interfaces, you’ll have understood, is to not overuse it. It has to be an option and complementing other features such as digital interfaces. Admittedly, voice technology is a growing trend. It is of great use when it comes to hands-free and accessibility. Yet, it can rapidly seem overdone if put in a device where it doesn’t add any value.

Of course, everything seems cooler when controlled by voice. But in reality, can you imagine what it could look like if really everything was voice-enabled? Voice technology is great because it is only an additional feature which is not yet overused, even if there are already some unnecessary use cases. I mean, controlling your house’s lights individually by voice is not really what we can call a user value. It will mostly seem to be a gadget to show off. Whereas a complete light-based scenario : “Prepare the room to watch a movie” would orchestrate a multitude of devices and actions (deeming lights, closing shutters…) to offer the expected result.

Another bad example is the one from Zendesk:

“When a customer asks a bot for help returning an item, the bot can direct them toward a piece of content that explains the company’s return policy.”

What is the added value from this kind of interaction? Nothing. The conversational AI doesn’t even bother speaking up the terms and conditions of returns. In addition to an action already based on dissatisfaction (the fact that the customer wants to return something), the user gets even more irritated by a feature supposed to help them. If you cannot do it correctly, don’t do it at all. Otherwise, it may end up hurting your business more than anything.

Tips to avoid a bad VUX

Prefer short responses from the conversational AI

Even if you have a super great speech synthesis and people like to hear it speaking, users don’t have infinite memory. If you have important information to share, make sure it appears at the beginning or at the end of a response and don’t make it too long to process. Make every word count and don’t speak for nothing because you don’t want to cause a loss in attention. This one is closely related to the next point.

Do not present too many options to the users

Remember Hick’s Law? As we were saying at the beginning of this article, you should avoid offering too many options to your users. They won’t remember everything and it could end up in a longer decision process or a request for repeating the options. Moreover, according to progressive disclosure you should ask for one element at a time so there’s no need to flood the user’s mind with information.

Provide clear options about a command

When the user has a choice to make, make it clear what they have to say in order to take action and, when it is necessary, provide them information about what their choice will result in.

The user should have enough time to think/to answer

It may seem logical but I bet you too have already been irritated for not being able to speak a command properly due to lack of time or the voice assistant that assumes you’re done speaking when you’re not? In order to avoid that, make sure to provide a long enough response-time so that the users can think about what is right for them. Moreover, when the interface is waiting for a response from the user, make it clear that it is. Don’t even let a doubt that the user is supposed to speak. It can be a visual or an auditive feedback/aid that informs the user the system is waiting for them to take an action.

Ask for user’s validation

Users will want the system to act just like real human beings and this is achieved through empathy and active listening. They want to feel listened to and understood by their devices. Then, this is a great practice to ask for their acknowledgement when specific commands are spoken. In order to do that, think about all the possible interactions. As we said before: create dialog flows. When making important decisions, make the issue clear: “Are you sure you want to [do this]?” or “You want to [do that], right?”.

Why is VUX so important

Another mirror with UX, long time forgotten, quickly became a fer-de-lance in competitive advantages

Advantages of VUX short term

Customer satisfaction

In the current society, designing a voice assistant may be a great asset to your product or service for the main reasons we spoke about in this article. If your product or service is already well implanted, adding a voice control feature may accelerate your growth and thus help you get a quick return on investment. Think about it: have you ever seen so many people sending and listening to voice messages? Even in public!

Standardisation

Just like for classic graphic design, voice design is already getting standardised in order to offer a consistent experience to the users. This helps the adoption curve grow but it also allows professionals to give guiding lines just as we did in this article. Because users wait for specific features and actions when speaking specific commands, you won’t gain anything by changing the rules. On the contrary, it may even be more counter-intuitive for them if you dare change their habits.

Advantages of VUX in the long run

First of all, VUX is the basis of the adoption of voice technology. The more it is easy to use, the more people get accustomed to it and use it naturally. Even if the technology isn’t a novelty anymore, its life cycle is still at its beginning and there are plenty of users to conquer.

Quicker decision process

A well-designed VUX may help users with the decision process. Indeed, as mentioned earlier, decision making must be easy and clear so that users know what they engage when answering their voice assistant or speaking voice commands. The greater the VUX, the more the user will be confident with using it.

Good foundation to proper Voice marketing

We presented what voice marketing is and what are the current turning points of this technology. It is still in its infancy but make sure not to miss out on this as even the GAFAM giants are very interested in it. Then, starting from a good VUX could help you rapidly take advantage of this new trend.

Thoughts on Voice Marketing ?

Voice marketing isn’t yet a saturated space and is an innovative way of interacting with users. It is a quickly-growing trend which is becoming more and more important, even more in a voicefirst society. At Vivoka, we strongly believe that voice is the next big evolution in our daily lives. Thus, marketers better get accustomed to using it as it would allow them to reach people by a different medium than visual interfaces. More original and easily memorable (you can easily sing a jingle even if you don’t see the images from the ad, whereas the logo is more difficult to think about and “recreate”), it is a great way of staying present in the mind of the user and increasing spontaneous notoriety. What are you waiting for? Create your branded voice assistant now!

C'est toujours le bon moment pour en apprendre plus sur les applications de la technologie vocale

Découvrir le contenu