#本文为人人都是产品经理 produced by the Original Incentive Program.
What comes to mind when you think of voice assistants? Is it Little Love Classmate, Tmall Genie, Amazon Alexa, or NIO NOMI? Presumably, for you, car voice is not unfamiliar. The development of speech recognition technology is changing rapidly, and today we, as long as we wake up the voice assistant, can realize a series of control commands to meet the needs of us in the cockpit.
First, the interpretation of the concept of VUI
1. VUI Interpretation and Brief History
VUI (Voice User Interface) means voice user interface and can be interpreted as a voice control application on a computer or mobile device. Its development history can be roughly divided into two stages/periods.
The first period of VUI was marked by the birth of the Interactive Voice Response (IVR) system, which could understand people's conversations and perform corresponding tasks through telephone lines.
For example, when we call the mobile communication operator 10086 or 10010, we can communicate directly with the voice customer service system that appears, and then inquire about the call bill or package service. In addition to being widely used in mobile communication operators, voice customer service systems are also widely used in various airlines, banks, and hotels.
The period we are in now is the second period of the VUI. Today, there are voice assistants such as Amazon Echo, Xiaomi Smart Speaker, Tmall Genie, etc. on the market, and some of their models can only provide voice information, while apps such as Siri, Google Now, and Cortana that integrate voice and visual information also occupy market share.
In the second period we are in, although many things can already be handled by the current voice assistant, but at the same time there are many things that the voice assistant cannot complete, so at present we are also in the early stage of the next stage, we can look forward to the future of the development of THEUI: when we drive on the road, the car voice assistant in the cockpit actively tells us where the mall is discounting today according to the preferences of each of us, which Rosen can still buy a small cake that is often out of stock; or when we are in a bad mood , take action to ease our emotions or make recommendations.
2. The work content of the VUI designer
Just as Internet UX designers focus on a smooth experience between users and apps/phones, VUI designers focus on the interaction experience with users and voice programs.
In the book Design for Voice User Interfaces, it is said: "VUI designers think about the entire conversation between the system and the end user, from start to finish. They think about the problems being solved and what users need to achieve their goals. ”
The workflow of the VUI designer is similar to the UX generic design process, including: user research, character creation, prototyping, user flow creation, usability testing, and iterative design.
But there is a big difference between VUI design and interface design, and voice interaction designers must consider the complexity of speech, and need to understand the difference between people's emotional feedback on voices and different sounds, involving psychology, sociology, linguistics and other fields.
To sum up, the work content of VUI designers is divided into four parts, which are: user research, design stage, scene data collection, and iterative optimization.
Second, the car voice image
Voice user interfaces without personality do not exist.
——Cohen, Giangola and Balogh , 2004
As Cohen, Giangola, and Balogh say in Voice-Machine Interfaces, if you don't personalize your VUI, your users will. So it's better to define our VUI personality early in the design.
A good VUI experience is essentially a good conversation. A good conversation is naturally inseparable from the complexity factors such as the object, content, tone, and timbre of the communication. The on-board voice image can be divided into a visualization part (visual part) and an auditory part. In most on-board voices, the on-board voice visual channel is the ideal carrier of the vehicle voice auditory channel, and the two are combined with each other and complement each other.
1. Voice visual image
1) Car voice visual image type
Voice images are divided into traditional (figurative), abstract, and anthropomorphic.
Mobile mobile phones and family speaker voices are mostly abstract images, new force cars are mostly anthropomorphic images, and concept cars are mostly traditional images.
There is a dividing line between the three types of types, but there will also be some convergence. Let me introduce the specific characteristics and examples of the three types.
(1) Traditional type (figurative type): two-dimensional, flat, simple color, simple animation, sound wave type, microphone shape.
(2) Abstract: 3D, irregular shape, rich color, flow, lighting effect, sphere, cool (example: Google voice assistant Cortana).
(3) Anthropomorphic type: cartoon, obvious facial features, rich expressions, IP, vivid and lively (for example: Xiaopeng voice assistant Xiao P).
2) Evolution of vehicle voice visual image style
The on-board voice visual image style has evolved from traditional to abstract/anthropomorphic.
Through the visual change process diagram of Siri in the following figure, we can clearly see that the visual image of Siri has gradually changed from the initial skeuomorphic microphone shape to an abstract circle with fluctuations in light effect changes. Siri is a typical example of the evolution from traditional to abstract.
3) Proportion of visual types
Through the following figure, we can see the rough proportion of traditional, abstract, and anthropomorphic voice visual images on the mobile end and the extremes of the car.
It can be found that the highest proportion of abstract visual images on the mobile end, in the car side, the highest proportion is currently the traditional voice image, with the increase in the proportion of new energy vehicles and the development of science and technology, this data will also change in the future.
Anthropomorphic visual image is a common manifestation of THEVI, but not all VUI require an anthropomorphic visual design, such as sir and Cortana's visual image is an active glowing circle. For a good voice interaction, an anthropomorphic image is not an essential condition.
4) Diversity of voice images
The voice image is not limited to showing the basic state of the voice (wake-up listening state, etc.), but also reflects the diversity of the broadcast voice image. The direction of expansion is:
- Voice image combined with different images;
- Voice image combined with different emotions, personalization (such as birthdays);
- Voice image combined with different intelligent scenes (weather, nap mode).
2. Vehicle voice image design process
The vehicle voice image design process includes personality characteristics, performance layers, basic states, and sound design in several directions, and each direction contains several design modules.
- Expressive layer: image design, animation design, color, expression;
- Sound design: timbre, intonation, tone, discourse rhythm;
- Basic status: recognition and resolution, wake-up listening, result feedback.
3. How to get the personality traits of speech
When we started VUI design, we needed to get voice assistant personality keywords that matched the brand of our project.
We can carry out workshops within the company or go to the brand 4S store to conduct research, get voice keywords, and then analyze and derive them one by one according to the user's usage scenarios. The auditory image created by VUI designers needs to conform to their own product attributes.
4. Speech-auditory image (two groups of surveys)
What kind of voice auditory image does the general public like? Here we look at the research results of two domestic and foreign teams.
The first is from the Baidu intelligent driving team put in the Baidu map activity area of the survey results, from 3745 valid samples, the team will be the auditory image subdivided into the basic attributes, personality traits and voice traits of the three dimensions of three dimensions, according to the three dimensions of the survey likes, the composition of four specific voice images, namely: sweet girl, gentle royal sister, lively girl and cheerful male.
According to its survey results, among the basic attributes, people who like female voice images account for the largest number of people, and more than 80% of users favor younger voice images of 18-32 years old. More than half of users want voice products to become their assistants, and less than half of users want voice products to be anchors/hosts.
At the level of personality temperament, users prefer a lively and cheerful multi-blood image (according to the psychological humoral temperament theory).
In terms of sound qualities, users prefer a sweet timbre and an easy-going and intimate tone. Secondly, the stable and delicate personality traits are also liked by most users.
After reading the interesting research results of Baidu's intelligent driving team, let's look at the research ideas of Dr. Michael Braun's team in Germany.
Preliminary research phase: Michael Braun designed 8 voice assistant personalities, containing the basic dimensions of "dominant/compliant" and "hostile/friendly" interpersonal communication, and invited 19 non-HCI professionals to talk to 8 assistants in 6 driving scenarios (3 driving-related and 3 entertainment-related) to understand their demand preferences for voice assistant personality traits.
Based on the above research, Michael Braun adjusted the personality design of the voice assistant and designed 4 new personality models, namely friend, admirer, aunt, and servant.
Real-life driving experiment: The experiment invited 55 respondents aged 23-60, including 45 men and 10 women, and more than half of the respondents had used voice assistants in their daily lives.
The interviewee drove a mid-level sedan on a certain section of Munich to begin the test, according to the order of the driving scene, the respondent interacted with the matching personalized assistant and the default personality assistant to complete two driving experiments. After each driving experiment, the respondent verbally evaluated the quality of the interactive experience.
Analysis of experimental results: In the process of matching the personality of the user and the assistant, 21 of the 55 respondents matched the personality of friends, 16 matched the personality of the servant, 15 matched the personality of the aunt, and 3 matched the personality of the admirer.
conclusion:
- The correct matching of the voice assistant personality and the user's personality is crucial, such as when matching to the assistant personality that suits you, you can get higher satisfaction and liking, on the contrary, it is easy to cause dissatisfaction.
- In non-driving scenarios (such as entertainment scenarios), users prefer personalized personalities that suit them; in safety-related driving scenarios, the default personality is more favored.
There is a saying in the xidian: "It is not necessary to perform the same, but to be adequate." This means that everyone's shoes don't have to be the same size, the key is to fit their own feet.
The same is true of the voice auditory image, which is the best for the user. Each car has its own user positioning, as a VUI designer, you should choose the most appropriate voice personality research method according to the user population of your own products, in order to get the most suitable voice auditory image.
In addition, some students may be curious, how is the voice assistant voice synthesized?
One way is to find a voice actor to record hundreds of key words, extract them according to the key information of the sound, and then synthesize them.
Third, beware of users "falling into the uncanny valley"
When users use voice assistants, it is easy to trigger some bad experiences, such as: uncanny valley conditions. We have to avoid getting into this situation when designing a VUI. So, what is uncanny valley theory?
Uncanny Valley Theory:
The uncanny valley theory was proposed by Japanese roboticist Masahiro Mori in 1970. The uncanny valley theory is explained as follows: When we see something that is very similar to a human being but not completely similar, we are sincerely frightened. For example, the zombie at the bottom of the valley in the picture above, although the same human appearance, but the essence is very different, so it causes our fear.
One way to avoid uncanny valley conditions is to reduce human feature designs in our designs, or to use cartoons/animal figures.
IV. Conclusion
The future is here, it's coming. Brain-computer interfaces, metacosms, all kinds of new terms fill our lives, and the speed of development of science and technology is beyond imagination.
The same is true of voice technology, and companies continue to iterate their own voice systems to lay out for the next generation of intelligent networking. We should be prepared to meet the new design challenges of the intelligent era.
References
- Voice User Interface Design: Principles of Experience Design for The Embodiment
- https://zhuanlan.zhihu.com/p/78890262
- https://www.researchgate.net/profile/Michael-Braun-20
- https://mp.weixin.qq.com/s/8Y8vj4NCeIJ2Oq5dWX626Q
Author: Blade Fish; Public Account: HMI Design Marker Pen
This article was originally published by @BladeFish on everyone is a product manager, and reproduction without permission is prohibited.
This article is produced by Everyone is a product manager, the Original Incentive Program.
The title image is from Pexels, based on the CC0 protocol