When watching a movie in the theater, we can feel sound coming into our ears from our left, right, back, and even overhead, giving us a better listening experience. This technique, which gives sound a sense of spatial direction, is called surround sound technology, and it allows the listener to experience a soundstage that is almost identical to the scene.
So, how can this surround sound technology be implemented? Obviously, the simplest idea is to put as many speakers around our ears as possible, so that the sound of different speakers can be sensed by the ear that the sound comes from different locations, which is also the design idea of the audio in the cinema space.
However, for individuals, this increases the cost of our equipment. Unlike cinemas with complex sound equipment, our headphones can achieve this effect with only two speakers on the left and right. This technique of using two in-ear headphones to sound in any direction in space is called virtual surround sound technology, also known as immersive spatial audio technology, and this is our next focus.
Image source: WWDC 2020
The purpose of spatial audio is to give the human ear a more realistic sense of space for the replayed sound. Therefore, to understand spatial audio technology, we first need to think about a question - how do humans judge the direction of sound?
How the human ear determines the direction of sound
As we all know, we can use one ear to feel the loudness, tone and timbre of sound. However, if you want to discern the direction of the sound, you have to rely on two ears. The reason is that both ears can hear the time difference and the difference in sound level. The time difference refers to the difference in the time it takes for sound to reach both ears, and the difference in sound level is the difference in the amount of energy heard by both ears.
For example, in the following scenario, when the sound source is on our right, our right ear will hear the sound first, and then the sound will reach the left ear. The longer the sound wave travels in the air, the less energy it will be, so the sound energy heard in the right ear is greater than that in the left ear.
Image source: Google I/O
So relying only on the two factors of time difference and sound level difference, can the positioning of sound sources in three-dimensional space be realized?
Don't worry, let's take a look at the following scene first.
In the following illustration, when the sound is emitted from our front and rear, the time difference and energy difference to reach the ears are zero. That is to say, when the time difference and energy difference between the sound reaching the two ears are zero, we cannot distinguish whether the sound comes from the front or from the back.
So, the question arises again, how do the ears distinguish the front and back direction of the sound? In fact, from the time sound is emitted to when it is heard by our ears, it goes through three processes—the process of transmission, the physiological process, and the mental process[1]. Since physical and mental processes are almost uncontrollable, here we are only concerned with the process of transmission.
The propagation process, also known as the physical process, refers to the process by which the sound waves emitted by the sound source reach the pinna of the ear through the medium, and then pass through the ear canal to the eardrum and cause its vibration. This is an extremely complex process, and the different structures of the human auricle will make the waveforms formed by the sound waves through the auricle be different.
Obviously, the propagation process of the sound source in front of the front and the propagation process of the sound source in the rear are not the same! Because our ears are not symmetrical. Sound coming from the front of the auricle is reflected through the pinna and can enter the ear canal directly, while the sound directly behind the ear needs to bypass the pinna to enter the ear canal. It is precisely because of this difference that we can distinguish the front and back of the source of the sound.
The pinna is equivalent to a device that "encrypts" sound, and our brains have fully mastered this "decryption technology" after a long period of learning, so that the front and rear positions of the sound source can be easily heard.
Now, we finally have the answer, and the direction of binaural localization of sound sources in three-dimensional space relies on the "encryption" of the auricle [2,3].
Virtual surround sound for headphones
More scientifically, it is not only the auricle that encrypts sound, but also body parts such as the contours of the head and the shoulders. Because this range of effects is all related to the head, this encryption method is also known as the Head Related Transfer Function [4,5].
The head correlation function can be understood as the method of encrypting the sound of our head, which is for different directions. It is precisely because the head encrypts the sound in different directions that our brain can decrypt the direction of the sound.
In order to decrypt the encryption of different sound source orientations, researchers can measure or calculate the head correlation functions in different directions [4,6] and then form a database.
Image credit: Veer Gallery
After we put on the headphones, the sound is received directly through the ear canal and is received by the eardrum. Without the process of head encryption, the sound inside the headset sounds directionless.
However, with the development of acoustic signal processing technology, we can simulate the encryption process of the head by inserting electronic devices inside the headphones. If our electronic devices are encrypted in the same way as the head-related functions, then the sound encrypted by the electronic device can be decrypted by the brain to decrypt the orientation information, successfully "tricking" the brain.
It is with this line of thinking in mind that engineers have developed spatial audio methods based on databases of head-related functions. They used digital circuitry to simulate the entire database of head-related functions, and then encrypted the sound inside the headset in a specific direction, so that the sound inside the headset could sound like a specific sense of direction.
Image source: Baidu Encyclopedia
For example, in a real concert, the violin is 45° on the left side of the listener, the piano is 45° on the right side of the listener, whether it is the sound of the violin or the sound of the piano, it can be encrypted by the listener's head, and the live sound has a good sense of direction.
If the online audience also wants to get an immersive experience through the headphones, then the digital circuit inside the headphones can choose the head correlation function of the left 45° to encrypt the sound of the violin, and the head correlation function of the right 45° encrypts the sound of the piano, so that it can "trick" the brain and make the sound inside the headphones sound good and have a good sense of direction.
Because this sound is not emitted from real space, but is "encrypted" through signal processing in such a virtual way, it is called virtual surround sound.
In recent years, with the increasing application of wearable devices such as headphones, virtual surround sound technology has been widely used, and it is also called immersive spatial audio technology by technology companies.
The reproduced content represents the views of the author only
Does not represent the position of the Institute of Physics, Chinese Academy of Sciences
Source: China Science Expo
Edit: Lychee jelly