In part one of this article, we covered the topics of microphone sensitivity and pickup range, and also touched on the basic principal behind beamforming. If you haven’t read the first part of this article, you can do so here. Today, I’ll dive into myths regarding how wide the array is and how to increase performance.
Previously, I discussed the basic parameters that determine the sharpened or width of the listening beam of an array. As mentioned, the most important parameters are the total aperture of the array (distance between the two ends of the array) and the frequency of the signal. The wider the array is the narrower the beam will be for a given frequency. Likewise for a given width, the higher the frequency is, the narrower the beam will be.
Does it matter how many microphones are in the array? Absolutely! First, the distance between adjacent microphones has to be smaller than a certain distance to prevent ambiguity. We call this aliasing. Secondly, the more microphones we have, the better we average and reduce the non-directional noises, which is an added bonus of the array. Let’s take a look at some of the common myths surrounding these.
Myth – The Wider the Array, the Better the Beamforming
Very true. The width (or aperture) of the array and the number of microphones in it, are extremely important indicatives of the potential performance.
What if we wanted to listen to a direction other than broadside? We can either rotate the array or electronically delay the signals coming out of the microphones. This allows signals originating from the wanted direction to be aligned, effectively rotating or steering the array.
These fundamentals describe the physics behind a simple beamforming, many times referred to as “delay-and-sum”. Let’s discuss some of the pros and cons of this simple beamformer, and what other alternatives are available.
Pros: Simple to implement. The process has no negative affect on the quality of audio signal, no artifact, and no processing noise.
Cons: You need a large array to achieve decent directionality. The directionality depends on the frequency, so effectively any signal originating from an off-axis direction will be attenuated by a different level for each frequency. This creates a distortion of the signal, also known as coloring.
Reality Check: Sound propagates through air at a speed of 340m/sec. As a result, the wavelength of a 500Hz signal (about the middle of a speech spectrum) is 0.6m. If we use a 1m wide array (which is pretty wide), at this frequency, the delay-and-sum beamformer will produce a beam of more than 35 degrees wide at its 3dB points. That’s a 50% drop in power.
How can we improve the performance of the simple delay-and-sum beamformer? For starters, we can reduce the dependency of the beam on the frequency, that thing that causes the coloring of the interferences. We do this by using a method called constant beamwidth design. Instead of just summing up the output of the microphones, we can apply digital filters and then sum the filtered outputs (filter-and-sum beamformer) to alter the shape of the beam. Carefully designed filters can reshape the beams, such that their dependency on frequency will be reduced and eliminated. This is not a simple task, but the tools are available. It’s fair to mention that by reducing the dependency on the frequency, we may widen the beam even further, which brings us to the next challenge.
We’re still left with the need to have very large arrays in order to get a high level of directionality. Available solutions include super-directional arrays based on sophisticated filter designs, as we mentioned before. With the proper filters, we can shape the beam and push it to be narrower. In addition, there are techniques called differential microphone arrays. These rely on the subtraction of the signal of the microphones in an array, instead of summing them up and creating a listening beam which is along the array axis (we call it end-fire). Super directional arrays and differential arrays have their own downfalls. They will probably be more susceptible to a mismatch of the microphone elements and may produce a less flat frequency response.
There are other, non-linear methods, which have been developed to obtain high directionality and a greater level of noise reduction. The basic idea is to determine whether the signal we’re observing originated from the wanted direction, and if not, to eliminate it vigorously. There are many ways to do this. One of them is referred to as adaptive beamforming. This is a technique, or a group of techniques, that was discussed broadly in the late sixties and improved over the years. These more aggressive beamformers eliminate significantly more noise and can produce very narrow beams. They are, however, sensitive to the matching of the microphones and the integrity of the acoustic/mechanical design. Any deficiencies will translate to distortion in the wanted signal. Even if we know how to overcome these potential problems, these solutions will leave noises tainted with artifacts and “processing noise”. Unfortunately, our brain is sensitive to interferences which don’t sound natural. We can better deal with and ignore interferences (noises) which we’re used to hearing. We’re far less capable of dealing with processing noises of a digital nature. As a result, the advantage of getting higher directionality is sometimes masked by the disadvantage of having to deal with distorted signal and leftover noises that we don’t know how to ignore.
Myth – Highly Directional Arrays Will Improve Our Pickup Range and Quality
If so, they come with a price. Linear, super-directional methods will color our signal and are more sensitive to a mismatch of the microphone elements. Non-linear adaptive beamformers will improve the ratio between signal and noise, but the result may be less intelligible and less adequate for conferencing. In many cases, they’ll also “attack” the wanted signal itself and cause distortion.
It seems like there’s no silver bullet and wherever we go, we run into downfalls and negative effects. To a large extent, this is true. The key is to use elements from all of the above methods, and use them in moderation. In our team’s quest for the most effective solution, we started many years ago with very aggressive beamformers. Throughout the years, we learned how to tame them down, so that the audio quality would be acceptable for conferencing. We reverted to more traditional linear approaches and are now back to a mixture of all these methods. It’s a very complex stew.
What about two dimensional arrays? What are the advantages or disadvantages? A linear array, in which the microphones are all placed on a line, will provide directionality on the two dimensional plane they are placed on. A horizontal array will have horizontal directionality and no preferences for the vertical position of the source. In order to get directionality in other planes, you need to place microphones on these planes as well. If you place the microphone at the end of conference room, the vertical direction is not important and doesn’t mean much. However, if you place an array on the ceiling, it has to be two dimensional for obvious reasons. Therefore, in some installations, a planar or two dimensional array is a necessity. In other instances, there’s no advantage. Note that planar arrays require up to four times the processing of a linear array, which is a heavy price to pay if it’s not necessary.
What you should know when considering a beamforming solution:
- Beamforming is good! It eliminates some of the interfering noises. It has the capacity to eliminate non-diffused noises. These are the noises that non-beamforming noise canceling techniques will take care of. It will also reduce the level of reverb you pick up.
- There are many types of techniques that can be called beamforming. The large variety is reflected in the various performance levels. In other words, beamforming is good, but it’s not all the same. Your vendor isn’t going to make it easy on you to assess all the aspects of their beamformers.
- Wide aperture and a large number of microphones is important. You can get impressive performance out of a smaller array, but it’s always better to work with a large array with multiple microphones.
- Evaluating performance isn’t simple. In most cases, the beamforming output is fed into a mixer and goes through AGC. If you want to analyze it scientifically, you would have to freeze the beams, put the array in an anechoic chamber, and run a number of beam patterns. This isn’t something you can typically do, unless you’re the developer. The best approach is to “taste the pudding”. Listen to the output to see if you like it. Try to listen to artifact, switching noises, and all kinds of non-natural effects. Try the array from a distance, but also in close proximity. Introduce noises and see if it’s thrown off. Trust companies that have the experience and knowledge. Do your research on companies that come out with an array solution out of nowhere.
I’d like to wrap up by addressing one final topic. Many times people inquire about “multiple talkers”. The answer to this would be the same if you place many microphones around the table and fed them to a mixer. An unintelligent mixer would sum all the microphones up. By doing so, it would pick up all the talkers simultaneously along with some noise, which isn’t good. A poorly designed mixer would switch between the different microphones, selecting the microphone being used in a way that can be noticed. A well designed mixer will select the best microphone, or a combination of microphones, and at any given moment make the switch sound smooth and seamless. So when multiple people are talking at the same time, a good mixer should select a combination of microphones and modify its selection fast enough while reacting to the current situation at any given moment. If the selection is done quickly and the switching is done smoothly, the output will include all the necessary signals, meaning all the talkers will be picked up with the minimum level of noise. Now, instead of using many microphones around the table, say you use many beams, each looking at a different direction. Your mixer should be able to determine the right beam to use and how to combine it with other beams to make the switching smooth and seamless. The concern about “multiple talkers” is very legitimate, but it’s related to the mixer, not the beamformer itself.
I’m sure some of you are expert beamformer designers and you may have found this article a little too simplified. For those of you whom are less proficient, I hope it explained some of the terms and concepts you run into when coming across yet another microphone array solution. What I hope you take from this is that, it’s not a simple subject. It’s much more complicated because of the external parameters that have to be taken into consideration when implementing these solutions in real-life situations. These include the room acoustics, microphone mismatch, the mechanical design of the array, position of the array, and proximity to other objects, just to name a few. A good microphone array is one that behaves well in most cases, in most locations, for most of the time. This is the art of fine tuning performance and it can only be the result of many years of experience.
If you have any questions or wish to discuss further I’ll be more than happy to try to help. Find me on our company’s blog, or through our contact link on our website.