Very few people truly understand audio and its layered complexities. Often, people who know perfectly well what they are talking about are engineers, because the essence matters more to them than anything else. When I came out of a conversation with Harry Jones, Sonos Sound Experience Engineer, he noted something very important – that their products get better the longer you have them. For example, he said, there are the Sonos Ace headphones, released a year ago, which recently received a significant software update adding Adaptive ANC (or automatic noise cancellation), TrueCinema, and Multiuser TV Audio Swap features.
“It’s not something you buy once and that’s the experience you get, because our products get better over time through software updates,” he said.
Jones said that Sonos’ intention was to “continue to explore and push the boundaries” with audio processing and personalization. One thing is clear, as the American audio giant builds product portfolios for the coming years as well as improves existing products that users have, there is no intention of falling back on the buzzword called artificial intelligence (AI) to create something that is not suitable for the user or that does not improve the sound experience. Jones talked about the search for perfect sound, how listener feedback is balanced with benchmark results, and the nuances of getting the right sound in your home. Edited excerpts.
At Sonos, how do you balance what people like versus what sounds good in benchmarks and technical devices? If listeners say they’d like something else, how do you decide?
This is a really good question because it explores what sets us apart as a brand. When we start development on a new product, we have a group of extremely talented and experienced audio engineers, and they will work on different tunings. Tuning itself is something we spend a lot of time on, and you’re right that a lot has to do with measurement mics, anechoic chambers, and lab environments to get us 70% to 80% of the time there. At that time many people will stop and leave. But we know that throughout this process, a lot of changes occur, including, for example, spatial audio. Those productions are produced in rooms with 20 loudspeakers, and it’s the perfect environment for that kind of thing. But we have to do this with only one box. If I could have it my way, for that matter, if anyone could do it their way, and I’m sure you would too, we would have 20 loudspeakers in the living room. But this is not a realistic expectation.
So where we come in is that we are able to present the original intent of the music or film or TV piece in a way that satisfies the original artist. It’s also absolutely important to us, from a solid philosophy standpoint, that we’ve felt over the years that the only way we can really get to that point is with direct input from the people who created the content themselves. The people who mixed your favorite music or worked as sound designers on your favorite TV series have likely worked on our speakers. Talking about our creative community is an important thing to us and something we’re very proud of. We have our own Soundboard Group, which is a core group of mixers, engineers and producers across music, film and TV. They are Oscar-winning and Grammy-winning people, and know their industry well. We have an extensive network of over 450 creators in this industry who are truly at the top of their game.
We wouldn’t be able to get to the point without them that we are able to represent this original art. Obviously, if listeners prefer something else, we offer things like TruePlay or EQ changes in the app. But our job is really to put out the best tunes possible with input from industry-wide, trusted voices. In this sense, our collaboration with creators is extremely important.
Sonos systems are heard in a variety of indoor listening environments – noisier apartments in India, versus more peaceful and often softly-furnished homes in the West. Should sound systems attempt to ‘fix’ room acoustics or work with them, and how might AI tuning evolve beyond TruePlay today over the next few years?
It begs the question about real-world scenarios, not that we just say our speakers sound amazing, which they do (laughs), but it really depends on the room you put them in and even more so. Over the years, it’s become quite clear how important TruePlay has become to actually correcting some of that, or at least helping push it in the right direction in the case of poor room acoustics. Obviously, we can’t fight physics, but certainly, TruePlay will start to help the acoustics of the speaker by taking care of the location. And really help that talent that we’re known for really help it stand out. The basic idea with TruePlay is that getting great sound at home shouldn’t be particularly difficult. You should be asking why my speaker doesn’t sound very good? You should be able to put it in most places, run TruePlay, and get extra quality.
If there’s any trend I’ve seen over the past few years that’s kind of been low-hanging fruit to help the sound experience for consumers, it’s definitely placement. We have seen the trend of media walls nowadays where people will install sound bars in small spaces or in shelves. There are up-firing drivers, there are out-firing drivers, and people don’t understand it properly. And so you get a huge accumulation of energy within this very small acoustic space and people say they can hear dialogue and nothing else. A little more knowledge or understanding about it would really help in placing it on a TV stand instead of in a cavity. Even in simple terms, the simplicity of having the sound bar in front of you and the people around you behind you, and then TruePlay can really help you get to that next level when you understand the exact placement of all that.
In terms of other issues like how we’re working with AI to deal with the environment, I think, in addition, we’ve made a massive leap forward with our new speech enhancement feature. And we’re actually able to now separate the speech from everything else and provide the user with these really tasty “4 levels of tuning” and their intensity increases as they go up. We understand that it is becoming more difficult to understand conversations now, and you know there are a myriad of issues. We don’t pretend that this future can solve everything, but we could be looking at a really difficult mix, if you’re listening to something where we’re looking at a rushed film schedule with post-production having to happen in a very short period of time – so recording everything is done really hurriedly. It’s like TV, a conveyor belt where it just has to go out, you go to Netflix and there’s like 18 seasons on it. It’s all happening very quickly.
Apart from this, people also have difficulty in hearing. We worked with a charity in the UK called the Royal National Institute for Deaf People, or RNID, and they really helped us define that kind of top-level, maximum setting in the app. So we can really make watching movies and TV more accessible. There are a number of things we are trying to target in this area. And maybe in addition to that, the investment in the Sonos system and the right kind of modularity and making everything work together is a really important point. When it comes to distribution of audio throughout your home and helping in that sense, as well as flexibility, ease of use and really having that optimal sound experience no matter where you are.
Does more speakers really mean better sound, as this is the conventional wisdom, or is it just a factor in tuning and feeling the sound?
I believe that more speakers does not mean better sound at all. We focus a lot of energy on how our speakers perform in standalone configurations. For example, we will of course be tuning the speakers themselves and then this will become your benchmark for pairing the stereo. I would say that since we are so modular as an audio company, things should sound good on their own too. We spend more than half the time on one unit only. And then when you’ve got one thing right, to add something else to it, the stereo presentation kind of takes care of itself if it’s put together correctly. Of course, having a secondary speaker is definitely better than having one, simply because spatially it has a more impressive performance. But it all depends on the budget, taste and location of the user as well. Multi-room audio is a really important aspect where you can decide exactly how many speakers you want and add to your system over time.
It’s a whole ecosystem that gets better the longer you have it. In India, we want to establish this complete experience of sound suites and we deliberately want people to listen in these environments. We got a chance to celebrate the revival of India’s culture and traditional music, which was wonderful. These really demonstrate how our products can work together and separately. And I think that’s a really important division, because as I said, we’ll spend a long, long time on individual tuning of a speaker. The second will probably give you a little more output and a better spatial experience, but in my opinion, more does not always mean better.
AI has become an element in almost everything, at what point would you set the limit for AI in terms of audio processing? Should the industry shift from the pursuit of the ‘perfect’ sonic mirage to a focus on personalization of experiences?
It’s interesting because I think, as a company, we’ve really tried over the years to honor that artistic vision for music, movies and TV. Like I said, get those creators involved early on so they can give their input on what we’re doing. I must say that we will really use AI only when it is better for the sound experience. We will only use it when it is honestly worth using, and will never bring it in as a gimmick or just to survive in the industry. We’re not about that at all. We’ll look at the AI implementation of something and we’ll do our best to make it the best experience possible rather than just using it for the sake of it, especially in a marketing context where you can use buzzwords. AI is kind of a widely discussed topic to sell more products, that’s not all we do.
The way I look at AI, it’s a great tool in our tool set. This is a very complex field with LLM and various machine learning models. Many artists and audio engineers I get a chance to talk to through their work at Sonos now use it extensively in their workflow. Anything from source separation, to creating a sort of Dolby Atmos mix of old records, to AI mastering. Throughout the entire production process, and then on the transmission side, there is a lot of AI involved. It’s just a matter of how do we use it properly?
As you say, finding the right sound, I don’t think we’ve ever had a problem with that. We’ve never had any problems working with artists and doing it right in that sense because we’re sitting right next to them and are able to ask them very direct questions. We never think of being complacent in anything, we don’t want good answers. And I think a lot of companies do that and they will defend it, justify what they’ve done. But we want to hear the real truth about what they think of our speaker in that particular instance. In that sense, I think we’re really ready. I think AI can really potentially help in terms of more personalization content, especially if you think about AI speech announcements. But yes, it is by no means central to getting good sound. A good voice is a good voice. People have different opinions on this, but for me, you can never replace the human ear with AI.





