|
Audio DSP Intelligence
For the past five years, the audio industry has witnessed a tectonic shift in the platforms for digital signal processing available to developers of audio products. Now, we are seeing another significant evolution as the industry migrates to extremely sophisticated use cases of edge AI and on-device resources. All the while, users' expectations of audio features continue to increase. Voice recognition and voice engines were one of the biggest incentives for manufacturers to boost the processing power of built-in DSP and audio systems. Voice applications required embedded signal processing to ensure that voice signals that originated in even the most unpredictable conditions and captured by microphones could be clearly recognized. But the original voice assistant implementations have almost always depended on connectivity to the cloud and super-powerful centralized computing infrastructures. And those have now been proven to be unsustainable as a business model, and for any audio manufacturer to offer without recurring costs.
Fortunately, those same efforts related to voice have contributed greatly to the sophisticated systems that we now see embedded in true wireless earbuds and consumer wearable designs, where there is simply no connectivity to offload any of the processing. Users expect increasingly more advanced features to simply work — and all in the context of ultra-low power requirements. The considerable DSP power existing in those platforms is also now benefiting new generations of consumer audio products able to accept simple voice commands processed on-device, and which at the same time are able to run sophisticated room correction, noise reduction, and previously unimaginable algorithms. A remarkable example was recently displayed by Apple with new features for its existing AirPods line of products, simply “activated” through a firmware update. Benefiting from its own silicon and full control of the hardware and software integration, Apple introduced Adaptive Audio, a term that defines a new listening mode that dynamically blends Active Noise Cancellation and Transparency (pass-through) to tailor the noise control experience based on the conditions of the user environment — in real time.
Naturally, very soon users will expect other systems to be able to offer this processing power and similar features. But not all manufacturers have access to an integrated Apple H2 SoC that uses cutting-edge 7nm technology with lower power consumption (Apple's previous chip, the H1 used 16nm process Nodes — and competitors in the space mostly still use 28nm technology). As any audio manufacturer quickly finds, in order to implement similar features to AirPods in a true wireless stereo (TWS) design, the limitations in terms of memory and processing power determine what new products can and cannot do. And those limitations are inherent to the existing chips from the major alternative vendors of Bluetooth systems for TWS products, or the traditional DSP platforms that are available for audio designs. As always, the audio industry alone is not able to determine what chips can or cannot do. Fortunately, the hyper-enthusiasm with Generative AI is coming to the rescue. Cadence Design Systems and every other major semiconductor company is racing to support Generative AI solutions and run large language models (LLM). This is creating the incentives for a new generation of chips that no audio manufacturer or even many large technology companies could develop on their own. Cadence is working on new solutions optimized for artificial intelligence (AI)-based speech recognition processing, expanding its family of Tensilica HiFi DSP solutions for audio and voice. New generations of HiFi DSPs normally double the power for audio processing as they also implement neural network (NN) processing improvements. This is good news for automotive developers focused on voice and immersive audio experiences, and also means that many other systems will likely benefit from new chips with additional HiFi DSP cores, energy efficiency for always-on use cases, and many improvements designed to accelerate neural network inferencing in addition to traditional DSP workloads. These new-generation DSPs are optimized to run machine learning applications, and able to power wake word detection, audio scene detection, speech recognition, and noise reduction, among other audio features.
Ultimately, these solutions will evolve from infrastructure to specialized AI chips for connected applications, and finally, to edge-AI, being able of unprecedented power all on their own. That is precisely the inspiration that has already lead to the foundation of several new chip companies, focused on delivering purpose-built silicon for edge AI deployment. But even those young pioneers are now facing the massive transition that is required to support Deep Learning AI models that run directly on-device, including speakers, headphones and hearables. Having platforms designed specifically for edge applications, running trained models optimized for audio processing from large numbers of sensors will likely not be viable unless those are based on large-scale solutions that are being created with other uses in mind — as always.
Subscribe To
audioXpress Magazine
|
|