The future of mobile communications: IVAS audio call

Voice is our most important means of communication, and telephony has allowed us to connect through our voice for more than a century. The telephone conversation as we know it has evolved from analogue to digital, from landline to mobile, and from low voice quality to natural voice quality. However, one major advance was still missing: how to deliver fully authentic, immersive sound live.

The introduction of the IVAS (Immersive Voice and Audio Services) codec, standardized by 3GPP in Release 18 in June this year, represents a major advancement in audio technology. Unlike traditional monophonic voice calls, IVAS enables the transmission of immersive, three-dimensional audio, providing a richer, more lifelike communications experience. This innovation is made possible using new audio formats optimized for a spatial audio experience. An example of this is a new Metadata-Assisted Spatial Audio format, MASA, which uses only two audio channels and metadata for spatial audio descriptions. Spatial audio calling allows users to experience sound as if it were happening in real life, complete with features such as head tracking.

Below we will explore the challenges associated with bringing 3D live calling to mobile phones, the requirements addressed in spatial communications and the new IVAS codec, and the game-changing impact that live 3D audio will have for people, mobile operators and business smartphones.

Kai Havukainen

Head of Product Management, Nokia Technologies.

3D calling to mobile phones

The last major innovation in voice calling was the EVS codec, introduced in 2014 and recognized by consumers as HD Voice+. While it significantly improved call quality, like all previous codecs, it only provided a monophonic listening experience.

With the introduction of 3D audio calling – the biggest leap in voice calling audio technology in decades – comes the challenge of creating an authentic, immersive experience in everyday communications. Although voice technology has evolved significantly – from analog to digital, from fixed to mobile and from low quality to natural voice quality – transmitting spatial audio, where sounds are perceived as coming naturally from all sides, is much more complex to reproduce in mobile environments .

Achieving this level of immersive sound experience has been easier in controlled environments such as cinemas and video games, where sound design is a core element, but reproducing it in everyday mobile conversations introduces a range of technical hurdles, including real-time spatial sound processing, hardware limitations and ensuring compatibility between devices.

The Immersive Voice and Audio Services (IVAS) speech codec is therefore the most important step forward in audio technology for voice calls in decades.

How to address and overcome spatial communication challenges

Several challenges had to be overcome before Immersive Voice could become a robust spatial audio solution. An important issue is noise cancellation, crucial for improving speech intelligibility in environments such as concerts or in nature. Traditional noise reduction methods often only filter out persistent sounds, such as the hum of air conditioning or traffic noise, but often leave out other background noise. Wind interference also poses a challenge because it introduces unwanted noise and causes fluctuations in audio levels.

However, recent developments in machine learning and intelligent noise reduction have addressed these issues. For example, immersive audio technology is designed to intelligently adjust how much background noise is reduced depending on the environment, and to provide users with control, allowing individuals to manually adjust noise cancellation levels. This ensures that the essential sounds are transmitted while minimizing unwanted background noise.

Immersive audio setups with multiple microphones and speakers also face a major obstacle: acoustic echo. This happens when microphones pick up sound from nearby speakers, causing unwanted feedback. The problem is even more challenging in spatial audio setups, where the placement and number of speakers affect sound quality and the device’s ability to capture spatial audio. Traditional methods of acoustic echo cancellation (AEC) often do not work well in these complex environments. To solve this, a machine learning-based spatial AEC solution was created, which removes the speaker noise from the microphone input using a reference signal. This improves audio quality, especially for spatial audio in real-time speech applications.

Get to know the IVAS codec

To bring spatial audio to mobile calling, in addition to Over-the-Top (OTT) services, the 3rd Generation Partnership Project (3GPP) recently adopted a new voice codec standard. The IVAS codec standard, developed through the collaboration of 13 companies, was included in Release 18 of the 3GPP, building on the widely used Enhanced Voice Services (EVS) codec. Importantly, the IVAS codec maintains full backward compatibility, ensuring seamless interoperability with existing voice services.

One of the key innovations during the IVAS standardization was the creation of a new parametric audio format, Metadata-Assisted Spatial Audio (MASA), specifically designed for devices with limited form factors, such as smartphones. The IVAS codec integrates a built-in renderer that supports head-tracked binaural audio and multi-speaker playback using the MASA format.

Additionally, an immersive voice client SDK can serve as the IVAS front end, capturing spatial audio from device microphones and converting it to the standardized MASA format. This technology enables true 3D immersive audio experiences for different types of voice calls.

The power of 3D Live Audio: what it means for people, operators and companies

New immersive 3D audio is revolutionizing the audio experience for consumers, businesses and industries. For consumers, it increases engagement in interactions with friends and family by sharing local sounds, live-streamed or recorded, and provides full immersion in synchronized metaverse experiences. For enterprises, 3D audio calling unlocks new possibilities, from enhanced customer experience through directional audio to transforming team collaboration and decision-making. In industrial environments, audio analytics can power automated processes such as predictive maintenance, streamlining operations and increasing efficiency.

To enable these experiences under a variety of network conditions, service providers need scalable solutions that optimize performance regardless of bandwidth limitations. The 3GPP IVAS standard codec is suitable for bitrates ranging from 13.2 to 512 kbit/s, ensuring immersive audio quality whether used in busy networks or in high-quality streaming environments. This scalability allows service providers to support more users while delivering rich audio experiences.

Looking to the future, voice-based user behavior is expected to continue to evolve. In addition to traditional conversations, spatial audio communication will expand to include semi-synchronous messaging through popular apps, people sending voice clips to each other, and expanded use of group conversations. With the rise of comprehensive reality devices and services across industries, the scope of voice communications will become even broader, with immersion as its defining feature. A key factor in this evolution will be the standardization and integration of the IVAS codec into the latest advanced 5G standard, which is essential to ensure the interoperability necessary to bring 3D calling to any phone at the touch of a button. to take.

We’ve reviewed the best business phone systems.

This article was produced as part of Ny BreakingPro’s Expert Insights channel, where we profile the best and brightest minds in today’s technology industry. The views expressed here are those of the author and are not necessarily those of Ny BreakingPro or Future plc. If you are interested in contributing, you can read more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Related Post