MLow: Meta’s low-bitrate audio codec


  • At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.
  • We’re working to make RTC accessible by providing a high-quality experience to everyone, even those who don’t have the fastest connections or newest phones.
  • As more people rely on our products to make calls over the years, we’re working on new ways to ensure all calls have solid audio quality.
  • We created the Meta Low Bitrate (MLow) codec: a new tool that improves audio quality, especially for those using slow speed connections.
Figure 1: Increasing complexity or bitrate generally improves quality, but good codecs achieve higher quality while balancing the other two.

RTC products use many building blocks to deliver a complete experience, and one of the essential components is audio/video codecs. These codecs help compress the captured audio/video data so that it can be sent efficiently over the Internet to the recipient, thereby retaining the real-time experience. For example, the size of raw audio captured for a typical call is 768 kbps (mono, 48 kHz sampling, 16 bit depth), which modern codecs are capable of compressing down to 25-30 kbps. This compression often comes at the expense of a certain quality (loss of information), but good codecs can find a balance between the trio of quality, bitrate and complexity by exploiting a deep knowledge of the nature of the audio signal as well as by using psychoacoustics. .

Building a good codec is quite difficult, and that’s why we don’t see new codecs emerging very often. The last good, widely known open source codec was Opus, released in 2012, which became the codec of choice for the wide variety of applications on the Internet. Meta has used Opus for all of its PSTN needs and, so far, it has served us well, helping to deliver quality calls to billions of users around the world.

Our motivation for building a new codec

Given the heavy use of RTC in Meta products, we can see how a codec performs in a range of network scenarios and how it impacts the end-user experience. In particular, we observed that a significant portion of calls have poor network connections during or during part of the call. Generally, a bandwidth estimation (BWE) module detects the network quality, and as the network quality degrades, we need to reduce the operating bit rate of the codec to avoid network congestion and maintain smoothness audio, which impacts the balance of the trio referenced above. To complicate matters, making a video call despite poor network quality leaves little room for audio and reduces the audio bitrate even further. Opus’ lowest operating point is 6 kbps, at which it operates in NarrowBand mode (0 – 4 kHz) and does not properly capture all sound frequencies produced by human voices and therefore does not sound as clear or natural . Here’s an example of Opus’ audio at 6 kbps and the corresponding reference file for comparison.

Raw reference signal:

Opus @ 6 kbps NarrowBand (NB):

Over the past couple of years, we have seen the development of new machine learning (ML)-based audio codecs that deliver good quality audio at very low bitrates. In October 2022, Meta released Encodec, which achieves incredibly clear audio quality at very low bitrates. Although these AI/ML-based codecs are capable of achieving high quality at low bitrates, this often comes at the expense of high computational cost. Therefore, only very high-end (expensive) mobile phones are capable of running these codecs reliably, while users using low-end devices continue to experience audio quality issues under low bitrate conditions. Thus, the net impact of these new computationally expensive codecs is actually limited to a small portion of users.

A significant number of our users still use low-end devices. For example, more than 20% of our calls are made on ARMv7 devices, and tens of millions of daily WhatsApp calls are made on devices more than 10 years old. Given the choices of codecs readily available and our commitment to ensuring that all users, regardless of the device they use, enjoy a quality calling experience, we clearly need a codec with very low computational requirements that still provide high quality audio at these lowest bitrates.

The MLow codec

We began development of a new codec at the end of 2021. After nearly two years of active development and testing, we are proud to announce Mr.And Weak Bitrate audio codec, aka MLow, which achieves twice the quality of Opus (POLQA MOS 1.89 vs. 3.9 at 6 kbps WB). More importantly, we are able to achieve this high quality while maintaining the computational complexity of MLow. 10 percent less than that of Opus.

Figure 2 below shows a Mean Opinion Score (MOS) plot on a scale of 1 to 5 and compares POLQA scores between Opus and MLow at different bitrates. As the graph shows, MLow has a huge advantage over Opus at the lowest bitrates, where it saturates quality faster than Opus.

Figure 2: POLQA score comparing Opus (WB) to MLow at different bitrates on a large file dataset.

We have already fully launched MLow for all Instagram and Messenger calls and are actively rolling it out to WhatsApp. We’ve already seen an incredible improvement in user engagement with better audio quality.

Here are some audio clips to listen to. We suggest using your favorite headset to appreciate the striking differences in audio quality.

Opus 6 kbps NB MLow 6 kbps WB Reference

Being able to encode high-quality audio at lower bitrates also unlocks more effective forward error correction (FEC) strategies. Compared to Opus, with MLow we can afford to pack FEC at much lower bitrates, which significantly helps improve audio quality in packet loss scenarios.

Here are two audio samples at 14 kbps with significant packet loss of 30% on the receiver side.

Opus:

Note that at these bitrates, Opus is not capable of in-band FEC encoding. It needs a minimum of 19 kbps to encode any in-band FEC with 10% packet loss, which is detrimental to audio recovery.

MWeak internal components

MLow builds on the concepts of a classic CELP (Code Excited Linear Prediction) codec with advances in excitation generation, parameter quantization and coding schemes. Figure 3 is a high-level visual of the internal workings of the codec. On the left we have an input signal (raw PCM audio) feeding the encoder, which then splits the signal into low and high frequency bands. Then, each band is encoded separately while using the shared information to achieve better compression. All output passes through a range encoder to further compress and generate an encoded payload. The decoder does exactly the opposite when it receives the payload to generate output audio signals.

Figure 3: High-level MLow encoder and decoder architecture.

With these split-band optimizations, we are able to encode high-band using very few bits, allowing MLow to deliver SuperWideBand (32kHz sampling) using a much lower bitrate.

And after?

MLow has significantly improved audio quality on low-end devices while ensuring calls are end-to-end encrypted. We’re really excited about what we’ve accomplished in the last two years alone: ​​from developing a new codec to successfully delivering it to billions of users around the world. We continue to work on improving audio recovery in high packet loss networks by pumping out more redundant audio, which MLow allows us to do efficiently. We’re excited to share more as we continue to work to make it easier for all of our users to make quality audio calls.





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top