Audio compression

EnCodec: High-fidelity Neural Audio Compression

We introduce EnCodec, a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. EnCodec is trained specifically to compress any kind of audio and reconstruct the original signal with high fidelity. It consists of an autoencoder with a residual vector quantization bottleneck that produces several parallel streams of audio tokens with a fixed vocabulary. The different streams capture different levels of information of the audio waveform, allowing to reconstruct the audio with high fidelity from all the streams.
We further propose a Multi-Band Diffusion framework that generates any audio modality with higher perceived quality from EnCodec’s low-bitrate discrete representations.
In addition to the audio compression use case, EnCodec compressed representation can be used as inputs for audio language modeling tasks as demonstrated in the AudioGen and MusicGen works.
Read the EnCodec paperRead the Multi-Band Diffusion paperGet started

EnCodec for compression

In this website, we show the performance of EnCodec as a generic audio codec at different bandwidth and for both monophonic and stereophonic audio. We released the training code for EnCodec, allowing anyone to train their own compression model tailored for their applications.

48 khz stereophonic audio

EnCodec is the first neural audio codec for stereophonic audio at 48 kHz, which is a standard quality for music. The 48 kHz model support bandwiths at 3, 6, 12 and 24 kbps.

24 khz monophonic audio

We additionallly show how EnCodec performs on 24 kHz monophonic audio that covers already plenty of use cases. The 24 kHz model can compress audio to 1.5, 3, 6, 12 or 24 kbps.

EnCodec and Diffusion

We also explored diffusion-based approaches for EnCodec and trained a diffusion model that is fully compatible with the discrete latent space of EnCodec. More specifically, we introduce a high-fidelity Multi-Band Diffusion framework (MBD) that generates any type of audio modality from low-bitrate discrete representations.

EnCodec with Multi-Band Diffusion (MBD-EnCodec) shows even less audio artifacts at the expanse of more computations.

Multi-Band Diffusion EnCodec for compression at 24 khz

Let’s listen to samples from MBD-EnCodec at 3 kbps compared to the original EnCodec decoder.

MBD-EnCodec can operate at different bitrates as well. Check out these music samples with bitrates ranging from 1.5 to 6 kbps.