In this website, we show the performance of EnCodec as a generic audio codec at different bandwidth and for both monophonic and stereophonic audio. We released the training code for EnCodec, allowing anyone to train their own compression model tailored for their applications.
EnCodec is the first neural audio codec for stereophonic audio at 48 kHz, which is a standard quality for music. The 48 kHz model support bandwiths at 3, 6, 12 and 24 kbps.
We additionallly show how EnCodec performs on 24 kHz monophonic audio that covers already plenty of use cases. The 24 kHz model can compress audio to 1.5, 3, 6, 12 or 24 kbps.
We also explored diffusion-based approaches for EnCodec and trained a diffusion model that is fully compatible with the discrete latent space of EnCodec. More specifically, we introduce a high-fidelity Multi-Band Diffusion framework (MBD) that generates any type of audio modality from low-bitrate discrete representations.
EnCodec with Multi-Band Diffusion (MBD-EnCodec) shows even less audio artifacts at the expanse of more computations.
Let’s listen to samples from MBD-EnCodec at 3 kbps compared to the original EnCodec decoder.
MBD-EnCodec can operate at different bitrates as well. Check out these music samples with bitrates ranging from 1.5 to 6 kbps.