Music generation

MusicGen: Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output.
Read the paperGet started

Text-to-music generation

MusicGen is an audio generation model specifically tailored for music generation. Music tracks are more complex than environmental sounds, and generating coherent samples on the long-term structure is especially important when creating novel musical pieces. Our modeling approach naturally extends to stereophonic music generation.

A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle.

Classic reggae track with an electronic guitar solo

drum and bass beat with intense percussions

A dynamic blend of hip-hop and orchestral elements, with sweeping strings and brass, evoking the vibrant energy of the city.

Violins and synths that inspire awe at the finiteness of life and the universe.

80s electronic track with melodic synthesizers, catchy beat and groovy bass

Smooth jazz, with a saxophone solo, piano chords, and snare full drums

Rock with saturated guitars, a heavy bass line and crazy drum break and fills.

Melody-guided music generation

In the MusicGen work, we explore additional ways of controlling music generation and introduce a novel unsupervised melody-guided generation approach based on chromagram. Given a music sample, we extract the main melody through chromagrams. Chromagrams are features that ​​capture harmonic and melodic characteristics of music and that are robust to instrumentation or timbre. This signal is used to guide the generation to follow the extracted melody while being faithful to the provided text description.

90s rock song with electric guitar and heavy drums

An 80s driving pop song with heavy drums and synth pads in the background

An energetic hip-hop music piece, with synth sounds and strong bass. There is a rhythmic hi-hat patten in the drums.

90s rock song with electric guitar and heavy drums

An 80s driving pop song with heavy drums and synth pads in the background

An energetic hip-hop music piece, with synth sounds and strong bass. There is a rhythmic hi-hat patten in the drums.

90s rock song with electric guitar and heavy drums

An 80s driving pop song with heavy drums and synth pads in the background

An energetic hip-hop music piece, with synth sounds and strong bass. There is a rhythmic hi-hat patten in the drums.

Long generation

MusicGen models are trained on 30-second chunks of audio but it is possible to generate longer sequences with a simple windowing approach. Let’s use a fixed 30-second windows and slide the window by chunks of 10 seconds, keeping the last 20 seconds that were generated as context, to generate those 2 minute-long tracks.

Lofi slow bpm electro chill with organic samples

A light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions

A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle.

Text-to-music generation with diffusion-based EnCodec

The audio quality of MusicGen can be further enhanced relying our latest Multi-Band Diffusion EnCodec decoder. The MBD-enhanced EnCodec generates samples showing even less audio artifacts, at the expanse of more computations.

Bluesy guitar instrumental with soulful licks and a driving rhythm section

Progressive rock drum and bass solo

Punk Rock song with loud drum and power guitar