Generative Audio
Jukebox is a neural network designed by OpenAI that is capable of generating music in a variety of genres and artist styles. It utilizes a Vector Quantized Variational Autoencoder (VQ-VAE) variant called VQ-VAE-2 that uses feedforward encoders and decoders only.
The algorithm uses three levels of VQ-VAE that independently encode the input. The top level encoding learns the highest degree of abstraction while the bottom level encoding produces the highest quality audio. Cascade transformers generate audio from the top level to the bottom level.
We use Jukebox to generate several audio files in different genres and artist styles. The best samples are saved and manually paired with the videos generated by our pipeline.