1/26
These vocabulary flashcards cover the key technical concepts of generative audio modelling, neural codecs, specific AI models (MusicLM, MusicGen, Jukebox), and the ethical frameworks for Responsible AI Music (RAIM).
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Symbolic Music Generation
Generative music modelling using compact representations such as MIDI, MusicXML, ABC, or Kern, where music is treated as a language modelling problem.
Audio Music Generation
The process of generating sound based on spectrograms (2D approach) or waveforms (1D approach), capturing rich performance and production info at the cost of high dimensionality.
The Audio Bottleneck
The computational challenge of predicting audio sample-by-sample due to high density; standard CD-quality audio features 44,100 individual data points per second.
Autoencoder (AE)
A neural network module that maps high-dimensional input to a smaller, compressed "Latent Space" (embedding) via an Encoder and reconstructs it via a Decoder.
Variational Autoencoder (VAE)
A generative AE that maps inputs to a continuous probability distribution (via mean and variance) rather than a fixed point, forcing the model to learn a smooth latent space.
Reconstruction Loss
A component of the VAE loss function that measures how accurately the decoder rebuilt the original input from the latent representation.
KL Divergence
A regularization term in the VAE loss function that forces learned distributions to stay close to a standard normal distribution to prevent gaps in the latent space.
Neural Audio Codecs (NACs)
Neural networks that learn the optimal way to convert raw waveforms into an "alphabet" of discrete acoustic tokens for efficient AI processing.
Vector Quantisation (VQ)
The process of turning continuous VAE embeddings into discrete tokens by "snapping" a latent representation to the nearest cluster center in a fixed codebook.
Codebook
A fixed set of cluster centers in a latent space used in Vector Quantisation to represent audio samples as discrete indices.
VQ-VAE
A Variational Autoencoder that utilizes a vector-quantized latent space to learn discrete representations of data.
Residual Vector Quantisation (RVQ)
A technique using multiple sequential codebooks where the first level captures coarse features and subsequent levels quantize the "residual" error to capture fine details like timbre and reverb.
OpenAI Jukebox
A 2020 generative model using three levels of hierarchical VQ-VAE to capture both global musical structure and fine acoustic details, such as singing voices.
Semantic Tokens
Tokens extracted using models like w2v-BERT that capture long-term coherence, including melody, rhythm, and speech content (the "what" of the audio).
Acoustic Tokens
Tokens extracted from Neural Audio Codecs that capture high-fidelity details required for reconstruction (the "how" of the audio).
MusicLM
A Google generative model that builds on the AudioLM pipeline by adding text conditioning through a joint text-audio embedding model called MuLan.
MusicGen
A Meta generative model that uses token interleaving (delay pattern) and the EnCodec tokenizer to maintain musical structure without separate semantic tokens.
Vocoder
A specialized neural network, such as Hi-Fi GAN, used to translate 2D spectrogram representations back into 1D audio waves.
Fréchet Audio Distance (FAD)
An evaluation metric for acoustic quality that compares the statistical distribution of AI-generated audio against a dataset of real studio-quality music; lower scores indicate higher fidelity.
CLAP Score
A metric for semantic alignment (Contrastive Language-Audio Pretraining) that measures how well generated audio matches the provided text prompt.
RAIM Framework
The Responsible AI Music framework, an interdisciplinary effort to identify features for the ethical and responsible development of generative music systems.
Human Agency and Oversight
A RAIM principle ensuring individuals can influence and monitor AI systems while maintaining control over the creative process.
Robustness and Safety
A RAIM requirement that generative systems remain reliable and secure, preventing the production of harmful content or exposure to adversarial attacks.
Privacy and Data Governance
A RAIM requirement ensuring that training data is legally acquired and that the system does not leak private information or violate copyrights.
Transparency
A principle encompassing explainability regarding the data, model architecture, and business models used in generative AI systems.
Diversity, Fairness, and Non-Discrimination
The implementation of mechanisms to avoid unfair bias and ensure accessibility and fair treatment for all users of generative systems.
Accountability
A requirement ensuring AI systems are responsible for their design, implementation, and impact throughout their entire lifecycle.