COMP346: Music Intelligence - Generative Audio Modelling & Responsible AI Music

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/26

Earn XP

Description and Tags

These vocabulary flashcards cover the key technical concepts of generative audio modelling, neural codecs, specific AI models (MusicLM, MusicGen, Jukebox), and the ethical frameworks for Responsible AI Music (RAIM).

Last updated 6:22 PM on 5/14/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

27 Terms

New cards

Symbolic Music Generation

Generative music modelling using compact representations such as MIDI, MusicXML, ABC, or Kern, where music is treated as a language modelling problem.

New cards

Audio Music Generation

The process of generating sound based on spectrograms (2D approach) or waveforms (1D approach), capturing rich performance and production info at the cost of high dimensionality.

New cards

The Audio Bottleneck

The computational challenge of predicting audio sample-by-sample due to high density; standard CD-quality audio features $44,100$ individual data points per second.

New cards

Autoencoder (AE)

A neural network module that maps high-dimensional input to a smaller, compressed "Latent Space" (embedding) via an Encoder and reconstructs it via a Decoder.

New cards

Variational Autoencoder (VAE)

A generative AE that maps inputs to a continuous probability distribution (via mean and variance) rather than a fixed point, forcing the model to learn a smooth latent space.

New cards

Reconstruction Loss

A component of the VAE loss function that measures how accurately the decoder rebuilt the original input from the latent representation.

New cards

KL Divergence

A regularization term in the VAE loss function that forces learned distributions to stay close to a standard normal distribution to prevent gaps in the latent space.

New cards

Neural Audio Codecs (NACs)

Neural networks that learn the optimal way to convert raw waveforms into an "alphabet" of discrete acoustic tokens for efficient AI processing.

New cards

Vector Quantisation (VQ)

The process of turning continuous VAE embeddings into discrete tokens by "snapping" a latent representation to the nearest cluster center in a fixed codebook.

New cards

Codebook

A fixed set of cluster centers in a latent space used in Vector Quantisation to represent audio samples as discrete indices.

New cards

VQ-VAE

A Variational Autoencoder that utilizes a vector-quantized latent space to learn discrete representations of data.

New cards

Residual Vector Quantisation (RVQ)

A technique using multiple sequential codebooks where the first level captures coarse features and subsequent levels quantize the "residual" error to capture fine details like timbre and reverb.

New cards

OpenAI Jukebox

A 2020 generative model using three levels of hierarchical VQ-VAE to capture both global musical structure and fine acoustic details, such as singing voices.

New cards

Semantic Tokens

Tokens extracted using models like w2v-BERT that capture long-term coherence, including melody, rhythm, and speech content (the "what" of the audio).

New cards

Acoustic Tokens

Tokens extracted from Neural Audio Codecs that capture high-fidelity details required for reconstruction (the "how" of the audio).

New cards

MusicLM

A Google generative model that builds on the AudioLM pipeline by adding text conditioning through a joint text-audio embedding model called MuLan.

New cards

MusicGen

A Meta generative model that uses token interleaving (delay pattern) and the EnCodec tokenizer to maintain musical structure without separate semantic tokens.

New cards

Vocoder

A specialized neural network, such as Hi-Fi GAN, used to translate 2D spectrogram representations back into 1D audio waves.

New cards

Fréchet Audio Distance (FAD)

An evaluation metric for acoustic quality that compares the statistical distribution of AI-generated audio against a dataset of real studio-quality music; lower scores indicate higher fidelity.

New cards

CLAP Score

A metric for semantic alignment (Contrastive Language-Audio Pretraining) that measures how well generated audio matches the provided text prompt.

New cards

RAIM Framework

The Responsible AI Music framework, an interdisciplinary effort to identify features for the ethical and responsible development of generative music systems.

New cards

Human Agency and Oversight

A RAIM principle ensuring individuals can influence and monitor AI systems while maintaining control over the creative process.

New cards

Robustness and Safety

A RAIM requirement that generative systems remain reliable and secure, preventing the production of harmful content or exposure to adversarial attacks.

New cards

Privacy and Data Governance

A RAIM requirement ensuring that training data is legally acquired and that the system does not leak private information or violate copyrights.

New cards

Transparency

A principle encompassing explainability regarding the data, model architecture, and business models used in generative AI systems.

New cards

Diversity, Fairness, and Non-Discrimination

The implementation of mechanisms to avoid unfair bias and ensure accessibility and fair treatment for all users of generative systems.

New cards

Accountability

A requirement ensuring AI systems are responsible for their design, implementation, and impact throughout their entire lifecycle.