Music affects emotional states; researchers are interested in AI-driven systems for music generation. AI-based affective music generation (AI-AMG) systems can influence entertainment, healthcare, and interactivity.
The review categorizes existing AI-AMG systems based on music generation algorithms, discusses musical features, and outlines challenges. Authors aim to provide insights into developing controllable AI-AMG systems.
Key Features: These systems utilize a set of predefined rules derived from music theory, psychology, and cultural considerations. They often dictate how musical features like tempo, mode, and chord progressions are manipulated to evoke specific emotional responses.
Examples in Literature: Early works focused on classical music rules, using simple algorithms to generate pieces that could convey basic emotions. More recent studies have incorporated complex rules to mimic contemporary genres, enhancing emotional output.
Key Features: These systems leverage large datasets to learn patterns in music that correlate with emotional expression. Machine learning techniques are often used to analyze and generate music based on previously played or recorded pieces that evoke certain emotions.
Examples in Literature: The utilization of neural networks and deep learning has surged since 2015, leading to sophisticated models capable of composing pieces that resonate emotionally with listeners.
Key Features: Optimization methods involve defining an objective function related to emotional expression and altering musical features iteratively to achieve the desired outcome. Techniques such as genetic algorithms or machine learning optimization frameworks are frequently employed.
Examples in Literature: Recent studies highlight systems that optimize for emotional divergence within compositions, creating unique blends that not only meet emotional criteria but also adhere to musical quality standards.
Key Features: Combining elements from rule-based and data-driven systems, hybrid methods aim to capitalize on the strengths of both approaches. This involves using learned models to guide rule-based characteristics, allowing for greater flexibility and emotional nuance.
Examples in Literature: Various studies have reported hybrid approaches yielding better results in terms of listener engagement and emotional response, demonstrating the potential for more dynamic music generation.
Advantages over human-generated music include avoiding copyright issues, novel blends of genres/elements, and real-time adaptability to listeners' states. Potential fields of impact include:
Healthcare: Using music therapy for anxiety and depression.
Co-creativity: Collaborative composition between humans and AI.
Entertainment: Enhancing gaming and storytelling experiences using emotionally tailored music.
Previous reviews mainly focused on music generation systems without emphasizing emotional aspects. This review highlights the shift towards systems that explicitly create and evaluate emotional content in music.
Systematic search conducted across Google Scholar, Scopus, and IEEE Xplore with specific queries targeting emotion-based music generation systems. 63 relevant articles spanning 1990-2023, with most advancements post-2015.
Maps user input to emotional data usable by the system (e.g., via text, video). Emotions expressed as discrete sets or continuous values (e.g., valence-arousal).
Composes music that expresses intended emotions. Music represented through features such as tempo, melody, harmony, and rhythm. Can use symbolic (MIDI) or audio representations.
Evaluates the emotional effectiveness of the generated music using:
Algorithm-Based Assessment (ABA): Analytical comparison with templates to generate metrics.
Human Study-Based Assessment (HBA): Listener evaluations to gauge emotional perceptions. Most reviewed systems lack formal evaluations, which affects reliability.
Key features impacting emotions:
Tempo: Indicates arousal, manipulated through rules, neural networks.
Mode/Scale: Influences valence depending on major (positive) or minor (negative) keys.
Chord Progressions: Affects emotion representation; selected using probabilistic methods.
Instrument Volume: Impacts both valence and arousal.
Rhythm: Controls arousal and emotion; simple/moderate approaches can be effective.
Control: Difficulty in allowing users to specify emotional content accurately.
Adaptability: Need for music to dynamically change with narrative elements.
Hybridization: Lack of clarity in combining approaches; systematic methods are needed.
Long-Term Structure: Coherence across long music pieces is often a challenge.
Manipulating Listener Expectations: Understanding and leveraging musical expectations to elicit emotions.
Focus on interdisciplinary approaches and the relationship between music features and emotional expression.
Use reinforcement learning and conditional architectures to improve control over emotional expression in music generation.
Develop larger datasets with reliable emotion annotations for training AI-AMG systems.
This review encapsulates the landscape of controllable AI-AMG systems, listing their components, methodologies, and challenges while suggesting future pathways for enhancing these computational systems.