L6: Multimodal AI

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/7

Looks like no tags are added yet.

Last updated 2:53 PM on 4/26/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

New cards

Unimodal AI

System that can process, understand and generate information from only one type of data → One “modality”

New cards

Multimodal AI

AI system that can process, understand and generate information across multiple modalities

New cards

Challenges of multimodal AI

Generating a plausible world
- Must satisfy physics, aesthetics, and language meaning
Not just realism
- Control over structure, space, time and continuity

New cards

How does Multimodal AI work?

New cards

Tokenisation in Multimodal AI

Text tokenisation: Text → Sequence of (partial words)
- E.g. “Un” “believ” “able”
Image tokenisation: Raw pixels → Visual patches
Audio tokenisation: Sound waves → Sequence of time slices

New cards

Transformer (Thinker)

New cards

Diffusion model (Creator)

New cards

Diffusion model

Making pictures messy
1. Add random noise to images and observe what noisy images look like
Learn to clean up
1. Use many images from step 1 to learn how to remove noise
2. Training → Prediction
Create from scratch
1. Start with random noise
2. Create a new image