L6: Multimodal AI

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/7

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:53 PM on 4/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

8 Terms

1
New cards

Unimodal AI

System that can process, understand and generate information from only one type of data → One “modality”

2
New cards

Multimodal AI

AI system that can process, understand and generate information across multiple modalities

<p>AI system that can process, understand and generate information across multiple modalities</p>
3
New cards

Challenges of multimodal AI

  • Generating a plausible world

    • Must satisfy physics, aesthetics, and language meaning

  • Not just realism

    • Control over structure, space, time and continuity

4
New cards

How does Multimodal AI work?

  1. Input is tokenised

  2. Transformer + Diffusion model work together to generate output

5
New cards

Tokenisation in Multimodal AI

  • Text tokenisation: Text → Sequence of (partial words)

    • E.g. “Un” “believ” “able”

  • Image tokenisation: Raw pixels → Visual patches

  • Audio tokenisation: Sound waves → Sequence of time slices

6
New cards

Transformer (Thinker)

  • Understands the meaning of the input

  • Turns input (image / text / audio) → Concepts → Conditioning signals

7
New cards

Diffusion model (Creator)

  • Starts from pure noise and conditioning signals

  • Gradually denoises it into a realistic input (image, video, sound)

8
New cards

Diffusion model

  1. Making pictures messy

    1. Add random noise to images and observe what noisy images look like

  2. Learn to clean up

    1. Use many images from step 1 to learn how to remove noise

    2. Training → Prediction

  3. Create from scratch

    1. Start with random noise

    2. Create a new image