Lecture 9

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/17

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:55 PM on 4/10/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

18 Terms

1
New cards

LLM Crohn’s Interviews: Evidence-based decision making

● Engine of large language models

● Fundamental contribution/invention of transformer model: attention

○ Past neural networks: long sentences are a problem

○ Now: system can handle an entire field

■ Focused on a few words at a time, carefully structured by “few other tools”

○ Transformer model sits on a foundation of a neural network

<p>● Engine of large language models</p><p>● Fundamental contribution/invention of transformer model: attention</p><p>○ Past neural networks: long sentences are a problem</p><p>○ Now: system can handle an entire field</p><p><span>■ Focused on a few words at a time, carefully structured by “few other tools”</span></p><p>○ Transformer model sits on a foundation of a neural network</p>
2
New cards

Strategy: Use Disease Expertise to interrogate LLM decision making

knowt flashcard image
3
New cards

What is an Embedding Space

● Created dimensions to meaningfully distinguish a pear and an apple

○ To not confuse one with another

● Location of pear in ordinate space = embedding or vector

○ Vector - array or coordinates

● Embedding space - Assigning meaning to words or concepts

○ The closer things are, the more similar they are

○ Can be more than 3 dimensions

○ ChatGPT: ~2,000 dimensions

● Gravity and Fear (emotion) on Figure B

○ Embedding space can be for dissimilar embeddings

○ Can work on concepts, not just objects

<p>● Created dimensions to meaningfully distinguish a pear and an apple</p><p>○ To not confuse one with another</p><p>● Location of pear in ordinate space = embedding or vector</p><p>○ Vector - array or coordinates</p><p>● Embedding space - Assigning meaning to words or concepts</p><p>○ The closer things are, the more similar they are</p><p>○ Can be more than 3 dimensions</p><p>○ ChatGPT: ~2,000 dimensions</p><p>● Gravity and Fear (emotion) on Figure B</p><p>○ Embedding space can be for dissimilar embeddings</p><p>○ Can work on concepts, not just objects</p>
4
New cards

Embedding: Setting up similarity scores

● Apple the fruit or Apple the company?

○ Relate to other words in a sentence

<p>● Apple the fruit or Apple the company?</p><p>○ Relate to other words in a sentence</p>
5
New cards

Similarity Scoring

● Origin of embedding space expressed as an angle

○ Look at angle that describes 2 points of reference

● When in same sentence: how close they are in the embedding space

<p>● Origin of embedding space expressed as an angle</p><p>○ Look at angle that describes 2 points of reference</p><p>● When in same sentence: how close they are in the embedding space</p>
6
New cards

Transformer based large language models

● Most language models use transformer-based models

○ Transformer is the revolution

○ Transformer plays an integrative role between two areas:

■ Embedding space - helps in understanding meaning of concepts or words based on how similar they are in that space

■ Neural network - foundational core

● Pretraining is just training

○ Neural network is chained onto large volume of data/literature from different disciplines

○ Develop correlations between words within this training

■ Apple associated with sweetness, but not typically associated with automotive

○ Transformer model integrates pretraining and embedding space through its attention mechanism to read a sentence

■ Sees apple = knows what the apple is referring to

<p>● Most language models use transformer-based models</p><p>○ Transformer is the revolution</p><p>○ Transformer plays an integrative role between two areas:</p><p><span>■ Embedding space - helps in understanding meaning of concepts or words based on how similar they are in that space</span></p><p><span>■ Neural network - foundational core</span></p><p>● Pretraining is just training</p><p>○ Neural network is chained onto large volume of data/literature from different disciplines</p><p>○ Develop correlations between words within this training</p><p><span>■ Apple associated with sweetness, but not typically associated with automotive</span></p><p>○ Transformer model integrates pretraining and embedding space through its attention mechanism to read a sentence</p><p><span>■ Sees apple = knows what the apple is referring to</span></p>
7
New cards

The problem in protein structure to be solved

● Challenge: Amino acid sequence and protein

○ How does the amino acid sequence give rise to 3d structure of protein?

○ Practical POV: DNA is relevant in its capacity to produce proteins

■ Proteins is the endgame

■ No point in discussing DNA if it does not affect proteins in any way

○ Better understand amino acid sequence

■ To create new 3d structures of proteins and modify existing 3d structures of proteins (a lot of work and expensive)

○ Protein reconstruction and development is nowhere near as easy as sequencing DNA

○ We need to understand this transition better

<p>● Challenge: Amino acid sequence and protein</p><p>○ How does the amino acid sequence give rise to 3d structure of protein?</p><p>○ Practical POV: DNA is relevant in its capacity to produce proteins</p><p><span>■ Proteins is the endgame</span></p><p><span>■ No point in discussing DNA if it does not affect proteins in any way</span></p><p>○ Better understand amino acid sequence</p><p><span>■ To create new 3d structures of proteins and modify existing 3d structures of proteins (a lot of work and expensive)</span></p><p>○ Protein reconstruction and development is nowhere near as easy as sequencing DNA</p><p>○ We need to understand this transition better</p>
8
New cards

Protein folding (conformation) Review:

● AA sequence: primary structure

○ Can form secondary structures

● Main concern: tertiary structures conformation

○ Undergoing changes all the time, not static!

● Quaternary structure: assembly of protein using other proteins

○ Not really going through this

<p>● AA sequence: primary structure</p><p>○ Can form secondary structures</p><p><span>● Main concern: tertiary structures </span><span data-name="arrow_right" data-type="emoji">➡</span><span> conformation</span></p><p>○ Undergoing changes all the time, not static!</p><p>● Quaternary structure: assembly of protein using other proteins</p><p>○ Not really going through this</p>
9
New cards

AlphaFold Evidence that AI can crack fundamental problems in protein folding

● First time computer program successfully predicted 3d structure of tertiary protein from primary structure (breakthrough)

● LLM and Protein Structure

○ Concept: continuous information (one word follows another word follows another word)

○ Protein follows another aa follows another aa…

○ Repurposing LLM based on transformers, but talking about amino acids

■ Aa became the language of LLMs

<p>● First time computer program successfully predicted 3d structure of tertiary protein from primary structure (breakthrough)</p><p>● LLM and Protein Structure</p><p>○ Concept: continuous information (one word follows another word follows another word)</p><p>○ Protein follows another aa follows another aa…</p><p>○ Repurposing LLM based on transformers, but talking about amino acids</p><p><span>■ Aa became the language of LLMs </span></p>
10
New cards

DeepMind (google) new product

● AlphaFold: application of AI that allows us to predict how proteins will fold, allowing us to discover new proteins

● What’s going on with AlphaFold

○ Integrates evolutionary information, and proteins from known protein structures

○ Key: evoformer - use of a transformer for serial positioning of aa

● Integration of information from biology AND pure geometry

11
New cards

AlphaFold

  • Workflow of alpha fold

  • Integrates evolutionary information and proteins from non-protein structure

<ul><li><p><span>Workflow of alpha fold</span></p></li><li><p><span>Integrates evolutionary information and proteins from non-protein structure</span></p></li></ul><p></p>
12
New cards

Predicting Protein Confirmation: Two data sources by alphaFold

knowt flashcard image
13
New cards

Geometry Tower: Triplet of attention or the “Triangle Equality”

● Comparing location of 2 aa = find configuration of 2 aa, the way they interact, and stabilize aa

● Can be done with multiple aa at the same time

<p>● Comparing location of 2 aa = find configuration of 2 aa, the way they interact, and stabilize aa</p><p>● Can be done with multiple aa at the same time</p>
14
New cards

Using evolutionary history to add to Data about amino acid positions to predict protein structure

● Conserved when constantly cutting between species

● Stability of protein when found in a row

<p>● Conserved when constantly cutting between species</p><p>● Stability of protein when found in a row</p>
15
New cards

Alphafold: Seismic acceleration of protein structure discovery

  • Drawing on information on what amino acids are conserved

● What aa are conserved in context of other aa

● Range of aa combination in terms of sequence and 3d structure

● 150,000 proteins known before into 250 million proteins nowadays

● Sets a stage for practical application

16
New cards

Al method claims for AlphaFold

Method Claims for AlphaFold (how it is patented)

● Limited patent: early stages were not patented

○ Publish first before patenting

■ Issues with prior art; 1 year grace period before deciding to patent in USA and Canada

● 1st claim (method claim): take primary structure of protein, combine with known evo data, and combine with neural network training

○ Primary sequence data + evo data + neural network = 3d protein structure

○ Not algorithmic; physical outcome

■ 3d protein structure AND primary structure of polypeptide

<p>Method Claims for AlphaFold (how it is patented)</p><p>● Limited patent: early stages were not patented</p><p>○ Publish first before patenting</p><p><span>■ Issues with prior art; 1 year grace period before deciding to patent in USA and Canada</span></p><p>● 1st claim (method claim): take primary structure of protein, combine with known evo data, and combine with neural network training</p><p>○ Primary sequence data + evo data + neural network = 3d protein structure</p><p>○ Not algorithmic; physical outcome</p><p><span>■ 3d protein structure AND primary structure of polypeptide</span></p>
17
New cards

Creating Novel proteins: RF diffusion

● Type in protein along with its following properties

○ “Generate protein”

● Creating novel functional proteins

<p>● Type in protein along with its following properties</p><p>○ “Generate protein”</p><p>● Creating novel functional proteins</p>
18
New cards

Benefits of predicting protein tertiary structure

Benefits of Predicting Protein Tertiary Structure

1. Discovering drugs

● Biologic - proteins

2. Effect of genetic variants

● What genetic variation means in terms of impact

3. Modeling protein-protein interactions

● Drug and binding site (back to biologics)

4. Engineering artificial proteins