1/26
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
LLM (Large Language Model)
massive neural network designed to understand and generate human text
essential for NLP (natural language processing)
Latency
Latency and factors affecting latency
time delay between input and output
factors affecting it:
Model complexity: number of layers
Hardware: type of hardware
Batch size
Slow data transfer
Critical path algorithm and optimization
cpa: management tool used to identify the sequence of dependent tasks (longest stretch of independent tasks)
optimizing:
data loading: pre-fetching, faster storage solution
loss computation: ensure efficient code
forward pass: optimize by using efficient algorithms
NLU
-Natural language understanding: helps machines understand what humans mean
NLU processes messy, unstructured text and turns it into useful information
advantages: accuracy, efficiency, adaptability
pipeline: tokenization-text cleaning-POS tagging-stop words-lemmetization
Linguistic nuances
5 steps of NLP (natural language processing)
NLP: each step builds on the previous one to enhance understanding
Lexical analysis: breaks down input into individual words
Syntactic analysis: interprets grammar and structure
Semantic analysis: focuses on understanding the meaning
Discourse integration: inserts the meaning of the sentence with the larger context of the conversation
Pragmatic analysis: considers social, legal, cultural context
Architecture
what is machine learning
combines data and algorithms to predict future behavior (ex. face ID, shazam, recommendations)
analyzes past data to find patterns
we need: data, a computer, programming language
neural networks and its layers
neural networks: a type of ML, simplified version of the human brain
helps identify categories, predicts outcomes, finds patterns
Improve over time by learning from experience
Input layer: accepts information
Hidden layer: analyzes info, decision making
Output layer: gives prediction
loss, gradient, gradient descent function
loss: number that shows how well/bad the model’s predictions match target values
lower loss=better predictions
gradient: how sensitive the amount of loss is to change in weight, tells us what to update to reduce loss
gradient descent function: a map showing which way the model should go
backpropagation and terms in the hidden layer
method used to train models by improving accuracy, reducing loss. by comparing output to the correct answer and adjusting weight in the hidden layer
weights: control how much influence one neuron has on another, critical for learning, change during training to make better predictions
bias: value added to make better predictions
activation function: decides if a neuron should be active or inactive
vanishing gradient problem and causes
gradient becomes very small, making updates too small stopping training
causes:
use of sigmoid function for activation
small initial weights
deep networks with many layers, more layers=more precise results, but may lead to this problem
chain rule: helps you find the slope of a function
hyperparameters and hypertuning
hyperparameters: a set of settings chosen before training
Number of layers
more layers=more precision
but can lead to the vanishing gradient problem
more processing power, memory
Learning rate
faster learning rate= correct response faster
but can make NN stop learning too soon
hypertuning: manually choosing the best combination of hyperparameters
RNN (recurrent neural network) and its processes
type of neural network; great at handling sequential data by remembering past information to make better predictions
ex. google autocomplete, translator, chatbots
BPTT (backpropagation through time): special backpropagation used to train RNN
by going back in time and figuring out which steps caused errors
PROS of RNN
good sequence handling
good memory
flexibility
CONS
vanishing, exploding gradients: when it tries to learn long sequences it goes back through many steps which get smaller
intensive, time consuming
forgets long term ideas from long ago
LSTM (long short-term memory network)
type of RNN, handles vanishing gradient by storing information in the memory cells
decides what to do with memory:
input gate: values are updated
forget gate: discarded
output gate: information used for output
target updating: updating only a certain part of memory
TNN (Transformer Neural Network)
type of NN, that:
Position encoding: indicates position of words in a sentence
Self Attention: adds weight to words based on importance
multi head attention: applies self attention mechanisms to ports of sentences
PROS
parallel processing=faster training
long range dependancies: can learn words far apart
can handle large data sets due to distribution, optimization
CONS
not suitable for every task
complex, resource heavy-requires computational power
more research still needed
architecture of TNN and residual connections
encoder layers (6 layers)
decoder layers (6 layers)
Residual connections: technique allowing to skip layers, adding input directly to output
learning more efficiently
shortcut solving vanishing gradient problem
Datasets
Datasets and the types
collections of data used to train and test LLM
Real Data: collected from real world events, observations
PROS:
authentic, credible
diverse-including rare events
CONS:
collecting challenges
privacy, legal concerns
Synthetic Data: generated artificially, mimics real data
PROS:
cost effective
privacy safe
customizable
CONS:
Lack of realism-skepticism
complex generating process
does not take rare events into account
6 Types of biases
Confirmation bias: Favoring info that supports your beliefs
Historical bias: Bias from outdated data
Labelling bias: Prejudging based on assigned labels
Linguistic bias: Biased meaning from word choice
Sampling bias: Data not representing the whole group
Selection bias: Choosing data in a non-random, unfair way
Processing power
processing power and its main tasks
computational capacity/ability of hardware to perform a large number of complex calculations
Main tasks:
Pre-processing: preparing raw data for training
Training the model: by optimizing parameters
Deploying the model
good processing power requires
GPU (graphical processing unit): multiple cores and programmability
TPU (tensor processing unit):
special chip developed by google built for AI
high computation, high performance with lower energy consumption
clustering and the benefits
clustering: putting together multiple GPU=more computational power
scalability
fault tolerance-failure of a single unit doesn’t halt the process
Ethical considerations
what to do to fix ethical concerns
data anonymization
secure data handling
compliance with regulations
bias detection
Accountability and responsibility:
establish governance structure-who is in charge
ensure there is human oversight-checking and validation
monitor for misinformation
transparency to the public