cs paper 3

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

LLM (Large Language Model)

massive neural network designed to understand and generate human text

  • essential for NLP (natural language processing)

2
New cards
  1. Latency

3
New cards

Latency and factors affecting latency

time delay between input and output

factors affecting it:

  1. Model complexity: number of layers

  2. Hardware: type of hardware

  3. Batch size

  4. Slow data transfer

4
New cards

Critical path algorithm and optimization

cpa: management tool used to identify the sequence of dependent tasks (longest stretch of independent tasks)

optimizing:

  • data loading: pre-fetching, faster storage solution

  • loss computation: ensure efficient code

  • forward pass: optimize by using efficient algorithms

5
New cards

NLU

-Natural language understanding: helps machines understand what humans mean

NLU processes messy, unstructured text and turns it into useful information

advantages: accuracy, efficiency, adaptability

pipeline: tokenization-text cleaning-POS tagging-stop words-lemmetization

6
New cards
  1. Linguistic nuances

7
New cards

5 steps of NLP (natural language processing)

NLP: each step builds on the previous one to enhance understanding

  1. Lexical analysis: breaks down input into individual words

  2. Syntactic analysis: interprets grammar and structure

  3. Semantic analysis: focuses on understanding the meaning

  4. Discourse integration: inserts the meaning of the sentence with the larger context of the conversation

  5. Pragmatic analysis: considers social, legal, cultural context

8
New cards
  1. Architecture

9
New cards

what is machine learning

  • combines data and algorithms to predict future behavior (ex. face ID, shazam, recommendations)

  • analyzes past data to find patterns

  • we need: data, a computer, programming language

10
New cards

neural networks and its layers

neural networks: a type of ML, simplified version of the human brain

  • helps identify categories, predicts outcomes, finds patterns

  • Improve over time by learning from experience

  1. Input layer: accepts information

  2. Hidden layer: analyzes info, decision making

  3. Output layer: gives prediction

11
New cards

loss, gradient, gradient descent function

loss: number that shows how well/bad the model’s predictions match target values

  • lower loss=better predictions

gradient: how sensitive the amount of loss is to change in weight, tells us what to update to reduce loss

gradient descent function: a map showing which way the model should go

12
New cards

backpropagation and terms in the hidden layer

method used to train models by improving accuracy, reducing loss. by comparing output to the correct answer and adjusting weight in the hidden layer

weights: control how much influence one neuron has on another, critical for learning, change during training to make better predictions

bias: value added to make better predictions

activation function: decides if a neuron should be active or inactive

13
New cards

vanishing gradient problem and causes

gradient becomes very small, making updates too small stopping training

causes:

  1. use of sigmoid function for activation

  2. small initial weights

  3. deep networks with many layers, more layers=more precise results, but may lead to this problem

chain rule: helps you find the slope of a function

14
New cards

hyperparameters and hypertuning

hyperparameters: a set of settings chosen before training

  1. Number of layers

    • more layers=more precision

    • but can lead to the vanishing gradient problem

    • more processing power, memory

  2. Learning rate

    • faster learning rate= correct response faster

    • but can make NN stop learning too soon

hypertuning: manually choosing the best combination of hyperparameters

15
New cards

RNN (recurrent neural network) and its processes

type of neural network; great at handling sequential data by remembering past information to make better predictions

ex. google autocomplete, translator, chatbots

BPTT (backpropagation through time): special backpropagation used to train RNN

  • by going back in time and figuring out which steps caused errors

PROS of RNN

  • good sequence handling

  • good memory

  • flexibility

CONS

  • vanishing, exploding gradients: when it tries to learn long sequences it goes back through many steps which get smaller

  • intensive, time consuming

  • forgets long term ideas from long ago

16
New cards

LSTM (long short-term memory network)

type of RNN, handles vanishing gradient by storing information in the memory cells

  • decides what to do with memory:

    1. input gate: values are updated

    2. forget gate: discarded

    3. output gate: information used for output

target updating: updating only a certain part of memory

17
New cards

TNN (Transformer Neural Network)

type of NN, that:

  1. Position encoding: indicates position of words in a sentence

  2. Self Attention: adds weight to words based on importance

  3. multi head attention: applies self attention mechanisms to ports of sentences

PROS

  • parallel processing=faster training

  • long range dependancies: can learn words far apart

  • can handle large data sets due to distribution, optimization

CONS

  • not suitable for every task

  • complex, resource heavy-requires computational power

  • more research still needed

18
New cards

architecture of TNN and residual connections

encoder layers (6 layers)

decoder layers (6 layers)

Residual connections: technique allowing to skip layers, adding input directly to output

  • learning more efficiently

  • shortcut solving vanishing gradient problem

19
New cards
  1. Datasets

20
New cards

Datasets and the types

collections of data used to train and test LLM

Real Data: collected from real world events, observations

PROS:

  • authentic, credible

  • diverse-including rare events

CONS:

  • collecting challenges

  • privacy, legal concerns

Synthetic Data: generated artificially, mimics real data

PROS:

  • cost effective

  • privacy safe

  • customizable

CONS:

  • Lack of realism-skepticism

  • complex generating process

  • does not take rare events into account

21
New cards

6 Types of biases

Confirmation bias: Favoring info that supports your beliefs

Historical bias: Bias from outdated data

Labelling bias: Prejudging based on assigned labels

Linguistic bias: Biased meaning from word choice

Sampling bias: Data not representing the whole group

Selection bias: Choosing data in a non-random, unfair way

22
New cards
  1. Processing power

23
New cards

processing power and its main tasks

computational capacity/ability of hardware to perform a large number of complex calculations

Main tasks:

  1. Pre-processing: preparing raw data for training

  2. Training the model: by optimizing parameters

  3. Deploying the model

24
New cards

good processing power requires

  1. GPU (graphical processing unit): multiple cores and programmability

  2. TPU (tensor processing unit):

    • special chip developed by google built for AI

    • high computation, high performance with lower energy consumption

25
New cards

clustering and the benefits

clustering: putting together multiple GPU=more computational power

  • scalability

  • fault tolerance-failure of a single unit doesn’t halt the process

26
New cards
  1. Ethical considerations

27
New cards

what to do to fix ethical concerns

  • data anonymization

  • secure data handling

  • compliance with regulations

  • bias detection

Accountability and responsibility:

  • establish governance structure-who is in charge

  • ensure there is human oversight-checking and validation

  • monitor for misinformation

  • transparency to the public