Lecture 20 - Machine Learning III

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/5

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

6 Terms

1
New cards

Cross-Validation

Used when datasets have varying data that change over time of collection. So, is an algorithm was trained on earlier data, but trends and patterns are different in later data, it wouldn’t be valid. Essentially, we use the same dataset to train and retrain the ML model.

At each iteration, a different subset of the dataset will be reserved for testing, while the rest would be training. There’d be many iterations and the model will process each one (very time consuming).

Results in much more reliable ensemble models

  1. Reserve some portion of the dataset for testing

  2. Use the rest of the dataset to train the model

  3. Test the model using the reserved portion

  4. Rinse and repeat

2
New cards

Genetic Programming

An algorithm inspired by natural selection involving biological genetic inspired operations.

Essentially, the algorithm will generate an initial random set of solutions. It will undergo “genetic operations” to hopefully bring forth a new population of better solutions to eventually generate the best solution

3
New cards

Genetic Operations

Includes:

  • GP mutation

  • GP crossover

All for the goal of GP survival, the persistence of the best solution/s

4
New cards

GP Mutation

Used to ensure diversity in the solutions, stops lags from persisting and happening

5
New cards

GP Crossover

Information from two separate solutions are crossed over and swaped

6
New cards

Pros & Cons of Genetic Programming

Pros:

  • It can result in understanding (more easier to understand & interpret, eg. if all solutions contains a variable, it may mean that variable is important in the output)

  • Produces multiple solutions

  • Can be “stochastic

Cons:

  • Very computationally heavy'

  • We might never get the best solution (stuck in a lag, due random generations, could find local best instead of global best

  • It is slow and not the most powerful algorithm