Data Science Interview Review

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/9

flashcard set

Earn XP

Description and Tags

Data science type topics, but not just limited to data science

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

10 Terms

1
New cards

Data Science Process Lifecycle

  1. Framing the problem

  2. Collecting data

  3. Exploratory Data Analysis (EDA)

  4. Model building

  5. Model development

  6. Communication result

<ol><li><p>Framing the problem</p></li><li><p>Collecting data</p></li><li><p>Exploratory Data Analysis (EDA)</p></li><li><p>Model building</p></li><li><p>Model development</p></li><li><p>Communication result</p></li></ol><p></p>
2
New cards

Data Science Process Framework - CRISP-DM (Cross-Industry Standard Process for Data Mining)

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Modeling

  • Evaluation

  • Development

<ul><li><p>Business Understanding</p></li><li><p>Data Understanding</p></li><li><p>Data Preparation</p></li><li><p>Modeling</p></li><li><p>Evaluation</p></li><li><p>Development</p></li></ul><p></p>
3
New cards

Data Science Framework - OSEMN

  • Obtain Data

  • Scrub Data

  • Explore Data

  • Model Data

  • Interpret Results

<ul><li><p>Obtain Data</p></li><li><p>Scrub Data</p></li><li><p>Explore Data</p></li><li><p>Model Data</p></li><li><p>Interpret Results</p></li></ul><p></p>
4
New cards

Model Overfitting

When a model learns the training data too well

5
New cards

Model Underfitting

Happens when a model is too simple to capture the underlying patterns in the data

6
New cards

Regularization Technique

A technique used to avoid overfitting by trying to make the model more simple. One way to apply regularization is by adding the weights to the loss function. This is done in order to consider minimizing unimportant weights.

7
New cards

L1 Regularization

We add the sum of the absolute of the weights to the loss function

  • Loss (L1) : Cost function + L * |weights|

  • Penalizes weights by adding a term to the loss function which is the absolute value of the loss. This leads to it removing small values of the parameters leading in the end of the parameter hitting zero and staying there for the rest of the epochs.

8
New cards

L2 Regularization

We add the sum of the squares of the weights to the loss function.

  • Loss (L2): Cost function + L * weights ²

  • Penalizes huge parameters preventing any of the single parameters from getting too large. Weights never become zeros, adding parameters square to the loss, preventing the model from overfitting any single feature.

9
New cards

Gradient Descent

Is a generic optimization algorithm cable for finding optimal solutions to a wide range of problems. The general idea of gradient descent is to tweak parameters iteratively in order to minimize a cost function.

10
New cards

Statistical Power

Refers to the ability of a statistical test or analysis to detect an effect or relationship if one truly exists in the population being studied. In other words, it is the probability of correctly rejecting a false null hypothesis.

  • High power has a greater chance of detecting a true effect, whereas a test with low power is less likely to detect a true effect even if it exists

  • Depends on factors such as the sample size, the significance level, the effect size, and the variability of the data

  • High statistical power is desirable in research because it increases the likelihood of obtaining accurate and reliable results

  • Typically reported as a number between 0 and 1

  • 0.80 (80%) is typically considered desirable

  • Applications: Experimental design, hypothesis testing, sample size determination, meta-analysis (evidence across multiple studies)