Data Science Interview Review

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/9

Earn XP

Description and Tags

Data science type topics, but not just limited to data science

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

10 Terms

New cards

Data Science Process Lifecycle

Framing the problem
Collecting data
Exploratory Data Analysis (EDA)
Model building
Model development
Communication result

<ol><li><p>Framing the problem</p></li><li><p>Collecting data</p></li><li><p>Exploratory Data Analysis (EDA)</p></li><li><p>Model building</p></li><li><p>Model development</p></li><li><p>Communication result</p></li></ol><p></p>

New cards

Data Science Process Framework - CRISP-DM (Cross-Industry Standard Process for Data Mining)

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Development

<ul><li><p>Business Understanding</p></li><li><p>Data Understanding</p></li><li><p>Data Preparation</p></li><li><p>Modeling</p></li><li><p>Evaluation</p></li><li><p>Development</p></li></ul><p></p>

New cards

Data Science Framework - OSEMN

Obtain Data
Scrub Data
Explore Data
Model Data
Interpret Results

<ul><li><p>Obtain Data</p></li><li><p>Scrub Data</p></li><li><p>Explore Data</p></li><li><p>Model Data</p></li><li><p>Interpret Results</p></li></ul><p></p>

New cards

Model Overfitting

When a model learns the training data too well

New cards

Model Underfitting

Happens when a model is too simple to capture the underlying patterns in the data

New cards

Regularization Technique

A technique used to avoid overfitting by trying to make the model more simple. One way to apply regularization is by adding the weights to the loss function. This is done in order to consider minimizing unimportant weights.

New cards

L1 Regularization

We add the sum of the absolute of the weights to the loss function

Loss (L1) : Cost function + L * |weights|
Penalizes weights by adding a term to the loss function which is the absolute value of the loss. This leads to it removing small values of the parameters leading in the end of the parameter hitting zero and staying there for the rest of the epochs.

New cards

L2 Regularization

We add the sum of the squares of the weights to the loss function.

Loss (L2): Cost function + L * weights ²
Penalizes huge parameters preventing any of the single parameters from getting too large. Weights never become zeros, adding parameters square to the loss, preventing the model from overfitting any single feature.

New cards

Gradient Descent

Is a generic optimization algorithm cable for finding optimal solutions to a wide range of problems. The general idea of gradient descent is to tweak parameters iteratively in order to minimize a cost function.

New cards

Statistical Power

Refers to the ability of a statistical test or analysis to detect an effect or relationship if one truly exists in the population being studied. In other words, it is the probability of correctly rejecting a false null hypothesis.

High power has a greater chance of detecting a true effect, whereas a test with low power is less likely to detect a true effect even if it exists
Depends on factors such as the sample size, the significance level, the effect size, and the variability of the data
High statistical power is desirable in research because it increases the likelihood of obtaining accurate and reliable results
Typically reported as a number between 0 and 1
0.80 (80%) is typically considered desirable
Applications: Experimental design, hypothesis testing, sample size determination, meta-analysis (evidence across multiple studies)