vocab data sci

Three M's Munging - wrangling the data Modeling - making an algorithm that describes whats going on Making the case - telling the story of the data

"all models are wrong but some are useful" -George E.P. Box

X is called the explanatory variable Y is the target variable The output of a model is the target variable

Model = mathmatical way to predict a target variable

Overfitting = When the model works with the data points but is not generalizable | capturing too much noise

Underfitting = not capturing enough signal

Data = signal + noise = the new oil

regression = modeling where target var is numerical

classification = modeling where target vbl. is categorical

NUMBER ONE THING = predicting the target variable accurately

noise = unwanted variation

rows for entries

columns for variables/fields

feature engineering = making a helper column to flesh out the data so that its useful to you

parameter = a number that determines how a model functions

Pearson's r -     |- correlation
Spearman's r -

correlation = association involving exactlys 2 variables

cborn builds on math.lib cborn is easier to use and looks cooler

SDLC

Colab Jupyter notebook