vocab data sci
Three M's Munging - wrangling the data Modeling - making an algorithm that describes whats going on Making the case - telling the story of the data
"all models are wrong but some are useful" -George E.P. Box
X is called the explanatory variable Y is the target variable The output of a model is the target variable
Model = mathmatical way to predict a target variable
Overfitting = When the model works with the data points but is not generalizable | capturing too much noise
Underfitting = not capturing enough signal
Data = signal + noise = the new oil
regression = modeling where target var is numerical
classification = modeling where target vbl. is categorical
NUMBER ONE THING = predicting the target variable accurately
noise = unwanted variation
rows for entries
columns for variables/fields
feature engineering = making a helper column to flesh out the data so that its useful to you
parameter = a number that determines how a model functions
Pearson's r -
|- correlation
Spearman's r -
correlation = association involving exactlys 2 variables
cborn builds on math.lib cborn is easier to use and looks cooler
SDLC
Colab Jupyter notebook