Name 4 ways to do EDA (Exploratory Data Analysis)
compare columns against each other (feature variables)
compare variables against your target (what we want to predict)
Understand the data dictionary
talk with SME's so that, you become an SME on this data
EDA - Exploratory data analysis: 4 thing s u need to know about the data
what kind of data is this (catagorical, numeric etc.)? What tool and which models make the most sense to use?
Are their outliers?
How will missing data be dealt with?
Which features (variables) should be used (kept or discarded)?
How to use code to EDA the target variable
dataframe['target'].value_counts()
# of the 303 patients (each is a row of data)
# 165 have heart disease vs. 138 do not have heart disease
# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data
Use matplotlib.pyplot as plt to visualize the data given the previous variables. dataframe = pd.read_csv("data/heart-disease.csv") and %matplotlib inline
dataframe['target'].value_counts()
# of the 303 patients (each is a row of data)
# 165 have heart disease vs. 138 do not have heart disease
# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data