Exploring data cards

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/3

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

4 Terms

1
New cards

Name 4 ways to do EDA (Exploratory Data Analysis)

  • compare columns against each other (feature variables)

  • compare variables against your target (what we want to predict)

  • Understand the data dictionary

  • talk with SME's so that, you become an SME on this data

2
New cards

EDA - Exploratory data analysis: 4 thing s u need to know about the data

  • what kind of data is this (catagorical, numeric etc.)? What tool and which models make the most sense to use?

  • Are their outliers?

  • How will missing data be dealt with?

  • Which features (variables) should be used (kept or discarded)?

3
New cards

How to use code to EDA the target variable

dataframe['target'].value_counts()

# of the 303 patients (each is a row of data)

# 165 have heart disease vs. 138 do not have heart disease

# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data

4
New cards

Use matplotlib.pyplot as plt to visualize the data given the previous variables. dataframe = pd.read_csv("data/heart-disease.csv") and %matplotlib inline

dataframe['target'].value_counts()

# of the 303 patients (each is a row of data)

# 165 have heart disease vs. 138 do not have heart disease

# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data