Exploring data cards

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/3

Earn XP

Description and Tags

Computer Engineering

AP Computer Science Principles

Big Idea 2: Data

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

4 Terms

New cards

Name 4 ways to do EDA (Exploratory Data Analysis)

compare columns against each other (feature variables)
compare variables against your target (what we want to predict)
Understand the data dictionary
talk with SME's so that, you become an SME on this data

New cards

EDA - Exploratory data analysis: 4 thing s u need to know about the data

what kind of data is this (catagorical, numeric etc.)? What tool and which models make the most sense to use?
Are their outliers?
How will missing data be dealt with?
Which features (variables) should be used (kept or discarded)?

New cards

How to use code to EDA the target variable

dataframe['target'].value_counts()

# of the 303 patients (each is a row of data)

# 165 have heart disease vs. 138 do not have heart disease

# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data

New cards

Use matplotlib.pyplot as plt to visualize the data given the previous variables. dataframe = pd.read_csv("data/heart-disease.csv") and %matplotlib inline

dataframe['target'].value_counts()

# of the 303 patients (each is a row of data)

# 165 have heart disease vs. 138 do not have heart disease

# Roughly 50/50 so our data has an equal split roughly 150 in each target group - balance data