EDA - Exploratory data analysis: Becoming more familiar with the data

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/3

flashcard set

Earn XP

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

4 Terms

1
New cards

What is EDA about?

Understanding the data by

  • Understanding the data dictionary

  • How to deal with missing data dataframe.info()

  • Finding outliers

    dataframe.describe()

  • Which features to keep and discard

2
New cards

Positive vs. Negative correlation

  • 1.00 is a perfect correlation (age vs. age) is 100% correlated to itself. 0.00 means there is no correlation.

  • Comparing cp to target has a +0.43. So, it has a positive potential correlation to the target

    • As cp goes up. The target value also increases. As cp incraeses, the target (has heart disease) goes up

  • exang to target has a -0.44. So, it has a negative potential correlation to the target

    • As exange goes down. The target value will go up. As exang goes up the target (has heart disease) goes down

3
New cards

Explain ways to use EDA

  • dataframe.describe() …count, mean, stdDev, min, max

  • dataframe['target'].value_counts() will show you if, the data is balanced for dep var

  • Finding questions to ask the SME's

  • Using a correlation matrix dataframe.corr() to see how each variable is correlated to every other variable

  • Visualizing correlation matrix using a heat map to visually see how variables relate to each other

4
New cards

Explain what a crosstab is used for

  • To compare different a feature variable against the target variable

  • A crosstab compares two variables and puts them in a matrix

  • pd.crosstab(dataframe['target'], dataframe.sex)