EDA - Exploratory data analysis: Becoming more familiar with the data

studied byStudied by 1 person
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 3

flashcard set

Earn XP

4 Terms

1

What is EDA about?

Understanding the data by

  • Understanding the data dictionary

  • How to deal with missing data dataframe.info()

  • Finding outliers

    dataframe.describe()

  • Which features to keep and discard

New cards
2

Positive vs. Negative correlation

  • 1.00 is a perfect correlation (age vs. age) is 100% correlated to itself. 0.00 means there is no correlation.

  • Comparing cp to target has a +0.43. So, it has a positive potential correlation to the target

    • As cp goes up. The target value also increases. As cp incraeses, the target (has heart disease) goes up

  • exang to target has a -0.44. So, it has a negative potential correlation to the target

    • As exange goes down. The target value will go up. As exang goes up the target (has heart disease) goes down

New cards
3

Explain ways to use EDA

  • dataframe.describe() …count, mean, stdDev, min, max

  • dataframe['target'].value_counts() will show you if, the data is balanced for dep var

  • Finding questions to ask the SME's

  • Using a correlation matrix dataframe.corr() to see how each variable is correlated to every other variable

  • Visualizing correlation matrix using a heat map to visually see how variables relate to each other

New cards
4

Explain what a crosstab is used for

  • To compare different a feature variable against the target variable

  • A crosstab compares two variables and puts them in a matrix

  • pd.crosstab(dataframe['target'], dataframe.sex)

New cards
robot