Distributions and Histograms!

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Scatter plot

numerical vs numerical

2
New cards

line plot

sequential numerical (time) vs numerical

3
New cards

bar chart

cateogrical vs numerical

4
New cards

histogram

distribution of numerical

5
New cards

What is the distribution of a variable?

“How often does a variable take on a certain value?”

Both categorical and numerical variables have this.

6
New cards

Categorical variables

Bar charts show you distribution of categorical variable.
and you can do .plot(kind=”barh”, y=”Distance”) ← y declares numerical value

can include .plot(kind=,y=,legend=False,xlabel=”Count”, title=”Distribution of Exoplanet Types”)

When you don’t put an x, you use the index
legend=False puts a legend on the top right that may or may not be accurate
xlabel is the label of the x-axis (the categorical part)
title= the title

figsize=(3, 10) takes in sequence of two values, first one is how wide you want it to be, second one is the how high you want it

Keep in mind, this is a bar in horizontal.. so if you have ascending=False, it will actually output an ascending order!

7
New cards

What does .describe() do? Series method.

Output series, and will give you count(), mean(), etc.

8
New cards

can you represent radius of exoplanets in bar chart?

NO, horizontal axis should be numerical not categorical. There should be more space between certain bars than others.

ex. you might think that one planet which is 80% larger than another be the same height

Instead, use density histograms

9
New cards

What is a density histogram? for a radius?

Looks like a bar chart, but x-axis is like a numberline (rather than a category!)

also, y-axis says frequency. This does not mean there are 2.8 planets within that range if y-axis is at 2.8; It’s not telling us how many COUNT() is in there

10
New cards

What is binning?

Groups nearby values into one bin. like [a, b) will include a, b is not

this is the convention of binning, greater than or equal to the left endpoint and less than the right endpoint

Doesn’t distinguish between each number, just puts together

11
New cards

Plotting density histograms

df.plot(

kind=”hist”

y=column_name

density=True
)

ec=”w” puts a little white EDGE COLORS to the bars

Requires ONLY ONE value

default chooses 10 bins of equal space, some which are empty
can also specify bins to be different by included argument “bins = #”

12
New cards

What does bins=20 argument do?

creates 20 bins of equal width for your histogram

13
New cards

you can specify specific starting and ending points. How?

set bin= to a sequence such as a list of all the endpoints you want to use.

bin=[]

14
New cards

what is the y-axis values? for histograms?

Proportion of the values of that bar’s WIDTH

15
New cards

Normally histogram bins include [a, b) but what about the last section?

[a, b]

16
New cards

Do bins cut off value?

Yes, if you don’t include all values in your range then it will get cut off

17
New cards

What does bins=np.arange(4)

This works! But it creates bins [0,1), [1, 2), [2,3]
CUTS OFF 4!

18
New cards

Also, histograms total area is what

Total area of all the redness is 1, explains the weird y-axis. This is the DENSITY histogram, so it makes sense!

19
New cards

Proportion vs percentage

proportion = 0-1, percentage 0%-100%

20
New cards

How do you find area of bar?

Calculate the width, then multiply it by the height;

not an exact match b/c you’re estimating visually

21
New cards

y-axis always says “FREQUENCY” but it’s wrong. how to fix?

Can use ylabel to fix it, but you usually don’t and just see that it’s density histogram, and know how to interpret it.

22
New cards

How to make multiple plots on the same axis

you can .get([]) multiple columns, and display them at the same time!

Alternatively, if you omit the y, you get ALL the columns displayed at the same time (if they can be. i.e. they are numerical!)

23
New cards
24
New cards