Module – 8 Visualizing Variability

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/35

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

36 Terms

1
New cards

random variable

a quantity with values not known with certainty

2
New cards

Variation

difference in a variable measured over observations.

3
New cards

frequency distribution

describes the values of a variable and how often they appear in the data.

4
New cards

categorical variable

data consist of labels or names for which arithmetical manipulation is impossible.

5
New cards

quantitative variable

data consist of numerical values for which arithmetical manipulation is possible.

6
New cards

Ways to make frewuecy distribution

Pivot table
CountIF

7
New cards

relative frequency of a bin 

equals the proportion of items belonging to a class

= freq, of a bin/ n

8
New cards

percent frequency of a bin

the relative frequency times 100.

Percent Freq. of a bin = 100 ∙ Relative Freq. of a bin

9
New cards

A probability distribution

characterizes the variability of a random variable.

10
New cards

Benford’s Law

states that in many data sets, the proportion of observations in which the first digit is from 1 to 9, respectively, follows the distribution shown to the right.

11
New cards

A histogram

column chart with no spaces between the columns.

12
New cards

Kernel density chart

is a continuous alternative to a histogram.

It employs a smoothing technique known as kernel density estimation.

Not available in Excel.

13
New cards

Skewness

represents the lack of symmetry in a quantitative distribution.

14
New cards

skewed left

the left tail extends farther than the right one (example: exam scores)

15
New cards

symetric histogram

the two tails mirror each other

16
New cards

skewed right histogram

the right tail extends farther than the left one (example: housing prices)

17
New cards

a highly skewed histogram

one of the tails extends much farther than the other one (example: data on wealth and salaries are usually highly skewed right)

18
New cards

frequency polygon

is a visualization tool useful for comparing distributions

19
New cards

Like a histogram, a frequency polygon plots

count of observations in a set of bins

20
New cards

Unlike a histogram, a frequency polygon uses

uses lines instead of columns to connect the counts of different bins.

21
New cards

In Excel, use the option to create a frequency polygon.

Line Chart

22
New cards

trellis display

a vertical or horizontal arrangement of individual charts of the same type, size, scale, and formatting that differ only by the data they display.

23
New cards

A trellis display can be useful

when comparing three or more distributions that would otherwise appear cluttered if plotted using several frequency polygons on the same chart.

24
New cards

Using the standard deviation to describe variability:

≈ 68% of data values lie within one standard deviation of the mean

≈ 95% of data values lie within two standard deviations of the mean

≈ 99.7% of data values lie within three standard deviations of the mean

25
New cards

Violin chart

advanced visualization that combines the statistical descriptors of a box and whisker chart with a rotated and mirrored kernel density chart.

26
New cards

Statistical inference

is the process of collecting sample data to make estimates of or draw conclusions about one or more characteristics of a population.

27
New cards

confidence interval

parameter estimate such as the mean or the proportion of a population of interest.

28
New cards

margin of error

represents the uncertainty on the parameter estimate at a given confidence level, such as 95% or 99%.

29
New cards

confidence interval on a mean:

sample mean ± margin of error

30
New cards

confidence interval on a proportion:

sample proportion ± margin of error

31
New cards

The margin of error for a confidence interval on a mean depends on three factors:

1)The confidence level

2)The variability of sample values (the standard deviation)

3)The sample size

32
New cards

Time series data

is a sequence of observations on a variable measured at successive points in time.

33
New cards

A time series chart i

s a line chart with the time unit displayed on the horizontal axis and the values of the variable on the vertical axis.

34
New cards

percent frequency distribution

probability distribution

35
New cards

in a histogram, column height represents the

frequency of the corresponding bin

36
New cards

in a histogram and absence of space

reflects the continuous nature of the variable of interest