Unit 1: Exploring One-Variable Data

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/117

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

118 Terms

1
New cards

What are statistics?

the science and art of collecting, analyzing, and drawing conclusions from data

2
New cards

What is data analysis?

the process of collecting, analyzing, and drawing conclusions from data

3
New cards

What are individuals?

objects described in sets of data

4
New cards

What are variables?

attributes that can take different values for different individuals

5
New cards

What does a categorical variable do?

assigns labels that place each individual into a particular group, called a category

6
New cards

What does a quantitative variable do?

takes number values that are quantities

7
New cards

How do you tell whether a variable is categorical or quantitative?

if you can take the average of the variable, it’s quantitative, and if you can’t take the average, it’s categorical

8
New cards

What are examples of a categorical variable?

color, type, phone number, ID number

9
New cards

What are examples of a quantitative variable?

age, money, minutes, miles

10
New cards

What are we interested in when looking at variables?

the pattern of variation

11
New cards

What is distribution of a variable?

it tells us what values the variable takes and how often it takes those values

12
New cards

What should you do when analyzing data?

  • examine each variable by itself, then study the relationships among the variables

  • start with a graph, then add numerical summaries

13
New cards

What are descriptive statistics?

the process of explanatory data analysis

14
New cards

What are inferential statistics?

the process of drawing conclusions that go beyond the data at hand

15
New cards

What types of graphs are useful when analyzing a distribution?

bar graphs and pie charts

16
New cards

How do bar graphs work?

they compare several quantities by comparing the heights of bars that represent those quantities

17
New cards

Why should you draw the bars of a bar graph equally wide?

because our eyes react to width of bars as well as their heights

18
New cards

What should you keep in mind when analyzing data?

  • beware pictographs

  • watch the scales

19
New cards

When is it inappropriate to use a pie chart?

when data comes from different variables

20
New cards

What is a two-way table?

a table of counts that summarizes data on the relationship between 2 categorical variables for some group, organizing counts according to a row and a column

21
New cards

What does a marginal distribution do?

it gives the percent or proportion of individuals that have a specific value for one categorical variable

22
New cards

How do you examine a marginal distribution?

  1. use the data in the table to calculate the marginal distribution (in percentages) of the row or column totals

  2. make a graph to display the marginal distribution

23
New cards

What does a conditional distribution do?

it describes the values of that variable among individuals who have a specific value of another variable

24
New cards

How do you examine or compare conditional distributions?

  1. select the row(s) or column(s) of interests

  2. use the data in the table to calculate the conditional distribution (in percentages) of the row(s) or column(s)

  3. make a graph to display the conditional distribution

    • use a side-by-side var graph or segmented bar graph to compare distributions

25
New cards

What is marginal relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable

26
New cards

What is joint relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another one

27
New cards

What is conditional relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value for another categorical variable

28
New cards

What is a side-by-side bar graph?

it displays the distribution of a categorical variable for each value of another categorical variable; bars are grouped together based on the values of one categorical variables and placed side by side

29
New cards

What is a segmented bar graph?

it displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals int he corresponding category

30
New cards

When does an association occur?

when knowing the value of one variable helps us predict the value of the other

31
New cards

What is a mosaic plot?

a modified segment bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

32
New cards

How does a dot plot display data?

it shows each value as a dot above its location on a number line

33
New cards

How do you make a dot plot?

  1. Draw a horizontal axis and label it with the quantitative data

  2. Scale the axis from the minimum to the maximum value

  3. Mark a dot above the location on the horizontal axis corresponding to each data value

34
New cards

What should you always ask after making a graph?

“what do I see?”

35
New cards

When is a distribution roughly symmetric?

if the right and left sides of the graph are approximately mirror images of each other

36
New cards

When is a distribution skewed to the right?

if the right side of the graph is much longer than the left side

37
New cards

When is a distribution skewed to the left?

if the left side of the graph is much longer than the right side

38
New cards

What is the direction of a distribution’s skewedness toward?

the long tail

39
New cards

When is the distribution of a quantitative variable unimodal?

if it has a single peak

40
New cards

When is the distribution of a quantitative variable bimodal?

if it has two distinct clusters and peaks

41
New cards

When is the distribution of a quantitative variable approximately symmetric?

if the frequencies are about the same for all values

42
New cards

What do we look for in any graph?

the overall pattern and any clear departures from that pattern

43
New cards

How do we describe the overall pattern of a distribution?

by its:

  • shape

  • center

  • variability

44
New cards

What do we call an important kind of departure from the overall pattern of a distribution?

outlier

45
New cards

What is it important to remember when comparing distributions?

to give context and use comparative language

46
New cards

How do you make a stemplot?

  1. Separate each observation into a stem (all but the final digit) and a leaf (the final digit)

  2. Write the stems in a vertical column with the smallest at the top. Draw a vertical line to the right of the column

  3. Write each leaf in the row to the right of the stem

  4. Arrange the leaves in increasing order out of the stem

  5. Provide a key that identifies the variable and explains what the stems and leaves represent

47
New cards

How can we get a better picture of a distribution with “bunched up” data values?

by splitting stems

48
New cards

How can we compare two distributions of the same quantitative variable?

by using a back-to-back stem plot

49
New cards

How does a histrogram display data?

it shows each interval of values as a bar, with the heights of the bars showing the frequencies or relative frequencies of values in each interval

50
New cards

How do you make a histogram?

  1. Choose equal-width intervals that span the data

  2. Make a table that shows the frequency or relative frequency of individuals in each interval

  3. Draw horizontal and vertical axes. Label the axes

  4. Scale the axes

  5. Draw bars above the intervals. The bar heights correspond to the frequency or relative frequency of individuals in that interval

51
New cards

What is the most common measure of center?

mean

52
New cards

How do you find the mean?

by adding all values in a set of observations and then dividing that sum by the number of observations

53
New cards

What is the median of a distribution?

the center

54
New cards

What does the symbol x̄ represent?

the mean of a sample

55
New cards

What does the symbol μ represent?

the mean of a population

56
New cards

What is a statistic?

a number that describes some characteristic of a sample

57
New cards

What is a parameter?

a number that describes some characteristic of a sample

58
New cards

When is a statistical measure resistant?

if it isn’t sensitive to extreme values

59
New cards

How do you find the median of a distrbution?

  1. Arrange all observations from smallest to largest

  2. If the number of observations n is odd, the median is the middle obesrvation in the ordered list

  3. If the number of observations n is even, the median is the average of the two center observations in the ordered list

60
New cards

When are the mean and median of a distribution similar?

if the distribution is roughly symmetric and has no outliers

61
New cards

How does the skewedness of a distrbution’affects its mean and median?

if the distribution is strongly skewed, the mean will be pulled in the direction of skewedness but the median won’t

62
New cards

How do the mean and median react to outliers?

the median is resistant to outliers but the mean isn’t

63
New cards

What is the range of a distribution?

the distance between the minimum value and the maximum value

64
New cards

Is range a resistant measure of variability?

no

65
New cards

What does standard deviation measure?

the typical distance of the values in a distribution from the mean

66
New cards

How do you calculate standard deviation?

  1. Find the mean of the distribution

  2. Calculate the deviation of each value from the mean

  3. Square each of the deviations

  4. Add all the squared deviations, divide by n-1

  5. This is the sample variance

  6. Take the square root

67
New cards

What is the formula for standard deviation?

knowt flashcard image
68
New cards

What is the standard variance?

standard deviation before you square root it

69
New cards

What is standard deviation always greater than or equal to?

0

70
New cards

What do larger values of standard deviation indicate?

greater variation

71
New cards

Is standard deviation a resistant measure of variability?

no

72
New cards

What do the quartiles of a distribution do?

divide the ordered data set into four groups having roughly the same number of values

73
New cards

How do you find the quartiles of a distribution?

arrange the data values from smallest to greatest and find the median

74
New cards

What is the first quartile Q1 of a distribution?

the median of the data values that are to the left of the median in the ordered list

75
New cards

What is the third quartile Q3 of a distribution?

the median of the data values that are to the right of the median in the ordered list

76
New cards

What is the interquartile range (IQR)?

the distance between the first and third quartiles of a distribution

IQR = Q3 - Q1

77
New cards

What is the rule for outliers?

an observation is an outlier if it falls 1.5 x IQR above the third quartile or below the first quartile

low outliers < Q1 - 1.5 x IQR | high outliers < Q3 + 1.5 x IQR

78
New cards

Why do we look for outliers?

  • they might be inaccurate data values

  • they can indicate a remarkable occurrence

  • they can heavily influence the values of some summary statistics, like the mean, range, and standard deviation

79
New cards

What does the five-number summary of a distribution consist of?

the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum

80
New cards

What is a boxplot?

a visual representation of the five number summary

81
New cards

How do you make a boxplot?

  1. Find the five-number summary

  2. Identify the outliers using the 1.5 x IQR rule

  3. Draw and label the horizontal axis

  4. Scale the axis

  5. Draw a box (from the first quartile to the third quartile)

  6. Mark the median

  7. Draw whiskers (to the minimum and the maximum)

  8. Outliers are marked with a special symbol such as an asterisk

82
New cards

What is percentile used to do?

to describe the location of a value in a distribution

83
New cards

How do you find the percentile of a value?

count the number of values less than or equal to it, then divide by the total number of values

84
New cards

What is a cumulative relative frequency graph?

a graph that plots a point corresponding to the percentile of a given value in a distribution of quantitative data and connects consecutive points using line segments

85
New cards
term image

cumulative relative frequency graph

86
New cards

What does a z-score tell us?

how many standard deviations from the mean an observation falls and in what direction

87
New cards
<p></p>

formula for z-score

88
New cards

What is a standardized score often called?

z-score

89
New cards

What does transforming data do?

  • converts the original observations from the original units of measurement to another standardized scale

  • can affect the shape, center, and variability of a distribution

90
New cards

What are the effects of adding/subtracting a constant to/from a distribution?

adding/subtracting the same positive number a to/from each observation:

  • adds/subtracts a to/from measures of center and location (mean, five-number summaries, percentile)

  • does not change measures of variability (range, IQR, standard deviation)

  • does not change the shape

91
New cards

What are the effects of multiplying/dividing a constant by the distribution?

multiplying/dividing each observation by the same positive number b:

  • multiplies/divides measures of center and location (mean, five number summaries, percentiles) by b

  • multiplies/divides measures of variability (range, IQR. standard deviation) by b

  • does not change the shape

92
New cards

What is a density curve?

a curve that

  • is always on or above the horizontal axis

  • has an area of exactly 1 underneath it

93
New cards

What does a density curve describe?

the overall pattern of a distribution

94
New cards

What does the area under the density curve and above any interval of values on the horizontal axis estimate?

the proportion of all observations that fall in that interval

95
New cards
term image

density curve

96
New cards

What is the mean of a density curve?

the point at which the curve would balance if made of solid material

97
New cards

What is the median of a density curve?

the equal-areas point, the point that divides the area under the curve in half

98
New cards
term image

mean and median of a symmetric curve

99
New cards
term image

mean and median of a right skewed curve

100
New cards

What is a density curve an idealized description of?

a distribution of data