Unit 1: Exploring One-Variable Data

studied byStudied by 1 person
0.0(0)
Get a hint
Hint

What are statistics?

1 / 117

118 Terms

1

What are statistics?

the science and art of collecting, analyzing, and drawing conclusions from data

New cards
2

What is data analysis?

the process of collecting, analyzing, and drawing conclusions from data

New cards
3

What are individuals?

objects described in sets of data

New cards
4

What are variables?

attributes that can take different values for different individuals

New cards
5

What does a categorical variable do?

assigns labels that place each individual into a particular group, called a category

New cards
6

What does a quantitative variable do?

takes number values that are quantities

New cards
7

How do you tell whether a variable is categorical or quantitative?

if you can take the average of the variable, it’s quantitative, and if you can’t take the average, it’s categorical

New cards
8

What are examples of a categorical variable?

color, type, phone number, ID number

New cards
9

What are examples of a quantitative variable?

age, money, minutes, miles

New cards
10

What are we interested in when looking at variables?

the pattern of variation

New cards
11

What is distribution of a variable?

it tells us what values the variable takes and how often it takes those values

New cards
12

What should you do when analyzing data?

  • examine each variable by itself, then study the relationships among the variables

  • start with a graph, then add numerical summaries

New cards
13

What are descriptive statistics?

the process of explanatory data analysis

New cards
14

What are inferential statistics?

the process of drawing conclusions that go beyond the data at hand

New cards
15

What types of graphs are useful when analyzing a distribution?

bar graphs and pie charts

New cards
16

How do bar graphs work?

they compare several quantities by comparing the heights of bars that represent those quantities

New cards
17

Why should you draw the bars of a bar graph equally wide?

because our eyes react to width of bars as well as their heights

New cards
18

What should you keep in mind when analyzing data?

  • beware pictographs

  • watch the scales

New cards
19

When is it inappropriate to use a pie chart?

when data comes from different variables

New cards
20

What is a two-way table?

a table of counts that summarizes data on the relationship between 2 categorical variables for some group, organizing counts according to a row and a column

New cards
21

What does a marginal distribution do?

it gives the percent or proportion of individuals that have a specific value for one categorical variable

New cards
22

How do you examine a marginal distribution?

  1. use the data in the table to calculate the marginal distribution (in percentages) of the row or column totals

  2. make a graph to display the marginal distribution

New cards
23

What does a conditional distribution do?

it describes the values of that variable among individuals who have a specific value of another variable

New cards
24

How do you examine or compare conditional distributions?

  1. select the row(s) or column(s) of interests

  2. use the data in the table to calculate the conditional distribution (in percentages) of the row(s) or column(s)

  3. make a graph to display the conditional distribution

    • use a side-by-side var graph or segmented bar graph to compare distributions

New cards
25

What is marginal relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable

New cards
26

What is joint relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another one

New cards
27

What is conditional relative frequency?

the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value for another categorical variable

New cards
28

What is a side-by-side bar graph?

it displays the distribution of a categorical variable for each value of another categorical variable; bars are grouped together based on the values of one categorical variables and placed side by side

New cards
29

What is a segmented bar graph?

it displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals int he corresponding category

New cards
30

When does an association occur?

when knowing the value of one variable helps us predict the value of the other

New cards
31

What is a mosaic plot?

a modified segment bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

New cards
32

How does a dot plot display data?

it shows each value as a dot above its location on a number line

New cards
33

How do you make a dot plot?

  1. Draw a horizontal axis and label it with the quantitative data

  2. Scale the axis from the minimum to the maximum value

  3. Mark a dot above the location on the horizontal axis corresponding to each data value

New cards
34

What should you always ask after making a graph?

“what do I see?”

New cards
35

When is a distribution roughly symmetric?

if the right and left sides of the graph are approximately mirror images of each other

New cards
36

When is a distribution skewed to the right?

if the right side of the graph is much longer than the left side

New cards
37

When is a distribution skewed to the left?

if the left side of the graph is much longer than the right side

New cards
38

What is the direction of a distribution’s skewedness toward?

the long tail

New cards
39

When is the distribution of a quantitative variable unimodal?

if it has a single peak

New cards
40

When is the distribution of a quantitative variable bimodal?

if it has two distinct clusters and peaks

New cards
41

When is the distribution of a quantitative variable approximately symmetric?

if the frequencies are about the same for all values

New cards
42

What do we look for in any graph?

the overall pattern and any clear departures from that pattern

New cards
43

How do we describe the overall pattern of a distribution?

by its:

  • shape

  • center

  • variability

New cards
44

What do we call an important kind of departure from the overall pattern of a distribution?

outlier

New cards
45

What is it important to remember when comparing distributions?

to give context and use comparative language

New cards
46

How do you make a stemplot?

  1. Separate each observation into a stem (all but the final digit) and a leaf (the final digit)

  2. Write the stems in a vertical column with the smallest at the top. Draw a vertical line to the right of the column

  3. Write each leaf in the row to the right of the stem

  4. Arrange the leaves in increasing order out of the stem

  5. Provide a key that identifies the variable and explains what the stems and leaves represent

New cards
47

How can we get a better picture of a distribution with “bunched up” data values?

by splitting stems

New cards
48

How can we compare two distributions of the same quantitative variable?

by using a back-to-back stem plot

New cards
49

How does a histrogram display data?

it shows each interval of values as a bar, with the heights of the bars showing the frequencies or relative frequencies of values in each interval

New cards
50

How do you make a histogram?

  1. Choose equal-width intervals that span the data

  2. Make a table that shows the frequency or relative frequency of individuals in each interval

  3. Draw horizontal and vertical axes. Label the axes

  4. Scale the axes

  5. Draw bars above the intervals. The bar heights correspond to the frequency or relative frequency of individuals in that interval

New cards
51

What is the most common measure of center?

mean

New cards
52

How do you find the mean?

by adding all values in a set of observations and then dividing that sum by the number of observations

New cards
53

What is the median of a distribution?

the center

New cards
54

What does the symbol x̄ represent?

the mean of a sample

New cards
55

What does the symbol μ represent?

the mean of a population

New cards
56

What is a statistic?

a number that describes some characteristic of a sample

New cards
57

What is a parameter?

a number that describes some characteristic of a sample

New cards
58

When is a statistical measure resistant?

if it isn’t sensitive to extreme values

New cards
59

How do you find the median of a distrbution?

  1. Arrange all observations from smallest to largest

  2. If the number of observations n is odd, the median is the middle obesrvation in the ordered list

  3. If the number of observations n is even, the median is the average of the two center observations in the ordered list

New cards
60

When are the mean and median of a distribution similar?

if the distribution is roughly symmetric and has no outliers

New cards
61

How does the skewedness of a distrbution’affects its mean and median?

if the distribution is strongly skewed, the mean will be pulled in the direction of skewedness but the median won’t

New cards
62

How do the mean and median react to outliers?

the median is resistant to outliers but the mean isn’t

New cards
63

What is the range of a distribution?

the distance between the minimum value and the maximum value

New cards
64

Is range a resistant measure of variability?

no

New cards
65

What does standard deviation measure?

the typical distance of the values in a distribution from the mean

New cards
66

How do you calculate standard deviation?

  1. Find the mean of the distribution

  2. Calculate the deviation of each value from the mean

  3. Square each of the deviations

  4. Add all the squared deviations, divide by n-1

  5. This is the sample variance

  6. Take the square root

New cards
67

What is the formula for standard deviation?

knowt flashcard image
New cards
68

What is the standard variance?

standard deviation before you square root it

New cards
69

What is standard deviation always greater than or equal to?

0

New cards
70

What do larger values of standard deviation indicate?

greater variation

New cards
71

Is standard deviation a resistant measure of variability?

no

New cards
72

What do the quartiles of a distribution do?

divide the ordered data set into four groups having roughly the same number of values

New cards
73

How do you find the quartiles of a distribution?

arrange the data values from smallest to greatest and find the median

New cards
74

What is the first quartile Q1 of a distribution?

the median of the data values that are to the left of the median in the ordered list

New cards
75

What is the third quartile Q3 of a distribution?

the median of the data values that are to the right of the median in the ordered list

New cards
76

What is the interquartile range (IQR)?

the distance between the first and third quartiles of a distribution

IQR = Q3 - Q1

New cards
77

What is the rule for outliers?

an observation is an outlier if it falls 1.5 x IQR above the third quartile or below the first quartile

low outliers < Q1 - 1.5 x IQR | high outliers < Q3 + 1.5 x IQR

New cards
78

Why do we look for outliers?

  • they might be inaccurate data values

  • they can indicate a remarkable occurrence

  • they can heavily influence the values of some summary statistics, like the mean, range, and standard deviation

New cards
79

What does the five-number summary of a distribution consist of?

the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum

New cards
80

What is a boxplot?

a visual representation of the five number summary

New cards
81

How do you make a boxplot?

  1. Find the five-number summary

  2. Identify the outliers using the 1.5 x IQR rule

  3. Draw and label the horizontal axis

  4. Scale the axis

  5. Draw a box (from the first quartile to the third quartile)

  6. Mark the median

  7. Draw whiskers (to the minimum and the maximum)

  8. Outliers are marked with a special symbol such as an asterisk

New cards
82

What is percentile used to do?

to describe the location of a value in a distribution

New cards
83

How do you find the percentile of a value?

count the number of values less than or equal to it, then divide by the total number of values

New cards
84

What is a cumulative relative frequency graph?

a graph that plots a point corresponding to the percentile of a given value in a distribution of quantitative data and connects consecutive points using line segments

New cards
85
term image

cumulative relative frequency graph

New cards
86

What does a z-score tell us?

how many standard deviations from the mean an observation falls and in what direction

New cards
87
<p></p>

formula for z-score

New cards
88

What is a standardized score often called?

z-score

New cards
89

What does transforming data do?

  • converts the original observations from the original units of measurement to another standardized scale

  • can affect the shape, center, and variability of a distribution

New cards
90

What are the effects of adding/subtracting a constant to/from a distribution?

adding/subtracting the same positive number a to/from each observation:

  • adds/subtracts a to/from measures of center and location (mean, five-number summaries, percentile)

  • does not change measures of variability (range, IQR, standard deviation)

  • does not change the shape

New cards
91

What are the effects of multiplying/dividing a constant by the distribution?

multiplying/dividing each observation by the same positive number b:

  • multiplies/divides measures of center and location (mean, five number summaries, percentiles) by b

  • multiplies/divides measures of variability (range, IQR. standard deviation) by b

  • does not change the shape

New cards
92

What is a density curve?

a curve that

  • is always on or above the horizontal axis

  • has an area of exactly 1 underneath it

New cards
93

What does a density curve describe?

the overall pattern of a distribution

New cards
94

What does the area under the density curve and above any interval of values on the horizontal axis estimate?

the proportion of all observations that fall in that interval

New cards
95
term image

density curve

New cards
96

What is the mean of a density curve?

the point at which the curve would balance if made of solid material

New cards
97

What is the median of a density curve?

the equal-areas point, the point that divides the area under the curve in half

New cards
98
term image

mean and median of a symmetric curve

New cards
99
term image

mean and median of a right skewed curve

New cards
100

What is a density curve an idealized description of?

a distribution of data

New cards

Explore top notes

note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 14 people
... ago
5.0(1)
note Note
studied byStudied by 79 people
... ago
5.0(4)
note Note
studied byStudied by 2 people
... ago
4.0(1)
note Note
studied byStudied by 73 people
... ago
5.0(1)
note Note
studied byStudied by 27 people
... ago
4.5(2)
note Note
studied byStudied by 9 people
... ago
5.0(1)
note Note
studied byStudied by 32 people
... ago
4.5(2)

Explore top flashcards

flashcards Flashcard (335)
studied byStudied by 33 people
... ago
5.0(1)
flashcards Flashcard (115)
studied byStudied by 14 people
... ago
5.0(1)
flashcards Flashcard (27)
studied byStudied by 6 people
... ago
5.0(1)
flashcards Flashcard (44)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (94)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (75)
studied byStudied by 307 people
... ago
4.5(2)
flashcards Flashcard (172)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (632)
studied byStudied by 70 people
... ago
5.0(1)
robot