AP STATS UNIT 1 - DATA ANALYSIS

studied byStudied by 6 people
5.0(1)
Get a hint
Hint

Statistics

1 / 82

flashcard set

Earn XP

83 Terms

1

Statistics

The science and art of collecting, analyzing, and drawing conclusions from data.

New cards
2

Individual

An object described in a set of data. Individuals can be people, animals, or things.

New cards
3

Variable

An attribute that can take different values for different individuals.

New cards
4

Categorical variable

Assigns labels that place each individual into a particular group, called a category.

New cards
5

Quantitative variable

Takes number values that are quantities—counts or measurements.

New cards
6

Discrete variable

A quantitative variable that takes a fixed set of possible values with gaps between them. (ex. Number of siblings)

New cards
7

Continuous variable

A quantitative variable that can take any value in an interval on the number line. (ex. height of person)

New cards
8

Distribution

Of a variable, tells us what values the variable takes and how often it takes those values.

New cards
9

Frequency table

Shows the number of individuals having each value.

New cards
10

Relative frequency table

Shows the proportion or percent of individuals having each value.

New cards
11

Bar graph

Shows each category as a bar. The heights of the bars show the category frequencies or relative frequencies.

New cards
12

Pie chart

Shows each category as a slice of the "pie." The areas of the slices are proportional to the category frequencies or relative frequencies.

New cards
13

Two-way table

A table of counts that summarizes data on the relationship between two categorical variables for some group of individuals.

New cards
14

Marginal relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable.

New cards
15

Joint relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable.

New cards
16

Conditional relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition).

New cards
17

Side-by-side bar graph

Displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and placed side by side.

New cards
18
<p>Segmented bar graph</p>

Segmented bar graph

Displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category. Must add up to 100%. Association occurred if graphs are different.

New cards
19
<p>Mosaic plot</p>

Mosaic plot

A modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category.

New cards
20

Association

There is an association between two variables if knowing the value of one variable helps us predict the value of the other. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables.

New cards
21

Dotplot

Shows each data value as a dot above its location on a number line.

New cards
22

Symmetric distribution

A distribution is roughly symmetric if the right side of the graph (containing the half of observations with the largest values) is approximately a mirror image of the left side. (center is mean). Mean is equal to median.

New cards
23

Skewed distribution

Skewed Distribution: Definition, Examples - Statistics How To

A distribution is skewed to the right if the right side of the graph is much longer than the left side. A distribution is skewed to the left if the left side of the graph is much longer than the right side. (center is median)

New cards
24

Stemplot

Shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

New cards
25

Histogram

Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval. Good for large sets of data.

New cards
26

Mean

The mean of a distribution of quantitative data is the average of all the individual data values. To find the mean, add all the values and divide by the total number of data values.

New cards
27

Statistic

A number that describes some characteristic of a sample.

New cards
28

Parameter

A number that describes some characteristic of a population.

New cards
29

Resistant

A statistical measure is resistant if it isn't sensitive to extreme values. (ex. median and IQR)

New cards
30

Non-resistant

Statistical measures that can be greatly influenced by extreme values/outliers in a dataset. (ex. mean, SD, range)

New cards
31

Median

The midpoint of a distribution, the number such that about half the observations are smaller and about half are larger. To find the median, arrange the data values from smallest to largest. — If the number n of data values is odd, the median is the middle value in the ordered list.; If the number n of data values is even, use the average of the two middle values in the ordered list as the median.

New cards
32

Range

The range of a distribution is the distance between the minimum value and the maximum value. That is, Range = Maximum - Minimum

New cards
33

Standard deviation

Measures the typical distance of the values in a distribution from the mean. It’s calculated by finding an average of the squared deviations and then taking the square root.

New cards
34

Variance

The average squared deviation.

New cards
35

Quartiles

The quartiles of a distribution divide the ordered data set into four groups having roughly the same number of values. To find the quartiles, arrange the data values from smallest to largest and find the median.

New cards
36

First quartile Q1

The first quartile Q1 is the median of the data values that are to the left of the median in the ordered list.

New cards
37

Third quartile Q3

The third quartile Q3 is the median of the data values that are to the right of the median in the ordered list.

New cards
38

Interquartile range (IQR)

The distance between the first and third quartiles of a distribution. In symbols: IQR = Q3 - Q1

New cards
39

Five-number summary

The five-number summary of a distribution of quantitative data consists of the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum.

New cards
40

How to write the distribution

SOCV - Shape, Outliers (gaps, clusters), center, variance + context

New cards
41

Explanatory variable (independent variable)

manipulated variable

New cards
42

Response variable (dependent variable)

changes as a result of the manipulated variable

New cards
43

Boxplot

A visual representation of the five-number summary.

New cards
44

Unusual

Outliers, Peaks, Gaps

New cards
45

Center

Mean/X bar, Median

New cards
46

Spread

Range, Standard Deviation, Variance

New cards
47

Shape

Skewed right/left, Symmetric, Unimodal, Bimodal

New cards
48
<p>Distribution is skewed right</p>

Distribution is skewed right

Mean is greater than median.

New cards
49
<p>Distribution is skewed left</p>

Distribution is skewed left

Mean less than median.

New cards
50

Boxplot Advantages

- Organizes large amounts of data into five number summary + outliers. - Splits data into quartiles.

New cards
51

Boxplot Disadvantages

- Doesn’t show every individual value. - Can hide certain features of shape of distribution. (clusters and gaps) - Only quantitive data.

New cards
52

Histogram Disadvantages

- Doesn’t show every individual value. - Use only with continuous data.

New cards
53

Determining relative position (for distributions with any shape)

Percentile and Standardized Score.

New cards
54

Percentile

Percent of values less than or equal to given value. Only use if know data and/or if normal

New cards
55

Percentile Interpretation Example

“The value of __ is at the pth percentile. About p% of the values are less than or equal to __.”

New cards
56
<p>Standardized Score (z-score)</p>

Standardized Score (z-score)

Show position relative to other values in distribution.

New cards
57

Standardized Score Interpretation Example

“The value of ___ is (z-score) standard deviations above/below the mean”

New cards
58

Normal distribution

- Mound-shaped (bell curve) and symmetric. - Determined by mean and SD.

New cards
59

Empirical rule (for normal distributions)

Percent of data values within, one (68%), two (95%), and three (99.7%) standard deviations of the mean.

New cards
60

Low outlier(s)

Q1-(1.5*IQR)

New cards
61

High outlier(s)

Q3+(1.5*IQR)

New cards
62

Types of ways to show distribution:

Dot plot, stemplot, histogram, box plot, segmented bar graph, mosaic, bar graph, pie charts

New cards
63

Dot plot Advantages

- Shows every individual value. - Shows range, shape, minimum & maximum, gaps & clusters, and outliers easily. - Quick analysis.

New cards
64

Dot plot Disadvantages

- Not great for larger sets of data. - continuous quantitive data.

New cards
65

Stemplot Advantages

- Concise representation of data. - Shows range, shape, outliers, minimum & maximum, gaps, & clusters, easily. - Can handle extremely large data sets.

New cards
66

Stemplot Disadvantages

- Key can be hard to understand at times. - Discrete/continuous quantitive data.

New cards
67
<p>Segmented Bar Graph Advantages</p>

Segmented Bar Graph Advantages

Help you display how a larger category is divided into smaller sub-categories and their relationship to the whole.

New cards
68
<p>Segmented Bar Graph Disadvantages</p>

Segmented Bar Graph Disadvantages

Doesn't tell the total frequency of respondents in each category. - categorical data.

New cards
69
<p>Mosaic Graph Advantages</p>

Mosaic Graph Advantages

Identifies correlations between distinct variables. For example, independence is demonstrated when all of the boxes in the same category have the same areas.

New cards
70
<p>Mosaic Graph Disadvantages</p>

Mosaic Graph Disadvantages

- Hard to focus on either the heights or widths individually. - Strictly categorical and don’t work well with continuous data.

New cards
71
<p>Bar Graph Advantages</p>

Bar Graph Advantages

- Easy to compare multiple data sets.

New cards
72
<p>Bar Graph Disadvantages</p>

Bar Graph Disadvantages

- Best used with categorical discrete data.

New cards
73

Pie Chart Advantages

- Visually appealing. - Shows percent of total for each category.

New cards
74

Pie Chart Disadvantages

- No exact numerical data. - Hard to compare multiple data sets. - Works best with categorical data.

New cards
75

Standardizing a distribution

Same shape, mean=0, SD=1.

New cards
76

Linear transformation of shape

Stays the same.

New cards
77

linear transformation of center (mean/median)

Changes when constant is added/subtracted/multiplied/divided in data set.

New cards
78

linear transformation of variability

Only adjusts when data set multiplied/divided by a constant.

New cards
79

linear transformation of standard deviation

Only adjusts when data set is multiplied/divided by a constant.

New cards
80

Cumulative Relative Frequency Graph

Q1=25% Med=50% Q3=75%

New cards
81
<p>Uniform distribution</p>

Uniform distribution

Mean = Median.

New cards
82

Normal Distribution Calculation: Finding proportion on calculator

normalcdf

New cards
83

Normal Distribution Calculation: Finding boundary value on calculator

invNorm (remember to change percentage to decimal)

New cards

Explore top notes

note Note
studied byStudied by 64 people
... ago
4.9(7)
note Note
studied byStudied by 37 people
... ago
5.0(2)
note Note
studied byStudied by 521 people
... ago
4.5(2)
note Note
studied byStudied by 33 people
... ago
5.0(1)
note Note
studied byStudied by 20 people
... ago
5.0(1)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 22 people
... ago
4.5(2)

Explore top flashcards

flashcards Flashcard (44)
studied byStudied by 42 people
... ago
5.0(1)
flashcards Flashcard (31)
studied byStudied by 21 people
... ago
5.0(1)
flashcards Flashcard (83)
studied byStudied by 36 people
... ago
5.0(2)
flashcards Flashcard (42)
studied byStudied by 11 people
... ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 7 people
... ago
4.0(1)
flashcards Flashcard (60)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (39)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (67)
studied byStudied by 227 people
... ago
5.0(9)
robot