AP STATS UNIT 1 - DATA ANALYSIS

5.0(1)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/82

flashcard set

Earn XP

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

83 Terms

1
New cards

Statistics

The science and art of collecting, analyzing, and drawing conclusions from data.

2
New cards

Individual

An object described in a set of data. Individuals can be people, animals, or things.

3
New cards

Variable

An attribute that can take different values for different individuals.

4
New cards

Categorical variable

Assigns labels that place each individual into a particular group, called a category.

5
New cards

Quantitative variable

Takes number values that are quantities—counts or measurements.

6
New cards

Discrete variable

A quantitative variable that takes a fixed set of possible values with gaps between them. (ex. Number of siblings)

7
New cards

Continuous variable

A quantitative variable that can take any value in an interval on the number line. (ex. height of person)

8
New cards

Distribution

Of a variable, tells us what values the variable takes and how often it takes those values.

9
New cards

Frequency table

Shows the number of individuals having each value.

10
New cards

Relative frequency table

Shows the proportion or percent of individuals having each value.

11
New cards

Bar graph

Shows each category as a bar. The heights of the bars show the category frequencies or relative frequencies.

12
New cards

Pie chart

Shows each category as a slice of the "pie." The areas of the slices are proportional to the category frequencies or relative frequencies.

13
New cards

Two-way table

A table of counts that summarizes data on the relationship between two categorical variables for some group of individuals.

14
New cards

Marginal relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable.

15
New cards

Joint relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable.

16
New cards

Conditional relative frequency

Gives the percent or proportion of individuals that have a specific value for one categorical variable among individuals who share the same value of another categorical variable (the condition).

17
New cards

Side-by-side bar graph

Displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and placed side by side.

18
New cards
<p>Segmented bar graph</p>

Segmented bar graph

Displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category. Must add up to 100%. Association occurred if graphs are different.

19
New cards
<p>Mosaic plot</p>

Mosaic plot

A modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category.

20
New cards

Association

There is an association between two variables if knowing the value of one variable helps us predict the value of the other. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables.

21
New cards

Dotplot

Shows each data value as a dot above its location on a number line.

22
New cards

Symmetric distribution

A distribution is roughly symmetric if the right side of the graph (containing the half of observations with the largest values) is approximately a mirror image of the left side. (center is mean). Mean is equal to median.

23
New cards

Skewed distribution

Skewed Distribution: Definition, Examples - Statistics How To

A distribution is skewed to the right if the right side of the graph is much longer than the left side. A distribution is skewed to the left if the left side of the graph is much longer than the right side. (center is median)

24
New cards

Stemplot

Shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from lowest to highest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

25
New cards

Histogram

Shows each interval of values as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval. Good for large sets of data.

26
New cards

Mean

The mean of a distribution of quantitative data is the average of all the individual data values. To find the mean, add all the values and divide by the total number of data values.

27
New cards

Statistic

A number that describes some characteristic of a sample.

28
New cards

Parameter

A number that describes some characteristic of a population.

29
New cards

Resistant

A statistical measure is resistant if it isn't sensitive to extreme values. (ex. median and IQR)

30
New cards

Non-resistant

Statistical measures that can be greatly influenced by extreme values/outliers in a dataset. (ex. mean, SD, range)

31
New cards

Median

The midpoint of a distribution, the number such that about half the observations are smaller and about half are larger. To find the median, arrange the data values from smallest to largest. — If the number n of data values is odd, the median is the middle value in the ordered list.; If the number n of data values is even, use the average of the two middle values in the ordered list as the median.

32
New cards

Range

The range of a distribution is the distance between the minimum value and the maximum value. That is, Range = Maximum - Minimum

33
New cards

Standard deviation

Measures the typical distance of the values in a distribution from the mean. It’s calculated by finding an average of the squared deviations and then taking the square root.

34
New cards

Variance

The average squared deviation.

35
New cards

Quartiles

The quartiles of a distribution divide the ordered data set into four groups having roughly the same number of values. To find the quartiles, arrange the data values from smallest to largest and find the median.

36
New cards

First quartile Q1

The first quartile Q1 is the median of the data values that are to the left of the median in the ordered list.

37
New cards

Third quartile Q3

The third quartile Q3 is the median of the data values that are to the right of the median in the ordered list.

38
New cards

Interquartile range (IQR)

The distance between the first and third quartiles of a distribution. In symbols: IQR = Q3 - Q1

39
New cards

Five-number summary

The five-number summary of a distribution of quantitative data consists of the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum.

40
New cards

How to write the distribution

SOCV - Shape, Outliers (gaps, clusters), center, variance + context

41
New cards

Explanatory variable (independent variable)

manipulated variable

42
New cards

Response variable (dependent variable)

changes as a result of the manipulated variable

43
New cards

Boxplot

A visual representation of the five-number summary.

44
New cards

Unusual

Outliers, Peaks, Gaps

45
New cards

Center

Mean/X bar, Median

46
New cards

Spread

Range, Standard Deviation, Variance

47
New cards

Shape

Skewed right/left, Symmetric, Unimodal, Bimodal

48
New cards
<p>Distribution is skewed right</p>

Distribution is skewed right

Mean is greater than median.

49
New cards
<p>Distribution is skewed left</p>

Distribution is skewed left

Mean less than median.

50
New cards

Boxplot Advantages

- Organizes large amounts of data into five number summary + outliers. - Splits data into quartiles.

51
New cards

Boxplot Disadvantages

- Doesn’t show every individual value. - Can hide certain features of shape of distribution. (clusters and gaps) - Only quantitive data.

52
New cards

Histogram Disadvantages

- Doesn’t show every individual value. - Use only with continuous data.

53
New cards

Determining relative position (for distributions with any shape)

Percentile and Standardized Score.

54
New cards

Percentile

Percent of values less than or equal to given value. Only use if know data and/or if normal

55
New cards

Percentile Interpretation Example

“The value of __ is at the pth percentile. About p% of the values are less than or equal to __.”

56
New cards
<p>Standardized Score (z-score)</p>

Standardized Score (z-score)

Show position relative to other values in distribution.

57
New cards

Standardized Score Interpretation Example

“The value of ___ is (z-score) standard deviations above/below the mean”

58
New cards

Normal distribution

- Mound-shaped (bell curve) and symmetric. - Determined by mean and SD.

59
New cards

Empirical rule (for normal distributions)

Percent of data values within, one (68%), two (95%), and three (99.7%) standard deviations of the mean.

60
New cards

Low outlier(s)

Q1-(1.5*IQR)

61
New cards

High outlier(s)

Q3+(1.5*IQR)

62
New cards

Types of ways to show distribution:

Dot plot, stemplot, histogram, box plot, segmented bar graph, mosaic, bar graph, pie charts

63
New cards

Dot plot Advantages

- Shows every individual value. - Shows range, shape, minimum & maximum, gaps & clusters, and outliers easily. - Quick analysis.

64
New cards

Dot plot Disadvantages

- Not great for larger sets of data. - continuous quantitive data.

65
New cards

Stemplot Advantages

- Concise representation of data. - Shows range, shape, outliers, minimum & maximum, gaps, & clusters, easily. - Can handle extremely large data sets.

66
New cards

Stemplot Disadvantages

- Key can be hard to understand at times. - Discrete/continuous quantitive data.

67
New cards
<p>Segmented Bar Graph Advantages</p>

Segmented Bar Graph Advantages

Help you display how a larger category is divided into smaller sub-categories and their relationship to the whole.

68
New cards
<p>Segmented Bar Graph Disadvantages</p>

Segmented Bar Graph Disadvantages

Doesn't tell the total frequency of respondents in each category. - categorical data.

69
New cards
<p>Mosaic Graph Advantages</p>

Mosaic Graph Advantages

Identifies correlations between distinct variables. For example, independence is demonstrated when all of the boxes in the same category have the same areas.

70
New cards
<p>Mosaic Graph Disadvantages</p>

Mosaic Graph Disadvantages

- Hard to focus on either the heights or widths individually. - Strictly categorical and don’t work well with continuous data.

71
New cards
<p>Bar Graph Advantages</p>

Bar Graph Advantages

- Easy to compare multiple data sets.

72
New cards
<p>Bar Graph Disadvantages</p>

Bar Graph Disadvantages

- Best used with categorical discrete data.

73
New cards

Pie Chart Advantages

- Visually appealing. - Shows percent of total for each category.

74
New cards

Pie Chart Disadvantages

- No exact numerical data. - Hard to compare multiple data sets. - Works best with categorical data.

75
New cards

Standardizing a distribution

Same shape, mean=0, SD=1.

76
New cards

Linear transformation of shape

Stays the same.

77
New cards

linear transformation of center (mean/median)

Changes when constant is added/subtracted/multiplied/divided in data set.

78
New cards

linear transformation of variability

Only adjusts when data set multiplied/divided by a constant.

79
New cards

linear transformation of standard deviation

Only adjusts when data set is multiplied/divided by a constant.

80
New cards

Cumulative Relative Frequency Graph

Q1=25% Med=50% Q3=75%

81
New cards
<p>Uniform distribution</p>

Uniform distribution

Mean = Median.

82
New cards

Normal Distribution Calculation: Finding proportion on calculator

normalcdf

83
New cards

Normal Distribution Calculation: Finding boundary value on calculator

invNorm (remember to change percentage to decimal)