STATS 100 Exam 1a/

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/137

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:59 AM on 3/30/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

138 Terms

1
New cards

Statistics is

the science of variability

2
New cards

3 short statistical sayings:

What was compared?

Who’s not here?

Incorporate “ish”-ness

3
New cards

Data is

a representation of someone or something

4
New cards

Tidy data is

a way of mapping the real world to a data set

5
New cards

Observations

an entire row, they are the things we care about

6
New cards

Attributes

An entire column, a unique aspect of each observation that we are interested in, they are the pieces of info we are interested in

7
New cards

Measures are

the way in which we collect info about observations

8
New cards

Quantitative data

data in which the values of an attribute for an observation are numbers representing a quantity of something

Key Giveaway: “How many?”, values with decimals, scales

Ex: temp., blood pressure

9
New cards

Categorical data

Data in which the values of an attribute for an observation are selected from a set of different category labels

Key Giveaway: selected from a drop-down list, no order to the values

Ex: hair color, blood type, country of origin

10
New cards

Text Data

Data in which the values of an attribute for an observation is text, typically from an open-ended question. Can be a single word or full sentence

Key Giveaway: Write-in data

Ex: customer reviews, social media posts

11
New cards

Rating scale data

Data in which the values of an attribute for an observation are selected from a predetermined rating scale ~ a list that has an order to it. Can even use numbers (e.g. rate pain 1-10)

Key Giveaway: Clear order to the values

Ex: “to what extent do you agree or disagree?”, how much pain are you in between 1 and 5?)

12
New cards

Time Series Data

Data in which the values of an attribute for an observation indicate a movement in time (such as a year, month, or day)

Key Giveaway: “When?”

Ex: date of birth, year, financial quarterr

13
New cards

Reliability

the extent to which the data you collect from a measure truly represents and reflects the real world characteristics of the observations. (process of measurement)

14
New cards

Data validation

is the act ensuring that the values collected from each observation for each attribute are reliable/valid

15
New cards

Problem

Identify a statistical question

16
New cards

Plan

Choos a sample design, study design, and measures

17
New cards

Data

Collect and process data

18
New cards

Analysis

Look for patterns with summary tables, graphs, and statistical models

19
New cards

Conclusion

Interpret the results and generate new questions about the real world context

20
New cards

Measurement

the process by which we collect information about observations

21
New cards

Data cleaning

Removing invalid values from a dataset

22
New cards

Prospective

Build checks into the survey

23
New cards

Retrospective

Clean the data

24
New cards

Univariate analysis

the analysis of a single attribute or variable at a time

25
New cards

Standard Deviation

Measures how spread out the data points are from the mean of a dataset.

Low = data points clustered closely around the mean

High = data points are spread out over wider range

26
New cards

5 Number Summary

Contains 5 numbers that help statisticians and data scientists understand the different values that the different observations have for an attribute

27
New cards

Minimum

the smallest value that any observation has for the attribute

28
New cards

1st quartile

25th percentile value. 25% of observations have a value below the 1st quartile

29
New cards

The median or the 2nd quartile

the 50th percentile value. ½ of all observations have a value below the median

30
New cards

3rd quartile

the 75th percentile value. 75% of all observations have a value below the 3rd quartile

31
New cards

Maximum

is the largest value that any observation has for the attribute

32
New cards

Frequencies

the number of times a specific value of event occurs in a dataset, how common something is

33
New cards

Percentages or relative frequencies

ratio or percentage of the number of times a value of the data sets occurs out of the total number of outcomes

34
New cards

Dot plot

a graph where each observation is displayed as 1 point on the graph

height of a dot plot = frequency for a particular value

35
New cards

Density plot

similar to a dot plot with a line drawn across the top of all of the stacks of dots

36
New cards

Word cloud

graphically depicts all of the words across all of the responses from all of the observations. The size of the word varies.

Larger the word, the more frequent it is

37
New cards

Bar graph

based on frequency table, and has 1 bar for every response option

The bar’s height is equal to each response option’s frequency

38
New cards

Aggregate characteristics

the characteristics of a group of observations

  • for text, rating scale, and categorical data, the main aggregate characteristics we focus on are the frequencies and percentages of each of the response options

  • When examining quantitative data, we focus on the shape of the data, the spread of the data, and the location of the data

39
New cards

Distribution

The pattern that the responses from all the observations make

40
New cards

Normal distributions

A bell-curve shape

  • most of the observations have a value near the average

  • Approximately 95% of the observations will have a value within 2 standard deviations of the mean

41
New cards

Skewed distribution

look like they’ve had 1 side stretched out

not symmetric

42
New cards

Right Skew

distributions look like the right side of a normal distribution has been stretched, which indicates that some units have very large values

43
New cards

Left Skew

Distributions look like the left side of a normal distribution has been stretched, which indicates that some units have very small values

44
New cards

Multimodal distributions

multiple peaks

often seen when there are actually group differences in the attribute

45
New cards

2-way table

is similar to a frequency table, except that 1 attribute’s frequencies are presented as different rows in the table, and a second attribute’s frequencies are presented as columns

46
New cards

Column percents

are relative frequencies based only on the total from a single column

  • Used to compare the distribution of a categorical or rating scale attribute between 2 groups

47
New cards

Line graph

Places time on the x axis, and the average value of an attribute or a percentage on the vertical axis

48
New cards

Ratio of Standard Deviations

Equal to the largest standard deviation between the 2 groups divided by the smallest standard deviation

  • Used to compare the spread of the distribution between 2 groups

  • If the ratio is approximately 3 or higher, we say that 1 distribution is MORE spread out than the other

49
New cards

Effect Size

Equal to the difference in the means, divided by the larger standard deviation

  • Used to compare the relative difference in the means between 2 groups

  • 0.75 or MORE indicate a LARGE difference in the averages between the groups

  • 0.25 or LESS indicate a SMALL difference in the averages between the groups

  • Between 0 and 0.10 indicate that there is a negligible or no meaningful difference in the averages between groups

50
New cards

Grouped Bar Graph

Has 1 bar graph created separately for each of the different response options for the 2nd attribute being considered in the 2-way table

51
New cards

Grouped density plots

have 1 density curve for a quantitative attribute for each group all on the same plot

52
New cards

Scatter plot

A plot in which each observation is placed as a point on a graph according to their value for each of the 2 quantitative attributes

  • Usually the casual factor goes on the horizontal axis

53
New cards

Smoothed Trend Line

A line through the average value of the vertical axis attribute across all values of the horizontal axis attribute

54
New cards

Correlation Coefficient

A statistic that summarizes how related 2 attributes are to each other

X always increases

  • Values close to 0 indicate that there is NO association between the attributes

  • Values close to -1 indicate that there is a STRONG NEGATIVE association between the attributes, as x increases, y decreases

  • Values close to +1 indicate that there is a STRONG POSITIVE association between the attributes, as x increases, y increases

55
New cards

Quantitative Measures

Ex: temp., blood pressure

Summary: count, mean, sd, 5-number summary

Visualizations: density curve

56
New cards

Categorical Measure

Ex: hair color, blood type, country of origin

Summary: frequency table

Visualizations: bar chart

57
New cards

Rating Scale

Ex: “to what extent do you agree or disagree?”, how much pain are you in between 1 and 5?

Summary: frequency table (with the order preserved)

Visualizations: bar chart (with the order preserved)

58
New cards

Text Measure

Ex: customer reviews, social media posts

Summary: frequency table

Visualizations: word cloud

59
New cards

Time Measure

Line graph is ‘when’

60
New cards

Modality

The number of ‘peaks’ a distribution has

61
New cards

Middle 95%

Represents most of the observations

62
New cards

Empirical Rule

The middle 95% of a normal distribution lies within 2 standard deviations of the mean

63
New cards

Middle 99%

3 standard deviations of the mean

64
New cards

Middle 68%

1 standard deviation of the mean

65
New cards

Standard Deviation Equation

(Upper limit of middle 95% - lower limit of middle 95%) / 4

66
New cards

Mean

math average (sum / by count) Add all the data points together and / by total # of points

67
New cards

Median

exact middle value when data is ordered. If even # of data points it is the average of the 2 middle #’'‘s

68
New cards

Mode

The value that appears the most frequently. A set can have no mode, 1 mode, or multiple modes. The peak of a distribution

69
New cards

Absolute Difference

Subtracting 1 column percent from the other, and taking the absolute value (i.e., positive numbers only)

Whenever the absolute difference is GREATER THAN 10%, we interpret that difference to indicate a real-world difference between the 2 groups

As you get closer to 0 the more similar they are

70
New cards

Relative Difference

When column percents are less than 50% we focus on computing the relative difference

By dividing 1 column percent from the other

Whenever the relative difference is LESS THAN 0.80 or GREATER THAN 1.25 we interpret that difference to indicate a real world difference between the 2 groups

As you get closer to 1 the more similar the data is

71
New cards

Ratio of Standard Deviation Equation

Larger sd / Smaller sd

If the ratio is approximately 3 OR HIGHER we say that 1 distribution is more spread out than the other

72
New cards

Effect Size Equation

(Group 1 mean - Group 2 mean) / larger standard deviation

  • 0.10ish or less as an indicator of NO difference in the averages between the groups

  • 0.25ish as an indication of a SMALL difference in the averages between the groups

  • 0.75ish as an indicator of a LARGE difference in the average between the groups

73
New cards

If 2 attributes are not correlated at all…

The correlation will be 0

74
New cards

If 2 attributes are perfectly correlated with a positive trend…

the correlation will be 1

75
New cards

If 2 attributes are perfectly correlated with a negative trend…

the correlation will be -1

76
New cards

The closer a correlation is to 1 or -1…

the stronger the correlation

77
New cards

If the points are all very close to the trend line

Then the relationship is very strong

78
New cards

If the points are all very spread out and far away from the trend line

Then the relationship is very weak

79
New cards

Interaction plot

simplifies comparing distributions between 2 sets of different groups by focusing just on the relative frequencies or mean

Don’t forget error bars

80
New cards

2 types of distinguishing characteristics of quantitative distributions that statisticians and data scientists focus on:

  1. the number of ‘peaks’ the distribution has

  2. whether or not the distribution is symmetric

81
New cards

2 types of modality

  1. unimodal distributions: distributions with only 1 ‘peak’

  2. multimodal distributions: distributions with more than 1 ‘peak’

82
New cards

The empirical rule only applies to:

normal distributions (unimodal symmetric bell-curves)

83
New cards

In symmetric distributions, the location of the ‘peak’ will be equal to:

The values of both the mean and median

84
New cards

In skewed distributions, the value of the median and mean will be:

pulled away from the mode in the same direction of the skew, with the mean being affected more than the median

85
New cards

In left skew distributions, the few observations with very small values ‘pull’ the mean and median to the:

left of the mode. we focus on the value of the median to tell us the location of the center of the distribution, or the most typical value

86
New cards

In right skewed distributions the few observations with very large values ‘pull’ the mean and median to the:

Right of the mode

87
New cards

U-shaped distribution

Bimodal frequency distribution where data points are concentrated at the extreme ends (low and high values) and sparse in the middle, creating a U-shape on the graph.

Most of the observations are concentrated at the 2 extremes (the lowest and highest values) of the range.

88
New cards
<p>Uniform Distribution</p>

Uniform Distribution

A flat shape. Type of probability distribution where all possible outcomes or values within a specific range are equally likely to occur. Looks like a rectangle.

89
New cards
<p>Distribution of Categorical or Rating Scale</p>

Distribution of Categorical or Rating Scale

Based on: Groups as defined by a categorical attribute

Summaries: 2-way table, column percents

90
New cards
<p>Distribution of Quantitative</p>

Distribution of Quantitative

Based on: Groups as defined by a categorical attribute

Summaries: summary table with 1 row per group, effect size, ratio of SDs

91
New cards
<p>Distribution or Quantitative or Categorical or Rating Scale</p>

Distribution or Quantitative or Categorical or Rating Scale

Based on: Time periods as defined by a time series attribute

Summaries: means or percentages by time period

92
New cards
<p>Distribution of Quantitative (2)</p>

Distribution of Quantitative (2)

Based on: Another Quantitative attribute

Summaries: Correlation

93
New cards
<p>Distribution or Quantitative or Categorical or Rating Scale (2)</p>

Distribution or Quantitative or Categorical or Rating Scale (2)

Based on: Time periods as defined by a time series attribute AND Groups as defined by a categorical attribute

Summaries: means or percentages by time period and group

94
New cards
<p>Distribution of Quantitative (3)</p>

Distribution of Quantitative (3)

Based on: Another Quantitative attribute AND Groups as defined by a categorical attribute

Summaries: Correlation

95
New cards
<p>Distribution or Quantitative or Categorical or Rating Scale (3)</p>

Distribution or Quantitative or Categorical or Rating Scale (3)

Based on: 2 sets of Groups as defined by 2 categorical attribute

Summaries: means or percentages by group

96
New cards

Causal Inference

an inference about which factor or factors may be responsible for causing an effect on some observed outcome

97
New cards

Observed outcome

What actually happened in the real world

98
New cards

Counterfactual question

What would have happened in the parallel universe

99
New cards

Counterfactual questions are

“What if"?” questions

100
New cards

Treatment observation

observations that were exposed to some causal factor

Explore top flashcards

flashcards
Niemiecki - 7.03
65
Updated 382d ago
0.0(0)
flashcards
Kapitel 2.2
52
Updated 1212d ago
0.0(0)
flashcards
Ap world unit 8 vocab
64
Updated 1082d ago
0.0(0)
flashcards
Human Evolution Unit 1
46
Updated 1132d ago
0.0(0)
flashcards
Executive Branch Zehe Test Prep
60
Updated 837d ago
0.0(0)
flashcards
exam 2 - id
48
Updated 168d ago
0.0(0)
flashcards
Niemiecki - 7.03
65
Updated 382d ago
0.0(0)
flashcards
Kapitel 2.2
52
Updated 1212d ago
0.0(0)
flashcards
Ap world unit 8 vocab
64
Updated 1082d ago
0.0(0)
flashcards
Human Evolution Unit 1
46
Updated 1132d ago
0.0(0)
flashcards
Executive Branch Zehe Test Prep
60
Updated 837d ago
0.0(0)
flashcards
exam 2 - id
48
Updated 168d ago
0.0(0)