Stats 5 Midterm

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/96

flashcard set

Earn XP

Description and Tags

hi ryan

Last updated 4:45 AM on 4/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

97 Terms

1
New cards

Census

we collect data for every individual in a population

2
New cards

Parameter

numerical summary of the population- values are usually unknown

3
New cards

Statistic

numerical summary of a sample- use sample statistics to estimate the value of population parameters

4
New cards

Descriptive Statistics

Methods for summarizing the collected data

  • describe data through tables, graphs, and numerical summaries such as averages or percentages

  • allow an overview of data to determine statistical methods researcher should use

5
New cards

Inferential Statistics

Methods that take a result from a sample, extend it to the population, and measure the reliability of the result

  • contains uncertainty

6
New cards

Qualitative

Categorical variables

  • classification based on attribute or characteristic

7
New cards

Quantitative

Numerical variables

  • values can be added or subtracted and provide meaningful results

8
New cards

Nominal Variable

Qualitative used to represent names, labels, or categories

  • used to differentiate between different categories

  • do not represent any quantity or order

9
New cards

Ordinal Variable

Qualitative not only represent categories but also indicate an order or ranking

  • Although can be arranged in a certain sequence, intervals between them are not defined

10
New cards

Binary Level

Qualitative- “Yes or No”

11
New cards

Continuous Variable

Quantitative- can have infinite values within possible range

  • Ex. 3.564 grams

12
New cards

Discrete Variable

Quantitative- observations can only exist at limited values

  • Ex. 8 legs

13
New cards

Process of Stats

  1. Identify research objective

  2. Collect data needed for Qs

  3. Describe data

  4. Perform inference

14
New cards

Selection Bias

When study participants are not representative of the target population, leading to skewed, inaccurate, and non-generalizable results

15
New cards

Underestimation of Effectiveness

bias where a study or statistic consistently reports a lower value for an effect, treatment, or relationship than actually exists in the population

16
New cards

Simple Random Sampling

Let chance determine the sample

  • n subjects from a population of size N is one in which each possible sample of size n has the same chance of being selected.

17
New cards

Stratified Sampling

divides the population into separate groups, called strata, and then selects a simple random sample from each stratum

  • This guarantees that each stratum is represented in the sample.

    Stratified Sampling | Definitions, Types, and, Best Practices

18
New cards

Cluster Sampling

divides the population into a large number of clusters, such as city blocks

  • Then a simple random of clusters is selected, and all individuals in the selected clusters are included in the sample.

Cluster Sampling: A Simple Guide with Examples | TGM Research

19
New cards

Systematic Sampling

by selecting every kth individual from a population

Systematic Sampling - What Is It, Example, Advantages

20
New cards

Convenience Sampling

individuals are easily obtained and not based on randomness

  • individuals are self-selected, also called voluntary response samples

21
New cards

Sampling Bias

technique used to obtain the sample favors one part of the population over another

22
New cards

Nonresponse Bias

individuals selected who do not respond to the survey have different opinions from those who do respond.

23
New cards

Response Bias

when the answers on a survey do not reflect the true opinions of the respondent

  • because lying, wording of the questions, or way in which the interviewer asks the question is confusing or misleading.

24
New cards

Explanatory Variables

IV- or factors has on a response variable

25
New cards

Element of a Good Experiment

  • Control Comparison group

  • Randomization

  • Blinding the Study

  • Replication

26
New cards

Replication

when each treatment is applied to more than one experimental unit

  • ensures the effect of a treatment is not due to some characteristic of a single experimental unit

27
New cards

Frequency Distribution

lists each category of data and the number of observations in each

28
New cards

Relative Frequency Distribution

lists each category of data together with the relative frequency, the proportion of observations in each category

29
New cards

Relative Frequency

taking the frequency for a particular category and dividing by the total number of observations

30
New cards

Bar Graph

bar for each category, where the height of each bar is either the frequency or the relative frequency of the category.

31
New cards

Pareto Chart

bar graph with categories ordered by their frequency or relative frequency, from tallest bar to shortest bar

  • useful comparing 2 qualitative variables with emphasis on comparing different parts, but nut necessarily the whole

32
New cards

Pie Chart

circle divided into sectors for each category

  • area of each sector is proportional to the relative frequency of the category

  • useful for showing the division with emphasis on comparing the part to the whole

33
New cards

Histogram

constructed by drawing rectangles for each class of data.

  • The height of each rectangle is the frequency or relative frequency of the class

34
New cards

Dot Plot

horizontal axis that spans from the minimum to the maximum data values

  • a dot above its corresponding value on the axis

35
New cards

Stem-and-Leaf Plot

How to Read a Stem and Leaf Plot: 3 Easy Steps

36
New cards

Shape of Distribution

symmetry or skewness, the number of peaks, any clusters or gaps, and outliers

37
New cards

Uniform

frequency of each class is relatively the same

38
New cards

Bell-Shaped

Symmetric and unimodal is described as bell-shaped.

39
New cards

How can we Describe Distribution

  1. Shapesymmetry or skewness, the number of peaks, any clusters or gaps, outliers

  2. Center mean or median.

  3. Spread – spread of the distribution describes the variability in the data

    1. are either clustered together or spread out.

    2. range, variance, standard deviation, and interquartile range

40
New cards

Left-Skewed with Mean and Median

Mean < Median

41
New cards

Symmetric With Mean and Median

Median = Mean

42
New cards

Right Skewed with Median and Mode

Mean > Median

43
New cards

Measures of Spread

  • Range

  • IQR

  • Variance

  • Standard Deviation

44
New cards

Range

difference between the largest and the smallest observations

  • R = maximum – minimum

  • affected severely by outliers.

45
New cards

Deviation from Mean

how far each value is from the mean

  • Deviation =( Value - Mean )=

  • Positive deviation = above average

  • Negative deviation = below average

46
New cards

Standard Deviation

measure of how spread out the numbers in a data set are and how much the data varies from the mean

  • A small SD-close to the mean.

  • A large SD-wider range of values.

  • assess the risk or volatility in fields like finance, quality control, and research.

47
New cards

Standard Dev Formula

How to Calculate a Sample Standard Deviation

48
New cards

Variance Formula

Variance in Calculator: Understanding the Concept and Its Applications

49
New cards

Outliers

observation that is unusually large or small relative to the other values in a data set

  • observed, recorded, or entered into the computer incorrectly.

  • comes from a different population.

  • correct but represents a rare event.

50
New cards

Percentile

n observations arranged in order, the pth percentile is a number such that p% of the observations fall below the pth percentile and (100 – p)% fall above it. 

  • Ex. 90th percentile

    • 90% of test takers scored below

    • 10% of test takers scored higher

51
New cards

Quartiles

most common percentiles, dividing data into four equal parts:

  • First Quartile (Q₁): The 25th percentile (the lowest 25% of data).

  • Second Quartile (Q₂): The 50th percentile, which is also the median.

  • Third Quartile (Q₃): The 75th percentile (top 25% of data below the maximum)

52
New cards

Interquartile Range IQR

distance between the first and third quartiles

  • IQR = Q3 – Q1

  • measure of spread; the more spread out the data the larger the IQR.

  • represents the range of the middle 50% of observations

53
New cards

Fences

To spot outliers

  • Lower fence = Q1 - 1.5 x IRQ

  • Upper fence = Q3 + 1.5 x IRQ

54
New cards

For distributions that are symmetric report the

Mean and Standard deviation

55
New cards

For distributions that are skewed report the

Median and IRQ

56
New cards

Boxplot

A Graphical Representation of the Five-Number Summary

  • box extending from the lower quartile (Q1) to the upper quartile (Q3).

  • line at the median (Q2).

  • whiskers to smallest and largest observation that is not an outlier

  • data below the lower fence or above the upper fence are considered outliers and are marked with an asterisk

57
New cards

Boxplots used for

  • Large data sets (5+ data points)

  • Unimodal Distributions (can obscure bimodal)

58
New cards

A Boxplot that is symmetric will have

  • median near center of the box

  • left and right whiskers same length

  • How to Identify Skewness in Box Plots

59
New cards

A Boxplot that is skewed right will have

  • median left of the center

  • right whisker will be longer than the left whisker (or there may be high outliers)

Identification of Skewness in Box Plots - GeeksforGeeks

60
New cards

A Boxplot that is skewed left will have

  • median right of the center

  • left whisker will be longer than the right whisker (or there may be low outliers)

How to Identify Skewness in Box Plots

61
New cards

Comparing two Boxplots

  • Provide a measure of center for each, and which one is larger/smaller

  • Provide a measure of spread for each, and which is larger/smaller

(Skewed- compare medians and IQR, Symmetric- means and SD)

62
New cards

Univariate Analysis

Analysis of a single variable to understand its distribution or characteristics

63
New cards

Categorical Visualization Methods

  • Bar graph

  • Pie chart

  • Pareto chart

64
New cards

Quantative Visualization Methods

  • Stem plot

  • Histogram

  • Boxplot

65
New cards

Bivariate Analysis

Analysis of relationship between two variables

66
New cards

Two Quantitative Variables Visualization Method

  • Scatter Plot

67
New cards

One Quantitative and one Categorical Variable Visualization Methods

  • Boxplot

  • Bar Graph

68
New cards

Response Variable

DV- measures outcome of study

69
New cards

Scatter Plot

graph showing the relationship between two quantitative variables measured on the same individual

  • if roughly straight-line trend, the relationship between x and y is said to be approximately linear

70
New cards

When Two Variables have Linear Relationship

we can describe the direction of their association:

  • Positive association: As x increases, y also tends to increase

  • Negative association: As x increases, y tends to decrease

  • No association: As x increases, there is no clear pattern in changes of y

71
New cards

Correlation Coefficient Formula

What Is The Correlation Formula?

72
New cards

Iinear Correlation Coefficient

Direction:

  • r > 0positive association.

  • r < 0negative association

Form:

  • r close to 0 = not linear.

Strength:

  • The closer r is to ±1, the stronger the linear relationship.

  • The closer r is to 0, the weaker the linear relationship.

*NOT RESISTANT

73
New cards

Correlation Coefficient and Critical Value

If the absolute value of the correlation coefficient is greater than the critical value, we say a linear relation exists between the two variables.

  • Otherwise, there is no relation

74
New cards

Least-Squares Regression

used to predict the value of the response variable (y) based on a given value of the explanatory variable (x)

  • Creates linear equation (regression line)

  • NOT FOR VALUES OUTSIDE RANGE OF DATA COLLECTED (extrapolation)

75
New cards

Extrapolation

the estimation of values beyond a known dataset's range

  • can lead to answers that don’t make sense bc we cannot be certain of the behavior of data for which we have no observations

76
New cards

Residuals

prediction error for any given value of x.

  • Formula: Residual = Observed y - Predicted y

  • In a scatterplot, the residual is the vertical distance between a data point and the regression line (smaller distance the better)

77
New cards

R2 (coefficient of determination)

Evaluates strength fit of a linear model

  • calculated with square of correlation coefficient

  • tells us what percent variability in response variable is explained in model

78
New cards

Limitations of Regression Models

  • Approximation- average value of y given x, not exact

  • Influence of Other Variables

  • Random Variation- will be unexplained random variation in y

  • Line of Means- regression line predicts mean value of y for all specific x

79
New cards

Probability Experiment

act or process of observation with uncertain results that can be repeated

  • Probability of outcome- proportion of times that the outcome would occur in a long run of observations

  • Probabilities are ALWAYS between 0 and 1

80
New cards

Independent Random Experiment

if the outcome of any one trial is not affected by the outcome of any other trial

  • Ex. No matter how many times heads or tails have appeared before, the 11th flip is still a new event, and the probability of getting heads or tails remains 50%

81
New cards

Probability Model

Description of probability experiment, includes:

  • list of all possible outcomes (sample space (S))

  • probability for each outcome

82
New cards

Sample Space

(denoted S) is the set of all possible outcomes

  • Ex. Rolling a Die: S = {1, 2, 3, 4, 5, 6}

  • Ex. Flipping a Coin: S = {H, T}

83
New cards

Tree Diagram

Sample space if experiment consist of more than one technique

  • Ex. S = {HH, HT, TH, TT}

84
New cards

All Possible Outcomes Without Replacement

Multiply first and second amounts in tree graph

85
New cards

All Possible Outcomes With Replacement

Multiplication Rule for Counting: n^r

  • repeating task with n outcomes r times

(3 marble types taken 2 times —> 3² = 9)

86
New cards

Event

any collection of outcomes from a probability experiment

  • Denoted A or B

  • impossible event- the probability of the event is 0

  • certainty event- the probability of the event is 1

  • unusual event- low probability of occurring

    • Typically, an event with a probability less than 0.05 (or 5%)

87
New cards

Simple Events

events w only one outcome denoted e

88
New cards

Equally Likely Outcomes

when each outcome has the same chance of occurring

89
New cards

Probability Range

Probability of event A (denoted as P(A)) must be between 0 and 1

  • 0 < P(A) < 1

90
New cards

Sum of Probabilities

Sum of probabilities of all possible outcomes must equal 1

  • If S = {e1, e2, e3}, P(e1) + P(e2) + P(e3) = 1

91
New cards

Complement

consists of all outcomes in the sample space that are not in event A.

  • denote with Ac

  • P(Ac) = 1 - P(A)

92
New cards

Mutually Exclusive Events

Disjointed- do not have any common outcomes

Mutually Inclusive vs. Mutually Exclusive Events
  • P(A or(U) B) = P(A) + P(B)

93
New cards

Intersection Event

two events A and B consists of the outcomes in both A and B.

  • ONLY THE OVERLAP (A and B at once)

  • P(A ∩ B) = P(A) x P(B)

94
New cards

Union Event

two events A and B consists of outcomes that are in A or B.

  • P(A or B) = A or B or Both

  • P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

95
New cards

Conditional Probability

Of the cases in which B occurred, P(A|B) is the proportion in which A also occurred

  • P(A and B) / P(B)

  • Finding probability of event when you know what the outcome was for another event

Why is the denominator in a conditional probability the probability of the  conditioning event? - Cross Validated

96
New cards

General Multiplication Rule

  • P(A and B) = P(A|B) x P(B) or

  • P(A and B) = P(B|A) x P(A)

Example

97
New cards

3 Ways to Determine if A and B are independent events

  1. Is P(A|B) = P(A)?

  2. Is P(B|A) = P(B)?

  3. Is P(A and B) = P(A)∙P(B)?

If any true, others true and they are independent