stats midterm

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/98

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

99 Terms

1
New cards

Simulation

A method used to imitate a real-world statistical situation using random processes

2
New cards

Random process

A process that uses chance to determine outcomes, often with random numbers

3
New cards

Repeated trials

Running a simulation many times to observe the distribution of outcomes

4
New cards

Response variable (simulation)

The outcome recorded for each trial

5
New cards

Sampling variability

The natural differences that occur from sample to sample

6
New cards
7
New cards

Population

The entire group we want information about

8
New cards

Sample

A subset of the population used to draw conclusions

9
New cards

Census

A study that includes every member of the population

10
New cards

Sample size

The number of individuals in a sample

11
New cards

Representative sample

A sample that accurately reflects the population

12
New cards

Simple random sample (SRS)

Every possible sample of the same size has an equal chance of being chosen

13
New cards

Sampling frame

The list of individuals from which a sample is drawn

14
New cards

Stratified random sample

Population divided into similar groups (strata), then SRS taken from each

15
New cards

Cluster sample

Population divided into clusters, randomly select clusters and survey everyone in them

16
New cards

Systematic sample

Selecting every nth individual after a random start

17
New cards

Pilot survey

A trial run of a survey used to improve the final version

18
New cards

Voluntary response bias

Bias caused when individuals choose to participate

19
New cards

Convenience sampling

Choosing individuals because they are easy to reach

20
New cards

Undercoverage

Some groups in the population are left out or undersampled

21
New cards

Nonresponse bias

Bias from differences between respondents and nonrespondents

22
New cards

Response bias

Bias caused by the wording or design of survey questions

23
New cards
24
New cards

Observational study

Researchers observe but do not impose treatments

25
New cards

Retrospective study

Uses data from past events

26
New cards

Prospective study

Collects data going forward in time

27
New cards

Experiment

A study that imposes treatments to determine cause and effect

28
New cards

Explanatory variable

The variable that explains or causes changes

29
New cards

Response variable

The variable that is measured as an outcome

30
New cards

Control group

A group that receives no treatment

31
New cards

Randomization

Randomly assigning treatments to reduce bias

32
New cards

Replication

Using enough subjects to reduce variability

33
New cards

Blocking

Grouping similar subjects to reduce variability

34
New cards

Matched pairs design

A design comparing two treatments on the same or similar subjects

35
New cards
36
New cards

Factor

A variable manipulated in an experiment

37
New cards

Levels

The different values of a factor

38
New cards

Treatment

A specific combination of factor levels

39
New cards

Confounding variable

A variable that affects the response but is not controlled

40
New cards

Lurking variable

A hidden variable that affects both explanatory and response variables

41
New cards
42
New cards

Blinding

Subjects do not know which treatment they receive

43
New cards

Double blinding

Neither subjects nor evaluators know the treatment

44
New cards

Placebo

A fake treatment used for comparison

45
New cards

Placebo effect

A response caused by belief in treatment rather than the treatment itself

46
New cards

Sample space

The set of all possible outcomes

47
New cards

Law of large numbers

As trials increase, empirical probability approaches true probability

48
New cards

Complement rule

P(A) + P(not A) = 1

49
New cards

Mutually exclusive events

Events that cannot happen at the same time

50
New cards

Independent events

The outcome of one event does not affect another

51
New cards

Conditional probability

The probability of an event given another has occurred

52
New cards

Tree diagram

A diagram that shows all possible outcomes and probabilities

53
New cards

Probability model

A list of outcomes and their probabilities

54
New cards

Random variable

A variable whose value depends on chance

55
New cards

Expected value

The long-run average outcome (weighted mean)

56
New cards

Variance

The average squared distance from the mean

57
New cards

Standard deviation

The square root of variance, measures spread

58
New cards

Bernoulli trials

Trials with two outcomes, constant probability, and independence

59
New cards

Geometric distribution

Probability the first success occurs on the nth trial

60
New cards

Binomial distribution

Probability of a certain number of successes in fixed trials

61
New cards

10% condition

The sample is less than 10% of the population

62
New cards

Normal model

A bell-shaped distribution defined by mean and standard deviation

63
New cards

Success/failure condition

np ≥ 10 and nq ≥ 10

64
New cards

z-score

The number of standard deviations a value is from the mean

65
New cards

Scatterplot

A graph that shows the association between two quantitative variables

66
New cards

Explanatory variable

The variable on the x-axis that explains or predicts

67
New cards

Response variable

The variable on the y-axis that responds or is predicted

68
New cards

Form

The overall shape of a scatterplot (linear or non-linear)

69
New cards

Direction

Whether the association is positive, negative, or unclear

70
New cards

Strength

How closely the points follow a pattern; less scatter means stronger association

71
New cards

Unusual features

Outliers, clusters, or gaps in a scatterplot

72
New cards

Correlation coefficient (r)

A unitless measure of the strength and direction of a linear relationship between −1 and 1

73
New cards

Linear condition

The requirement that a scatterplot be linear in order to use correlation

74
New cards

Correlation

A numerical description of linear association that must include r

75
New cards

Association

A relationship between variables that does not require mentioning r

76
New cards

Least squares regression line

The line that minimizes the sum of squared residuals and best fits the data

77
New cards

Regression equation

An equation that models the relationship between explanatory and response variables

78
New cards

Slope (b₁)

The expected change in y for each one-unit increase in x

79
New cards

Intercept (b₀)

The predicted value of y when x = 0

80
New cards

Residual

The difference between an observed value and a predicted value

81
New cards

Positive residual

Observed value is greater than predicted value

82
New cards

Negative residual

Observed value is less than predicted value

83
New cards

Slope formula

b₁ = r × SD(y) ÷ SD(x)

84
New cards

Intercept formula

b₀ = mean(y) − b₁(mean(x))

85
New cards

Residual plot

A graph of residuals versus explanatory variable

86
New cards

Good residual plot

Residuals randomly scattered around zero, indicating a linear model is appropriate

87
New cards

Coefficient of determination (R²)

The proportion of variability in y explained by x

88
New cards

Interpretation of R²

If R² = 0.80, then 80% of the variation in y is explained by x

89
New cards

Extrapolation

Using a regression line to predict values outside the range of observed data

90
New cards

Why extrapolation is risky

Predictions far from the mean of x are less reliable

91
New cards

High leverage point

A point with an x-value far from the mean of x

92
New cards

Outlier

A point with a y-value far from its predicted value

93
New cards

Influential point

A point that significantly affects the regression model

94
New cards

Non-influential point

A point that is neither high leverage nor an outlier

95
New cards

Grouped data point

A point representing an average of several values, reducing variability

96
New cards

Regression output (constant)

The y-intercept of the regression line

97
New cards

Regression output (coefficient)

The slope associated with the explanatory variable

98
New cards

s (standard deviation of residuals)

The average distance from observed points to their predicted values

99
New cards

Interpretation of s

The model’s predictions are typically within s units of the actual values