ST 311 Final

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/103

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

104 Terms

1
New cards

Statistics (the science)

The science of planning studies and experiments, obtaining data, and organizing,

summarizing, analyzing, and interpreting those data and then drawing conclusions based on them

2
New cards

1st step of conducting a statistical analysis

Prepare: Consider the population, data types, and sampling method

3
New cards

2nd step of conducting a statistical analysis

Analyze: Describe the data you collected and use appropriate statistical methods to help with drawing conclusions

4
New cards

3rd step of conducting a statistical analysis

Conclude: Using statistical inference, make reasonable judgments and answer broad questions

5
New cards

Data

Collections of observations, such as measurements, counts, descriptions, or survey responses

6
New cards

Population

The complete collection of all data that we would like to better understand or describe

7
New cards

Sample

A subset of members selected from a population

8
New cards

Parameter

a numerical measurement describing some characteristic of a population

9
New cards

Statistic

a numerical measurement describing some characteristic of a sample

10
New cards

Quantitative Data

consists of numbers representing counts or measurements

11
New cards

Qualitative/Categorical Data

consists of names or labels (not numbers that represent counts or measurements)

12
New cards

Discrete Data

result when the data values are quantitative and the number of values is finite or “countable.”

13
New cards

Continuous Data

result from infinitely many possible quantitative values, where the collection of values is not countable.

14
New cards

Biased Samples

samples are more likely to produce some outcomes than others. The resulting statistics may be too high or too low

15
New cards

Convenience Samples

easy to collect, often have some bias or do not represent the population in general.

16
New cards

Volunteer Responses

a self-selected sample of people who respond to a general appeal

17
New cards

Simple Random Sample (SRS)

A sample of n subjects is selected in such a way that every possible sample of the

same size n has the same chance (or probability) of being chosen.

18
New cards

Stratified Sample

Subdivide the population into at least two different subgroups (or strata) so that the subjects within the same subgroup share the same characteristics. Then draw a sample from each subgroup (or stratum). The number sampled from each stratum may be done proportionally with respect to the size of the population.

19
New cards

Cluster Sample

Divide the population area into naturally occurring sections (or clusters) then randomly select some of those clusters and choose all the members from those selected clusters.

20
New cards

Systematic Sample

Select some starting point and then select every kth element in the population. This works

well when units are in some order (assembly lines, houses on a block, etc.).

21
New cards

Multistage Sample

Collect data by using some combination of the basic sampling methods.

22
New cards

Bad Sampling Frame

When attempting to list all members of a population, some subjects are missing. It

can be difficult to obtain a full, complete list.

23
New cards

Undercoverage

The sampling frame is missing groups from the population or the groups have smaller

representation in the sample than in the population.

24
New cards

Non-response Bias

Some part of the population chooses not to respond, or subjects were selected but

are not able to be contacted.

25
New cards

Response Bias

Responses given to questions or surveys are not truthful. This may occur when people

are unwilling to reveal personal matters, admit to illegal activity, or otherwise tailor their responses to “please” the investigator.

26
New cards

Wording and Order

The way questions are worded may be leading or inflammatory to elicit a particular

response. The order in which questions are asked may influence the answers.

27
New cards

x-bar

the sample mean

28
New cards

Mu

the population mean

29
New cards

Mean

  • Uses every data value

  • Highly affected by outliers

  • Not good for skewed data sets (but is best for symmetric data!)

30
New cards

Median

  • Not affected by outliers

  • Can use with any data set

31
New cards

Mode

  • Not necessarily in the center

  • Not affected by outliers

  • Only useful for multimodal or qualitative data

32
New cards

Histogram

horizontal scale representing classes of quantitative data values, and a vertical scale represents frequency.

33
New cards

Dotplot

shows each value in a dataset as a dot above a number line, no y-axis

34
New cards

Standard Deviation (sigma)

a measure of how much data values deviate from the mean

35
New cards

Variance

Standard Deviation squared

36
New cards

Experiment

The process of applying some treatment and then observing its effects is called an experiment. Has a control group and a treatment group.

37
New cards

Observational Study

The process of observing and measuring specific characteristics without attempting to

modify the individuals being studied

38
New cards

Response Variable

measures an outcome of a study

39
New cards

Explanatory Variable

explains or influences changes in the response variable

40
New cards

Reasons for variability in responses

  • Treatment effects

  • Experimental error

  • Confounding variables

  • Lurking variables

41
New cards

Control in Experimental Design

control the effects of lurking/confounding variables and other sources of variability on the

response by carefully planning the study

42
New cards

Randomization in Experimental Design

randomly assign experimental units to treatments to reduce or eliminate bias

43
New cards

Replication in Experimental Design

measure the effect of each treatment on many units to reduce chance variation in the

results

44
New cards

Completely Randomized Design

participants are randomly assigned to treatments (including control

groups)

45
New cards

Randomized Block Design

the experimenter divides participants into subgroups called blocks, such that the variability within blocks is less than the variability between blocks. Then, participants within each block are randomly assigned to treatment groups

46
New cards

Matched Pairs Design

used when the experiment has only two treatment groups; and participants can be grouped into pairs, based on one or more blocking variables. Then, within each pair, participants are randomly assigned to different treatments.

47
New cards

Bias of the Subjects

subjects may want to please the researcher or hope for a specific outcome

48
New cards

Hawthorne Effect

When people behave differently because they know they are being watched

49
New cards

Bias of the Researcher

They may assign subjects to groups or report results in a biased way, and may treat people or animals differently when holding certain expectations of their treatment

50
New cards

Blinding

when individuals associated with an experiment are not aware of how subjects have been assigned

51
New cards

Single Blind Study

those who could influence the results are blinded

52
New cards

Double Blind Study

those who evaluate the results are blinded as well as those who influence

53
New cards

z-score

the number of standard deviations away from the mean a certain data value is

54
New cards

Positive z-score

data value is above average

55
New cards

Negative z-score

data value is below average

56
New cards

Standardizing

The process of converting a data value (x) to a z-score

57
New cards

Significantly low values

considered significant or unusual if they are (µ − 2σ) or lower

58
New cards

Significantly high values

considered significant or unusual if they are (µ + 2σ) or higher

59
New cards

Values not significant

Between (µ − 2σ) and (µ + 2σ)

60
New cards

Density Curve

Probability is represented by the area underneath it

61
New cards

Normal Distribution properties

  • Mean, median, and mode are equal

  • Normal curve is bell-shaped and symmetric about the mean

  • Total area under the curve is equal to 1

  • Normal curve approaches, but never touches, the x-axis

62
New cards

Standard Normal Distribution

distribution of z-scores

63
New cards

Percentile

finding x-values when given probability, solve with z-score formula: 𝑥 = 𝜇 + 𝑧𝜎

64
New cards

Probability Distribution

describes how likely the values of the variable are to occur

65
New cards

Binomial Random Variable four criteria

  • There are a fixed number of trials/observations (n)

  • The trials are independent of each other

  •  Each outcome is either a success (s), the outcome being counted, or a failure (f)

  • The probability of a success P(S) = p is constant for each trial

66
New cards

Summarize the shapes of Binomial Distributions

  • For small n, the shape tends to be skewed

  • As n increases, we see more bell-shaped/symmetric distributions (for any p).

  • When p is closer to 0 or 1, the shape starts to skew

67
New cards

p

population proportion

68
New cards

p-hat

sample proportion

69
New cards

In order to look at the distribution of a statistic, we need to know

the possible values of the random variable and how likely they are to occur

70
New cards

Standard Error

The standard deviation of the sample mean, gets smaller the larger the sample size is

71
New cards

Point Estimate of a Parameter

the value of the sample statistic that corresponds to that parameter

72
New cards

Level of Confidence (C)

the probability that the interval estimate contains / captures the population parameter

73
New cards

Confidence Interval (CI)

a range/interval of values used to estimate the true value of a population parameter

74
New cards

Margin of Error (MOE)

tells us the amount of random sampling error in our results and how far we might be off

75
New cards

How to narrow a confidence interval

Decrease the confidence level

76
New cards

Standard error gets smaller as the sample size

increases

77
New cards

Null Hypothesis

H0: Only claims using =. We assume the equality value in the null hypothesis is true and conduct the test under this assumption. 

78
New cards

Alternative Hypothesis

HA:  The complement of the null. Only strict inequalities may be used in the alternative

79
New cards

Type I Error

if the null hypothesis is rejected when it is actually true

80
New cards

Type II Error

if the null hypothesis is not rejected when it is actually false

81
New cards

Left-Tailed Test

we are only interested in showing that the parameter is less than a particular value

82
New cards

Right-Tailed Test

we are only interested in showing that the parameter is more than a particular value

83
New cards

Two-Tailed Test

we are interested in showing that the parameter is not equal to a particular value (less than or more than)

84
New cards

P-value (probability value)

the probability of observing this value or something more extreme, under the assumed distribution of the null hypothesis

85
New cards

If the p-value is less than α

reject the null

86
New cards

If the p-value is greater than α

fail to reject the null

87
New cards

If 0 is not included in the confidence interval for the difference of means

Then the means are significantly different

88
New cards

Confidence intervals and 2-sided hypothesis tests are

equivalent

89
New cards

Correlation Coefficient: r

a measure of the strength and the direction of a linear relationship between two variables

90
New cards

Strong r values

values greater than 0.8 or smaller than -0.8

91
New cards

Moderate r values

values between 0.5 and 0.8 or -0.5 and -0.8

92
New cards

Weak r values

values between -0.5 and 0.5 (closer to 0)

93
New cards

Residual

observed y (points) - predicted y (points on line)

94
New cards

Regression line

Best fitting straight line of the sample data

95
New cards

β0

intercept of the population regression model and is the expected value (mean) of Y when x=0

96
New cards

β1

the slope of the population regression model, and is the expected change in Y relative to one unit change in x

97
New cards

The smaller the SSE (sum of squares error),

the better the line fits

98
New cards

Condition 1 for Linear Regression: Linear data

If the data do have a linear association/correlation, then a linear regression model is not a good choice

99
New cards

Condition 2 for Linear Regression: Constant Variance

The errors/deviations around the regression line should be the same at each value of x

100
New cards

Coefficient of Determination: r-squared

the proportion of observed y variation that can be explained by the simple  linear regression model