Statistics Final Flashcards (ST 311 NCSU)

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/196

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

197 Terms

1
New cards

Statistics

the science of planning studies and experiments, obtaining data, and organizing, summarizing, analyzing, and interpreting those data and then drawing conclusions based on them

2
New cards

Conducting a statistical study includes 3 phases:

  1. Prepare: consider the population, data types, and sampling method

  2. Analyze: describe the data you collected and use appropriate statistical methods to help with drawing conclusions

  3. Conclude: using statistical inference, make reasonable judgements and answer broad questions

3
New cards

Data

collections of observations, such as measurements, counts, descriptions, or survey responses

4
New cards

Population

the complete collection of all measurements or data that are being considered. Typically, a population is the complete collection of all data we would like to better understand or describe. We also call it the population of interest

5
New cards

Sample

a subset of members selected from a population (random)

6
New cards

Parameter

a numerical measurement describing some characteristic of a population

7
New cards

Statistic

a numerical measurement describing some characteristic of a sample

8
New cards

Quantitative (numerical) data

consists of numbers representing counts or measurements (2 types: discrete or continuous)

9
New cards

Categorical data (qualitative)

consists of names or labels (NOT numbers)

10
New cards

Discrete data (quantitative)

result when the data values are quantitative and the number of values is finite or countable (ex: # of tosses of a coin before getting tails)

11
New cards

Continuous data (numerical)

result from infinitely many possible quantitative values where the collection of values is not countable (ex: the arm spans in inches of high school seniors)

12
New cards

Our goal is to answer a question about a ___

population

13
New cards

We want our sample to be random and ___ of the population

representative

14
New cards

Simple Random Sample (SRS)

A sample of n subjects is selected in such a way that every possible sample of the size size n has the same chance (probability) of being chosen

15
New cards

Stratified Sample

Subdivide the population into 2+ subgroups (or strata) so that the subjects in the same subgroups share the same characteristics. Then draw a sample from each subgroup. The number sampled from each stratum may be done proportionally with respect to population size.

16
New cards

Cluster Sample

Divide the population area into naturally occurring sections (or clusters), then randomly select some of these clusters and choose all the members for those selected clusters)

17
New cards

Systematic Sample

select some starting point and then select every kth element in the population. Works well when units are in the same order like an assembly line

18
New cards

Multistage sample

Collect data by using some combination of the basic sampling methods

19
New cards

Convenience Sampling

Select the first k # of subjects that you come across

20
New cards

Bad Sampling Frame

When attempting to list all members of a population, some subjects are missing. It can be difficult to make a complete list

21
New cards

Non-response bias

Some part of the population chooses not to respond, or subjects were selected but are not able to be contacted

22
New cards

Response bias

Responses to questions are not truthful. This may occur when people are unwilling to reveal personal matters, admit to illegal activity, or tailor their responses to “please” the investigator

23
New cards

Wording and Order Bias

The way questions are worded may be leading/inflammatory to elicit a response. Or the order of questions influences answers.

24
New cards

Measure of center

a value at or near the center or middle of a data set, “typical” values for a group

EX: mean, median, mode

25
New cards

Σ

denotes a sum, “sigma”

26
New cards

x

denotes individual data value

27
New cards

n

denotes # of values in a sample, “sample size”

28
New cards

N

denotes number of values in a population

29
New cards

denotes the same mean, “x bar”

30
New cards

μ

denotes the population mean, “mew”

31
New cards

Mean

found by adding all values and dividing by the number of values in the set. A sample mean is the mean of a sample. A population mean is the mean of an entire population.

32
New cards

Median

the value that is in the middle when listed in ascending order. Shows what # separates the bottom 50% of the data from the top 50%. Roughly half of all values are below, and half are above it.

33
New cards

Mode

the value that occurs with the greatest frequency. Could be no mode. One mode: unimodal, two modes: bimodal, 2+ modes: multimodal

34
New cards

Histogram

the graph of a frequency distribution, a graph of bars of equal width drawn adjacent to each other, a horizontal scale representing classes of quantitative data values, a vertical scale (height) represents frequency

<p>the graph of a frequency distribution, a graph of bars of equal width drawn adjacent to each other, a horizontal scale representing classes of quantitative data values, a vertical scale (height) represents frequency</p>
35
New cards

Dotplot

shows each value in a dataset as a dot above a number line

<p>shows each value in a dataset as a dot above a number line</p>
36
New cards

Measures of variation (or spread)

Range, IQR, variance, standard deviation

37
New cards

Range

max data value - min data value (highly affected by outliers)

38
New cards

Interquartile Range (IQR)

uses quartiles to provide a range of values that are not as affected by potential outliers as the range

(Q1, Q2, Q3)…1/4 of the data lies between 2 consecutive quartiles

IQR= Q3-Q1

39
New cards

3 IQR quartiles together with the min and max values constitutes the 5-number summary:

  1. minimum

  2. Q1 (median of the first half of the dataset)

  3. Median

  4. Q3 (median of the second half of the dataset)

  5. Maximum

<ol><li><p>minimum</p></li><li><p>Q1 (median of the first half of the dataset)</p></li><li><p>Median</p></li><li><p>Q3 (median of the second half of the dataset)</p></li><li><p>Maximum</p></li></ol><p></p>
40
New cards

Variance

(Standard deviation)²

41
New cards

Standard deviation

sqrt(variance)

Defined as a measure of how much data values deviate from the mean, the value of it is never negative, zero ONLY when data is all the same, larger values indicate greater amounts of variation, SD can increase a lot with one or more outliers, units of SD are the same as the units of the OG data values

42
New cards

Population variance

43
New cards

σ or s

standard deviation

44
New cards

sample variance

45
New cards

Experiment

the process of applying some treatment and then observing the effect

  • almost always compares 2+ groups: treatment and control group

  • the individuals in an experiment are called units

46
New cards

Control group

no treatment

47
New cards

Units

the individuals in an experiment

48
New cards

Observational study

the process of observing and measuring specific characteristics without attempting to modify the individuals studied

  • tell “what’s happening” and can’t describe cause-effect relationships

  • accessing reliable records counts as observational

49
New cards

Response variable

measures outcome of a study

50
New cards

explanatory variable

explains/influences changes in the response variable

51
New cards

Design of experiment

plan for collecting the sample

52
New cards

Treatment

a specific experimental condition applied to the units/subjects

53
New cards

Variability in Experiments

There will be variability from treatment effects, experimental error, lurking variables, and confounding variables

54
New cards

Treatment effects

different treatments cause different outcomes

55
New cards

Experimental error

variability among observed values of the response variable for units receiving some treatment, small as possible

56
New cards

Lurking variables

a variable not among the explanatory variables in a study but has impact

57
New cards

Confounding variables

2 variables confounded when the effects on the response variable can’t be distinguished

58
New cards

Principles of Experiment Design

Control, randomization, and replication

59
New cards

Control

Control the effects of lurking/confounding variables by carefully planning

60
New cards

Randomization

randomly assign experimental units to treatments to decrease bias

61
New cards

Replication

measure the effect of each treatment on many units to increase chance variation

62
New cards

Completely Randomized Design

participants randomly assigned to treatments, so lurking variables affect each group equally

63
New cards

Randomized Block Design

the experimenter divides participants into subgroups called blocks, so variability in blocks is less than between blocks. Then, part of each block are randomly assigned to treatment groups.

64
New cards

Matched Pairs Design

a special case of randomized block design; used when only 2 treatment groups are present. Participants grouped in pairs on one or more blocking variables. Then, in each pair, participants randomly assigned to different treatments

65
New cards

Placebo

false drug that subjects believe is real

66
New cards

Placebo effect

tendency to react to a drug/treatment regardless of function

67
New cards

Bias of Subjects

subjects may want to please researcher/hope for specific outcome (Hawthorne Effect, when people behave differently b/c they know they are being watched)

68
New cards

Bias of Researchers

people behave in ways that favor what they believe; researchers may assign subjects to groups/report results in a bias way

69
New cards

Blinding

when individuals in experiments are not aware of how subjects are assigned, so they are less likely to respond with bias

70
New cards

Single-blind study

those who could influence the results are blinded

71
New cards

Double-blind study

those who evaluate the results are blinded too

72
New cards

z-score

the number of standard deviations away from the mean a certain data value is

73
New cards

positive z-score

data value is above average

74
New cards

negative z-score

data value is below average

75
New cards

Standardizing

the process of converting a data value (often labeled x) to a z-score

76
New cards

𝑧 = (𝑥−𝜇) / 𝜎

converting x-value to z-score

77
New cards

Empirical Rule

When a distribution is bell-shaped/normal, the mean and standard deviation have the following relationship:

99.9% of the data is within 3 standard deviations of the mean, 95% of the data is within 2 SD’s, and 68% of the data is within 1 SD of the mean (34% is within -1 SD, 34% is within +1SD).

The 34, 14, 2.5 rule

78
New cards

Significantly low value

values are generally considered significant or unusual if they are (u-2a) or lower

79
New cards

Significantly high value

values are generally considered significant or unusual if they are |u + 2a | or higher

80
New cards

Values not significant

between (u-2a) and (u + 2a)

81
New cards

We will use a significance % of ___ as a general guide for significant values

5%

82
New cards

Density curve

If we scale the bell curve model so the area under the curve = 1

83
New cards

Probability, in a contin. prob. distri., is consequently the ____ the density curve.

area under

84
New cards

Probability Statement

P (small # </= x </= bigger #)

85
New cards

The graph of a normal distri. is called the

normal curve

86
New cards

In a normal curve…

The mean, median, and mode are EQUAL

The normal curve is bell-shaped and is symmetric on the mean..

The total area under the normal curve is EQUAL TO 1.

The normal curve approaches, but never touches, the x-axis as it extends further away from the mean.

87
New cards

Distribution of z-scores

Standard normal distribution

88
New cards

Notation

X ~ N(u, σ) where the ~ symbol reads “is distributed

89
New cards

The random variable X is distri. normally with mean u and SD σ and

Z ~ N(0,1)

90
New cards

Distribution

describes the possible values of a variable, how often they occur, and what pattern they create

91
New cards

Probability description

does the same thing as other distributions but describes how likely (instead of how often) the values of the variable are to occur)

92
New cards

Continuous Random Variable

has an uncountable number of possible outcomes, represented by an interval on the number line

93
New cards

Discrete Random Variable

has a finite or countable number of possible outcomes that can be listed. Countable refers to the fact that they might be infinitely many values, but they can be associated with a counting process.

94
New cards

Criteria for Binomial Distribution

  1. There are a fixed number of trials/observation. Labled n.

  2. The trials are independent (the outcome of any individual trial doesn’t affect the probabilities in the other trials)

  3. Each outcome can be classified as a success or failure. The outcome that a random variable is counting is labeled the success.

  4. The probability of a success is constant for each trial. The probability of success is denoted by P(S) = p.

95
New cards

Binomial Notation

X ~ Bin (n,p)

96
New cards

parameters of the distribution

number of trials (n), probability of success (p)

97
New cards

Expected Value

E(x), mean of a random variable

98
New cards

The expected value of a random variable is a ___

weighted mean of the outcomes

99
New cards

The expected value of a discrete random variable is equal to the ____ of the random variable

mean

100
New cards

Binomial Variance

σ² = n x p x q where q = 1-p