statistics 121 test 1

studied byStudied by 15 people
4.7(3)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 102

flashcard set

Earn XP

Description and Tags

Statistics

103 Terms

1
population
the entire group of individuals that is the target of our interest; generally too big to actually measure or observe
New cards
2
sample
subgroup of the population which we can examine or observe, measure and collect data from
New cards
3
individual
single entity that is being observed
New cards
4
variable
characteristic measured on each individual
New cards
5
quantitative variable
variable whose possible values are meaningful numbers
New cards
6
categorical variable
variable whose possible responses are non-quantitative categories (words/labels/attributes)
New cards
7
measurement
value of a variable for an individual
New cards
8
data
measurements for a set of individuals (Goal of Statistics: convert this to useful information)
New cards
9
data set
data identified with contextual information (who was observed, what was measured, why is study done) often given in a table
New cards
10
EDA (exploratory data analysis) goals
  • organize and summarize data

  • discover features, patterns and striking deviations

  • interpret patterns in context

  • include visual displays and numerical values

New cards
11
single variable pattern
distribution of a variable: summary of data one variable at a time (all the possible values and how often they occur)
New cards
12
process of statistical problem solving
  1. Collect data

  2. Summarize data

  3. Interpret data

New cards
13
parameter
numerical fact about the variable in the population
New cards
14
statistic
numerical fact about the variable in the sample
New cards
15
convenience sampling
select individuals in the easiest possible way
New cards
16
volunteer response sampling
individuals select themselves
New cards
17
quota sampling
force the sample to meet specified quotas
New cards
18
simple random sample (SRS)
every possible set of a specified size has an equal chance of being selected
New cards
19
cluster sampling
a random sample of clusters is taken and all individuals in selected clusters are included in sample
New cards
20
stratified random sample
select a random sample (SRS) from each stratum and combine these SRSs together
New cards
21
multi-stage sample
take a sample at each hierarchical level of the population
New cards
22
treatment
the condition applied to a subject in an experiment (one of the subcategories/values of the explanatory variable)
New cards
23
lurking variables
variables that affect both the explanatory and response variables but are not measured or included as a planned factor in the study
New cards
24
control
an effort to reduce the effects of lurking variables
New cards
25
confounding
situation in which effects of lurking variables cannot be distinguished from effects of factors
New cards
26
historical comparison experiments
study involving only one treatment, where treated subjects are compared to untreated subjects from some external source
New cards
27
unreplicated experiments
assigns one subject only to each treatment
New cards
28
confounded experiments
treatment groups are handled differently in some way OTHER than the treatment
New cards
29
undercoverage
some individuals have no possibility of being selected
New cards
30
non-response
some selected individuals choose not to be in the sample because they refuse to provide information or cannot be contacted
New cards
31
misleading response
people lie or give inaccurate answers (often about sensitive issues)
New cards
32
interviewer effect
person asking questions influences responses (for in-person/phone surveys)
New cards
33
question order effect
the order that questions are asked promotes certain responses
New cards
34
question wording
the way a question asked leads, misleads or confuses
New cards
35
open questions
allow for almost unlimited possible responses (short answer), less restrictive but more difficult to analyze
New cards
36
closed questions
limit response options (multiple choice), easier to analyze but may be biased by the options provided. should include "other/unsure" option
New cards
37
observational studies
individuals are not assigned to treatments, are self selected, cannot conclude causation
New cards
38
experiment
study where individuals are assigned to treatments, causation okay if valid
New cards
39
subject
individual to which treatment is applied
New cards
40
response variable
characteristic measure on each subject; outcome of interest
New cards
41
explanatory variable
characteristic/measurement that is use to predict or explain changes in the response variable; variable we think could help us know about the response (measured earlier or more easily); independent variable
New cards
42
factor
planned explanatory variable
New cards
43
comparison
two or more groups; controls lurking variables by including comparison treatments
New cards
44
randomization
randomly assign subjects to groups; neutralizes effects of lurking variables by assigning subjects to treatments using a random device
New cards
45
replication
two or more subjects in each group; assign more that one subject to each treatment to detect important effects
New cards
46
double blinding
neither subjects nor the researchers in direct contact with the subjects know which treatment is received
New cards
47
placebo effect
favorable response of a human subject to a placebo because of trust in the medical provider or belief that the treatment will work
New cards
48
diagnostic bias
diagnosis of subjects is biased by preconceived notions about the effectiveness of the treatment (person administering treatments expects certain responses)
New cards
49
lack of realism
realism is compromised by the conditions of the study
New cards
50
hawthorne effect
people in experiment behave differently than they would normal behave, not like real life
New cards
51
non-compliance
subjects fail to submit to the assigned treatment or refuse to follow the protocol of the experiment
New cards
52
principles of data ethics
• safety and well-being of the subjects must be protected
• all individuals must give their informed consent before data are collected
• individual data must be kept confidential
New cards
53
randomized controlled experiment
randomly assign subjects to treatments, grouped by treatment
New cards
54
randomized block design
randomly assign to treatments within blocks, grouped by treatment or by block
New cards
55
benefits of randomized block design (RBD)
  • removes confounding of lurking variables

  • reduces chance variation by removing variation associated with the blocking variable

  • yields more precise estimates of chance variation

New cards
56
matched pairs
two treatments; matched individuals or two measurements per subject
New cards
57
three principles of experiments
  • randomly assign two treatments to two individuals or randomize the order of treatment application to each individual

  • replication = number of pairs

  • compare the two treatments

New cards
58
analysis of distribution of quantitative data
  • always plot data first

  • look for an overall pattern and for striking deviations

  • look at shape, center, spread of distribution

  • add numerical summaries to supplement graph

  • if pattern is regular, use mathematical model to describe data

New cards
59
symmetric and bell shaped distribution examples
blood pressure, IQ, biological factors
New cards
60
symmetric and bell shaped distribution
mean, median, and mode are the same
New cards
61
right skewed distribution
concentration of data on left, tail extends to the right; mean > median
New cards
62
right skewed distribution examples
salary, home price, children, economic variables
New cards
63
left skewed distribution
concentration of data on right and the tail on the left; median > mean
New cards
64
left skewed distribution examples
test scores, olympic high jump
New cards
65
bimodal distribution
a distribution with two modes
New cards
66
bimodal distribution examples
speed limits, restaurant patrons
New cards
67
flat or uniform distribution
relatively equal across graph
New cards
68
flat or uniform distribution examples
rolling a die, day of the month born
New cards
69
center
typical, middle value; half of data to each side
New cards
70
spread
consistency/inconsistency of data; look for maximum and minimum
New cards
71
outliers

values that are far outside most of data

  • is data point miscoded?

  • unusual conditions?

  • should data point be excluded?

New cards
72
mode
most frequently occurring score, corresponds to a peak
New cards
73
median
the middle score in a distribution; half the scores are above it and half are below it
New cards
74
mean
center of gravity; the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores
New cards
75
mean vs median
  • construct graph to evaluate skewness and outliers

  • use median if distribution is markedly skewed or outliers are present

  • use mean if distribution is roughly symmetric

New cards
76
range
maximum - minimum
New cards
77
interquartile range (IQR)
the difference between the first and third quartiles
New cards
78
standard deviation
average distance of values from the mean
New cards
79
first quartile (Q1)
a number for which 25% of the data is less than that number; same as the median of the data which are less than the overall median
New cards
80
second quartile (Q2)
median
New cards
81
third quartile (Q3)
a number for which 75% of the data is less than that number; same as the median of the part of the data which is greater than the median
New cards
82
5 number summary vs 2 number summary
use 5 number for skewed, and 2 number for symmetric
New cards
83
5 number summary
minimum, Q1, median, Q3, maximum
New cards
84
random phenomenon
individual outcome unpredictable, but outcomes from large number of repetitions follow regular pattern
New cards
85
sample space
the set of all possible outcomes
New cards
86
event
a collection of possible outcomes
New cards
87
probability of an outcome
The proportion of times that an outcome occurs in many, many repetitions of the random phenomenon
New cards
88
probability rules
  • 0<P(A)<1

  • summation of all probabilities is 1

  • if two events cannot occur simultaneously, the probability of one or the other equals the sum of separate probabilities

  • probability of event not occurring equals one minus the probability of event occurring

New cards
89
theoretical probability
number of favorable outcomes divided by total number of possible outcomes
New cards
90
empirical probability
number of outcomes divided by total of repetitions
New cards
91
law of large numbers
As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the theoretical probability of the outcome
New cards
92
probability
the long-run relative frequency with which an event will occur
New cards
93
probability distribution
all possible events and their associated probabilities
New cards
94
random variable
a variable whose value is a numerical outcome of a random phenomenon
New cards
95
continuous random variable
a variable that can take on any possible value, all values cannot be listed
New cards
96
discrete random variable
variable whose possible values are a list of distinct values
New cards
97
𝜇
mean of a population
New cards
98
x-bar
mean of a sample
New cards
99
s
standard deviation of a sample
New cards
100
𝜎
standard deviation of a population
New cards
robot