Studied by 3 people

5.0(1)

get a hint

hint

1

population

the entire group of individuals that is the target of our interest; generally too big to actually measure or observe

New cards

2

sample

subgroup of the population which we can examine or observe, measure and collect data from

New cards

3

individual

single entity that is being observed

New cards

4

variable

characteristic measured on each individual

New cards

5

quantitative variable

variable whose possible values are meaningful numbers

New cards

6

categorical variable

variable whose possible responses are non-quantitative categories (words/labels/attributes)

New cards

7

measurement

value of a variable for an individual

New cards

8

data

measurements for a set of individuals (Goal of Statistics: convert this to useful information)

New cards

9

data set

data identified with contextual information (who was observed, what was measured, why is study done) often given in a table

New cards

10

EDA (exploratory data analysis) goals

organize and summarize data

discover features, patterns and striking deviations

interpret patterns in context

include visual displays and numerical values

New cards

11

single variable pattern

distribution of a variable: summary of data one variable at a time (all the possible values and how often they occur)

New cards

12

process of statistical problem solving

Collect data

Summarize data

Interpret data

New cards

13

parameter

numerical fact about the variable in the population

New cards

14

statistic

numerical fact about the variable in the sample

New cards

15

convenience sampling

select individuals in the easiest possible way

New cards

16

volunteer response sampling

individuals select themselves

New cards

17

quota sampling

force the sample to meet specified quotas

New cards

18

simple random sample (SRS)

every possible set of a specified size has an equal chance of being selected

New cards

19

cluster sampling

a random sample of clusters is taken and all individuals in selected clusters are included in sample

New cards

20

stratified random sample

select a random sample (SRS) from each stratum and combine these SRSs together

New cards

21

multi-stage sample

take a sample at each hierarchical level of the population

New cards

22

treatment

the condition applied to a subject in an experiment (one of the subcategories/values of the explanatory variable)

New cards

23

lurking variables

variables that affect both the explanatory and response variables but are not measured or included as a planned factor in the study

New cards

24

control

an effort to reduce the effects of lurking variables

New cards

25

confounding

situation in which effects of lurking variables cannot be distinguished from effects of factors

New cards

26

historical comparison experiments

study involving only one treatment, where treated subjects are compared to untreated subjects from some external source

New cards

27

unreplicated experiments

assigns one subject only to each treatment

New cards

28

confounded experiments

treatment groups are handled differently in some way OTHER than the treatment

New cards

29

undercoverage

some individuals have no possibility of being selected

New cards

30

non-response

some selected individuals choose not to be in the sample because they refuse to provide information or cannot be contacted

New cards

31

misleading response

people lie or give inaccurate answers (often about sensitive issues)

New cards

32

interviewer effect

person asking questions influences responses (for in-person/phone surveys)

New cards

33

question order effect

the order that questions are asked promotes certain responses

New cards

34

question wording

the way a question asked leads, misleads or confuses

New cards

35

open questions

allow for almost unlimited possible responses (short answer), less restrictive but more difficult to analyze

New cards

36

closed questions

limit response options (multiple choice), easier to analyze but may be biased by the options provided. should include "other/unsure" option

New cards

37

observational studies

individuals are not assigned to treatments, are self selected, cannot conclude causation

New cards

38

experiment

study where individuals are assigned to treatments, causation okay if valid

New cards

39

subject

individual to which treatment is applied

New cards

40

response variable

characteristic measure on each subject; outcome of interest

New cards

41

explanatory variable

characteristic/measurement that is use to predict or explain changes in the response variable; variable we think could help us know about the response (measured earlier or more easily); independent variable

New cards

42

factor

planned explanatory variable

New cards

43

comparison

two or more groups; controls lurking variables by including comparison treatments

New cards

44

randomization

randomly assign subjects to groups; neutralizes effects of lurking variables by assigning subjects to treatments using a random device

New cards

45

replication

two or more subjects in each group; assign more that one subject to each treatment to detect important effects

New cards

46

double blinding

neither subjects nor the researchers in direct contact with the subjects know which treatment is received

New cards

47

placebo effect

favorable response of a human subject to a placebo because of trust in the medical provider or belief that the treatment will work

New cards

48

diagnostic bias

diagnosis of subjects is biased by preconceived notions about the effectiveness of the treatment (person administering treatments expects certain responses)

New cards

49

lack of realism

realism is compromised by the conditions of the study

New cards

50

hawthorne effect

people in experiment behave differently than they would normal behave, not like real life

New cards

51

non-compliance

subjects fail to submit to the assigned treatment or refuse to follow the protocol of the experiment

New cards

52

principles of data ethics

• safety and well-being of the subjects must be protected • all individuals must give their informed consent before data are collected • individual data must be kept confidential

New cards

53

randomized controlled experiment

randomly assign subjects to treatments, grouped by treatment

New cards

54

randomized block design

randomly assign to treatments within blocks, grouped by treatment or by block

New cards

55

benefits of randomized block design (RBD)

removes confounding of lurking variables

reduces chance variation by removing variation associated with the blocking variable

yields more precise estimates of chance variation

New cards

56

matched pairs

two treatments; matched individuals or two measurements per subject

New cards

57

three principles of experiments

randomly assign two treatments to two individuals or randomize the order of treatment application to each individual

replication = number of pairs

compare the two treatments

New cards

58

analysis of distribution of quantitative data

always plot data first

look for an overall pattern and for striking deviations

look at shape, center, spread of distribution

add numerical summaries to supplement graph

if pattern is regular, use mathematical model to describe data

New cards

59

symmetric and bell shaped distribution examples

blood pressure, IQ, biological factors

New cards

60

symmetric and bell shaped distribution

mean, median, and mode are the same

New cards

61

right skewed distribution

concentration of data on left, tail extends to the right; mean > median

New cards

62

right skewed distribution examples

salary, home price, children, economic variables

New cards

63

left skewed distribution

concentration of data on right and the tail on the left; median > mean

New cards

64

left skewed distribution examples

test scores, olympic high jump

New cards

65

bimodal distribution

a distribution with two modes

New cards

66

bimodal distribution examples

speed limits, restaurant patrons

New cards

67

flat or uniform distribution

relatively equal across graph

New cards

68

flat or uniform distribution examples

rolling a die, day of the month born

New cards

69

center

typical, middle value; half of data to each side

New cards

70

spread

consistency/inconsistency of data; look for maximum and minimum

New cards

71

outliers

values that are far outside most of data

is data point miscoded?

unusual conditions?

should data point be excluded?

New cards

72

mode

most frequently occurring score, corresponds to a peak

New cards

73

median

the middle score in a distribution; half the scores are above it and half are below it

New cards

74

mean

center of gravity; the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores

New cards

75

mean vs median

construct graph to evaluate skewness and outliers

use median if distribution is markedly skewed or outliers are present

use mean if distribution is roughly symmetric

New cards

76

range

maximum - minimum

New cards

77

interquartile range (IQR)

the difference between the first and third quartiles

New cards

78

standard deviation

average distance of values from the mean

New cards

79

first quartile (Q1)

a number for which 25% of the data is less than that number; same as the median of the data which are less than the overall median

New cards

80

second quartile (Q2)

median

New cards

81

third quartile (Q3)

a number for which 75% of the data is less than that number; same as the median of the part of the data which is greater than the median

New cards

82

5 number summary vs 2 number summary

use 5 number for skewed, and 2 number for symmetric

New cards

83

5 number summary

minimum, Q1, median, Q3, maximum

New cards

84

random phenomenon

individual outcome unpredictable, but outcomes from large number of repetitions follow regular pattern

New cards

85

sample space

the set of all possible outcomes

New cards

86

event

a collection of possible outcomes

New cards

87

probability of an outcome

The proportion of times that an outcome occurs in many, many repetitions of the random phenomenon

New cards

88

probability rules

0<P(A)<1

summation of all probabilities is 1

if two events cannot occur simultaneously, the probability of one or the other equals the sum of separate probabilities

probability of event not occurring equals one minus the probability of event occurring

New cards

89

theoretical probability

number of favorable outcomes divided by total number of possible outcomes

New cards

90

empirical probability

number of outcomes divided by total of repetitions

New cards

91

law of large numbers

As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the theoretical probability of the outcome

New cards

92

probability

the long-run relative frequency with which an event will occur

New cards

93

probability distribution

all possible events and their associated probabilities

New cards

94

random variable

a variable whose value is a numerical outcome of a random phenomenon

New cards

95

continuous random variable

a variable that can take on any possible value, all values cannot be listed

New cards

96

discrete random variable

variable whose possible values are a list of distinct values

New cards

97

𝜇

mean of a population

New cards

98

x-bar

mean of a sample

New cards

99

s

standard deviation of a sample

New cards

100

𝜎

standard deviation of a population

New cards