Looks like no one added any tags here yet for you.
individual/unit
the objects described by a set of data, may be people but also be may animals or things
variable
any characteristic of an individual, can take different values for different individuals
categorical variable
places an individual into one of several groups or categories
numerical/quantitative variables
takes numerical values for which arithmetic operations such as adding and averaging make sense.
population
in a statistical study is the entire group of individuals about which we want information
sample
part of the population from which we actually collect information and is used to draw conclusions about the whole
sample survey
survey some group of individuals by studying only some of its members, selected not because they are of special interest but because they represent the larger group
kind of observational study
response variable
a variable that measures an outcome or result of a study
observational study
observes individuals and measures variables of interest but does not intervene the responses. purpose is to describe some group or situation
census
a sample survey that attempts to include the entire population in the sample
experiment
deliberately imposes some treatment on individuals in order to observe their response. purpose is to study whether the treatment causes a change in the the response
biased
systematically favors certian outcomes
convenience sampling
selection of whichever individuals are easiest to reach
voluntary response sample
chooses itself by responding to general appeal. write-in or call-in opinion polls are examples of this
convenience sampling and voluntary response sample are often…
biased
simple random sampling (SRS)
of size “n” consists of “n” individuals from the population chosen in such a way that every set of “n” individuals has an equal chance to be the sample actually selected
a SRS gives…
each individual an equal chance to be chosen (avoiding bias0 and every possible sample an equal chance to be chosen
what are the different ways to obtain a simple random sample?
names in a hat, random integer generator, table of random digits
what are two steps to obtain a simple random sample?
1) assign a numerical label to every individual in the population
2) use random digits to select labels at random
parameter
a number that describes the population. it is a fixed number, but in practice we don’t know the actual value of this number because we cannot access the entire population
statistic
a number that describes a sample. the value can be determined and is known once we have taken a sample, but its value can change from sample to sample. often used to estimate an unknown parameter
variability
describes how the values of the sample statistic will vary when we take many samples
what does large variabilty mean?
the result of the sampling is not repeatable
what does a good sampling method have?
both small bias and small variability
to reduce bias…
use random sampling
to reduce the variability of an SRS…
use a larger sample
large samples…
almost always give an estimate that is close to the truth
margin of error
represents the natural sampling variability
to cut the margin of error in half…
we must use a sample four times as large
confidence statement
has two parts: a margin of error and a level of confidence. the margin of error says how close the sample statistic lies to the population parameter. the level of confidence says what percentage of all possible samples satisfy the margin of error
the conclusion of a confidence statement…
always applies to the population, not to the sample
our conclusion about the population is…
never completely certain
a sample survey can choose…
to use a confidence level other than 95%
a smaller margin of error with the same confidence?
take a larger sample
the variability of a statistic from a random sample is essentially…
unaffected by the size of the population as long as the population is at least 20 times larger than the sample
confidence statement
to say how accurate our conclusions about the population are
sampling errors
errors caused by the act of taking a sample. they cause sample results to be different from the results of a census of the population
random sampling error
the variation due to chance in choosing a random sample. the margin of error in a confidence statement includes only random sampling error
nonsampling errors
errors not related to the act of selecting a sample from the population. they can be present even in a census
undercoverage
occurs when some groups in the population have no chance of being included in the sample
sampling frame
list of individuals from which we will draw our sample
nonresponse
the failure to obtain data from an individual selected for a sample. most happens because subjects can’t be contacted or because some subjects who are contacted refuse to cooperate
what is the most serious problem facing sample surveys?
nonresponse
processing errors
mistakes in mechanical tasks such as doing arithmetic or entering responses into a computer
multiple inclusions
occur if some population members appear multiple times in the sampling frame so that they have a higher chance of being sampled
erroneous inclusions
can occur if the frame includes units that are not in the population of interest so that the invalid units have a chance of being in the sample
probability sample
a sample chosen by chance. we must know what samples are possible and what chance, or probability, each possible sample has
frame errors
can occur because the sampling frame is not an accurate representation of the population
response errors
incorrect answers by respondents
what are examples of nonsampling errors?
processing errors, response errors,
explanatory variable
a variable that we think explains or causes changes in the response variable
subjects
individuals studied in an experiment
treatment
any specific experimental condition applied to the subjects
lurking variable
a variable that has an important effect on the relationship among the variables in a study but is not one of the explanatory variables
confounded
when two variables’ effects on a response cannot be distinguished from each other. may be explanatory variables or lurking variables
placebo
dummy treatment with no active ingredients
double-blind
an experiment in which neither the subjects nor the physicians recording the symptoms know which treatment was received
randomized comparitive experiment
one that compares just two treatments
control group
placebo group, comparing the treatment and control group allows us to control the effects of lurking variables
what are the basic principles of statistical design of experiments?
control the effects of lurking variables, randomize, and use enough subjects
statistically significant
an observed effect of a size that would rarely occur by chance
a good comparative study…
measures and adjusts for confounding variables
discrete data
data that is counted
continuous data
unit of measure
qualitative
descriptive
completely randomized experimental design
all the experimental subjects are allocated at random among all treatments
matched pairs design
compares just two treatment. choose pair of subjects that are as closely matched as possible. assign one of the treatments to each subject in a pair by tossing a coin or reading odd and even digits
block
group of experimental subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments
block design
the random assignment of subjects to treatments is carried out separately within each block
Hawthorne Effect
the tendency of some people to work harder and perform better when they are participants in an experiment. individuals may change their behavior due to the attention they are receiving from researchers rather than because of any manipulation of independent variables
what is the most common weaknesss in experiments?
we can’t generalize the conclusions widely
measure
a property of a person or thing when we assign a value to represent the property
instrument
used to make a measurement
units
used to record the measurements
valid
measure of property if it is relevant or appropriate as a representation of that property
rate
(a fraction, proportion, or percentage) at which something occurs is a more valid measure than a simple count of occurrences
predictive validity
it can be used to predict success on tasks that are related to the property measured
reliable
if the random error is small
random error
repeated measurements on the same individual give different results
a reliable measurement process has what variance?
a small variance
the average of several repeated measurements of the same individual is more or less reliable?
more reliable (less variable) than a single measurement
what to look for when determining if the numbers make sense?
missing information, inconsistencies, incorrect arithmetic, implausible, too regular or agree too well, hidden agenda
distribution of a variable
tells us what values it takes and how often it takes these values
pie chart
show how a whole is divided into parts
bar graph
all bars are the same width and do not touch each other
can also compare the size of the categories that are not parts of one whole
what does the height of the bars in a bar graph represent?
amount of people in that category
pie charts and bar graphs show the distribution of?
categorical variables
pictogram
a bar graph in which pictures replace the bars
line graph
used to display how a quantitative variable changes over time
time on x-axis
trend
a long-term upward or downward movement over time
seasonal variation
a pattern that repeats itself at known regular intervals of time
seasonally adjusted
the expected seasonal variation is removed before the data are published
linear correlation coefficient
r, it measures the strength and direction of linear relationship between x and y
R squared
coefficient of determination, measures the explanatory ability of the line of best fit
skewed right
bump is on the left and it tapers off to the right
skewed left
bump is on the right and it tapers off to the left
symmetrical
peak in the middle
uniform
bars are the same height
to describe the overall pattern of a distribution:
describe the center and the variability and describe the shape of the histogram with a few words
histogram
bars touch each other, quantitative variables