data
facts that convey information; set values or characters with some information about those (ex. yearly average global surface air temperatures)
population
contains all of the items or individuals of interest that you seek to study
sample
a subset of the target population from which conclusions about the target population will be drawn (ex. 20 WPI students)
census
to obtain observations from every sampling unit in the population
sample unit
an entity on which measurements or observations can be made (ex. a single student in WPI)
target population
a collection of sampling units about which we want to draw conclusions (ex. all WPI students)
sampling frame
a list of all sampling units in the target population (ex. a list of all WPI student ID’s'; phone numbers)
sampling design
how you get your sample; a pattern, arrangement or method used for selecting a sample of sampling units from the target population (ex. simple random sample)
sampling plan
the operational plan, including the sampling design, for actually obtaining or accessing the sampling units for the study (ex. interview, survey, phone interview)
experimental unit
a physical entity that is the primary unit of interest in a specific research objective (ex. the test sheets at the ink company)
response
a measurement or observation of interest that is made on an experimental unit (ex. improper inked area in mm^2)
factor
a quantity that is thought to influence the response (experimental & nuisance)
experimental factor
a factor that is purposely varied by the experimenter (ex. ink flow setting; pressure plate setting)
nuisance factor
a factor that cannot be controlled by the experimenter. May or may not be known to the experimenter (ex. temperature of the room; quality of the paper)
factor level
a value assumed by a factor in an experiment (ex. ink flow- high/low; pressure plates- high/medium/low)
treatments
the combinations of levels of experimental factors for which the response will be observed (ex. low+low, low+med, low+high, etc.)
effect
the change in the average response between two factor levels or between two combinations of factor levels (ex. the effect of high pressure plate setting(23.025) over medium pressure plate setting(25) is 23.025-25= -1.975)
controlled experiment
study in which treatments are imposed on experimental units in order to observe a response (ex. keeping ink flow the same; maintaining temperature)
non-probability samples
items included are chosen without regard to their probability of occurence
convenience sampling
items are selected based only on the fact that they are easy, inexpensive, or convenient to sample
judgment sample
get opinions of preselected experts in the subject matter (ex. study of an illness- a doctor)
probability sample
method of choosing a sample using a pre-specified chance of mechanism
simple random sampling
each possible sample has the same chance of selection; good if units are homogeneous and easily assessed (ex. table of random numbers; computer random number generators)
SRS with replacement
selected individuals are returned to frame for possible reselection
SRS without replacement
selected individual is NOT returned to the frame
stratified random sampling
units are divided into distinct strata (homogeneous subgroups), and a simple random sample taken separately in each subgroup (ex. break school up by major, then SRS)
cluster sampling
population is divided into “k” groups called clusters, researchers randomly select n clusters to include in sample (ex. randomly choose 5 flights that depart that day and survey all passengers)
single-stage cluster sampling
all elements within the selected clusters are included in the sample
multi-stage cluster sampling
sample is taken in stages (ex. a sample of 300 counties in US is chosen, then a sample of townships within each of selected counties is chosen, then a sample of city blocks within selected townships, then sample of households is chosen)
systematic samples
designed to observe each “k”th individual in a process (ex. suppose you take a systematic sample of n=40 for the population of N=800 employees. Subgroups will each contain k=20 employees. Select a random number 1-20. Suppose you chose 8. Selections are 8, 28, 48, 68…788)
sampling errors
error obtained when, due to chance, a sample quantity give results different than the analogous population quantity. Not a result of a mistake, and will occur even when everything in the sampling plan has been done correctly
non-sampling errors
result of being unable to sample from entire population, misleading or false measurements (ex. selection bias, nonresponse bias, response bias)
selection bias
bias that results from an unrepresentative sample (ex. radio station asking listeners if they like that radio station)
response bias
occurs in studies of human populations when questions are phrased in a manner which is difficult to understand or in a fashion that makes a particular answer seem more desirable to the respondent (ex. asking if someone is using drugs)
nonresponse bias
failure to collect data on all items in the sample; cannot assume that a person who did not respond is similar to those who did
variable
the name given to what is being measured, counted, or observed when data are collected (ex. height, weight, GPA)
parameter
summarizes the value of a population for specific variable (ex. average height of a man in the USA)
statistics
summarizes the value of a specific variable for sample data (ex. average height of a man in MA)
categorical variable
qualitative; variables take categories as their values (ex. eye color)
numerical variables
quantitative; variables have values that represent a counted or measurement quantity (ex. height 120, 130, 170)
Ordinal
There is order and ranking in the variable (ex. freshman, sophmore, junior, senior military ranking)
Nominal
No order or ranking is implied (ex. eye color)
Interval
Values can drop below zero (ex. temperature)
Ratio
Values cannot drop below zero (ex. number of pets, weight)
Descriptive statistics
methods that primarily help summarizing and presenting data
Inferential statistics
Methods that use data collected from a sample to reach conclusions about a population
Discrete variables
variables such as number of family members, or pets
Continuous variables
variables such as height or weight