4.1 and 4.2 vocabulary

4.1 - Sampling and Surveys

The population in a statistical study is the entire group of individuals about which we want information. A census collects data from every individual in the population. A sample is the part of the population from which we actually collect information.

A sample survey is a study that collects data from a sample to learn about the population from which the sample was selected. We use information from a sample to draw conclusions about the entire population.

Convenience sampling selects individuals from the population who are easy to reach.

The design of a statistical study shows bias if it is very likely to underestimate or very likely to overestimate the value you want to know.

Voluntary response sampling allows people to choose to be in the sample by responding to a general invitation.

Random sampling involves using a chance process to determine which members of a population are included in the sample.

A simple random sample (SRS) of size n is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample.

How to Choose an SRS with Technology

Label. Give each individual in the population a distinct numerical label from 1 to N, where N is the number of individuals in the population.
Randomize. Use a random number generator to obtain n different integers from 1 to N, where n is the sample size.
Select. Choose the individuals that correspond to the randomly selected integers.

When sampling without replacement, an individual from a population can be selected only once. When sampling with replacement, an individual from a population can be selected more than once.

A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties:

Each entry in the table is equally likely to be any of the 10 digits 0 through 9
The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part.

Strata are groups of individuals in a population who share characteristics thought to be associated with the variables being measured in a study. Stratified random sampling selects a sample by choosing an SRS from each stratum and combining the SRSs into one overall sample.

A cluster is a group of individuals in the population that are located near each other. Cluster sampling selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample.

Systematic random sampling selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter.

Most large-scale sample surveys use multistage sampling, which combines two or more sampling methods.

Sampling errors come from the act of choosing a sample. Random sampling error and undercoverage are common types of sampling error. Undercoverage occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample. The list from which the sample is actually chosen is called the sampling frame.

Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate.

Response bias occurs when there is a systematic pattern of inaccurate answers to a survey question.

4.2 - Experiments

An observational study observes individuals and measures variables of interest but does not attempt to influence the responses.

A response variable measures an outcome of a study. An explanatory variable may help explain or predict changes in a response variable.

Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other.

An experiment deliberately imposes treatments (conditions) on individuals to measure their responses.

A placebo is a treatment that has no active ingredient but is otherwise like other treatments.

A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. An experimental unit is the object to which a treatment is randomly assigned. When the experimental units are human beings, they often are called subjects.

In an experiment, a factor is an explanatory variable that is manipulated and may cause a change in the response variable. The different values of a factor are called levels.

In an experiment, a control group is used to provide a baseline for comparing the effects of other treatments. Depending on the purpose of the experiment, a control group may be given an inactive treatment (placebo), an active treatment, or no treatment at all.

The placebo effect describes the fact that some subjects in an experiment will respond favorably to any treatment, even an inactive treatment.

In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject is receiving. In a single-blind experiment, either the subjects or the people who interact with them and measure the response variable don’t know which treatment a subject is receiving.

In an experiment, random assignment means that experimental units are assigned to treatments using a chance process.

In an experiment, control means keeping other variables constant for all experimental units.

In an experiment, replication means giving each treatment to enough experimental units so that a difference in the effects of the treatments can be distinguished from chance variation due to the random assignment.

The Basic Principles of Experimental Design

Comparison. Use a design that compares two or more treatments.
Random assignment. Use chance to assign experimental units to treatments (or treatments to experimental units). Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variables among the treatment groups.
Control. Keep other variables the same for all groups, especially variables that are likely to affect the response variable. Control helps avoid confounding and reduces variability in the response variable.
Replication. Giving each treatment to enough experimental units so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.

In a completely randomized design, the experimental units are assigned to the treatments completely at random.

A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a randomized block design, the random assignment of experimental units to treatments is carried out separately within each block.

A matched pairs design is a common experimental design for comparing two treatments that uses blocks of size 2. In some matched pairs designs, two very similar experimental units are paired and the two treatments are randomly assigned within each pair. In others, each experimental unit receives both treatments in a random order.