Last saved 114 days ago
AP

AP stats unit 4 flashcards

robot
knowt logo

AP stats unit 4 flashcards

4.1 Samples and Surveys


Population- The population in a statistical study is the entire group of individuals about which we want information.


Sample- the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population.


BIAS:

The design of a statistical study shows bias if it systematically favors certain outcomes. [You must discuss direction when discussing bias.]  [Failure to use random sampling often results in bias, or systematic errors in the way the sample represents the population.]


Convenience Sample - The researcher chooses a sample that is readily available in some non-random way. [Using Convenience Sample is biased as it is using a method that will consistently overestimate or underestimate the results] 


Voluntary Response Sample - people who choose themselves by responding to a general appeal. Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond. 


Undercoverage Bias - Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.


Non-response bias - Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate.


Response Bias - situations where people do not answer questions truthfully for some reason. (A systematic pattern of incorrect responses in a sample survey leads to response bias.)


Wording Bias - occurs when the way a question is phrased influences the responses given by participants, leading to skewed or inaccurate results


Simple random sample- is obtained by randomly selecting individuals from a target population.


stratified random sample- involves dividing the population into separate strata, based on shared characteristics or attributes. 


Cluster sampling- you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample. Select all people in a that cluster. Not some in the cluster (why diff from Stratified RS)


Sampling Error - Sampling errors come from the act of choosing a sample.


 Sample survey-selects a sample from the population of all individuals about which we desire information. 

Systematic random sample - Selected systematically at regular intervals from a randomly ordered list of the population, ensuring that each member has an equal chance of being included in the sample. (ex: every tenth person gets chosen)

Multistage sample - breaks down area of population in at least 2 different situations.





4.2 EXPERIMENTS 

observational study- An observational study observes individuals and measures variables of interest but does not attempt to influence the responses.


Experimental study - An experiment deliberately imposes some treatment on individuals to measure their responses.


Lurking variable- a variable that is not among the explanatory or response variables in a study but that may influence the response variable.


Confounding Variable- occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other.


When our goal is to understand cause and effect, experiments are the only source of fully convincing data.


An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response.


Here is the basic vocabulary of experiments:


Treatment- A specific condition applied to the individuals in an experiment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables.


Subjects- The experimental units are the smallest collection of individuals to which treatments are applied. When the units are human beings.


Placebo - is a “dummy pill” or inactive treatment that is indistinguishable from the real treatment.


random assignment - means that experimental units are assigned to treatments at random, that is, using some sort of chance process.


Comparative experiment - some units receive one treatment and similar units receive another. Most well-designed experiments compare two or more treatments. (effects of two or more treatments are compared)

Completely randomized design - the treatments are assigned to all the experimental units completely by chance. (each of the experimental units (subjects) is assigned to one random treatment.


control group-that receives an inactive treatment or an existing baseline treatment.





Three Principles of Experimental Design

Randomized comparative experiments are designed to give good evidence that differences in the treatments actually cause the differences we see in the response.


1. Control for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered.


2. Random assignment: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren’t controlled on the treatment groups.


3. Replication: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.


Block - a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. (ONLY EXPERIMENT; NOT SAMPLE)


Randomized block design - the random assignment of experimental units to treatments is carried out separately within each block. [Form blocks based on the most important unavoidable sources of variability (lurking variables) among the experimental units. Randomization will average out the effects of the remaining lurking variables and allow an unbiased comparison of the treatments.]


Matched-pairs design - are a common form of blocking for comparing just two treatments. In some matched pair designs, each subject receives both treatments in a random order.

Chance is used to determine which unit in each pair gets each treatment. [Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both treatments. Since the order of the treatments can influence the response, chance is used to determine with treatment is applied first for each unit.] (EACH SUBJECT GIVEN BOTH TREATMENT)


statistically significant - An observed effect so large that it would rarely occur by chance. A statistically significant association in data from a well-designed experiment does imply causation. [” When a finding is significant, it simply means you can feel confident that’s it real, not that you just got lucky (or unlucky) in choosing the sample.] 

4.3 Using Studies Wisely


Explanatory variable - the variable being manipulated (independent variable)

Response variable - concept, idea or quantity that someone wants to measure. (dependent variable)

experiment units = subjects

Population parameter: a numerical value that describes a characteristic of an entire population.

Population proportion: fraction of the population that has a certain characteristic.

The Challenges of Establishing Causation

When we can’t do an experiment, we can use the following criteria for establishing causation.

• The association is strong.

• The association is consistent.

• Larger values of the explanatory variable are associated

with stronger responses.

• The alleged cause precedes the effect in time.

• The alleged cause is plausible.


Inference for Experiments

Tips for sound experiments: 

  1. Determine the treatments

  2. Describe how you would assign treatments to the experimental units (this includes a full description of how you would randomly assign them)

  3. Explanation of what you would measure and how you would measure it.


Inference- Well-designed experiments randomly assign individuals to treatment groups. Most experiments don’t select experimental units at random which limits such experiments to inference about cause and effect. Observational studies don’t randomly assign individuals to groups, which rules out inference about cause and effect. Observational studies that use random sampling can make inferences about the population.

• Lack of realism in an experiment can prevent us from generalizing its results.


Basic Data Ethics

• All planned studies must be reviewed in advance by an

institutional review board charged with protecting the safety and

well-being of the subjects.

• All individuals who are subjects in a study must give their

informed consent before data are collected.

• All individual data must be kept confidential. Only statistical

summaries for groups of subjects may be made public.