1/35
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Statistics
A body of principles for designing the process of data collection
and making inferences about the population from information in the sample
Data analysis steps
Set clearly defined goals for study
Make a plan of what data to collect & how to collect it
Collect the data
Data summary & preliminary analysis
Apply appropriate methods for formal data analysis
Interpret info & draw conclusions
Observational units (cases)
the entity from which we observe and measure characteristics, horizontal in a dataset
Variables
the characteristics that have been measured or observed, vertical in a data set
Wide format
Repeated measurements or variables as distinct columns
Long format
Repeated measurements or variables as distinct column
Missing data is usually handled by….
listwise deletion, removes entire case/row from a dataset
Continuous variable
Quantitative NON-COUNTABLE variable that takes value in any interval, infinite number of values within given range. Ex. temperature of patients w/ flu, length, mass, price, time
Discrete variable
Quantitative COUNTABLE variable that takes value in a set interval. Ex. counts, age (if you take it in years)
Ordinal variable
Qualitative variable that characteristics have an order. Ex. do you agree scale 1-5
Nominal variable
Qualitative variable that characteristics have no order. Ex. blood type, movie ratings
Population of units vs population vs sample
Sample: subset of measurements from population that is actually collected
Population: set of all measurements or record of some qualitative trait corresponding to each unit in the collection of units
Census: look at every unit in the population
Population of units: the collection of units in which we have scientific interest—what we normally mean by like “population”
Ex. Population of units: all the undergrad students @cornell; population: height of all undergrad students @cornell; sample: height of undergrad students taking ILRST/STSCI2100—continuous
Ex. Population of units: all the trees in Sapsucker woods, population: species of all trees in sapsucker woods, sample: species of all trees within 100 feet of sapsucker woods pond—nominal
Sample size (n)
the number of observations in the sample
Types of bias (selection, measurement/response, nonresponse)
Selection bias: bias incurred when a sample systematically excludes some part of the population (ex. only picking the first few rows)
Measurement or response bias: bias incurred when a method of observation produces values diff from the true value of the observational unit (ex. thermometer giving wonky values)
Nonresponse bias: bias from when data is not collected from all observational units selected for the study
Observational study
A study that observes characteristics of an existing population.
Simple random sample
A sample selected in a way that gives every different sample of size n an equal chance of being selected.
Stratified sampling
Dividing a population into subgroups (strata) and then taking a separate random sample from each stratum.
Cluster sampling
Dividing a population into subgroups (clusters) and forming a sample by randomly selecting clusters and including all individuals or objects in the selected clusters in the sample.
1 in k systematic sampling
A sample selected from an ordered arrangement of a population by choosing a starting point at random from the first k individuals on the list and then selecting every kth individual thereafter.
Confounding variable
A variable that is related both to group membership and to the response variable.
Extraneous variable
A variable that is not an explanatory variable in the study but is thought to affect the response variable.
Direct control
Holding extraneous variables constant so that their effects are not confounded with those of the experimental conditions.
Blocking
Using extraneous variables to create groups that are similar with respect to those variables and then assigning treatments at random within each block, thereby filtering out the effect of the blocking variables.
Replication
A strategy for ensuring that there is an adequate number of observations on each experimental treatment.
Placebo treatment
A treatment that resembles the other treatments in an experiment in all apparent ways but that has no active ingredients.
Control group
A group that receives no treatment.
Single-blind experiment
An experiment in which the subjects do not know which treatment they received but the individuals measuring the response do know which treatment was received, or an experiment in which the subjects do know which treatment they received but the individuals measuring the response do not know which treatment was received.
Double-blind experiment
An experiment in which neither the subjects nor the individuals who measure the response know which treatment was received.
Simple Random Sample
Sampling method where 1) every set of n units of the population has same chance of being sampled and 2) selections are independent
typically generated in R with sample()
Sampling with/without replacement
Sampling without replacement: Selecting units so that once a unit is chosen it cannot be chosen again (common in SRS).
Sampling with replacement: Selecting units where chosen units are put back and can be selected again.
R example. SRS: sample(1:103, 11) picks 11 unique integers from 1 to 103 (SRS without replacement).
Pros/cons of SRS
Simple to analyze; equal chance for each unit; conceptually straightforward.
Can be costly or logistically hard; may still give high variability or unbalanced samples by chance.
Observational study
A study where the researcher does not control exposures/treatments; they observe naturally occurring conditions—harder to establish causality bc of confounding variables.
Experiment
Study where researcher applies treatments under controlled conditions & randomizes assignments to evaluate causal effects
Response variable
Main outcome measured in a study; dependent variable (ie. amount of bacteria)
Treatment (factor)
A condition applied in an experiment (ie. packaging conditions: a, b, c, d)
Experimental unit
The physical entity to which a treatment is assigned (the unit of randomization).