Stats Notes 1_3.pdf
Three Things to Understand Data
Variables have distinct types.
One variable is often used to explain another.
Data helps answer specific questions.
Cases and Variables
Definition of Cases
Cases are individual subjects or entities in a dataset (e.g., participants in a marathon).
Definition of Variables
Variables are characteristics measured for each case, usually represented as columns in a data table.
Types of Variables
Explanatory Variables
Variables used to explain variations in another variable (response variable).
Response Variables
Variables that are explained or predicted by explanatory variables.
Collecting Data Example
Involvement in Truckee Marathon
Information to be collected:
Finish Time
Event (Marathon/Half-Marathon)
Hometown
Age
Gender
Participant's Name
Organizing Data
Data Organization Techniques
Data can be organized on index cards or structured in a spreadsheet.
CSV File Example
Analyzing
TruckeeMarathons2017.csv
:Rows represent cases (participants).
Columns represent variables.
Explore potential questions from the dataset.
Types of Variables in Detail
Categorical Variables
Divides cases into distinct groups; each case belongs to one category (M/F).
Quantitative Variables
Measures a numerical quantity; operations like adding/averaging are relevant.
Ordinal Variables
A categorical data type with natural ordered categories, but inconsistent distances (e.g., star ratings).
Practical Considerations with Variables
Examples of Variables
Categorical but not ordinal: Gender
Ordinal: Rating scales (1-5 stars)
Problems to Consider
Averaging ordinal data, like Amazon ratings, may not be appropriate due to lack of meaningful distance between categories.
Case Study Example: Doctor's Office
Variables Collected
Insurance Company: Categorical
Weight: Quantitative
Height: Quantitative
Temperature: Quantitative
Pain Level: Ordinal
Age: Quantitative
Examining Correlations
Sleep and Grades Example
Observation: More sleep often correlates with better grades.
Definition of Explanatory and Response Variables
Explanatory Variable: Sleep (independent variable)
Response Variable: Grades (dependent variable)
Building Towards Analysis
Identifying Variables in Questions
Predict party affiliation from religious affiliation.
Compare depreciation rates between domestic and foreign cars.
Assess the effect of Tylenol on fever.
How heart rate affects systolic blood pressure.
Introduction to Sampling
Importance of Samples
Samples provide insights into populations.
Bias in Sampling
Bias can occur during sample collection, distorting results.
Random Sampling
Random sampling eliminates bias and ensures every unit has an equal chance of selection.
Population Definitions
Population: A Complete Set
A population includes all individuals or objects relevant to a study.
Sampling: Choosing a Subset
A sample is a subset selected to represent the entire population.
Statistical Inference Understanding
Definition of Statistical Inference
It involves using sample data to draw conclusions about a population.
Assessing Pros and Cons of Sampling Strategies
Type | Pros | Cons |
---|---|---|
Census | Complete Information | Difficulties in collection |
Sample | Easier to collect | Validity of population inference may vary |
Identifying Bias
Definition of Sampling Bias
Occurs when selection methods distort population representation.
Selection Bias
Introduced when randomization fails, leading to non-representative samples.
Preservation Bias Example
Understanding Survival Bias
Focus on visible subjects overlooks those that did not survive a selection process.
Examples
Dinosaur size estimates.
Cavemen lifestyle.
Armor placement on planes.
Participation and Non-Response Biases
Bias occurs if study participants disproportionately possess specific traits affecting outcomes.
Examples: Test preparation effects, mail surveys.
Designing Samples and Surveys Effectively
Biased Survey Questions
Questions framed to skew responses.
Example Questions
Views on school board funding.
Pet vaccination stance.
Random Sampling Techniques
Definition of Simple Random Sample
Every group of size n has the same chance of being chosen.
Random Selection Problem
Strategies for random selection (e.g., drawing slips from a hat).
Understanding Experimental Designs
Experimental vs. Observational Studies
Controlled Experiments: Researcher controls variables.
Observational Studies: Researcher observes without manipulation.
Association vs. Causation
Definitions
Association: Relationship between two variable values.
Causation: Actively changing one variable affects another.
Investigating Confounding Variables
Definition
Third variable influencing both the explanatory and response variables.
Designing Experiments Responsibly
Example Questions
Designing sleep impact experiments while considering ethics.
Placebos and Blinding in Experiments
Definitions
Placebo Effect: Perceived benefit from believing in treatment efficacy.
Blinding: Keeping participant or researcher unaware of treatment allocation.
Types of Randomized Experiments
Randomized Comparative Experiment
Random assignment to treatment groups for comparison.
Matched Pairs Experiment
Same case receives both treatments, analyzed for differences.
Example Problem
Design experiments for comparing poison ivy lotions.