Stats Notes 1_3.pdf

Three Things to Understand Data

Variables have distinct types.
One variable is often used to explain another.
Data helps answer specific questions.

Cases and Variables

Definition of Cases

Cases are individual subjects or entities in a dataset (e.g., participants in a marathon).

Definition of Variables

Variables are characteristics measured for each case, usually represented as columns in a data table.

Types of Variables

Explanatory Variables

Variables used to explain variations in another variable (response variable).

Response Variables

Variables that are explained or predicted by explanatory variables.

Collecting Data Example

Involvement in Truckee Marathon

Information to be collected:
- Finish Time
- Event (Marathon/Half-Marathon)
- Hometown
- Age
- Gender
- Participant's Name

Organizing Data

Data Organization Techniques

Data can be organized on index cards or structured in a spreadsheet.

CSV File Example

Analyzing TruckeeMarathons2017.csv:
- Rows represent cases (participants).
- Columns represent variables.
- Explore potential questions from the dataset.

Types of Variables in Detail

Categorical Variables

Divides cases into distinct groups; each case belongs to one category (M/F).

Quantitative Variables

Measures a numerical quantity; operations like adding/averaging are relevant.

Ordinal Variables

A categorical data type with natural ordered categories, but inconsistent distances (e.g., star ratings).

Practical Considerations with Variables

Examples of Variables

Categorical but not ordinal: Gender
Ordinal: Rating scales (1-5 stars)

Problems to Consider

Averaging ordinal data, like Amazon ratings, may not be appropriate due to lack of meaningful distance between categories.

Case Study Example: Doctor's Office

Variables Collected

Insurance Company: Categorical
Weight: Quantitative
Height: Quantitative
Temperature: Quantitative
Pain Level: Ordinal
Age: Quantitative

Examining Correlations

Sleep and Grades Example

Observation: More sleep often correlates with better grades.

Definition of Explanatory and Response Variables

Explanatory Variable: Sleep (independent variable)
Response Variable: Grades (dependent variable)

Building Towards Analysis

Identifying Variables in Questions

Predict party affiliation from religious affiliation.
Compare depreciation rates between domestic and foreign cars.
Assess the effect of Tylenol on fever.
How heart rate affects systolic blood pressure.

Introduction to Sampling

Importance of Samples

Samples provide insights into populations.

Bias in Sampling

Bias can occur during sample collection, distorting results.

Random Sampling

Random sampling eliminates bias and ensures every unit has an equal chance of selection.

Population Definitions

Population: A Complete Set

A population includes all individuals or objects relevant to a study.

Sampling: Choosing a Subset

A sample is a subset selected to represent the entire population.

Statistical Inference Understanding

Definition of Statistical Inference

It involves using sample data to draw conclusions about a population.

Assessing Pros and Cons of Sampling Strategies

Type	Pros	Cons
Census	Complete Information	Difficulties in collection
Sample	Easier to collect	Validity of population inference may vary

Identifying Bias

Definition of Sampling Bias

Occurs when selection methods distort population representation.

Selection Bias

Introduced when randomization fails, leading to non-representative samples.

Preservation Bias Example

Understanding Survival Bias

Focus on visible subjects overlooks those that did not survive a selection process.

Examples

Dinosaur size estimates.
Cavemen lifestyle.
Armor placement on planes.

Participation and Non-Response Biases

Bias occurs if study participants disproportionately possess specific traits affecting outcomes.
Examples: Test preparation effects, mail surveys.

Designing Samples and Surveys Effectively

Biased Survey Questions

Questions framed to skew responses.

Example Questions

Views on school board funding.
Pet vaccination stance.

Random Sampling Techniques

Definition of Simple Random Sample

Every group of size n has the same chance of being chosen.

Random Selection Problem

Strategies for random selection (e.g., drawing slips from a hat).

Understanding Experimental Designs

Experimental vs. Observational Studies

Controlled Experiments: Researcher controls variables.
Observational Studies: Researcher observes without manipulation.

Association vs. Causation

Definitions

Association: Relationship between two variable values.
Causation: Actively changing one variable affects another.

Investigating Confounding Variables

Definition

Third variable influencing both the explanatory and response variables.

Designing Experiments Responsibly

Example Questions

Designing sleep impact experiments while considering ethics.

Placebos and Blinding in Experiments

Definitions

Placebo Effect: Perceived benefit from believing in treatment efficacy.
Blinding: Keeping participant or researcher unaware of treatment allocation.

Types of Randomized Experiments

Randomized Comparative Experiment

Random assignment to treatment groups for comparison.

Matched Pairs Experiment

Same case receives both treatments, analyzed for differences.

Example Problem

Design experiments for comparing poison ivy lotions.