Variables have distinct types.
One variable is often used to explain another.
Data helps answer specific questions.
Cases are individual subjects or entities in a dataset (e.g., participants in a marathon).
Variables are characteristics measured for each case, usually represented as columns in a data table.
Variables used to explain variations in another variable (response variable).
Variables that are explained or predicted by explanatory variables.
Information to be collected:
Finish Time
Event (Marathon/Half-Marathon)
Hometown
Age
Gender
Participant's Name
Data can be organized on index cards or structured in a spreadsheet.
Analyzing TruckeeMarathons2017.csv
:
Rows represent cases (participants).
Columns represent variables.
Explore potential questions from the dataset.
Divides cases into distinct groups; each case belongs to one category (M/F).
Measures a numerical quantity; operations like adding/averaging are relevant.
A categorical data type with natural ordered categories, but inconsistent distances (e.g., star ratings).
Categorical but not ordinal: Gender
Ordinal: Rating scales (1-5 stars)
Averaging ordinal data, like Amazon ratings, may not be appropriate due to lack of meaningful distance between categories.
Insurance Company: Categorical
Weight: Quantitative
Height: Quantitative
Temperature: Quantitative
Pain Level: Ordinal
Age: Quantitative
Observation: More sleep often correlates with better grades.
Explanatory Variable: Sleep (independent variable)
Response Variable: Grades (dependent variable)
Predict party affiliation from religious affiliation.
Compare depreciation rates between domestic and foreign cars.
Assess the effect of Tylenol on fever.
How heart rate affects systolic blood pressure.
Samples provide insights into populations.
Bias can occur during sample collection, distorting results.
Random sampling eliminates bias and ensures every unit has an equal chance of selection.
A population includes all individuals or objects relevant to a study.
A sample is a subset selected to represent the entire population.
It involves using sample data to draw conclusions about a population.
Type | Pros | Cons |
---|---|---|
Census | Complete Information | Difficulties in collection |
Sample | Easier to collect | Validity of population inference may vary |
Occurs when selection methods distort population representation.
Introduced when randomization fails, leading to non-representative samples.
Focus on visible subjects overlooks those that did not survive a selection process.
Dinosaur size estimates.
Cavemen lifestyle.
Armor placement on planes.
Bias occurs if study participants disproportionately possess specific traits affecting outcomes.
Examples: Test preparation effects, mail surveys.
Questions framed to skew responses.
Views on school board funding.
Pet vaccination stance.
Every group of size n has the same chance of being chosen.
Strategies for random selection (e.g., drawing slips from a hat).
Controlled Experiments: Researcher controls variables.
Observational Studies: Researcher observes without manipulation.
Association: Relationship between two variable values.
Causation: Actively changing one variable affects another.
Third variable influencing both the explanatory and response variables.
Designing sleep impact experiments while considering ethics.
Placebo Effect: Perceived benefit from believing in treatment efficacy.
Blinding: Keeping participant or researcher unaware of treatment allocation.
Random assignment to treatment groups for comparison.
Same case receives both treatments, analyzed for differences.
Design experiments for comparing poison ivy lotions.
Stats Notes 1_3.pdf
Variables have distinct types.
One variable is often used to explain another.
Data helps answer specific questions.
Cases are individual subjects or entities in a dataset (e.g., participants in a marathon).
Variables are characteristics measured for each case, usually represented as columns in a data table.
Variables used to explain variations in another variable (response variable).
Variables that are explained or predicted by explanatory variables.
Information to be collected:
Finish Time
Event (Marathon/Half-Marathon)
Hometown
Age
Gender
Participant's Name
Data can be organized on index cards or structured in a spreadsheet.
Analyzing TruckeeMarathons2017.csv
:
Rows represent cases (participants).
Columns represent variables.
Explore potential questions from the dataset.
Divides cases into distinct groups; each case belongs to one category (M/F).
Measures a numerical quantity; operations like adding/averaging are relevant.
A categorical data type with natural ordered categories, but inconsistent distances (e.g., star ratings).
Categorical but not ordinal: Gender
Ordinal: Rating scales (1-5 stars)
Averaging ordinal data, like Amazon ratings, may not be appropriate due to lack of meaningful distance between categories.
Insurance Company: Categorical
Weight: Quantitative
Height: Quantitative
Temperature: Quantitative
Pain Level: Ordinal
Age: Quantitative
Observation: More sleep often correlates with better grades.
Explanatory Variable: Sleep (independent variable)
Response Variable: Grades (dependent variable)
Predict party affiliation from religious affiliation.
Compare depreciation rates between domestic and foreign cars.
Assess the effect of Tylenol on fever.
How heart rate affects systolic blood pressure.
Samples provide insights into populations.
Bias can occur during sample collection, distorting results.
Random sampling eliminates bias and ensures every unit has an equal chance of selection.
A population includes all individuals or objects relevant to a study.
A sample is a subset selected to represent the entire population.
It involves using sample data to draw conclusions about a population.
Type | Pros | Cons |
---|---|---|
Census | Complete Information | Difficulties in collection |
Sample | Easier to collect | Validity of population inference may vary |
Occurs when selection methods distort population representation.
Introduced when randomization fails, leading to non-representative samples.
Focus on visible subjects overlooks those that did not survive a selection process.
Dinosaur size estimates.
Cavemen lifestyle.
Armor placement on planes.
Bias occurs if study participants disproportionately possess specific traits affecting outcomes.
Examples: Test preparation effects, mail surveys.
Questions framed to skew responses.
Views on school board funding.
Pet vaccination stance.
Every group of size n has the same chance of being chosen.
Strategies for random selection (e.g., drawing slips from a hat).
Controlled Experiments: Researcher controls variables.
Observational Studies: Researcher observes without manipulation.
Association: Relationship between two variable values.
Causation: Actively changing one variable affects another.
Third variable influencing both the explanatory and response variables.
Designing sleep impact experiments while considering ethics.
Placebo Effect: Perceived benefit from believing in treatment efficacy.
Blinding: Keeping participant or researcher unaware of treatment allocation.
Random assignment to treatment groups for comparison.
Same case receives both treatments, analyzed for differences.
Design experiments for comparing poison ivy lotions.