1/69
Flashcards covering key vocabulary from Chapter 8 (Sample Surveys & Experiments), Chapter 2 (Displaying Categorical Data), Chapter 3 (Displaying Quantitative Data & Describing Distributions Numerically), and Chapter 4 (Regression Scatterplots).
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Population
The entire group of individuals that we want information about.
Sample
A subset of the population that we actually examine to gather information about the population.
Voluntary Response Sampling
A sampling method where people choose themselves to be included (e.g., webpolls, call-in polls).
Convenience Sampling
A sampling method where individuals are chosen because they are the easiest to reach.
Simple Random Sampling (SRS)
A sampling method where each member of the population has an equal chance of being included.
Systematic Sampling
A sampling method where every nth item from the population is chosen.
Cluster Sampling
A sampling method where the population is divided into groups, then random clusters are selected, and all individuals within those selected clusters are measured.
Stratified Sampling
A sampling method where the population is first divided into groups, and then a Simple Random Sample (SRS) is taken from each group.
Multistage Random Sampling
A sampling method that combines a variety of other sampling methods.
Biased Sample
A sample where each member of the population does not have an equal chance of being selected.
Undercoverage
A problem in sampling where the entire targeted population is not included in the design of the sample.
Non-response
A problem in sampling where an individual selected cannot be contacted or refuses to cooperate.
Response Bias
A problem in sampling where responses are influenced by the interviewer.
Retrospective Study (Observational)
An observational study that looks backward in time.
Prospective Study (Observational)
An observational study that looks forward in time.
Control (Experimental Design)
A principle of experimental design involving managing experimental conditions for all treatment groups to prevent lurking variables from biasing results.
Random Assignment (Experimental Design)
A principle of experimental design stating that experimental units must be randomly assigned to treatments.
Replication (Experimental Design)
A principle of experimental design involving repeating a study to reduce chance variation in results.
Placebo or Control (Experimental Design)
A principle of experimental design requiring the use of a dummy treatment or a standard comparison group as one of the treatments.
Double-blind (Medical Experiments)
A principle of experimental design for medical experiments where neither the participant nor the researcher taking measurements knows who received which treatment.
Experimental Units/Subjects
The individuals being studied in an experiment.
Treatment (Experiment)
A specific condition applied to the subjects in an experiment.
Factors (Experiment)
The explanatory (independent) variables that are thought to influence the response (outcome/dependent) variable studied, often combined at specific values (levels) to form a treatment.
Lurking Variable
A variable not among the explanatory or response variables, but which influences the interpretation of their relationship.
Confounding Variable
Additional explanatory variables that affect the response but are not considered when exploring the explanatory/response relationship.
Placebo
A dummy treatment used in experiments.
Double-blind Experiment
An experiment where neither the participant nor the researcher taking measurements knows who had which treatment.
Single-blind Experiment
An experiment where the participants do not know which treatment they have been assigned.
Statistically Significant
An observed effect so large that it would rarely occur by chance.
Completely Randomized Design
An experimental design where subjects are randomly assigned to different treatment groups.
Matched Pairs Design
An experimental design where subjects are paired according to variables that affect the response and then randomly assigned to treatments within pairs.
Block Design
An experimental design where blocks of similar subjects are formed and then randomly assigned to treatment groups within each block.
Bar Graphs
Graphs used to display one categorical variable.
Pie Charts
Graphs used to display one categorical variable, showing proportions of a whole.
Contingency Table
A table used to display the relationship between two categorical variables.
Joint Proportions
Values found by dividing each cell frequency by the overall total in a contingency table.
Conditional Proportion
Values found by first conditioning upon a category (which becomes the denominator) and then dividing the cell frequency by this denominator.
Histograms
Graphs used to display the distribution of quantitative variables.
Stem-plots (and split-stem plots)
Graphs used to display quantitative variables, showing the shape and individual data points.
Time Plots
Graphs used to display quantitative variables over time, showing trends.
Interpreting Quantitative Graphs
Analyzing graphs by evaluating their Shape, Center, Spread, and Outliers.
Graph Shapes (Quantitative)
Descriptions of the distribution of data, including Symmetry, Skewness (Left or Right), Bimodal, Unimodal, or Bell-Shaped.
Mean
The average value of a dataset.
Median
The middle value of a dataset when observations are ordered from smallest to largest.
Variance
A measure of the spread or variability of the data, the average of the squared differences from the mean.
Standard Deviation
A measure of the spread or variability of the data, calculated as the square root of the variance.
First Quartile (Q1)
The middle value of the smallest half of the data.
Third Quartile (Q3)
The middle value of the largest half of the data.
Five Number Summary
A set of five values that describe the distribution of data: Minimum, Q1, Median, Q3, and Maximum.
Boxplot
A graphical display created using the five number summary to show the distribution and potential outliers of quantitative data.
Modified Boxplot
A boxplot that specifically indicates outliers, often identified using rules like 1.5IQR or 3IQR.
Resistant Measures
Statistical measures (like median and quartiles) that are not significantly affected by outliers or skewness in the data.
Non-resistant Measures
Statistical measures (like mean and standard deviation) that are significantly affected by outliers or skewness in the data.
Z-score
A standardized score (Z = (X - µ) / σ) used to compare values from two different normal distributions, indicating how many standard deviations a value is from the mean.
Explanatory (x) Variable
The independent variable in a scatterplot, thought to influence the response variable.
Response (y) Variable
The dependent variable in a scatterplot, thought to be influenced by the explanatory variable.
Scatterplot
A graph that displays the relationship between two quantitative variables.
Form (of Scatterplot)
Describes the overall pattern of the relationship in a scatterplot, such as Linear, Curved, or Clusters.
Direction (of Scatterplot)
Describes whether the relationship between variables is a positive association (both increase) or a negative association (one increases as the other decreases).
Strength (of Scatterplot)
Describes how closely the points in a scatterplot lie to a simple form, such as a line.
Outliers (in Scatterplot)
Extreme observations in a scatterplot that deviate from the overall pattern.
Correlation Coefficient (r)
A numerical measure (+1 to -1) that quantifies the strength and direction of a linear relationship between two quantitative variables.
Regression Line
A line that describes how a response variable y changes as an explanatory variable x changes, used for interpretation of slope, predictions, and residual calculations.
R² (Coefficient of Determination)
The square of the correlation coefficient, which measures the predictive power of the regression equation. It represents 'The percentage of variability in Y that is explained by the regression line'.
Residual
The error in prediction, calculated as the observed y-value minus the predicted y-value (observed y – predicted y).
Negative Residual
Indicates that the prediction made by the regression line was too high compared to the observed value.
Positive Residual
Indicates that the prediction made by the regression line was too low compared to the observed value.
Residual Plot
A graph plotting residuals against the explanatory variable (x). A pattern in this plot (e.g., fanning, curvature) suggests that the linear regression line is not a good fit.
Extrapolation
Making predictions outside of the range for which there is available data, which can be unreliable.
Correlation does not imply Causation
A caution in regression analysis, warning that simply because two variables are correlated, it does not mean that one causes the other, as lurking variables may be involved.