1/25
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Observational Study
Study in which the person conducting the study observes characteristics of a sample selected from oner or more existing populations.
Experiment
intentional effort toinfluence individuals in a study (cause and effect)
Parameter
Number that describes an entire population “all”
Statistic
Number that describes a sample is called (usually number or percentage)
Simple Random Sampling
Equal chance of winning (lottery system)
Ex.. place all student ID numbers in a bin and ramdly select 600 blugolds for your sample
Stratified Sampling
Split group then select from each group
ex.. put all names with the same last digit into ten different bins (one for each digit, 0-9).Then randomly select 50 names from each bin for your sample.
Systematic Sampling
every kth person
ex.. start with the 4th ID number use every 15th student for your sample
Cluster Sampling
split group then select entire group
ex.. put all names with the same last digit into ten different bins (one for each digit, 0-9). Then randomly select a single bin and use all students in that bin for your sample.
Convenience Sampling
easiest to reach (nearby)
ex.. have a table at Blus org bash and log the opinions of hte first 600 blugolds that stop by
Qualitative (categorical) - Graphical Methods for Describing Data
classify individuals usually words (SSN)
Quantitative (Numerical) - Graphical Methods for Describing Data
numbers - operations like addition & subtraction
Discrete (numerical)
countable # of outcomes (the number of people in a class, test questions answered correctly, and home runs hit, tables, or information displayed in columns and rows, and graphs)
Continuous (numerical)
many outcomes, uncountable (Height, weight, temperature and length)
Measures of Center
Mean: add all the # then divide by amount
Median: middle # from lowest to highest
Mode: no mode
ex.. if a team payroll increases —> mean = change, increases median = no change
Resistant
resistant describes a measure or method that is not strongly affected by extreme values (outliers)
range does not equal to resistant
median is resistant, mean is not
Range
is a measure of dispersion: measures how spread out the data are.
maximum value − minimum value
Variance (Measure of Dispersion/Variability/Spread)
the average of the squared distances from the mean, the value of the standard deviation squared
Standard Deviation (Measure of Dispersion/Variability/Spread)
the square root of the variance, showing the typical distance from the mean
Z-scores
the # of standard deviations a data value is from the mean (no units)
mean (sample & population)= 0
standard deviation (sample & population) = 1
Percentiles
If a value in a data set represents the kth percentile then k% of the values in the data set are located at or below the value
Quartiles First Three and (IQR) | Outliers
The three quartiles
Q1 (First Quartile)
The median of the lower half of the data
25% of the data fall below Q1
Q2 (Second Quartile)
The median of the data
50% of the data fall below it
Q3 (Third Quartile)
The median of the upper half of the data
75% of the data fall below Q3
Interquartile Range (IQR) = measure of variability that is resistant to the effects outliers
IQR = Q3 − Q1
Measures the spread of the middle 50% of the data
Resistant to outliers
Outliers: A value is an outlier if it is:
Less than:
Q1−1.5×IQRQ1 - 1.5 \times IQRQ1−1.5×IQR
Greater than:
Q3+1.5×IQRQ3 + 1.5 \times IQRQ3+1.5×IQR
5 Number summary
Min, lower quartile, median, upper quartile, maximum
Residual
vertical distance from point to line
(actual response - predicted response)
Coefficient of Determination
measures the percentage of total variation in the response variable that is explained by the least squares regression line. (r²)
Influential Observation
a data point that has a large impact on the results of an analysis, especially on a regression line or correlation.
An observation is influential if removing it would noticeably change:
the slope of the regression line
the intercept
the correlation (r)
Residual Plots
a residual plot is a graph that shows the residuals on the vertical axis and the explanatory (x) variable on the horizontal axis.
Pattern in residual plot | What it means |
|---|---|
Random scatter around 0 | Linear model is appropriate |
Curve or systematic pattern | Nonlinear relationship, linear model not appropriate |
Increasing/decreasing spread | Heteroscedasticity (variance not constant) |
Points far from zero | Possible outliers |