Section A ๐ โ Sampling, Hypothesis, Probabilities, Presentation of Data
Measures of Central Tendency
Mean
Sum of all values divided by the number of values
Formula: Mean = Total/Number of Values
Median
Middle value when data is ordered
If even number of values, median is the average of the two central values
Mode
Most frequently occurring value in a dataset
Measures of Spread
Range
Difference between the highest and lowest values
Formula: \text{Range} = \text{Max} - \text{Min}
Variance
Average of the squared differences from the mean
Standard Deviation
Square root of the variance
Formula: \sigma = \sqrt{\frac{\sum{(x - \bar{x})^2}}{n}}
Probability of an Event
Formula: P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
Complementary Events
Formula: P(\text{not } E) = 1 - P(E)
Discrete Probability Distributions
Probability mass function (pmf)
Example: Rolling a fair die
Continuous Probability Distributions
Probability density function (pdf)
Example: Heights of people
Bar Chart
Used for categorical data
Histogram
Used for continuous data divided into bins
Pie Chart
Shows proportions of a whole
Box Plot
Displays distribution based on quartiles
Can also reveal skewness
Can include/exclude outliers
Cumulative Frequency Graph
Displays points graphed from a table
Used to find data of 5-point summary (akin to box plots)
Scatter Plots
Displays relationship (strength & direction) between two variables as graphed data points.
Key words for describing scatter plot data include positive, negative, no correlation, strong, moderate, weak.
Also make sure to note if there are or not any present possible outliers.
Frequency Tables
Organize data into categories with frequency counts
Grouped Frequency Tables
Data divided into ranges or intervals
Random Sampling
Every member of the population has an equal chance of being selected
Systematic Sampling
Selecting every nth member from a list
Stratified Sampling
Population divided into subgroups, sample taken from each subgroup
Cluster Sampling
Population divided into random groups (clusters), then the clusters to collect data from are randomly selected
Quota Sampling
Non-probability, balanced method in which the population is divided into groups (quotas) based on categories (i.e., age, gender, etc.) to ensure each quota has an equal size.
Sample
A subset of the population used to estimate characteristics of the whole
Population
The entire group being studied
State Hypotheses
Null hypothesis
H_0: \text{No effect or difference}
Alternative hypothesis \newline H_1: \text{There is an effect or difference}
Choose Significance Level
Commonly \alpha = 0.05\ (5\%)
Calculate test statistic and compare with critical value
Reject or fail to reject the null hypothesis
P-Value
The probability that the results/observations areย
Low p-value (< .05) โ strong evidence against null evidence indicating a significant effect of observations
High p-value (โฅ .05) โ weak/insufficient evidence against null hypothesis indicating observations are likely a coincidence/random chance
Type I Error
A false positive, meaning that you falsely reject a true null hypothesis.
Type II Error
A false negative, meaning that you fail to reject a false null hypothesis.
In real life scenarios, Type II Errors commonly result in more serious/dangerous errors than Type I.ย
On the other hand, the two non-error/correct decisions are:
Rejecting a false null hypothesis
Not rejecting a true null hypothesis.ย
Notes!
Never be too affirmative โ we never accept a null hypothesis; only reject or not reject.
The language of statistics is very precise in the real world as well as on the exam.ย ย
t-Test
t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}
\bar{x} = sample mean
\mu = population mean
{s} = sample std. deviation
{n} = sample size
Chi-Square Test
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
{O_i} = observed frequency in category {i}
{E_i} = expected frequency in category {i}
Check Formula Sheet
Mean, median, and mode โ measures of central tendency.
Standard deviation โ provides insights for data variability.
Probability rules โ help w/ understanding likelihoods.
Graphs and tables โ useful for presenting and interpreting data clearly.
Sampling methods โ used for reliability of data.
Measures of Central Tendency
Mean
Sum of all values divided by the number of values
Formula: Mean = Total/Number of Values
Median
Middle value when data is ordered
If even number of values, median is the average of the two central values
Mode
Most frequently occurring value in a dataset
Measures of Spread
Range
Difference between the highest and lowest values
Formula: \text{Range} = \text{Max} - \text{Min}
Variance
Average of the squared differences from the mean
Standard Deviation
Square root of the variance
Formula: \sigma = \sqrt{\frac{\sum{(x - \bar{x})^2}}{n}}
Probability of an Event
Formula: P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
Complementary Events
Formula: P(\text{not } E) = 1 - P(E)
Discrete Probability Distributions
Probability mass function (pmf)
Example: Rolling a fair die
Continuous Probability Distributions
Probability density function (pdf)
Example: Heights of people
Bar Chart
Used for categorical data
Histogram
Used for continuous data divided into bins
Pie Chart
Shows proportions of a whole
Box Plot
Displays distribution based on quartiles
Can also reveal skewness
Can include/exclude outliers
Cumulative Frequency Graph
Displays points graphed from a table
Used to find data of 5-point summary (akin to box plots)
Scatter Plots
Displays relationship (strength & direction) between two variables as graphed data points.
Key words for describing scatter plot data include positive, negative, no correlation, strong, moderate, weak.
Also make sure to note if there are or not any present possible outliers.
Frequency Tables
Organize data into categories with frequency counts
Grouped Frequency Tables
Data divided into ranges or intervals
Random Sampling
Every member of the population has an equal chance of being selected
Systematic Sampling
Selecting every nth member from a list
Stratified Sampling
Population divided into subgroups, sample taken from each subgroup
Cluster Sampling
Population divided into random groups (clusters), then the clusters to collect data from are randomly selected
Quota Sampling
Non-probability, balanced method in which the population is divided into groups (quotas) based on categories (i.e., age, gender, etc.) to ensure each quota has an equal size.
Sample
A subset of the population used to estimate characteristics of the whole
Population
The entire group being studied
State Hypotheses
Null hypothesis
H_0: \text{No effect or difference}
Alternative hypothesis \newline H_1: \text{There is an effect or difference}
Choose Significance Level
Commonly \alpha = 0.05\ (5\%)
Calculate test statistic and compare with critical value
Reject or fail to reject the null hypothesis
P-Value
The probability that the results/observations areย
Low p-value (< .05) โ strong evidence against null evidence indicating a significant effect of observations
High p-value (โฅ .05) โ weak/insufficient evidence against null hypothesis indicating observations are likely a coincidence/random chance
Type I Error
A false positive, meaning that you falsely reject a true null hypothesis.
Type II Error
A false negative, meaning that you fail to reject a false null hypothesis.
In real life scenarios, Type II Errors commonly result in more serious/dangerous errors than Type I.ย
On the other hand, the two non-error/correct decisions are:
Rejecting a false null hypothesis
Not rejecting a true null hypothesis.ย
Notes!
Never be too affirmative โ we never accept a null hypothesis; only reject or not reject.
The language of statistics is very precise in the real world as well as on the exam.ย ย
t-Test
t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}
\bar{x} = sample mean
\mu = population mean
{s} = sample std. deviation
{n} = sample size
Chi-Square Test
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
{O_i} = observed frequency in category {i}
{E_i} = expected frequency in category {i}
Check Formula Sheet
Mean, median, and mode โ measures of central tendency.
Standard deviation โ provides insights for data variability.
Probability rules โ help w/ understanding likelihoods.
Graphs and tables โ useful for presenting and interpreting data clearly.
Sampling methods โ used for reliability of data.