Section A 📊 → Sampling, Hypothesis, Probabilities, Presentation of Data

Measures of Central Tendency

Mean
- Sum of all values divided by the number of values
- Formula: Mean = Total/Number of Values
Median
- Middle value when data is ordered
- If even number of values, median is the average of the two central values
Mode
- Most frequently occurring value in a dataset

Measures of Spread

Range
- Difference between the highest and lowest values
- Formula: $\text{Range} = \text{Max} - \text{Min}$
Variance
- Average of the squared differences from the mean
Standard Deviation
- Square root of the variance
- Formula: $\sigma = \sqrt{\frac{\sum{(x - \bar{x})^2}}{n}}$

Probability of an Event
- Formula: $P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}$
Complementary Events
- Formula: $P(\text{not } E) = 1 - P(E)$

Discrete Probability Distributions
- Probability mass function (pmf)
- Example: Rolling a fair die
Continuous Probability Distributions
- Probability density function (pdf)
- Example: Heights of people

Bar Chart
- Used for categorical data
Histogram
- Used for continuous data divided into bins
Pie Chart
- Shows proportions of a whole
Box Plot
- Displays distribution based on quartiles
- Can also reveal skewness
- Can include/exclude outliers
Cumulative Frequency Graph
- Displays points graphed from a table
- Used to find data of 5-point summary (akin to box plots)
Scatter Plots
- Displays relationship (strength & direction) between two variables as graphed data points.
- Key words for describing scatter plot data include positive, negative, no correlation, strong, moderate, weak.
  - Also make sure to note if there are or not any present possible outliers.

Random Sampling
- Every member of the population has an equal chance of being selected
Systematic Sampling
- Selecting every nth member from a list
Stratified Sampling
- Population divided into subgroups, sample taken from each subgroup
Cluster Sampling
- Population divided into random groups (clusters), then the clusters to collect data from are randomly selected
Quota Sampling
- Non-probability, balanced method in which the population is divided into groups (quotas) based on categories (i.e., age, gender, etc.) to ensure each quota has an equal size.

Sample
- A subset of the population used to estimate characteristics of the whole
Population
- The entire group being studied

State Hypotheses
- Null hypothesis
- $H_0:$ $\text{No effect or difference}$
- Alternative hypothesis $\newline H_1: \text{There is an effect or difference}$
Choose Significance Level
- Commonly $\alpha = 0.05\ (5\%)$
- Calculate test statistic and compare with critical value
- Reject or fail to reject the null hypothesis
P-Value
- The probability that the results/observations are
  - Low p-value (< .05) → strong evidence against null evidence indicating a significant effect of observations
  - High p-value (≥ .05) → weak/insufficient evidence against null hypothesis indicating observations are likely a coincidence/random chance

Type I Error
- A false positive, meaning that you falsely reject a true null hypothesis.
Type II Error
- A false negative, meaning that you fail to reject a false null hypothesis.
- In real life scenarios, Type II Errors commonly result in more serious/dangerous errors than Type I.
On the other hand, the two non-error/correct decisions are:
- Rejecting a false null hypothesis
- Not rejecting a true null hypothesis.
Notes!
- Never be too affirmative → we never accept a null hypothesis; only reject or not reject.
- The language of statistics is very precise in the real world as well as on the exam.

t-Test
- $t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}$
  - $\bar{x}$ = sample mean
  - $\mu$ = population mean
  - ${s}$ = sample std. deviation
  - ${n}$ = sample size
Chi-Square Test
- $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
  - ${O_i}$ = observed frequency in category ${i}$
  - ${E_i}$ = expected frequency in category ${i}$
    - Check Formula Sheet