Descriptive Statistics: Methods of presenting data visually and numerically.
Charts, frequency distributions, and histograms organize and present data.
Measures of Central Tendency: Mean, median, mode.
Measures of Dispersion: Variance, standard deviation, range.
Statistical Inference: Drawing conclusions about the unknown characteristics of a population based on sample data.
Predictive Statistics: Developing predictions of future values from historical data.
Population vs Sample:
Population: Complete set of objects of interest.
Sample: Subset of objects taken from the population.
Measures of Location:
Mean: Population mean = ( \mu = \frac{\sum{i=1}^{N} xi}{N} ); sample mean = ( \bar{x} = \frac{\sum{i=1}^{n} xi}{n} ).
Median: The middle value in sorted data.
Mode: The most frequently occurring observation.
Measures of Dispersion:
Range: ( \text{Range} = \text{MAX(data range)} - \text{MIN(data range)} ).
Variance:
Population: ( \sigma^2 = \frac{\sum{i=1}^{N}(xi - \mu)^2}{N} ).
Sample: ( s^2 = \frac{\sum{i=1}^{n}(xi - \bar{x})^2}{n-1} ).
Standard Deviation:
Population: ( \sigma = \sqrt{\sigma^2} ).
Sample: ( s = \sqrt{s^2} ).
Proportions:
Proportion (P) is a fraction with a certain characteristic.
Sample proportion (( \hat{p} )) is used in categorical data analysis.
Example in Excel: ( \text{COUNTIF(range, criteria)} )
Measures of Shape:
Skewness: Asymmetry of the data.
Coefficient of Skewness (CS):
CS > 1 or < -1: High skewness.
0.5 to 1 or -0.5 to -1: Moderate skewness.
0.5 to -0.5: Symmetrical.
Kurtosis: Peakedness or flatness of a histogram.
Coefficient of Kurtosis (CK) measures degree of kurtosis.
CK < 3: flat distribution.
CK > 3: peaked distribution.
Sampling Distribution: Distribution of a statistic for all possible samples of a fixed size.
Standard Error of the Mean:
Infinite populations: ( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} ).
Finite populations: ( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}} ).
Central Limit Theorem:
The sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the population distribution.
Confidence Intervals:
CI is an interval estimate of a population parameter indicating the probability of containing the true parameter.
General formula: ( \text{CI} = \bar{x} \pm z\left( \frac{\sigma}{\sqrt{n}} \right) \)
Concept: Involves two contrasting propositions (hypotheses) regarding a population parameter (null hypothesis vs alternative hypothesis).
Steps in Hypothesis Testing:
Formulate the hypotheses.
Select level of significance (α).
Determine a decision rule.
Collect data and compute test statistic.
Apply decision rule and draw conclusion.
Critical Value: Divides the sampling distribution into rejection and non-rejection regions.
P-value: Probability of obtaining statistic value as extreme or more extreme than one observed if null hypothesis is true.
Purpose: To test equality of means from multiple populations.
One-way ANOVA: Compares means for different levels of one factor.
F Statistic: Ratio of variance estimates.
Decision Rule: If F statistic > critical value, means are likely different.
Regression Analysis: Models relationships between dependent and independent variables.
Simple Regression: One independent variable.
Multiple Regression: Multiple independent variables.
Correlation: Measures linear relationship:
Ranges from -1 (perfect negative) to 1 (perfect positive), with 0 indicating no relationship.
Coefficient of Determination (R²): Proportion of variance in dependent variable explained by independent variable(s).
Purpose: To test and compare methods or optimize process outputs by examining the effect of factors.
Factorial Experiment: All combinations of factor levels included.
Main Effect: Difference a factor has on the response.
Interaction Effects: The effect of changing one factor on others.