Topic Nine: Hypothesis Testing & One-Sample T-Tests (in-class Tutorial Notes)

Assessment Overview and Group Coordination

  • Group Structure and Engagement:     - The maximum group size is five students.     - Group members must engage with each other between classes and are encouraged to exchange contact information.     - Students must actively participate in selecting variables and defining the specific problem they intend to investigate.

  • Assessment Objectives:     - Students assume the role of a practitioner.     - The core aim is to identify, compare, and critique secondary sources.     - Students must evaluate these sources, synthesize multiple ideas, and argue their own perspective.

  • Assessment Deliverables:     - A written component of 1,000 words.     - The use of infographics to help visualize and support the argument.     - An optional dashboard upload may be available.

  • Final Exam Context:     - The final exam is closed-book.     - It focuses heavily on the interpretation of Excel outputs rather than just raw calculation.     - Statistical formulas will be provided in the exam.

Data Selection and Topic Development

  • Source Material:     - Students must use secondary data.     - Recommended sources include the Australian Bureau of Statistics (ABS), World Bank, and local LGA (Local Government Area) datasets provided by the instructor (e.g., economy.id.com.au).     - Data can be national or international in scope.

  • Topic Examples and Variable Pairing:     - Housing Market: Relationship between Interest Rates (Cash Rate) and the House Price Index. Note that using Consumer Price Index (CPI) to measure cost of living against house prices is problematic because housing expenditures are already a significant component of CPI.     - Social Metrics: Relationship between crime rates and the level of education or median income within an LGA.     - Macroeconomics: Relationship between exchange rates and the level of exports; relationship between GDP growth rate and the unemployment rate.     - COVID-19 Analysis: Comparing "pre-COVID" and "post-COVID" data for variables like literacy rates or employment participation among youth and adults.

  • Case Study: NBA Three-Pointers and Wins:     - Hypothesis: The more three-pointers a team makes, the more likely they are to win games.     - Variables:         - Dependent (yy): Number of games won (out of 82 per season).         - Independent (x1x_1): Number of three-pointers made.     - Data Structure: Cross-sectional data using 30 NBA teams for a single year/season provides sufficient observations (n=30n = 30).

Macroeconomic Theory and Structural Breaks

  • The Relationship Between Growth and Unemployment:     - Macroeconomic theory suggests that as the GDP growth rate increases, the unemployment rate falls (Okun's Law context).     - Unemployment is influenced by changes in Aggregate Demand (ADAD).

  • Aggregate Demand Formula:     - AD=C+I+G+(XM)AD = C + I + G + (X - M)     - Where:         - CC = Consumption Expenditures         - II = Investment         - GG = Government Expenditure         - XMX - M = Net Exports (Exports minus Imports)

  • Interest Rates and Investment:     - Falling interest rates reduce the cost of borrowing, which increases investment (II) and consumption (CC), thereby increasing the demand for goods and services and reducing unemployment.

  • Structural Breaks:     - Significant events like COVID-19 create a "structural break" in datasets (e.g., comparing 2019 data to later periods).     - Analyzing such periods often requires a "dummy variable" to distinguish between the before and after scenarios.

Core Statistical Concepts for Excel Analysis

  • Measures of Central Tendency:     - Mean (Average): The total value of all observations divided by the number of observations (nn).     - Median: The middle observation in a sorted dataset.     - Mode: The most frequent observation in a dataset.     - In a perfectly normal distribution, the Mean, Median, and Mode are equal.

  • Measures of Dispersion:     - Standard Deviation: Measures how the dataset is dispersed or positioned relative to the mean. A higher standard deviation indicates higher variability.     - Variance: Another measure of how much values in a data set differ from the mean.

  • Standard Error:     - Formula: SE=σnSE = \frac{\sigma}{\sqrt{n}}     - Where σ\sigma is the standard deviation and nn is the number of observations.

  • Degrees of Freedom:     - Calculated as n1n - 1.

Hypothesis Testing Framework

  • Hypotheses Formulation:     - Null Hypothesis (H0H_0): Always contains the equal sign (e.g., μ=50\mu = 50). It represents the status quo or the claim being tested.     - Alternative Hypothesis (H1H_1 or HaH_a): What you are actually testing. It can be not equal to (\neq), greater than (>), or less than (<).

  • Types of Tests:     - Two-Tailed Test: Uses \neq in the alternative hypothesis. It has two rejection regions (positive and negative tails).     - One-Tailed Test: Focuses on whether a value is strictly greater or strictly less than the mean.

  • T-Statistic Formula:     - t=xˉμSEt = \frac{\bar{x} - \mu}{SE}     - Where xˉ\bar{x} is the sample mean, μ\mu is the hypothesized population mean, and SESE is the standard error.

  • Error Types in Hypothesis Testing:     - Type I Error (α\alpha): Rejecting the null hypothesis when it is actually true. The analyst chooses the alpha level (commonly 0.050.05), which represents a 5% chance of making this error.     - Type II Error (β\beta): Failing to reject (accepting) the null hypothesis when it is actually false. To reduce the probability of a Type II error, one should increase the sample size (n30n \ge 30).

Decision Rules and Results Interpretation

  • Internal Decision Logic:     - If the calculated T-statistic falls within the Acceptance Region (between the critical values, e.g., 1.96-1.96 and +1.96+1.96), you accept H0H_0.     - If it falls in the Rejection Region, you reject H0H_0.

  • P-Value Decision Rule:     - If P-value α\le \alpha (0.050.05): Reject H0H_0. This implies the null hypothesis is unlikely to be true.     - If P-value > \alpha (0.050.05): Accept (Fail to Reject) H0H_0. This implies there is not enough evidence to say the null hypothesis is false.

  • Practical Exercise 1: Coffee Shop Sales:     - Scenario: A manager thinks they sell 50 coffees a day. Dataset: n=60n=60.     - Result: T-stat was 0.860.86 (within the range of ±1.96\pm 1.96). P-value was 0.39540.3954.     - Conclusion: Since 0.3954 > 0.05, we accept H0H_0. The claim of 50 coffees per day is accurate.

  • Practical Exercise 2: Consumer Confidence Index:     - Scenario: Pre-COVID mean was 78. Current survey of n=100n=100 collected.     - Result: T-stat was 2.97-2.97 (outside the range of ±1.96\pm 1.96). P-value was 0.0040.004.     - Conclusion: Since 0.004 < 0.05, we reject H0H_0. The consumer confidence level has significantly changed since the pre-COVID period.