BM 6
ASSOCIATION BETWEEN VARIABLES
Outline
- Introduction
- Correlation analysis
- Scatter plots
- Correlation Coefficient
- Coefficient of determination, r²
- Regression analysis
Objectives
- To be able to draw and interpret scatter diagrams.
- To be able to calculate the correlation coefficient and coefficient of determination.
- To understand and be able to use the least squares method to estimate the regression line.
Introduction
- The lecture introduces two fundamental techniques in statistics:
- Correlation: A method to measure the association between two variables.
- Regression: A technique to derive the relationship between two variables, helping to predict dependent variables based on independent variables.
- Example scenarios include:
- The potential dependence of production costs on the quantity produced.
- The relationship between product sales and pricing.
Correlation Analysis
- Correlation measures the strength of the association between two variables.
- A change in one variable corresponds to a change in another when they are said to be associated.
- Illustrative examples of potential correlations include:
- Relationship between production cost and price.
- Association between advertising efforts and sales revenue.
- Correlation between the number of deliveries and the time taken for those deliveries.
Class Activity: Determining Dependent and Independent Variables
- Examples include:
- Study Hours and Exam Grades: Study hours (independent) → Exam grades (dependent)
- Temperature and Ice Cream Sales: Temperature (independent) → Ice cream sales (dependent)
- Advertising Budget and Sales Revenue: Budget (independent) → Revenue (dependent)
- Distance Traveled and Fuel Consumption: Distance (independent) → Fuel (dependent)
- Employee Training Hours and Productivity: Training hours (independent) → Productivity (dependent)
- Rainfall Amount and Crop Yield: Rainfall (independent) → Crop yield (dependent)
- Social Media Engagement and Website Traffic: Engagement (independent) → Traffic (dependent)
- Education Level and Income: Education (independent) → Income (dependent)
Scatter Diagrams
- A scatter plot visually represents bivariate data, showing potential correlations.
- The independent variable is plotted on the x-axis, while the dependent variable is plotted on the y-axis.
- Analysis of scatter plots can reveal:
- Patterns indicating the strength and direction of the association.
- Suitability of data before conducting quantitative analysis.
Degrees of Association/Correlation
- When examining scatter plots, consider:
- Evidence of a pattern in the plotted points.
- Types of correlation:
- Perfect Correlation: All data points lie on a straight line, indicating a precise linear relationship (either positive or negative).
- Partially Correlated: Some pattern exists but not linear or perfectly associated.
- Uncorrelated: No discernible relationship between the variables.
Pearson’s Product Moment Correlation Coefficient
- Used to measure the strength of the association between two variables.
- The correlation coefficient (denoted as r) ranges from -1 to +1:
- A value close to +1 (e.g., 0.9) indicates a strong positive linear relationship.
- A value close to –1 (e.g., –0.9) indicates a strong negative linear relationship.
- A value of 0 indicates no linear relationship, though other types of relationships may exist.
- The formula for calculating the Pearson product moment correlation coefficient (r) is given by:
r=[n∑x2−(∑x)2][n∑y2−(∑y)2]n∑xy−∑x∑y
Example 1: Calculating r
- Calculate Pearson's correlation coefficient for the data:
- Units produced (x): 1, 2, 3, 4, 5, 6
- Production cost (y): 5.0, 10.5, 15.5, 25.0, 16.0, 22.5
Class Exercise
- Task: Plot a scatter diagram and calculate the Pearson product moment correlation coefficient for the following data:
- Policy (X) and Overtime hours (Y):
- 150 → 10
- 300 → 20
- 100 → 10
- 400 → 40
- 350 → 30
- 500 → 35
Coefficient of Determination r²
- Before using regression for prediction, evaluate its fit to the data.
- The coefficient of determination (r²) measures how well the independent variable explains the variation in the dependent variable.
- Given by the formula:
r2=(r)2 - Calculate r² for the previous examples.
Linear Regression Analysis
- Linear regression defines the relationship between dependent and independent variables using a linear equation.
- The focus is on estimating the line of best fit between two variables using:
- Least Squares Method: A calculation to minimize the sum of squared differences (errors) between observed and predicted values.
The Linear Regression Model
- The linear regression model is represented by:
y=a+bx
- Where:
- y = dependent variable
- x = independent variable
- a = y-intercept (constant)
- b = slope (gradient)
The Values of a and b
- The formulas to determine the optimal values of a and b that minimize squared errors are:
b=n∑x2−(∑x)2n∑xy−∑x∑y
a=n∑y−bn∑x
Example: Calculating the Regression Line
- Given data:
- Units produced (x): 1, 2, 3, 4, 5, 6
- Production cost (y): 5.0, 10.5, 15.5, 25.0, 16.0, 22.5
- Use the formulas for a and b to calculate the linear regression line.
Class Exercise 2
- Given data regarding output and cost:
- Output (x) in 000 units and Costs (Y) in P’000:
- 20 → 82
- 16 → 70
- 24 → 90
- 22 → 85
- 18 → 73
- Task: Calculate the regression line for the data.
Interpretation of a and b
- After calculating a and b, interpret the meaning of these coefficients in the context of the problem.
- Write the regression equation and plot the regression line on a scatter diagram to visualize the fit in relation to the data points.