Cost Behavior and Estimation: Fixed vs Variable; High-Low and Regression Methods

Fixed costs vs. variable costs

Management accounts convert data from the accounting system into a format that is more useful for cost analysis
Fixed costs: same total amount regardless of activity level, within the period
- Examples: rent paid each month; salary for a salaried employee
Variable costs: change with activity level (costs vary with output or hours worked)
- Example: wages for hourly employees (more hours → higher pay, fewer hours → lower pay)
Other examples:
- Subscriptions (e.g., Netflix) are typically fixed costs because the charge is the same each period
In practice, many costs are mixed (contain both fixed and variable components)
- Example: renting a truck with a fixed fee plus a variable cost per mile (or per hour): C = F + v imes x
- Intercept (F) is the fixed portion; slope (v) is the variable portion
Why categorize costs this way?
- Helps with forecasting and budgeting
- Lets us predict how total cost will change with activity level based on fixed and variable portions
Note from lecture: most costs in reality are mixed, not purely fixed or purely variable
- This is why we estimate both fixed and variable components rather than labeling every cost categorically

Mixed costs and a practical example

Truck rental example mentioned in class:
- Fixed portion: $400 (intercept)
- Variable portion: $0.20 per unit (e.g., per mile or per hour)
- Graphically: the intercept represents the fixed cost, and the slope represents the variable cost per unit
General form: C = F + v imes x where
- C = total cost (y)
- F = fixed cost (intercept)
- v = variable cost per unit (slope)
- x = activity level (e.g., miles, hours)

Why we estimate costs

Primary purpose: look at total cost and determine what portion is fixed vs. variable
Use past data to predict future costs under different activity levels
The two key implications:
- If we know the fixed and variable portions, we can forecast costs for different levels of activity
- Helps with budgeting, pricing, and profitability analysis

Three main methods to analyze/estimate costs

Scattergram (scatter plot)
- Plot observed data points (cost vs. activity)
- A visual check: does the data roughly fall on a straight line?
- If the data does not look linear, a simple linear method may not be appropriate
- Outliers can distort the line, so visually inspect points that lie far from the line
Regression (line of best fit)
- A statistical method that uses all data points to fit a straight line
- The slope gives the variable cost per unit; the intercept gives the fixed cost
- Provides goodness-of-fit measures to assess prediction quality
- Output typically includes: intercept (fixed cost F), slope (variable cost v), correlation metrics, t-stats, p-values
High-Low method (a simpler, quick estimation approach)
- Uses only the highest and lowest activity levels
- Steps: 1) Identify the highest and lowest activity levels:
  - X_ ext{high}, X_ ext{low}
  - Y_ ext{high}, Y_ ext{low}
    2) Compute the variable cost per unit (slope):
    v = rac{Y_ ext{high} - Y_ ext{low}}{X_ ext{high} - X_ ext{low}}
    3) Compute the fixed cost (intercept) using one of the points (either high or low):
    F = Yi - v imes Xi \text{for } i ext{ = high or low}
    4) Construct the cost function: C = F + v imes x
- Pros: simple and quick; Cons: uses only two data points and is sensitive to outliers
- Regression is typically more accurate because it uses all data points

Example data from the lecture (computer repair shop)

Setup:
- Y = total overhead cost (in dollars), measured monthly
- X = number of repair hours (nonfinancial unit; used as the activity level)
Data description (as presented):
- Two noted points used for high-low estimation: highest ~568 hours, lowest ~200 hours
- Costs: at 568 hours, Y ≈ 12,083; at 200 hours, Y ≈ 9,054
High-Low calculation (as described in the lecture):
- Variable cost per hour (slope) calculation:
  v = rac{Y_ ext{high} - Y_ ext{low}}{X_ ext{high} - X_ ext{low}} = rac{12{,}083 - 9{,}054}{568 - 200}
- Using those two points, the speaker stated the variable cost as v ext{ = } 10.40 ext{ per hour}
- Intercept (fixed cost) calculation (using one point):
- If using the high point: F = Y_ ext{high} - v imes X_ ext{high} = 12{,}083 - 10.40 imes 568 \approx 6{,}176
- If using the low point: F = Y_ ext{low} - v imes X_ ext{low} = 9{,}054 - 10.40 imes 200 \approx 7{,}414
- The two calculations give approximately similar fixed-cost estimates (about $6k–$7.4k), illustrating how the high-low method yields a fixed cost estimate from the chosen points
The lecture notes also illustrate a policy of using the high-low approach with the two points to derive the cost function

Regression approach (detailed interpretation from the lecture)

Regression output (illustrative example from the lecture):
- Cost function (predicted overhead): C = 64.72 + 12.52 imes x where
- F = 64.72 (intercept, fixed cost)
- v = 12.52 (slope, variable cost per repair hour)
Interpretation of regression outputs:
- Intercept (F) represents the fixed cost component per period
- Slope (v) represents the variable cost per additional unit of activity (here, per repair hour)
- Multiple R (correlation) ≈ 0.91, indicating a strong linear relationship between hours and cost
- R^2 ≈ 0.83, meaning about 83% of the variation in cost is explained by the number of repair hours
- t-statistics and p-values assess the statistical significance of the estimated coefficients; the lecture notes mention a very small p-value (on the order of 10^{-4} or smaller), indicating high significance of the estimates
Predictions using the regression model:
- Example: for x = 300 repair hours,
  C(300) = 64.72 + 12.52 imes 300 = 64.72 + 3756 = 3820.72
- The lecturer also stated a predicted value of $10{,}228 for 300 hours, which is inconsistent with the regression equation shown above; this appears to be an error or a mismatch in the transcript
Important caveat about extrapolation:
- The data used for the regression covered up to 568 hours
- Predicting costs for 600 hours would be outside the observed range, so the estimate may be unreliable
- Always check the range of the data before extrapolating with the model

Linear vs. non-linear data and model choice

Scatter plot helps visually assess whether a linear model is appropriate
If the scatter plot does not resemble a straight line, a non-linear model or a different approach may be necessary
The lecturer notes that, in some cases, non-linear forms can be used (and Excel can handle such models) if the data support it

Outliers and data quality

Outliers: observations that lie far from the line of best fit
- Example described: a data point at 400 repair hours with a very different cost than other 400-hour observations
- Outliers may indicate errors in data entry or unusual events in that month
- Decision point: determine whether the data point is representative; if not, decide whether to exclude it because it can distort the estimation method
When focusing on high-low estimation, the method is particularly sensitive to extremes/outliers

Practical implications and takeaways

Use cost behavior to forecast costs under different activity levels
Regression provides a more reliable estimate than high-low because it uses all data points and provides measures of fit
High-low offers a quick check or starting point for estimation but should be used with caution
Always examine data for linearity and outliers before choosing a method
Be mindful of extrapolation risks; predictions beyond the observed data range are less reliable
Understand that estimates are just that—best guesses based on historical data and the chosen model; real-world factors can cause deviations

Key takeaways summarized

Costs can be categorized as fixed, variable, or mixed; mixed costs combine a fixed base with a variable component
The general cost equation is C = F + v imes x where
- F is the fixed cost
- v is the variable cost per unit of activity
- x is the level of activity
High-Low method (two-point estimate) steps: identify high/low activity, compute v = rac{Y_ ext{high} - Y_ ext{low}}{X_ ext{high} - X_ ext{low}}, compute F = Yi - v imes Xi for either point, and form C = F + v imes x
Regression (best-fit line) uses all data points to estimate the cost function; interpret intercept and slope, assess fit with R, R^2, t-stats, and p-values; beware extrapolation beyond the data range
Real-world examples include rent (fixed), hourly wages (variable), and subscriptions (fixed)
Outliers require careful handling, as they can distort both high-low and regression results