Excel for Z-Scores and Normal Distribution Percentages
Histograms and Normal Distribution
Creating a Histogram in Excel
- Select the entire data column (e.g., 'RIC rating data').
- Go to Insert tab, then select Statistical Charts, and choose Histogram.
- Formatting Bins: The number of bins can be adjusted for better visualization. For example, changing the default to bins can provide a clearer sense of the data's distribution.
Interpreting the Histogram
- When examining the histogram for the 'RIC rating data', it appears unimodal (having one peak) and roughly symmetric.
- Though not perfectly normal, it is relatively close to a normal distribution.
- Significance: Identifying a distribution as roughly normal is crucial because it allows for the application of statistical concepts like Z-scores.
Calculating Z-Scores
Definition of a Z-Score: A Z-score (also known as a standard score) quantifies how many standard deviations an observation or data point is above or below the mean of a distribution.
- A positive Z-score indicates the value is above the mean.
- A negative Z-score indicates the value is below the mean.
- A Z-score of means the value is equal to the mean.
Prerequisites for Z-Score Calculation: To calculate Z-scores, two key descriptive statistics are needed from the distribution:
- Mean (): Calculated using the Excel function
=AVERAGE(column_range). For the 'Rick rate data' in column G, this would be=AVERAGE(G:G). The calculated mean was approximately . - Standard Deviation (): Calculated using the Excel function
=STDEV.P(column_range). This is specifically for the population standard deviation. For the 'Rick rate data', it would be=STDEV.P(G:G). The calculated standard deviation was approximately .
- Mean (): Calculated using the Excel function
Z-Score Formula: The mathematical formula for a Z-score is:
where:- is the Z-score.
- is the individual data point (raw score).
- (mu) is the population mean.
- (sigma) is the population standard deviation.
Implementing Z-Score Calculation in Excel
- For a raw score of in cell
G2, with the mean inK1and standard deviation inK2, the initial formula would be=(G2 - K1) / K2. - Anticipating the Z-score: Before hitting enter, mentally estimate the Z-score. For , , and :
- is above the mean, but not by much ().
- is significantly less than one standard deviation ().
- Therefore, the Z-score is expected to be positive but small, likely around to . The actual calculated Z-score for was .
- For a raw score of in cell
Excel Formula Dragging and Absolute References
- Issue: When an Excel formula like
=(G2 - K1) / K2is dragged down, Excel's default behavior is to update row values relatively. This meansK1would becomeK2, andK2would becomeK3, leading to aDIV/0(divide by zero) error if these cells become empty or non-numeric. - Solution: Absolute References: To prevent specific cell references from changing when a formula is dragged, use dollar signs () to make them absolute references.
- The formula becomes
=(G2 - $K$1) / $K$2. G2(the raw score) remains a relative reference so it updates toG3,G4, etc., as the formula is dragged down the column.$K$1(mean) and$K$2(standard deviation) are absolute references, ensuring they always point to the correct cells containing the mean and standard deviation, respectively.
- The formula becomes
- Automatic Fill: Once the formula with absolute references is correct, it can be dragged down manually or by double-clicking the small green square at the bottom right corner of the cell to automatically fill the formula for the entire data range.
- Issue: When an Excel formula like
Validating Z-Scores by Sorting
- To verify the calculated Z-scores, the entire dataset (including raw scores and Z-scores) can be sorted.
- Sort Process: Select the data, go to Sort & Filter, choose Custom Sort, and sort by the raw score (e.g., 'RIC rate value') from smallest to largest.
- Observations:
- Scores significantly below the mean (e.g., 50-2.6770.22).
- Scores near the mean (e.g., 70.22) will have Z-scores very close to zero.
- Scores significantly above the mean (e.g., 892.475).
- This validation confirms the logical consistency of the calculated Z-scores with the raw data.
Calculating Percentages in a Normal Distribution Using Excel
Introduction: Excel can precisely calculate the percentage of a normal distribution falling above or below a specific Z-score, replacing manual estimation methods.
Percentage Below a Z-Score
- Excel Function: The function used is
NORM.S.DIST(Z, cumulative).Z: The Z-score for which you want to find the percentage below (e.g.,H2for the Z-score in cell H2).cumulative: This parameter should always be set toTRUEfor calculating percentages below or above.TRUErefers to the cumulative distribution function, whileFALSErefers to the probability density function (which is not what's needed for these percentage calculations).
- Example: For a Z-score of -2.677, the formula would be
=NORM.S.DIST(H2, TRUE). - Estimation: For a Z-score of -2.6770.510.00370.37-11634\%14\%2\%0.158615.86100\%1 as a proportion).
- Excel Formula:
=1 - NORM.S.DIST(Z, TRUE). - Example: To find the percentage above the Z-score in
H2, the formula would be=1 - NORM.S.DIST(H2, TRUE). - Validation: For a very low Z-score (e.g., -2.6770.371 - 0.0037 = 0.996399.63050\%-0.0348.5\%51.5\%$$ above, fitting expectations for a symmetric normal distribution.
- Excel Function: The function used is
Formatting Percentages in Excel
- After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose
Percentageformat. - Adjust the number of decimal places for clarity if desired.
- After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose