Excel for Z-Scores and Normal Distribution Percentages
Histograms and Normal Distribution
Creating a Histogram in Excel
- Select the entire data column (e.g., 'RIC rating data').
- Go to Insert tab, then select Statistical Charts, and choose Histogram.
- Formatting Bins: The number of bins can be adjusted for better visualization. For example, changing the default to bins can provide a clearer sense of the data's distribution.
Interpreting the Histogram
- When examining the histogram for the 'RIC rating data', it appears unimodal (having one peak) and roughly symmetric.
- Though not perfectly normal, it is relatively close to a normal distribution.
- Significance: Identifying a distribution as roughly normal is crucial because it allows for the application of statistical concepts like Z-scores.
Calculating Z-Scores
Definition of a Z-Score: A Z-score (also known as a standard score) quantifies how many standard deviations an observation or data point is above or below the mean of a distribution.
- A positive Z-score indicates the value is above the mean.
- A negative Z-score indicates the value is below the mean.
- A Z-score of means the value is equal to the mean.
Prerequisites for Z-Score Calculation: To calculate Z-scores, two key descriptive statistics are needed from the distribution:
- Mean (): Calculated using the Excel function
=AVERAGE(column_range). For the 'Rick rate data' in column G, this would be=AVERAGE(G:G). The calculated mean was approximately . - Standard Deviation (): Calculated using the Excel function
=STDEV.P(column_range). This is specifically for the population standard deviation. For the 'Rick rate data', it would be=STDEV.P(G:G). The calculated standard deviation was approximately .
- Mean (): Calculated using the Excel function
Z-Score Formula: The mathematical formula for a Z-score is:
where:- is the Z-score.
- is the individual data point (raw score).
- (mu) is the population mean.
- (sigma) is the population standard deviation.
Implementing Z-Score Calculation in Excel
- For a raw score of in cell
G2, with the mean inK1and standard deviation inK2, the initial formula would be=(G2 - K1) / K2. - Anticipating the Z-score: Before hitting enter, mentally estimate the Z-score. For , , and :
- is above the mean, but not by much ().
- is significantly less than one standard deviation ().
- Therefore, the Z-score is expected to be positive but small, likely around to . The actual calculated Z-score for was .
- For a raw score of in cell
Excel Formula Dragging and Absolute References
- Issue: When an Excel formula like
=(G2 - K1) / K2is dragged down, Excel's default behavior is to update row values relatively. This meansK1would becomeK2, andK2would becomeK3, leading to aDIV/0(divide by zero) error if these cells become empty or non-numeric. - Solution: Absolute References: To prevent specific cell references from changing when a formula is dragged, use dollar signs () to make them absolute references.
- The formula becomes
=(G2 - $K$1) / $K$2. G2(the raw score) remains a relative reference so it updates toG3,G4, etc., as the formula is dragged down the column.$K$1(mean) and$K$2(standard deviation) are absolute references, ensuring they always point to the correct cells containing the mean and standard deviation, respectively.
- The formula becomes
- Automatic Fill: Once the formula with absolute references is correct, it can be dragged down manually or by double-clicking the small green square at the bottom right corner of the cell to automatically fill the formula for the entire data range.
- Issue: When an Excel formula like
Validating Z-Scores by Sorting
- To verify the calculated Z-scores, the entire dataset (including raw scores and Z-scores) can be sorted.
- Sort Process: Select the data, go to Sort & Filter, choose Custom Sort, and sort by the raw score (e.g., 'RIC rate value') from smallest to largest.
- Observations:
- Scores significantly below the mean (e.g., 50-2.6770.2270.22892.475-2.677-2.6770.510.00370.37-11634\%14\%2\%0.158615.86100\%1-2.6770.371 - 0.0037 = 0.996399.63050\%-0.0348.5\%51.5\%$$ above, fitting expectations for a symmetric normal distribution.
Formatting Percentages in Excel
- After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose
Percentageformat. - Adjust the number of decimal places for clarity if desired.
- After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose