Excel for Z-Scores and Normal Distribution Percentages

Histograms and Normal Distribution

  • Creating a Histogram in Excel

    • Select the entire data column (e.g., 'RIC rating data').
    • Go to Insert tab, then select Statistical Charts, and choose Histogram.
    • Formatting Bins: The number of bins can be adjusted for better visualization. For example, changing the default to 1515 bins can provide a clearer sense of the data's distribution.
  • Interpreting the Histogram

    • When examining the histogram for the 'RIC rating data', it appears unimodal (having one peak) and roughly symmetric.
    • Though not perfectly normal, it is relatively close to a normal distribution.
    • Significance: Identifying a distribution as roughly normal is crucial because it allows for the application of statistical concepts like Z-scores.

Calculating Z-Scores

  • Definition of a Z-Score: A Z-score (also known as a standard score) quantifies how many standard deviations an observation or data point is above or below the mean of a distribution.

    • A positive Z-score indicates the value is above the mean.
    • A negative Z-score indicates the value is below the mean.
    • A Z-score of 00 means the value is equal to the mean.
  • Prerequisites for Z-Score Calculation: To calculate Z-scores, two key descriptive statistics are needed from the distribution:

    • Mean (μ\mu): Calculated using the Excel function =AVERAGE(column_range). For the 'Rick rate data' in column G, this would be =AVERAGE(G:G). The calculated mean was approximately 70.2270.22.
    • Standard Deviation (σ\sigma): Calculated using the Excel function =STDEV.P(column_range). This is specifically for the population standard deviation. For the 'Rick rate data', it would be =STDEV.P(G:G). The calculated standard deviation was approximately 7.57.5.
  • Z-Score Formula: The mathematical formula for a Z-score is:
    Z=XμσZ = \frac{X - \mu}{\sigma}
    where:

    • ZZ is the Z-score.
    • XX is the individual data point (raw score).
    • μ\mu (mu) is the population mean.
    • σ\sigma (sigma) is the population standard deviation.
  • Implementing Z-Score Calculation in Excel

    • For a raw score of 7272 in cell G2, with the mean in K1 and standard deviation in K2, the initial formula would be =(G2 - K1) / K2.
    • Anticipating the Z-score: Before hitting enter, mentally estimate the Z-score. For X=72X = 72, μ=70.22\mu = 70.22, and σ=7.5\sigma = 7.5:
      • 7272 is above the mean, but not by much (7270.22=1.7872 - 70.22 = 1.78).
      • 1.781.78 is significantly less than one standard deviation (7.57.5).
      • Therefore, the Z-score is expected to be positive but small, likely around 0.20.2 to 0.30.3. The actual calculated Z-score for 7272 was 0.2290.229.
  • Excel Formula Dragging and Absolute References

    • Issue: When an Excel formula like =(G2 - K1) / K2 is dragged down, Excel's default behavior is to update row values relatively. This means K1 would become K2, and K2 would become K3, leading to a DIV/0 (divide by zero) error if these cells become empty or non-numeric.
    • Solution: Absolute References: To prevent specific cell references from changing when a formula is dragged, use dollar signs () to make them absolute references.
      • The formula becomes =(G2 - $K$1) / $K$2.
      • G2 (the raw score) remains a relative reference so it updates to G3, G4, etc., as the formula is dragged down the column.
      • $K$1 (mean) and $K$2 (standard deviation) are absolute references, ensuring they always point to the correct cells containing the mean and standard deviation, respectively.
    • Automatic Fill: Once the formula with absolute references is correct, it can be dragged down manually or by double-clicking the small green square at the bottom right corner of the cell to automatically fill the formula for the entire data range.
  • Validating Z-Scores by Sorting

    • To verify the calculated Z-scores, the entire dataset (including raw scores and Z-scores) can be sorted.
    • Sort Process: Select the data, go to Sort & Filter, choose Custom Sort, and sort by the raw score (e.g., 'RIC rate value') from smallest to largest.
    • Observations:
      • Scores significantly below the mean (e.g., 50)willhavelargenegativeZscores(e.g.,) will have large negative Z-scores (e.g.,-2.67belowthemeanofbelow the mean of70.22).
      • Scores near the mean (e.g., 70.22) will have Z-scores very close to zero.
      • Scores significantly above the mean (e.g., 89)willhavelargepositiveZscores(e.g.,) will have large positive Z-scores (e.g.,2.475).
    • This validation confirms the logical consistency of the calculated Z-scores with the raw data.

Calculating Percentages in a Normal Distribution Using Excel

  • Introduction: Excel can precisely calculate the percentage of a normal distribution falling above or below a specific Z-score, replacing manual estimation methods.

  • Percentage Below a Z-Score

    • Excel Function: The function used is NORM.S.DIST(Z, cumulative).
      • Z: The Z-score for which you want to find the percentage below (e.g., H2 for the Z-score in cell H2).
      • cumulative: This parameter should always be set to TRUE for calculating percentages below or above. TRUE refers to the cumulative distribution function, while FALSE refers to the probability density function (which is not what's needed for these percentage calculations).
    • Example: For a Z-score of -2.677, the formula would be =NORM.S.DIST(H2, TRUE).
    • Estimation: For a Z-score of -2.677,whichisfarinthelowertailofthedistribution,thepercentagebelowitisexpectedtobeverysmall,perhapsaround, which is far in the lower tail of the distribution, the percentage below it is expected to be very small, perhaps around0.5% to1%.</li>\n<li><strong>Result</strong>: The function returns a value like0.0037,whichmeans, which means0.37% (a very small percentage), confirming the estimation.</li>\n<li><strong>Another Example</strong>: For a Z-score of-1,estimationsuggestsapproximately, estimation suggests approximately16% (derived from the34\%,,14\%,,2\%rulefornormaldistributions).<code>NORM.S.DIST(1,TRUE)</code>indeedreturnsrule for normal distributions). <code>NORM.S.DIST(-1, TRUE)</code> indeed returns0.1586,or, or15.86%.</li>\n<li><strong>Note</strong>: This <code>NORM.S.DIST</code> function <em>always</em> returns the percentage of the distribution <em>below</em> the given Z-score.</li></ul></li>\n<li><p><strong>Percentage Above a Z-Score</strong></p>\n<ul>\n<li><strong>Principle</strong>: If the percentage below a Z-score is known, the percentage above can be easily calculated, as the sum of the percentage below and the percentage above must equal100\%(or(or1 as a proportion).
    • Excel Formula: =1 - NORM.S.DIST(Z, TRUE).
    • Example: To find the percentage above the Z-score in H2, the formula would be =1 - NORM.S.DIST(H2, TRUE).
    • Validation: For a very low Z-score (e.g., -2.677),theresaverysmallpercentagebelowit(), there's a very small percentage below it (0.37%), meaning a very large percentage above it (1 - 0.0037 = 0.9963oror99.63%). This makes logical sense.</li>\n<li>For Z-scores near the mean (around0),boththepercentbelowandpercentaboveshouldbecloseto), both the 'percent below' and 'percent above' should be close to50\%.Forexample,aZscoreof. For example, a Z-score of-0.03hashas48.5\%belowandbelow and51.5\%$$ above, fitting expectations for a symmetric normal distribution.
  • Formatting Percentages in Excel

    • After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose Percentage format.
    • Adjust the number of decimal places for clarity if desired.