Excel for Z-Scores and Normal Distribution Percentages

Histograms and Normal Distribution

  • Creating a Histogram in Excel

    • Select the entire data column (e.g., 'RIC rating data').
    • Go to Insert tab, then select Statistical Charts, and choose Histogram.
    • Formatting Bins: The number of bins can be adjusted for better visualization. For example, changing the default to 1515 bins can provide a clearer sense of the data's distribution.
  • Interpreting the Histogram

    • When examining the histogram for the 'RIC rating data', it appears unimodal (having one peak) and roughly symmetric.
    • Though not perfectly normal, it is relatively close to a normal distribution.
    • Significance: Identifying a distribution as roughly normal is crucial because it allows for the application of statistical concepts like Z-scores.

Calculating Z-Scores

  • Definition of a Z-Score: A Z-score (also known as a standard score) quantifies how many standard deviations an observation or data point is above or below the mean of a distribution.

    • A positive Z-score indicates the value is above the mean.
    • A negative Z-score indicates the value is below the mean.
    • A Z-score of 00 means the value is equal to the mean.
  • Prerequisites for Z-Score Calculation: To calculate Z-scores, two key descriptive statistics are needed from the distribution:

    • Mean (μ\mu): Calculated using the Excel function =AVERAGE(column_range). For the 'Rick rate data' in column G, this would be =AVERAGE(G:G). The calculated mean was approximately 70.2270.22.
    • Standard Deviation (σ\sigma): Calculated using the Excel function =STDEV.P(column_range). This is specifically for the population standard deviation. For the 'Rick rate data', it would be =STDEV.P(G:G). The calculated standard deviation was approximately 7.57.5.
  • Z-Score Formula: The mathematical formula for a Z-score is:
    Z=XμσZ = \frac{X - \mu}{\sigma}
    where:

    • ZZ is the Z-score.
    • XX is the individual data point (raw score).
    • μ\mu (mu) is the population mean.
    • σ\sigma (sigma) is the population standard deviation.
  • Implementing Z-Score Calculation in Excel

    • For a raw score of 7272 in cell G2, with the mean in K1 and standard deviation in K2, the initial formula would be =(G2 - K1) / K2.
    • Anticipating the Z-score: Before hitting enter, mentally estimate the Z-score. For X=72X = 72, μ=70.22\mu = 70.22, and σ=7.5\sigma = 7.5:
      • 7272 is above the mean, but not by much (7270.22=1.7872 - 70.22 = 1.78).
      • 1.781.78 is significantly less than one standard deviation (7.57.5).
      • Therefore, the Z-score is expected to be positive but small, likely around 0.20.2 to 0.30.3. The actual calculated Z-score for 7272 was 0.2290.229.
  • Excel Formula Dragging and Absolute References

    • Issue: When an Excel formula like =(G2 - K1) / K2 is dragged down, Excel's default behavior is to update row values relatively. This means K1 would become K2, and K2 would become K3, leading to a DIV/0 (divide by zero) error if these cells become empty or non-numeric.
    • Solution: Absolute References: To prevent specific cell references from changing when a formula is dragged, use dollar signs () to make them absolute references.
      • The formula becomes =(G2 - $K$1) / $K$2.
      • G2 (the raw score) remains a relative reference so it updates to G3, G4, etc., as the formula is dragged down the column.
      • $K$1 (mean) and $K$2 (standard deviation) are absolute references, ensuring they always point to the correct cells containing the mean and standard deviation, respectively.
    • Automatic Fill: Once the formula with absolute references is correct, it can be dragged down manually or by double-clicking the small green square at the bottom right corner of the cell to automatically fill the formula for the entire data range.
  • Validating Z-Scores by Sorting

    • To verify the calculated Z-scores, the entire dataset (including raw scores and Z-scores) can be sorted.
    • Sort Process: Select the data, go to Sort & Filter, choose Custom Sort, and sort by the raw score (e.g., 'RIC rate value') from smallest to largest.
    • Observations:
      • Scores significantly below the mean (e.g., 50)willhavelargenegativeZscores(e.g.,) will have large negative Z-scores (e.g.,-2.67belowthemeanofbelow the mean of70.22).</li><li>Scoresnearthemean(e.g.,).</li> <li>Scores near the mean (e.g.,70.22)willhaveZscoresveryclosetozero.</li><li>Scoressignificantlyabovethemean(e.g.,) will have Z-scores very close to zero.</li> <li>Scores significantly above the mean (e.g.,89)willhavelargepositiveZscores(e.g.,) will have large positive Z-scores (e.g.,2.475).</li></ul></li><li>ThisvalidationconfirmsthelogicalconsistencyofthecalculatedZscoreswiththerawdata.</li></ul></li></ul><h4id="calculatingpercentagesinanormaldistributionusingexcel">CalculatingPercentagesinaNormalDistributionUsingExcel</h4><ul><li><p><strong>Introduction</strong>:ExcelcanpreciselycalculatethepercentageofanormaldistributionfallingaboveorbelowaspecificZscore,replacingmanualestimationmethods.</p></li><li><p><strong>PercentageBelowaZScore</strong></p><ul><li><strong>ExcelFunction</strong>:Thefunctionusedis<code>NORM.S.DIST(Z,cumulative)</code>.<ul><li><code>Z</code>:TheZscoreforwhichyouwanttofindthepercentagebelow(e.g.,<code>H2</code>fortheZscoreincellH2).</li><li><code>cumulative</code>:Thisparametershouldalwaysbesetto<code>TRUE</code>forcalculatingpercentagesbeloworabove.<code>TRUE</code>referstothecumulativedistributionfunction,while<code>FALSE</code>referstotheprobabilitydensityfunction(whichisnotwhatsneededforthesepercentagecalculations).</li></ul></li><li><strong>Example</strong>:ForaZscoreof).</li></ul></li> <li>This validation confirms the logical consistency of the calculated Z-scores with the raw data.</li></ul></li> </ul> <h4 id="calculatingpercentagesinanormaldistributionusingexcel">Calculating Percentages in a Normal Distribution Using Excel</h4> <ul> <li><p><strong>Introduction</strong>: Excel can precisely calculate the percentage of a normal distribution falling above or below a specific Z-score, replacing manual estimation methods.</p></li> <li><p><strong>Percentage Below a Z-Score</strong></p> <ul> <li><strong>Excel Function</strong>: The function used is <code>NORM.S.DIST(Z, cumulative)</code>.<ul> <li><code>Z</code>: The Z-score for which you want to find the percentage below (e.g., <code>H2</code> for the Z-score in cell H2).</li> <li><code>cumulative</code>: This parameter should always be set to <code>TRUE</code> for calculating percentages below or above. <code>TRUE</code> refers to the cumulative distribution function, while <code>FALSE</code> refers to the probability density function (which is not what's needed for these percentage calculations).</li></ul></li> <li><strong>Example</strong>: For a Z-score of-2.677,theformulawouldbe<code>=NORM.S.DIST(H2,TRUE)</code>.</li><li><strong>Estimation</strong>:ForaZscoreof, the formula would be <code>=NORM.S.DIST(H2, TRUE)</code>.</li> <li><strong>Estimation</strong>: For a Z-score of-2.677,whichisfarinthelowertailofthedistribution,thepercentagebelowitisexpectedtobeverysmall,perhapsaround, which is far in the lower tail of the distribution, the percentage below it is expected to be very small, perhaps around0.5% to1<li><strong>Result</strong>:Thefunctionreturnsavaluelike%.</li> <li><strong>Result</strong>: The function returns a value like0.0037,whichmeans, which means0.37<li><strong>AnotherExample</strong>:ForaZscoreof% (a very small percentage), confirming the estimation.</li> <li><strong>Another Example</strong>: For a Z-score of-1,estimationsuggestsapproximately, estimation suggests approximately16% (derived from the34\%,,14\%,,2\%rulefornormaldistributions).<code>NORM.S.DIST(1,TRUE)</code>indeedreturnsrule for normal distributions). <code>NORM.S.DIST(-1, TRUE)</code> indeed returns0.1586,or, or15.86<li><strong>Note</strong>:This<code>NORM.S.DIST</code>function<em>always</em>returnsthepercentageofthedistribution<em>below</em>thegivenZscore.</li></ul></li><li><p><strong>PercentageAboveaZScore</strong></p><ul><li><strong>Principle</strong>:IfthepercentagebelowaZscoreisknown,thepercentageabovecanbeeasilycalculated,asthesumofthepercentagebelowandthepercentageabovemustequal%.</li> <li><strong>Note</strong>: This <code>NORM.S.DIST</code> function <em>always</em> returns the percentage of the distribution <em>below</em> the given Z-score.</li></ul></li> <li><p><strong>Percentage Above a Z-Score</strong></p> <ul> <li><strong>Principle</strong>: If the percentage below a Z-score is known, the percentage above can be easily calculated, as the sum of the percentage below and the percentage above must equal100\%(or(or1asaproportion).</li><li><strong>ExcelFormula</strong>:<code>=1NORM.S.DIST(Z,TRUE)</code>.</li><li><strong>Example</strong>:TofindthepercentageabovetheZscorein<code>H2</code>,theformulawouldbe<code>=1NORM.S.DIST(H2,TRUE)</code>.</li><li><strong>Validation</strong>:ForaverylowZscore(e.g.,as a proportion).</li> <li><strong>Excel Formula</strong>: <code>=1 - NORM.S.DIST(Z, TRUE)</code>.</li> <li><strong>Example</strong>: To find the percentage above the Z-score in <code>H2</code>, the formula would be <code>=1 - NORM.S.DIST(H2, TRUE)</code>.</li> <li><strong>Validation</strong>: For a very low Z-score (e.g.,-2.677),theresaverysmallpercentagebelowit(), there's a very small percentage below it (0.37%), meaning a very large percentage above it (1 - 0.0037 = 0.9963oror99.63<li>ForZscoresnearthemean(around%). This makes logical sense.</li> <li>For Z-scores near the mean (around0),boththepercentbelowandpercentaboveshouldbecloseto), both the 'percent below' and 'percent above' should be close to50\%.Forexample,aZscoreof. For example, a Z-score of-0.03hashas48.5\%belowandbelow and51.5\%$$ above, fitting expectations for a symmetric normal distribution.
    • Formatting Percentages in Excel

      • After calculating the percentages, select the cells, right-click (or use the 'Number' format section on the Home tab), and choose Percentage format.
      • Adjust the number of decimal places for clarity if desired.