Statistics Exam Notes

Introduction to Statistics

  • Overview:

    • Introduction to Statistics

    • Population and Sample

    • Data and Variables

    • Levels of Measurement

    • Introduction to Add-in Data Analysis Toolpak into Excel

    • Reading: Horvath (p 1-15)

  • Statistics: What’s the Point?

    • H. G. Wells (1903) hypothesized: “Statistical thinking would one day be as necessary for good citizenship as the ability to read and write”

    • Consequences of mathematical innumeracy aren't as obvious as illiteracy.

    • Lack of numerical perspective.

    • Misunderstanding of probability.

  • Why Study Statistics?

    • Understanding scientific evidence requires knowledge of statistical procedures.

    • Statistics helps in evaluating arguments responsibly against those trying to influence behavior with statistical arguments.

  • Science News Headline: Diet rich in animal protein is associated with a greater risk of early death

    • Journal Reference: Virtanen, et al. Dietary proteins and protein sources and risk of death: the Kuopio Ischaemic Heart Disease Risk Factor Study. The American Journal of Clinical Nutrition, 2019. DOI: 10.1093/ajcn/nqz025

    • Results:

      • Average follow-up of 22.3 years showed 1225 deaths due to disease.

      • Higher total and animal protein intakes had borderline statistically significant associations with increased mortality risk.

        • Multivariable-adjusted HR (95% CI) in the highest compared with the lowest quartile:

          • Total protein intake: 1.17 (0.99, 1.39; P across quartiles = 0.07).

          • Animal protein intake: 1.13 (0.95, 1.35; P = 0.04).

      • Higher animal-to-plant protein ratio (extreme-quartile HR = 1.23; 95% CI: 1.02, 1.49; P-trend = 0.01) and higher meat intake (extreme-quartile HR = 1.23; 95% CI: 1.04, 1.47; P = 0.01) were associated with increased mortality.

      • Association of total protein with mortality was more evident among those with a history of type 2 diabetes, cardiovascular disease, or cancer.

      • Intakes of fish, eggs, dairy, or plant protein sources were not associated with mortality.

  • What is Statistics?

    • The science of the collection, organization, analysis, and interpretation of data.

    • A set of procedures and principles for collecting and organizing data and analyzing information to help people make decisions when faced with uncertainty.

  • Big Picture Goal:

    • To take data from a sample and make conclusions about the population.

  • Purpose of Data:

    • To get necessary information and knowledge.

    • Data + Interpretation = Information + Analysis, Discussion, Inferences = Knowledge

    • "Data" is not "information" unless it is "interpreted"

  • Case Study: How Nimble Are Your Fingers?

    • Manual dexterity test: How many small pieces can you assemble in one minute?

    • Data: 200 students from a large statistics class.

    • Question: Which gender (male = 1) has better manual dexterity?

    • Summarizing the data involves measures of central tendency, variance measures, minimum and maximum scores, and the number of participants.

    • Simple summaries of data tell an interesting story and are easier to digest than large quantities of information.

    • Data are used to make a judgment or decision about a situation. This is what statistics is all about.

  • The Discovery of Knowledge:

    • Asking the right question(s).

    • Collecting useful data, including deciding how much is needed.

    • Summarizing and analyzing data, with the goal of answering the question(s).

    • Making decisions and generalizations based on the observed data.

    • Turning the data and subsequent decisions into new knowledge.

Two Types of Statistics

  • Descriptive Statistics:

    • Organize, describe, and summarize a small dataset.

    • Results obtained represent the entire dataset.

    • Can constitute the first stage of analysis.

    • Often used when researchers begin a new area of investigation.

    • Example: Deaths by Social Class (N=1316).

      • 1st class: SES (67% Men, 3% Women, 38% Total)

      • 2nd class: SES (92% Men, 14% Women, 59% Total)

      • 3rd class: SES (84% Men, 54% Women, 66% Children, 62% Total)

      • Total: SES (82% Men, 26% Women, 48% Children, 62% Total)

  • Inferential Statistics:

    • Conclusions about populations are derived from small (random) samples.

    • Uses a smaller dataset to make estimates and draw conclusions about the greater population (that the sample is drawn from).

    • Can be used to determine cause and effect relationships, test hypotheses, and make predictions.

Population and Sample

  • Population:

    • All of the objects that researchers want to describe or make inferences about.

    • Characteristic of population = “parameter”.

  • Sample:

    • Sub-group of population that researcher believes represents the population.

    • A group of specific size (n=) is selected and measured.

    • Characteristic of sample = “statistic”.

    • Needs to be a good estimate of the populations parameters. Cannot measure population (too big).

    • Take a sample - a smaller group - easier to measure (e.g. 72 y10 = sample =all ages).

  • Choosing a Sample from Population:

    • DO NOT DO:

      • Overrepresentation of population (e.g., 20% homeless/40% healthy/40% jobless X - sample).

      • Not a good representation of population = should include all percentages!

      • Experimental designs: manipulation.

      • Non-experimental designs: observation.

    • The best samples for experiments are those which are selected randomly!

      • Random sample means that the sample is unbiased.

      • To achieve this - every element of the population must be equally likely to be selected to the sample group.

      • Selection of one element does not affect the possibility of other elements being selected.

  • Structure of Data

    • Observations (= individuals or cases).

    • Variables = observations’ attributes.

    • Data refers to any recorded observation, and are usually numeric.

    • Experimental designs = manipulation = IV to DV.

Variables

  • Characteristics of a person, object, or phenomenon that is amenable to change and is measurable.

  • Any observable/measurable property of organisms, objects, or events.

  • Types of Variables:

    • Quantitative (Numerical).

    • Qualitative (Categorical).

  • Quantitative Variables:

    • Numerical data that you can add, subtract, multiply, and divide.

    • Examples: Age (years), Blood pressure (mm of Hg), BMI, Pulse per minutes, Exercise in hours per week, Coffee drinking in ounces per day.

  • Quantitative Variables: Continuous vs. Discrete

    • Continuous: can theoretically take on any value within a given range (e.g., height=188.99955… cm).

    • Discrete: can only take on certain values (e.g., no. of children in a family, No. of cities).

    • Continuous examples include a continuous spectrum variable of rainbow.

    • Discrete examples is a first ,economy , business class.

Qualitative Variables

  • Binary: Two categories.

    • Examples: Dead/alive, Treatment/placebo, Disease/no disease, Exposed/Unexposed, Heads/Tails, Did you have breakfast in the morning? (Yes/No).

  • More than Two categories

    • Example: Hair color – Blonde, Red-haired, Brown, and so forth.

Classification of Data (Levels of Measurement)

  • Four levels of measurement:

    • Nominal

    • Ordinal

    • Interval

    • Ratio

  • More information is conveyed as one moves from A to D.
    Scales of measurement can be either Qualitative OR Quantitative.

  • Nominal:

    • Data placed in categories (no ordering).

    • Cannot be quantified.

    • Mutually exclusive (e.g. TYPES).

    • Examples: blood type, type of car owned, gender, colour of paint.

    • Comparative (NAMES NUMBERS among different categories).

      • KIN (400 students), PSYC (200 students), BIOL (10 students).

        • e.g. most students in which major ? -> no "average"

      • Blood type can't be combined, just compare categories.

  • Ordinal:

    • Data is ranked.

    • Examples: “Idol” contest, preference (first, second, third), mineral hardness, cancer stages, University ranking, Letter-grades.

    • Used to organize data in order then nominal to categorized compare.

    • Small -> big high-low close -far Poor rich tall-short heavy light.

  • Interval:

    • Equal units of measurement assigned to the attribute.

    • Zero point is arbitrary!

    • Therefore not proportional (or multiplicable).

    • e.g. temperature (F, C): temperature can be below 0 degree Celsius (-10 or -20).

    • Zero value does not mean "zero" -> it has a meaning.

    • addition and subtraction only

  • Ratio:

    • Same as interval but zero is absolute or true.

    • Zero indicates an absence of the variable.

    • Therefore, direct comparison can be made.

    • Examples: Age, distance, weight, time, money etc.

    • Zero value means "zero"- nothing.

    • Numerical & means nothing = absent division / multiplication/addition/subtraction

Dependent and Independent Variables

  • Dependent Variable (DV)

    • The variable of primary interest (i.e. it is measured).

    • A variable whose changes we wish to study (a response variable).

    • The variable designed to measure the effect of the variation of the independent variable (outcomes).

  • Independent Variable (IV)

    • A variable we believe affects the measurements obtained on the dependent variable; i.e., it is manipulated.

    • A variable whose effects on the dependent variable we wish to study.

    • The variable that the researcher changes within a defined range, to study the effect on the dependent variable (predictors).

  • Variables Example: Vitamin C study

    • Independent variable of daily vitamin C intake can determine the dependent variable of life span.

    • Scientists will manipulate the vitamin C intake in a group of 100 people: 50 people will be given a daily high dose of vitamin C and 50 people will be given a placebo pill over a period of 25 years.

    • The goal is to see if the independent variable of high vitamin C dosage affects the people's life span.

Experimental Control

  • “Ideal” Scenario:

    • To imply causation, the experimenter eliminates the influence of all variables that could affect the DV except the one(s) directly manipulated.

    • All conditions are kept the same for all participants except the effect of the IV.

  • “Reality” Scenario:

    • Impossible to control all variables that could affect the DV.

    • Researchers control the variables they can.

    • Other influences that are not controlled are assumed to be randomized (i.e., we assume the effects are “washed out” if they are “spread out” over the groups).

  • To assist in describing data.

  • In making inferences or generalizations from experimental data (sample) to larger groups (population).

  • In studying causal relationships.

Introduction to Excel

  • Microsoft Excel is a useful spreadsheet software.

  • Use it to enter all sorts of data and perform financial, mathematical or statistical calculations.

Open an Existing Excel Workbook

  • On the File tab, click Open.

Create a New Excel Workbook

  • On the File tab, click New.

  • Click Blank workbook.

  • Excel worksheet.

Analysis ToolPak

  • An Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis.

Analysis ToolPak add-in

  • On the File tab, click Options.

  • Under Add-ins, select Analysis ToolPak and click on the Go button.

  • Check Analysis ToolPak and click on OK.

  • Analysis group click on Data Analysis (Data tab).

  • Dialog box appears

    • Select Histogram and click OK to create a Histogram in Excel

Organizing and Displaying Data

  • How to make sense of our data?

    • Good research is based on collecting large amounts of data, which needs to be simplified.

    • Frequency distribution – lists all possible data values or type, and the frequency of occurrence of each one.

    • Meant to organize and describe the data in table form.

    • Use frequency table to construct a frequency histogram (graph).

    • Reveal the pattern of the scores/observations.

  • Types of Frequency Distributions:

    • Ungrouped:

      • Frequency of all the possible data values or items in your dataset.

      • Can be nominal/ordinal categories OR quantitative but small number of single values.

    • Grouped (class intervals):

      • Applies when all “possible data values” would be too many, so data are arranged and separated into groups called class intervals.

      • Each class intervals includes a range of data.

  • Types of Frequency Distributions:

    • Ungrouped: Categorical (Blood type, Majors, Teams) Quantitative: Number of kids in a household, number of town/cities you have lived in, etc

    • Grouped (class intervals): Annual salary, reaction times for any of motor tasks, weight, commuting time to York Continuous values (need a range) but can be discrete (e.g. age)

  • Example:

    • Ungrouped Frequency Distribution:

      • Chin-up scores: 7, 15, 14, 9, 8, 13, 12, 15, 8, 12, 9, 9, 10, 13, 11, 10, 12 (N=17).

      • X = 15 Tally marks = II.

      • X = 14 Tally marks = I.

      • X = 13 Tally marks = II.

      • X = 12 Tally marks = III.

      • X = 11 Tally marks = I.

      • X = 10 Tally marks = II.

      • X = 9 Tally marks = III.

      • X = 8 Tally marks = II.

      • X = 7 Tally marks = I.

  • Example:

    • Ungrouped Frequency Distribution w Frequency and Cumulative f.

      • X Frequency Cumulative f:

        • 15 2 17

        • 14 1 15

        • 13 2 14

        • 12 3 12

        • 11 1 9

        • 10 2 8

        • 9 3 6

        • 8 2 3

        • 7 1 1

  • Example:

    • Grouped Frequency Distribution:

      • Chin-up scores (same scores as previous ex.):

        • CLASS INTERVAL FREQUENCY (f):

          • 14-15 : 3.

          • 12-13 : 5.

          • 10-11 : 3.

          • 8-9 : 5.

          • 6-7 : 1.

Steps in Constructing a Frequency Distribution

  • Step 1: Count the number of scores (N = 50)

  • Step 2: Identify highest and lowest score

    • MAX = 368 and MIN = 252. Range is 116.

  • Step 3: Identify smallest unit of measurement

    • Smallest unit = 1.

  • Step 4: Decide on appropriate number of class intervals (interval of 7; 7 rows in frequency distribution).

  • Step 5: Decide on the score range of each class interval (i):

    • i=highest scorelowest scorenumber of intervalsi = \frac{highest \ score - lowest \ score}{number \ of \ intervals}

    • i=1167=16.57i = \frac{116}{7} = 16.57

    • i=1168=14.5i = \frac{116}{8} = 14.5

  • Step 6: Round this class interval to make this range PRETTY

    • Choose 20, 15, 12 or 10 instead of 16.57 or 14.5.

  • Step 7: List class intervals of scores in order

    • Try a class interval of 15 (pretty range).

    • Need a starting value for first class interval that is also pretty.

    • Min is 252. Thus, round down to 250!

    • Class intervals (of 15): 355-369, 340-354, 325-339, 310-324, 295-309, 280-294, 265-279, 250-264.

    • Begin with the smallest values for the smallest class interval bin. Then add “i” (i.e., 15) to the next bin up until almost max.

    • Make sure that intervals have:

      • Same width (range of numbers)

      • No overlap across intervals

      • no gaps
        *Need more intervals, and thus more rows

Frequency Distributions

  • Ungrouped distributions Use UNGROUPED.

    • When data are items rather than numbers, i.e., nominal or ordinal (qualitative) values.

    • When can use all possible data values without being too many (< 15) e.g. small number of possible discrete scores (e.g. how many courses this class is enrolled in this term).

    • Just list items and values and start tallying when the the number of rows needed is clear.

  • Grouped distributions Used GROUPED data.

    • When data values are continuous (e.g. weight, time, blood pressure) or too many possible data values (e.g., age, or salary).

    • Calculate step 5, a range of values known as the class intervals (i) or Bins

    • A good to start is to first estimate what would be a good number of bins (step 4), but may need to redo steps 4 and 5 to get a “PRETTY” class interval or bin, e.g. 2, 5, 10, 12, 15, 20 etc (or 0.01, 0.2, 0.5 etc)
      *Add this i to the start of each bin, starting with the smallest score or value in the dataset. Cover all the data with following: (1)same width/range (2) no overlap across bins (3) no gaps (4).

Example for YOU TO DO: Construct a frequency distribution.

  • Data: RTs for participants .31, .27, .28, .29, .30, .25, .26, .27, .31, .34, .27, .28, .28, .29, .32

  • Steps

    • 1. N = 15 (# of scores)

    • 2. 0.34 – 0.25 = 0.09 (highest – lowest)

    • 3. 0.01 (smallest unit of measurement)

    • 4. 5 (number of categories) ß may change this afterwards

    • 5. i = 0.09/5 = 0.018 (step 2/step 4)

    • 6. 0.02 (round to “pretty number”).

  • Class -Class Interval :0.34-0.35,0.32-0.33,0.30-0.31, 0.28-0.2,0.26-0.27, 0.24-0.25,

    • Count I I II III IIIII IIII I
      -Frequency 1 1 3 5 4 1

    • Cumulative Freq 1 3 13 10 5 1

  • NOTE: Sometimes you will want to eliminate extreme scores from your data doing steps 2-6 to determine class intervals. But after determine the class intervals, need to add this extreme score to the tally. All scores must be included in your distribution.

Graphs:

  • A pictorial representation of a frequency distribution or other data.

  • Helpful in understanding concepts, e.g. frequencies, and other summary data.

  • Bar graphs for Grouped data.*

  • Histograms for Ungrouped data.

  • Graphs for Number of Students Enrolled showing values related to faculties and school

  • Graphs for Total yards offence in a session (Simple and Complex).

Histogram:

  • A histogram uses vertical bars to depict frequencies of an interval/ratio variable with no spaces between the bars.

  • Ungrouped FD is depicted by a bar graph while Grouped FD is depicted by a histogram.

  • Histogram Examples: Age groups and frequencies

  • Frequency Histogram vs Polygon (line graph)

Excel Exercise Textbook Horvath page (53-75)

  • Constructing Frequency Histogram using Data Analysis Tool Example: ages of students.

  • Enter the data in the excel sheet.

  • Select Tools/Data Analysis on the Standard Toolbar.

  • Select Histogram from the Analysis Tools window.

  • Input data to construct histogram.

  • Format the histogram to add title and labels.

Measure of Central Tendency

  • Center of data set.

  • A single summary number which indicates where many of the scores lie.

Measure of Central Tendency:

  • Mean: Arithmetic average. Mean=ΣxnMean = \frac{\Sigma x}{n}. For example: 1.42, 1.97, 1.42, 1.50, 1.67 = 7.98 => Mean=7.985=1.60Mean = \frac{7.98}{5} = 1.60.

  • Median: Middle value when data is ordered. To calculate the location of the medium: L=n+12L = \frac{n+1}{2}

  • Mode: The value that occurs most often (most common value in).

Which Measure to Use?

  • Nominal – Mode (Can’t calculate mean or median): e.g. Law = 64; Kine = 59; Eng. = 37.

  • Ordinal – Median e.g. 1st, 2nd, 3rd, 4th, 5th.

  • Interval or Ratio – Mean and/or Median

    • Use median instead of mean if highly skewed distribution or if have outliers (median not as affected).
      *Excel: Calculating central tendency measures with these equations: Mean = average(dataset) Median = median(dataset) Mode = mode(data_set).
      *Excel output must account for:
      Too many decimal places. For labs and most places, only 1-2 more decimal places than the data provided for non-integer values But for non-integer numerical answers on midterms, just to be safe, use all decimal places (paste the entire cell) OR < 10, use 2 <1, use 4 <0.01, use 5.

Graphs for Summary Data

  • Distribution of data, including central tendencies like mean and medians, can also be graphically depicted

  • Bar Graphs & Histograms
    Bar graphs: x-axis represents different groups or categories (nominal); and have space between bars histograms: x-axis plots the independent variable that are interval/ratio
    Graphs are created to compare relationships. Other graphs include dot plot and line plot. Key parts of the distribution come back to:

  • Peaks Identify the peaks.
    Represent the most common values/bulk of data: the mode.

  • Spread How much the data vary? Related to our next topic.

Kurtosis

  • The relative peaked-ness or flatness of the distribution.

  • It reflects whether the scores are more or less evenly distributed throughout the measurement range

  • Leptokurtic The scores are bunched together with steeply sloping sides.

  • Platykurtic The scores are more evenly spread out: Greater proportion of the scores fall toward the ends, or tails

Symmetry

  • Normal distribution. A distribution is termed Symmetrical when the data frequencies decrease at equal rates above and below a central point *Skewed - bunching of the observations at one or the other end of the measurement range -

    • Positively Skewed: observations are bunched at the lower score values

    • Negatively Skewedobservations are bunched at the higher score values.Distributions can also be Bimodal in nature.

  • The Mean is more affected than the Median if these instances occur
    Distributions with outliers should cause concern

Normal Distribution

  • e.g. height, birth weight, errors in measurements or distance from bull’s eye, blood pressure, RT, MT, marks on a test (if not too easy or hard), IQ, GPA, shoe size, hours slept * For large N, e.g., N > 100

  • Many variables* closely follow a normal distribution
    Normal Distribution
    Positively Skewed: income, scores on a difficult test, numbers of kids or cars or broken bones etc, points scored in a game, variables with a lower limit like weight. For large N.
    Negatively Skewed: age at death/lifespan, scores on an easy test, income as a function of age, variables with an upper limit (100%) For large N.
    *Bimodal means two peaks/mode but peaks/mode don’t need to be equal. *
    Graphs and distributions can take on all sorts of shapes.

Measures of Variability

  • Measure of variability is a single number which describes the spread in a set of data.

  • Example : {7, 12, 10, 8, 13} and {0, 5, 10, 15, 20} (Mean = #+ for both sets). However, the range is more spread out in the second set.

  • Most common measures:

    • Range

    • Standard Deviation (SD)

Range:

  • Total spread in data Range=highestscorelowestscoreRange = highest score – lowest score

    • e.g. 4, 5, 7, 9 (Range = 9 – 4 = 5).

Interpretation – score range over which 100% of scores fall
  • Advantage–very quick.

  • can be used for all levels of measurement.

  • Disadvantage– influenced by single extreme scores

With Examples Highlighting How Ranges Can Be Similar, Even if the Mean isn't.

Average Deviation

  • How much does each score deviate (vary) from the mean?

    • e.g. 14, 12, 9, 17, 8. N=5 and Mean = 605=12\frac{ 60}{ 5}= 12
      *How each Score is Accounted From The Mean
      *NOTE: Add up the deviations = 0!
      *Use absolute values (i.e. ignore negative sign). Now divide by N e.g. 145=2.8\frac{ 14}{ 5}= 2.8
      This is not used because is There another method to remove the negative sign?

Variance

  • Sum of Squares:

    • The sum of the squared deviations from the mean, Σ(Xμ)2\,\Sigma(X - \mu)^2

    • Always a positive value

    • Variance = divide Sum of Squares by n (statistically known as the variance)
      *Formula: σ2=Σ(Xμ)2n\sigma^2 = \frac{\Sigma (X - \mu)^2}{n}
      Suppose these data represent age to the nearest year of eight persons. How would it be accounted for?
      The Spread Depends on the Deviation!

Standard Deviation

  • Measure of variability for scores about the mean.

  • Measure of deviations of all the scores from the mean, expressed as a single number.
    *Sample statistic, SD or s: SD=Σ(XX)2N1SD = \sqrt{\frac{\Sigma(X - \overline{X})^2}{N-1}}

  • SD-method of specifying % scores falling within certain score limits around the mean

  • Quick but crude estimate: Range4\frac{Range}{4}, if there are no extreme scores

Formulas

  • Calculate with the Following Information:

    • Dataset A  0.5,0.4,0.4,0.6,0.5,0.5,0.3,0.7,0.6{\ 0.5, 0.4, 0.4, 0.6, 0.5, 0.5, 0.3, 0.7, 0.6}

    • \overline{X} = 0.50 \ ΣX2 = 2.37

    • SD=ΣXNX2N1 SD=2.3790.50291=SD = \sqrt{\frac{\Sigma X - N\overline{X}^2}{N-1}} \ SD = \sqrt{\frac{2.37 - 9 \cdot 0.50^2}{9 - 1}} = …

  • Dataset B  0.3,0.4,0.7,0.5,0.9,0.8,0.2,0.5,0.6{\ 0.3, 0.4, 0.7, 0.5, 0.9, 0.8, 0.2, 0.5, 0.6 }

  • X=0.544 ΣX2=3.09\overline{X} = 0.544 \ ΣX2 = 3.09

    • SD=ΣXNX2N1 SD=3.0990.544291=SD = \sqrt{\frac{\Sigma X - N\overline{X}^2}{N-1}} \ SD = \sqrt{\frac{3.09 - 9 \cdot 0.544^2 }{9 - 1}} = …

  • What can you say about the mean velocity and the variability (SD) in the 2 groups?

Dataset AX:=0.50±0.1225{\overline{X} : = 0.50 ± 0.1225}
  • Dataset BX:=0.5444±0.2299{\overline{X} : = 0.5444 ± 0.2299}

Interpretation of Mean and SD

Quick Reference for Means, Medians,and Distributions
*With extreme scores the mean is not a good measure of central tendency.
*Rule of Thumb: if mean and median differ by 1 SD or more then use median
Measure of Variability Across Various Methods

Nominal
Mode
Range
Ordinal
Median
Range
IntervalMean(* 𝑎𝑙𝑡ℎ𝑜𝑢𝑔ℎ 𝑚𝑒𝑑𝑖𝑎𝑛 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟 𝑖𝑓 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑘𝑒𝑤𝑒𝑑)
SD
Ratio Mean (* 𝑎𝑙𝑡ℎ𝑜𝑢𝑔ℎ 𝑚𝑒𝑑𝑖𝑎𝑛 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟 𝑖𝑓 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑘𝑒𝑤𝐸𝑑) SD.

Percentiles and Z-Scores

  • Inter-Quartile Range (IQR):

    • IQR is a measure of variability, based on dividing a dataset into quartiles.

    • Quartiles divide a rank-ordered dataset into four equal parts.

    • Values that divide each part are called the first, second, and third quartiles and denoted by Q1, Q2, and Q3.
      Percentiles Raw scores: Percentile Raw score!

Relative Scores: Percentiles and Z-Scores

  • A method of describing an individual’s standing in relation to a group, common use with norms: height/weight tables, age, fitness level, exams.

  • Achieved by translating an individual’s raw score into either percentile or z-score (transformation)

  • Raw scores refer to original measure e.g. height, % on exam > transform to letter grade.

  • Percentiles Percentile score = the PERCENTAGE of people in the group who have the same raw score or a lower raw score, than the one in question.
    *If you have not calculated, there is a need to order the numbers and solve for where the location is
    Equation for percentile ranking {\frac{ordinal\ rank\ Of\ a\ given\ value}{ total\ # \ of\ values } *100}

Formula for Percentiles from Grouped Data -:

  • xLLifw+Σfb/N100\frac{x-LL}{i} * fw + Σfb / N * 100
    Where:
    x = the score you are converting to percentile
    LL = lower limit of the class interval that contains the score
    i = the size of each class interval
    fw = the frequency of scores in the interval that contains the score
    ∑fb = sum of the scores below the interval
    N = number of scores in the data set

Finding the Raw Score of a Given Percentile -

  • Also possible to calculate in reverse: calculate what raw score a certain percentile represents
    Formula:  LL+PN[Σfb/fwi\ LL + P* N -[Σfb/ fwi
    All symbols as before with the addition of