PSYC*1010 Week 2

Frequency Distributions

Types of Statistics

  • Descriptive Statistics

    • Characterize attributes of samples and populations.

  • Inferential Statistics

    • Generalize from a sample to an unknown population.

Importance of Frequency Distributions

  • Goal: Organize data to communicate the number of observations at each category on the measurement scale.

  • Data can be represented in either table or graph forms.

Types of Frequency Distributions

  • Three types:

    1. Simple

    2. Relative

    3. Cumulative

  • Applications may vary based on the type of data:

    • Numerical data

    • Categorical data

Measurements in Frequency Distributions

  • Categorical Measurements:

    • Nominal, Ordinal

  • Quantitative Measurements:

    • Interval, Ratio

Simple Frequency Distribution

  • Example:

    Quiz Score (X)

    Frequency (f)

    10

    1

    9

    2

    8

    3

    7

    4

    6

    5

    5

    5

    4

    4

    3

    3

    2

    2

    1

    1

  • The scores should be arranged in ascending order.

  • Include frequencies even if they are 0.

  • The total frequencies must equal the sample size (f = n = 30).

Learning Check

  • Question: How many people are in this sample?

    • Answer Options: a) 10 b) 15 c) 25 d) 32 e) Not enough info

    • Total frequencies = 10

  • Question: Over 50% of individuals scored above 3. True/False

Relative Frequency Distribution

  • Each score is expressed as a proportion or percentage of the total sample.

  • New Column for Proportion (p):

    • Formula: p=fNp = \frac{f}{N}

    • All proportions should sum to 1.0.

  • New Column for Percent (%):

    • Formula: % = p \times 100

Example Table

    Quiz Score (X)

    Frequency (f)

    Proportion (p)

    Percent (%)


    10

    1

    130=0.03\frac{1}{30} = 0.03

    0.03×100=30.03 \times 100 = 3


    9

    2

    230=0.07\frac{2}{30} = 0.07

    0.07×100=70.07 \times 100 = 7


    8

    3

    330=0.10\frac{3}{30} = 0.10

    0.10×100=100.10 \times 100 = 10


    Cumulative Frequency Distribution

    • Shows total frequencies (or proportions or percentages) at each value and all lower-ranked values.

    • Starting from the bottom, frequencies are added upwards to find cumulative frequencies (cf).

    • Cumulative Percentages (c%):

      • Function: cc% = \left(\frac{cf}{N}\right) \times 100

    Example Table

      Quiz Score (X)

      Frequency (f)

      Proportion (p)

      Percent (%)

      Cumulative Frequency (cf)

      Cumulative Percent (c%)


      10

      1

      1


      9

      2

      3


      Grouped Frequency Distribution

      • When data spans a wide range, grouping into intervals can simplify presentation.

      • Rules:

        • Use a consistent interval width (e.g., 5, 10, 15).

        • The starting point of each class should be a multiple of the interval width.

      Example for Weight Distribution
      • Grouping weights of 194 individuals:

        • Class intervals: 15 lbs wide

        • | Weight (X) | Frequency (f) |
          |----------------|---------------|
          | 255 − 269 | 1 |
          | 240 − 254 | 4 |
          | 225 − 239 | 2 |
          | 210 − 224 | 6 |

      Categorical Frequency Distribution

      • Arranges categories meaningfully and records frequencies.

      • Types include simple frequency, relative frequency, cumulative frequency, and percentile ranks (if ordinal).

      • Example of Primary Languages Spoken at Home:

        • | Language | Frequency (f) | Percent (%) |
          |---------------|----------------|--------------|
          | English | 81 | 45 |
          | French | 34 | 19 |
          | Mandarin | 22 | 12 |

      Visualizing Distributions

      • Choose visualization method based on scale of measurement and data type (discrete/continuous).

      • Common types include:

        • Histograms: X-values on the x-axis, with bars representing frequencies.

        • Frequency Polygons: Data points are connected by lines.

      Common Distribution Shapes
      • Normal Distribution: Bell-shaped curve, one peak (unimodal), symmetrical.

      • Bimodal Distribution: Two distinct peaks.

      • Positively Skewed: Few high scores, most low scores. Common in variables like clinical depression.

      • Negatively Skewed: Few low scores, most high scores, often in variables like life satisfaction.

      Considerations for Data Visualizations

      • Use accurate scales and avoid misleading representations.

      • Key Tips:

        1. Know your audience.

        2. Identify the main message.

        3. Avoid "chartjunk" (unnecessary visual features).

        4. Make sure to label axes clearly and include legends where necessary.

        5. Ensure that color choices are accessible and informative.

      Misleading Visualizations

      • Watch for: Misleading scales on axes, contradictory presentations of information, and geometry misrepresentations.

      Conclusion and Further Reading

      • Strong data visualizations are essential for clarity and effectiveness in communicating research findings.

      • Always critically evaluate visual data presentations for integrity and clarity.

Frequency Distributions
Types of Statistics
  • Descriptive Statistics

    • Characterize attributes of samples and populations.

  • Inferential Statistics

    • Generalize from a sample to an unknown population.

Importance of Frequency Distributions
  • Definition: An organized tabulation showing the number of individuals (ff) in each category on the measurement scale.

  • Goal: Organize raw scores into patterns (high/low, clustered/spread) to simplify communication.

  • Purpose: Allows researchers to see data "at a glance" (e.g., identifying that most students scored 88 or 99 on a quiz despite few perfect scores).

  • Psychology Application: Organizing study scores (e.g., anxiety ratings) to spot trends before conducting inferential statistics.

Types of Frequency Distributions
  • Three main types:

    1. Simple: Raw counts (ff) per score (XX).

    2. Relative: Proportions or percentages (p=fNp = \frac{f}{N}, %=p×100\% = p \times 100).

    3. Cumulative: Running totals (cfcf or c%c\%).

  • Applications vary based on data type:

    • Numerical data: Ordinal, Interval, Ratio.

    • Categorical data: Nominal, Ordinal.

Measurements in Frequency Distributions
  • Categorical Measurements: Nominal, Ordinal

  • Quantitative Measurements: Interval, Ratio

Simple Frequency Distribution
  • Rules for Construction:

    • XX Column: Highest to lowest (though software may use ascending).

    • Include all values: All scores in the range must be listed, even if f=0f = 0.

    • Total Frequencies: Sum of frequencies must equal the sample size (f=N\sum f = N).

  • Calculations from Tables:

    • Sum of Scores (X\sum X): Calculated as (X×f)\sum (X \times f).

    • Example: If X=5,f=1X=5, f=1 and X=4,f=2X=4, f=2, then X=(5×1)+(4×2)=13\sum X = (5 \times 1) + (4 \times 2) = 13.

  • Example (Quiz Scores, N=20N=20):

XX

ff

10

2

9

5

8

7

7

3

6

2

5

0

4

1

Total

N=20N=20

Learning Check
  • Question: How many people are in this sample?

    • Answer: Sum the frequencies (f\sum f).

  • Question: Raw scores: 10,10,9,9,9,9,9,8,810, 10, 9, 9, 9, 9, 9, 8, 8. What is ff for X=9X=9?

    • Answer: 55.

  • Question: Over 50%50\% of individuals scored above 33. True/False

Relative Frequency Distribution
  • Expresses each score as a proportion (pp) or percentage (%\%) of the total sample.

  • Psychology Use: Stating "30%30\% of the sample is clinically depressed."

  • Proportion (pp):

    • Formula: p=fNp = \frac{f}{N}

    • All proportions must sum to 1.01.0.

  • Percentage (\%):

    • Formula: %=p×100\% = p \times 100

Example Table (N=10N=10)

XX

ff

pp

%\%

5

1

0.10

10

4

2

0.20

20

3

3

0.30

30

2

3

0.30

30

1

1

0.10

10

Cumulative Frequency Distribution
  • Cumulative Frequency (cfcf): Shows the number of observations at or below a specific score. Frequencies are added starting from the bottom (f<em>lowf<em>{low} to f</em>highf</em>{high}).

  • Cumulative Percentages (c%c\%): Also known as Percentile Rank.

    • Formula: c%=(cfN)×100c\% = \left(\frac{cf}{N}\right) \times 100

  • Percentiles: To find the 9595th percentile, scan the c%c\% column for the first value 95\geq95.

Grouped Frequency Distribution
  • When to Use: When data spans a wide range (typically more than 2020 rows).

  • General Guidelines:

    • Use approximately 1010 intervals.

    • Choose a simple interval width (2,5,10,152, 5, 10, 15).

    • The bottom score of each interval should be a multiple of the width.

    • Intervals must be equal in width with no gaps or overlaps.

  • Real Limits: For continuous variables, intervals have real limits (e.g., an apparent interval of 909490-94 has real limits of 89.594.589.5-94.5).

  • Trade-off: Grouping results in information loss because exact scores are no longer visible.

Categorical Frequency Distribution
  • Arranges non-numerical categories (Nominal/Ordinal) and records frequencies.

  • Order: Can be arbitrary for nominal data, but should be meaningful for ordinal data (e.g., Gold, Silver, Bronze).

  • Example (Primary Languages):

    • English: f=81f=81 (45%45\%)

    • French: f=34f=34 (19%19\%)

    • Mandarin: f=22f=22 (12%12\%)

Visualizing Distributions
  • Histograms: Used for numerical/continuous data. Bars represent frequency/proportion and should touch (no gaps).

  • Bar Graphs: Used for categorical data. Gaps are placed between bars to show distinct categories.

  • Frequency Polygons: Data points are placed above midpoints and connected by lines; the ends are "anchored" to the x-axis at zero frequency.

Common Distribution Shapes and Psychology Examples

  • Normal Distribution: Bell-shaped, unimodal, and symmetrical.

    • Example: IQ scores.

  • Bimodal Distribution: Two distinct peaks.

    • Example: Heights of a group containing both men and women.

  • Positively Skewed: Tail points to the right (few high scores, many low scores).

    • Example: Clinical depression scores in a general population.

  • Negatively Skewed: Tail points to the left (few low scores, many high scores).

    • Example: Life satisfaction scores in stable environments.

Data Visualization Best Practices
  • 10 Rules for Effectiveness:

    1. Know your audience.

    2. Identify the main message.

    3. Avoid "chartjunk" (unnecessary decorative features).

    4. Optimize the data-ink ratio (focus on the data).

    5. Label axes clearly and include legends.

    6. Ensure color choices are accessible (e.g., colorblind-friendly).

    7. Use error bars where appropriate.

    8. Choose the right graph type (Histogram for distributions, Scatter for patterns, Line for trends).

    9. Message should take priority over beauty.

    10. Critically evaluate for integrity.

Misleading Visualizations
  • Watch for:

    • Misleading Scales: Starting the y-axis at a value other than zero to exaggerate differences.

    • Dual Axes: Can imply correlations that don't exist.

    • Geometry Misrepresentations: Using 3D effects or distorted areas (e.g., in pie charts) that make segments look larger or smaller than they are.