Frequency Distribution Graphs and Distribution Shapes
Fundamentals of Frequency Distribution Graphs
- A frequency distribution graph is a visual representation of data that conveys the same essential information as a frequency distribution table: the list of possible scores (X) and the frequency (f) associated with each of those scores.
- All graphs in this format share a common structure using two axes:
- The X axis (horizontal axis): Represents the possible scores or intervals of scores. It tracks movement from left to right.
- The Y axis (vertical axis): Represents the frequency of those scores. It tracks movement up and down.
- On these graphs, the height of a bar or a dot corresponds to the frequency of the score indicated on the horizontal axis.
- It is critical to include all possible scores, even if their frequency is zero. For example, if a quiz score of 6 is possible but no one achieved it, the graph should still show the score of 6 with a height of zero.
Measurement Scales and Variable Types
- The type of graph appropriate for a dataset depends entirely on the variable's scale of measurement. These are categorized into four types: Nominal, Ordinal, Interval, and Ratio (NOIR).
- Nominal and Ordinal Scales:
- These are typically discrete variables, meaning they consist of separate, indivisible categories with no intermediate values.
- Examples include college majors (nominal) where you are either one major or another, and military rank (ordinal) where you jump from one rank to the next upon promotion.
- Discrete variables do not "bleed" into one another; they jump.
- Interval and Ratio Scales:
- these are generally treated as continuous variables, even if the data is reported in whole numbers.
- Examples include temperature in Celsius or Fahrenheit (interval) and height, weight, or distance (ratio).
- Continuous variables have "real limits," which are the boundaries that separate one score from the next on a continuous scale.
Graphs for Interval and Ratio Data: Histograms and Polygons
- If the variable measured is on an interval or ratio scale, the two primary options for graphing are the histogram and the polygon.
- The Histogram:
- In a histogram, a bar is centered above each score. If the data is grouped, the bar is centered above each interval (e.g., 50−59).
- The Touch Rule: The bars in a histogram must touch each other. Adjacent bars share boundaries.
- The practical reason for this is to represent the continuous nature of the variable; the width of the bars extends to the real limits of the scores.
- The Polygon:
- A dot is centered above each score at a height representing the frequency.
- These dots are then connected by straight lines.
- The Closing Rule: A polygon must be a closed shape, never a "floating line." To close the figure, an additional line must be drawn at each end to bring the frequency back to zero.
- To properly close a polygon, the researcher must label the X axis one unit higher than the highest observed score and one unit lower than the lowest observed score, then anchor the line to those points at a frequency of zero.
Graphs for Nominal and Ordinal Data: Bar Graphs
- When categories come from a nominal or ordinal scale, the appropriate format is a bar graph (or bar chart).
- The Gap Rule: In a bar graph, the bars must not touch. Spaces are placed between the bars to emphasize that the categories are discrete and separate.
- Personality type (Type A, B, or C) is an example of a nominal variable requiring a bar graph. There is no numerical distance or order that suggests these types should run into each other.
Principles of Data Visualization and Axis Labeling
- A primary goal of data visualization is to present data so that the reader cannot easily miss details that would change the meaning of the findings.
- Hash Marks (Break Marks): If the X axis starts at a number other than zero (for example, starting at score 30 because no children were shorter than 30 inches), the researcher must use hash marks (// or a jagged line) at the beginning of the axis to alert the reader that values have been skipped.
- Skipping numbers without a hash mark is misleading, as the reader might assume the first labeled point is score 1.
- Grouped Data Labeling Preference: For grouped frequency distributions, while some label the boundaries, it is often simpler to place a hash mark in the center of the bar or dot and label it with the interval exactly as it appears in the table (e.g., 30−31, 32−33).
Relative Frequency and Smooth Curves
- When populations are extremely large, it is often impossible to know the exact frequency for any category.
- Relative Frequency: Instead of exact numbers on the Y axis, the graph uses relative heights to show ratios. For instance, if a lake has bluegill and bass, a researcher may not know the total count but can show that there are approximately twice as many bluegill as bass by doubling the height of that bar.
- Smooth Curves: If the variable is continuous and the exact frequencies are unknown, a smooth curve is used instead of a jagged polygon or histogram.
- The smooth curve indicates that the distribution is an estimation based on relative frequency rather than absolute counts.
- A classic example is the Normal Distribution (Bell Curve), such as that seen in IQ scores. The highest frequency is at the average (100), and the frequencies drop off predictably as you move toward higher or lower scores.
Distribution Shapes and Skewness
- Symmetrical Distributions: These are characterized by a mirror-image relationship between the left and right sides. The normal distribution is the most common example.
- Skewed Distributions: These occur when a distribution is not symmetrical and possesses a "tail" containing outliers (extreme values far from the average).
- Positively Skewed (Right-Skewed):
- In this distribution, the tail of outliers points toward the right (the positive end of the X axis).
- While most scores are concentrated on the left, the few extreme high values pull the tail to the right.
- Example: Income and home prices. Most people earn near an average, but a few millionaires create a long right-pointing tail.
- Example: A very difficult test where most students score poorly, but a few excel.
- Negatively Skewed (Left-Skewed):
- The tail points toward the left (the negative/lower end of the X axis).
- Most scores are concentrated on the right, but a few extremely low values pull the tail to the left.
- Example: An easy test where almost everyone passes, but a few individuals who did not attend class fail.
- Naming Convention: Skewness is always named after the direction of the tail (the outliers), not where the majority of the population is located.
Mathematical Procedures and Calculations
- Determining sample size (n): To find the total number of individuals in a study from a graph, add the frequencies (f) of every category.
- Example: If a personality study shows 10 people for Type A, 5 for Type B, and 20 for Type C, then n=10+5+20=35.
- Calculating Interval Width (i): When using grouped data intervals, the width is calculated as i=(High−Low)+1.
- Example: For an interval of 30−31, the width is 31−30+1=2.