Qualitative variable
Introduction to iMac Study
In August 1988, Apple introduced the iMac computer, aiming to understand its impact on market share.
Analysis sought to determine if the iMac attracted new customers or primarily previous Macintosh owners.
A sample of 500 iMac customers was interviewed and categorized into three groups:
Previous Macintosh owners
Previous Windows users
New computer purchasers
Graphical Methods for Displaying Data
The section explores various graphical methods to effectively represent the qualitative data collected from interviews.
Qualitative Data
Definition: Data that occupies non-numerical categories without a pre-established order.
Example: The category of former Windows users does not have a natural positional relation to Macintosh owners.
Pie Charts
A pie chart represents different categories as slices of a pie, with the area of each slice proportional to the percentage of responses.
In the iMac study:
Majority of purchasers were Macintosh owners.
12% were former Windows users.
17% were buying a computer for the first time.
Effectiveness: Pie charts are suitable for presenting relative frequencies of a small number of categories.
Limitations:
Not recommended for large numbers of categories.
Can be confusing when comparing different surveys or experiments.
Edward Tufte's Critique:
Tufte criticized the use of multiple pie charts, suggesting that they can lead to poor design decisions.
Caution on Small Samples:
If pie charts are based on small observations, labeling slices with percentages can be misleading.
Example: If only five individuals were surveyed, and three were Windows users, a 60% label could mislead.
Recommendation: Instead of percentages, display actual frequencies (e.g., three).
Bar Charts
Alternative method for representing frequencies of categories.
The bar chart shows:
Frequencies on the y-axis.
Type of computer previously owned on the x-axis.
Comparison Across Surveys:
Bar charts excel at illustrating differences between two distributions, unlike pie charts.
Example: An illustration of users playing card games at Yahoo on two different days:
More players on Wednesday than Sunday.
Consistent number of Pinochle players on both days, but significantly more Hearts players on Wednesday.
Design Tip: Avoid excessive embellishments in graphs that may obscure important information.
Example: 3D bar charts can hinder clarity relative to 2D bar charts.
Example: Using images instead of plain bars can exaggerate differences.
Misleading Graphs
Distortion in Graphs:
Edward Tufte's concept of lie factor arises when the illustrated size effect in a graph exceeds or falls below the actual data.
A lie factor greater than 1.05 or less than 0.95 is considered unacceptable.
Bar Charts with Non-Zero Baselines:
The baseline, representing the minimum value of a category, should normally be set to zero.
Adjusting the baseline (e.g., setting it to 50) can distort perception of differences.
Line Graphs
Guideline: Avoid line graphs when dealing with qualitative variables on the x-axis.
Line graphs visually represent data as bars connected by lines, which can suggest a numerical order that doesn't exist.
Example: The misleading representation of card game data using a line graph implies a natural order among categories.
Conclusion
Both pie charts and bar charts can effectively display qualitative data.
Bar charts are favored for larger category sets, while pie charts work better for fewer categories.
Importance of avoiding misleading graphs to ensure accurate data representation.