Lesson 6.0 More About Data - Mathematics NCEA Level 2 Study Notes

Introduction to Data Cleaning

  • Definition of Data Cleaning: The process of checking a data set for incorrect or missing data values.
  • Methodology for Identification: Data cleaning can be initiated by examining graphs, specifically visual representations like a box plot.
  • Achievement Standard Context: For the Mathematics NCEA Level 2 Achievement Standard, students are typically provided with an appropriate data set. It is highly unlikely that students will be required to delete or remove any data values during the assessment.

Managing Unusual, Incorrect, or Missing Data

  • Handling 'Odd' Data Points: When a data point appears unusual, the initial step is to "drill down" into the data to identify why it differs from the rest of the sample.
  • Factors for Differentiation: Variations in data points often stem from specific attributes such as:     * Color     * Brand     * Breed     * Species
  • Visual Examples: The lesson cites three specific instances (referred to as Example 1, Example 2, and Example 3) where box plots are used to identify points that appear unique or different.
  • Guidelines for Removing Data (Cautionary Rules):     * A point should only be removed if it is significantly distant from the bulk of the data and is skewing the investigation.     * Removal is permitted if the value is categorized as "impossible" and is accompanied by a logical justification.     * General Rule: Within the scope of this Achievement Standard, the removal of data points is very unlikely to be necessary.

Summary Statistics and Box Plot Fundamentals

  • Summary Statistics Definition: These are calculations derived from a sample that describe its characteristics. Key summary statistics include:     * Mean     * Median     * Quartiles     * Range
  • The Box Plot: A specific type of graph designed to display summary statistics visually.
  • Resources for Further Learning: Students needing additional clarification are encouraged to:     * Watch instructional videos titled "UNDERSTANDING SUMMARY STATISTICS" and "UNDERSTANDING BOX PLOTS."     * Refer to the Glossary provided in the course materials.     * Contact their teacher regarding Level 1 module revision if necessary.

Analysis of Box Plot Principles

Based on the "Statement/True/False" evaluation section, the following principles define the use and interpretation of box plots:

  • The Median Indicator: In a box plot, the "line" located inside the box represents the median, not the mean.
  • Quartile Division: A box plot is structured to be divided into four equal-sized groups (each representing 25%25\% of the data points).
  • Sample Size Sensitivity: The construction and size of a box plot are influenced by the sample size of the data being analyzed.
  • Comparative Utility: Box plots are highly effective tools for comparing different categories within a dataset, such as comparing data for "males" versus "females."
  • Data Distribution Misconceptions: While the median represents the midpoint of the data distribution (half the values lie above and half below), this does not automatically apply to the mean unless the distribution is perfectly symmetrical.