COMM 1503 Types of Data
Introduction to Data and Context
Data: Refers to the information collected for analysis.
Population Data: Data that represents an entire population.
Sample Data: Data drawn from a subset of the population.
Types of Data:
Survey, Census, Parameter: concepts related to data gathering and analysis.
Categorical Data: Data representing categories or distinct groups (e.g., types of drinks).
Quantitative Data: Numerical data that can be measured (e.g., heights, ages).
Objective of the Class:
Discuss the utility and summarization of data provided.
Explore frequency distributions and their importance in analyzing data.
Frequency Distribution
Definition of Frequency Distribution:
A summary that shows all possible values in a dataset and their respective counts.
Helps to identify how often each value appears.
Goals of a Frequency Distribution:
Present every value that occurs in the dataset.
Count how many times each value appears in the data.
Types of Frequency Distribution:
For both numerical and categorical data, the frequency distribution is typically the first step in summarizing data.
Example of Frequency Distribution:
If analyzing drinks sold, the distribution would show:
What types of drinks were sold
How many times each kind was sold.
Calculating Frequency Distributions
Methods for Calculation:
Calculations can be performed by hand or using Excel. Basic counting suffices for manual calculations.
Using Excel for Frequency Distribution:
Function
COUNTIF: Counts the number of cells that meet a specific criterion.
=COUNTIF(A2:B26, criteria)Using dollar signs ($) in formulas to fix certain values when dragging formulas down rows/columns to ensure consistent referencing in calculations.
Total Count and Frequency Analysis:
The total frequency count should match the number of observations in the dataset, e.g., total drinks sold = 50.
Importance of Accuracy:
Verify that the number of counts aligns with the actual data collected to maintain data integrity.
Relative Frequency and Percent Frequency
Relative Frequency:
Represents the proportion of a total that a particular value contributes.
Calculated as:
Percent Frequency:
Similar to relative frequency but expressed as a percentage.
Calculated as:
Summation of Frequencies:
Absolute frequency totals should equal the total observations, while relative frequency should always sum to 1.
Analyzing Quantitative Data
Defining Ranges for Bins:
Rather than focusing on exact quantities (e.g., exact heights), the objective is often to classify data into ranges (bins).
Choosing the Number of Bins:
Decisions on bin count (typically between 5 to 20) are stylistic and depend on the analyst's judgment.
Calculating Bin Width:
Formula:
Example: For binning heights with a range of values, if the largest height is 73 inches and smallest is 58 inches, the width for 5 bins would be:
Understanding Edge Cases in Binning:
Decisions must be made consistently about including or excluding boundary values when establishing bins to ensure non-overlapping categories.
Use square brackets [ ] for inclusivity and round brackets ( ) for exclusivity in bin limits.
Histograms and Their Interpretation
Definition of Histogram:
A visual representation of frequency distribution where horizontal (x-axis) represents various bins while the vertical (y-axis) represents frequency, relative frequency, or percent frequency.
Unlike bar charts used for categorical data, histograms represent continuous data without spaces between bars, indicating data continuity.
Shape of Histograms:
Analysis of shape is crucial; look for symmetry or skewness.
Definitions:
Symmetrical Distribution: Both sides of the histogram mirror each other.
Skewed Distribution: One side is longer than the other:
Right-Skew: Most data is clustered on the left, tail to the right.
Left-Skew: Most data is clustered on the right, tail to the left.
Cumulative Frequency Distribution
Cumulative Distribution Explained:
Represents the accumulation of frequencies, tallying totals up to specific bin limits.
Cumulative frequency is calculated as:
For each bin, add the frequency of that bin to the cumulative total of the previous bins.
Applications of Cumulative Frequency:
Particularly useful for determining how many observations fall below a certain value without concern for the precise distribution beyond that point.
Limitations:
Cumulative distributions are not applicable for categorical data, as they rely on the assumption of order and continuity, which categorical data lacks.
Conclusion
Recap of Frequency Distributions:
Apply concepts of absolute frequency, relative frequency, and percent frequency to analyze data.
Histograms visualize distribution effectively, revealing insights about shape and skewness.
Understand the role of cumulative frequency in summing observations to establish overall trends in data analysis.