Module 1 – Graphing Data (Biostatistics for Evidence Based Practice) Vocabulary
Bar Chart
- Intro to Graphing: Bar Chart is one of the graphing methods used to describe data (listed under Graphing).
Histogram
- Purpose: shows the distribution of a quantitative variable by grouping data into bins.
- Example from slides: bins labeled as 1-5, 6-10, 11-15, 16-20, 21-25 on the x-axis; y-axis represents frequency or count.
- Visual cues: The slide displays a histogram with a vertical axis showing frequencies (e.g., values like 0, 10, 20, 30, 40, 50) and multiple bars corresponding to the bins.
- Context note: The histogram example is tied to dataset visuals used in the module (the slide shows institution labels and axis values).
Stem-and-Leaf Plot
- Purpose: a data-quick view that preserves the original data values while showing distribution.
- Structure shown: stems and leaves are arranged (e.g., a stem-and-leaf layout is presented with multiple lines of leaves). The stem-and-leaf on the slide demonstrates how data values are split into a stem (leading digits) and a leaf (trailing digits).
- Use: useful for small data sets to quickly assess shape, center, and spread while retaining actual data values.
Frequency Table
- Purpose: summarizes data by exact values and their frequencies.
- Columns shown:
- Final Exam Score (value categories)
- Frequency (how many observations fall into each score category)
- Percent (percent of total observations in each category)
- Valid Percent (percent of valid cases within the total, excluding any missing data)
- Cumulative Percent (running total of Percent or Valid Percent across categories)
- Example data outline: scores captured range in increments (e.g., 35, 40, 45, …, 100) with a total of 100 observations (Total row shows 100 for Frequency and 100.0 for Percent/Valid Percent/Cumulative Percent).
- Total row: shows overall totals, e.g., Frequency = 100, Percent = 100.0, Valid Percent = 100.0, Cumulative Percent = 100.0.
Frequency Distribution
- Purpose: describes how frequently data points occur in a dataset and how the data are distributed overall.
- Shape descriptors presented:
- Leptokurtic (thin): a distribution with a sharp peak (high kurtosis).
- Mesokurtic: a normal, moderate peak (normal-like shape).
- Platykurtic (flat): a flatter, broader peak (lower kurtosis).
- Normal curve overlay: a normal distribution curve is shown to compare the observed distribution against the theoretical normal distribution.
- Skewness concepts:
- Positive skew: tail extends to the right (higher values are less frequent).
- Negative skew: tail extends to the left (lower values are less frequent).
Graphing - Normal Curve, Skewness, and Kurtosis Labels
- Normal Curve: the classic bell-shaped distribution used for comparison.
- Positive Skew vs Negative Skew: descriptors for asymmetry of the distribution.
- Kurtosis terms included: Leptokurtic, Mesokurtic, Platykurtic to describe the peak sharpness and tail heaviness.
Boxplot
- Components shown on the slide:
- Outlier: data points outside the typical range (potentially flagged as outliers).
- Whiskers: lines extending from the quartiles to the smallest and largest values within the 1.5 * IQR range (or similar criterion).
- 25th percentile (Q1): the lower quartile.
- 75th percentile (Q3): the upper quartile.
- Median: middle value of the data set.
- Mean: average value (sometimes shown in the boxplot as a dot or special symbol in some diagrams).
- By-week boxplot (as suggested by day labels): the slide appears to show boxplots by category (e.g., Friday, Monday, Saturday, Sunday, Thursday, Tuesday, Wednesday) to illustrate distribution across categories.
Central Tendency and Dispersion (key themes across graphs)
- Central Tendency: measures that describe a data set by a single value representing the center.
- Dispersion: measures that describe the spread or variability of the data.
- How graphs support interpretation: choice of graph affects understanding of center and spread (e.g., mean vs median, presence of outliers, and tail behavior).
Normality, Skewness, and Kurtosis – Practical implications
- Normal Curve reference helps assess whether data are approximately normally distributed.
- Skewness affects the choice of statistical tests (e.g., parametric tests assume normality; nonparametric tests may be more appropriate for skewed data).
- Kurtosis informs about tail heaviness and peak; affects estimates of sampling distributions and confidence intervals.
Connections to foundational principles and real-world relevance
- Graphing data is foundational for Evidence Based Practice (EBP): visualization guides interpretation and decision-making.
- Understanding distribution shapes informs test selection and data transformation needs in research and clinical settings.
- Recognizing outliers (via boxplots) prompts consideration of data quality, measurement error, or real but rare phenomena.
Ethical, philosophical, and practical implications
- Accurate representation of data through graphs reduces misinterpretation and supports transparent reporting.
- Acknowledge and handle missing data appropriately (Valid Percent vs Total Percent) to avoid biased conclusions.
Quick reference formulas (LaTeX)
- Mean: ar{x} = rac{1}{n}
\sum{i=1}^{n} xi - Median: If n is odd, median is the middle value; if n is even, median is the average of the two middle values: ext{Median} = \begin{cases}\ x{\frac{n+1}{2}} & \text{if } n ext{ is odd} \\ \frac{x{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} & \text{if } n ext{ is even} \end{cases}
- Quartiles and IQR: Q1 = ext{25th percentile},\; Q3 = \text{75th percentile},\; \text{IQR} = Q3 - Q1
- Normal distribution (example): X \sim N(\mu,\sigma^2)
- Mean: ar{x} = rac{1}{n}
Summary takeaways
- Use bar charts for categorical comparisons, histograms for distribution of a continuous variable, stem-and-leaf for quick data inspection and retention of data values, frequency tables for exact counts, frequency distributions to assess shape, and boxplots for a concise view of center, dispersion, and outliers.
- Interpret normality, skewness, and kurtosis to inform analysis choices and data preparation in evidence-based practice.