Notes on Descriptive Statistics, Graphing, and Environmental Case Study
Descriptive statistics: central tendency, dispersion, outliers, and graphing
- Context: Comparing two teaching methods using exam data where each group has about 200 data points; instead of listing individual scores, provide descriptive statistics to summarize the datasets.
- Key goal: Describe the data sets (not to list each student score). Use descriptive statistics to answer which teaching method works better for exam 1.
Central tendency: mean vs median
Central tendency is what the data tend to center around; the typical score.
Mean (average):
- ext{mean} = ar{x} = rac{1}{n}
\sum{i=1}^n xi
- Compute by summing all scores and dividing by the number of data points (e.g., for 200 data points,
ar{x} = (sum of the 200 scores) / 200).
Outliers determine which measure to use:
- If there are data points that are significantly far away from the next closest point (outliers), you should use the median instead of the mean.
- Example: most scores cluster between 60 and 101; an 8 or 12 out of 100 would be far from the rest and considered an outlier.
- Rule stated: if outliers are present, use the median; if outliers are present in one group, use the median for that group; if outliers are present in both groups, use the median for both; if both groups are clean, use the median.
Median:
- Calculated by ranking scores from lowest to highest and choosing the middle value.
- Median is robust to outliers and gives a better center when extreme values exist.
Dispersion: how spread out the data are
- Measures of dispersion describe how far data are from the center.
- Range (max − min): mentioned but dismissed as not very informative for comparing groups.
- Standard deviation (SD): core measure of spread around the mean.
- Intuition: for each data point, compute its distance from the mean; square these distances, sum, and take the square root.
- For a sample (n data points):
- s = \sqrt{\frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2}
- For a population (if you treat the data as the entire population):
- \sigma = \sqrt{\frac{1}{n} \sum{i=1}^{n} (xi - \mu)^2}
- Practical interpretation: about 68% of the data lie within ar{x} \pm s (assuming roughly symmetric data);
about 95% lie within \bar{x} \pm 2s; about 99% within \bar{x} \pm 3s (empirical rule for normal-like data).
- Reporting: after the mean, it is common to note the SD in parentheses, e.g., "mean ≈ 80, SD ≈ 10".
- Example interpretation: if mean = 80 and SD = 10, about 68% of students scored between 70 and 90.
- Practical takeaway: describe data with central tendency (mean or median) plus dispersion (SD); this combination characterizes the typical score and its variability.
Graphical data presentation: moving from numbers to pictures
- The lab emphasizes graphing data clearly and accurately; many students struggle with graphing.
- Variables and graph types:
- Independent variable (x-axis): the variable you control or categorize (e.g., treatment type, time, pH, etc.).
- Dependent variable (y-axis): the variable you measure (e.g., score, growth rate).
- Discrete groups (categorical data): you have separate groups (e.g., ponds irrigated with wastewater vs. ponds not irrigated).
- Use a bar chart where each bar represents the central tendency (mean or median) for that group.
- Do not plot 100 separate bars for 100 students; summarize with a single bar per group.
- Continuous data: data that flow or vary continuously (e.g., time, pH, velocity).
- Use a line graph if you have measurements that progress in order (e.g., time series, a measured variable across a gradient); plot individual points and connect them to show the trend.
- Scatter plots: two continuous variables plotted against each other to look for relationships.
- Each point represents a pair (x, y).
- Interpret slope:
- Positive correlation: upward-sloping pattern.
- Negative correlation: downward-sloping pattern.
- No apparent correlation: scattered without trend.
- Classic example: ice cream sales vs outdoor temperature – warmer temperatures tend to increase ice cream sales.
- Common mistakes to avoid:
- Starting axes at zero is not always appropriate; instead, scale should cover the data range, with a little margin below the smallest point and above the largest point (e.g., if min ≈ 6.34, start slightly below, say 6.28; if max ≈ 6.46, go slightly above).
- Do not over- or under-scale; improper scaling can create misleading trends.
- For grouped data, do not show many small bars; show a single central tendency per group and use error bars to display variability.
- How to display variability on bar charts:
- Add error bars representing standard deviation (SD) to show the spread around the central tendency.
- Example: two groups with means and SDs; bars of height equal to the means, with SD bars indicating variability.
- Visual storytelling tips:
- A bar chart with identical central tendencies but different SDs communicates different stability/variability between groups.
- A line or scatter plot reveals trends or relationships not apparent in a bar chart.
The lab and assessment structure
- Four recommended problems: numbers 2, 6, 9, 12 (a cross-section of problems).
- Group work: collaborate in study groups; labs and problem sets reinforce concepts.
- The lab handout and problem sets: practice problems are available in the packet; there is a page reference for sample statistics problems (page 16 in the packet).
- Quizzes and exams:
- Quizzes are in person, like exams.
- The quiz day after the test covers lab topics (lab 1, 2, and 3).
- What quiz questions focus on:
- Background information: when to use mean vs median.
- Procedures: describing what you did in a lab or activity (e.g., a home energy audit, water calculators).
- Results: what results you obtained.
- Homework access: textbook link in the B2L (course platform); look for Unit One Active.
- Practical advice: bring the PDF packet or have it digitally; use the packet as a daily reference and supplement your notes.
Scientific method model and the classroom story
- The traditional flowchart of the scientific method:
- Observations -> generalizations -> explanatory model -> hypotheses -> experimental tests -> iteration if hypotheses fail or hold.
- This model is especially useful for complex ideas with multiple hypotheses.
- A second model is introduced via a real-world story about wastewater and forest ponds to illustrate how science is done in practice.
- The example uses a hypothetical or partly-real scenario to show how models guide hypothesis generation and testing, and how adjustments are made when observations contradict expectations.
Case study: wastewater irrigation, ponds, and eutrophication (Penn State example)
- Background: wastewater goes to a treatment plant that removes organics and sterilizes water; treated wastewater (TWV) is typically discharged back, but Penn State uses groundwater and spray irrigation on land near forests.
- Nutrient problem: treated wastewater contains nutrients like nitrogen (N) and phosphorus (P); phosphorus and nitrogen are plant nutrients and fertilizers.
- Irrigation effect: an artificial rain of about 1 inch per week is added to a forested area; this, combined with nutrients, alters habitat conditions.
- Five observed changes in the irrigated forest ponds (summarized from the talk):
1) Extra water creates ponds and depressions, especially in spring.
2) In untreated, normal forests, weeds are limited by light and nutrients; irrigated sites receive more light and nutrients due to canopy disturbance (ice damage), enabling dense weed growth and unusual understory patterns.
3) Leaves fall (leaf litter layer) creating a rich microhabitat underneath; in irrigated sites, the leaf litter layer persists less because wet conditions promote decay, so the classic leaf-litter ecosystem may not form as in non-irrigated forests.
4) The ponds show increased plant life on the water surface, notably duckweed forming a continuous mat across ponds (tiny floating plants that absorb nutrients).
5) The presence of duckweed and nutrient-rich water creates conditions different from typical forest ponds (a shift toward eutrophication-like conditions). - Duckweed and pond ecology:
- Duckweed is a tiny floating plant that spreads across the pond surface, forming a mat; it has tendrils that absorb nutrients.
- Duckweed mats indicate high nutrient input; not typical for forest ponds.
- Eutrophication definition and mechanism:
- Eutrophication is the process by which excessive nutrients (mainly nitrates and phosphates) enter a water body, stimulating algal blooms.
- Algal blooms reduce light penetration, causing photosynthetic plants to die on the bottom.
- Bacteria decompose dead plants and algae, releasing more nutrients and consuming oxygen, which reduces dissolved oxygen levels.
- Low oxygen can lead to anoxic conditions (no oxygen), causing death of fish and other aquatic life.
- Practical implication: excessive fertilizer use in agriculture can trigger eutrophication in nearby water bodies.
- Important biology reminders embedded in the talk:
- Oxygen basics: air has oxygen; aquatic organisms use dissolved oxygen via gills; land animals use lungs; plants in water rely on dissolved oxygen for respiration.
- When bacteria decompose organic matter, they consume oxygen, contributing to hypoxic or anoxic conditions.
- The classroom model build-out for irrigated ponds:
- Winter: a large duckweed mat forms and freezes, killing plant cells with ice damage; dead duckweed accumulates in the pond bottom sediments after thaw.
- Spring/summer: warming temperatures wake bottom-dwelling bacteria that feed on dead duckweed, rapidly consuming oxygen and depleting the dissolved oxygen in the water.
- Hypothesis-driven predictions:
- Hypothesis 1: The chemistry of irrigated ponds will differ (more nutrients).
- Hypothesis 2: Dissolved oxygen will be lower in warm conditions in irrigated ponds.
- Hypothesis 3: Amphibian populations (frogs and tadpoles) will be lower in irrigated ponds due to hypoxia.
- Hypothesis 4: Amphibian eggs and tadpoles will experience higher mortality in irrigated ponds.
- Real-world findings after 12 years of study at Penn State:
- Easy creeks (focal wildlife groups like birds and small mammals) showed higher species diversity where irrigation occurred, but the communities were less evenly distributed (lower evenness).
- Some species thrived in irrigated areas (e.g., those that utilize weed-rich environments), while leaf-litter–associated species declined (weed-preferring vs leaf-litter–preferring species shift).
- Amphibians were not extensively studied in this context, making it unclear how aquatic and semi-aquatic species respond; the narrative emphasizes the potential for cascading effects since amphibians sit in the middle of the food chain.
- The broader ecological implication: changes to amphibians can cascade to insects, birds, and small mammals, altering ecosystem balance and pest dynamics.
- Final takeaways from the case study:
- Descriptive statistics and graphical representations are essential for summarizing and communicating ecological data.
- A mechanistic model (winter duckweed die-off, spring bacterial bloom, oxygen depletion) helps generate testable hypotheses about ecological responses to nutrient inputs.
- Long-term monitoring revealed shifts in species composition and evenness, underscoring the complexity of ecosystem responses to nutrient enrichment and hydrological changes.
- The example illustrates how science integrates observations, models, hypotheses, experiments, and revisions to explain real-world phenomena.
Practical exam prep: takeaways and study strategy
- Focus four problems (2, 6, 9, 12) to cover core ideas across central tendency, dispersion, and graphing.
- Be able to:
- Decide when to use mean vs median based on presence of outliers.
- Compute or interpret standard deviation and describe what 68%, 95%, and 99% ranges mean in context.
- Choose appropriate graph type depending on whether data are discrete or continuous and how many groups there are.
- Read and interpret bar charts with error bars; understand how SD influences the interpretation of differences between groups.
- Comment on axis scaling and why starting at zero is sometimes inappropriate.
- Explain the basic steps of the scientific method as a cycle of observations, generalizations, explanations, hypotheses, and testing, including how models adapt when predictions fail.
- Remember the lab and homework logistics:
- Lab quizzes are in person, in the same room as exams.
- Homework access: find Unit One Active via the textbook link in the course platform (B2L).
- Bring the PDF packet to class or use it digitally; incorporate it into daily study and note-taking.
- Final note: the content blends statistics, graphing best practices, and an ecological case study to illustrate how data are collected, interpreted, and used to inform real-world environmental decisions.