T

Notes on Descriptive Statistics, Graphing, and Environmental Case Study

Descriptive statistics: central tendency, dispersion, outliers, and graphing

  • Context: Comparing two teaching methods using exam data where each group has about 200 data points; instead of listing individual scores, provide descriptive statistics to summarize the datasets.
  • Key goal: Describe the data sets (not to list each student score). Use descriptive statistics to answer which teaching method works better for exam 1.

Central tendency: mean vs median

  • Central tendency is what the data tend to center around; the typical score.

  • Mean (average):

    • ext{mean} = ar{x} = rac{1}{n}

    \sum{i=1}^n xi

    • Compute by summing all scores and dividing by the number of data points (e.g., for 200 data points,
      ar{x} = (sum of the 200 scores) / 200).
  • Outliers determine which measure to use:

    • If there are data points that are significantly far away from the next closest point (outliers), you should use the median instead of the mean.
    • Example: most scores cluster between 60 and 101; an 8 or 12 out of 100 would be far from the rest and considered an outlier.
    • Rule stated: if outliers are present, use the median; if outliers are present in one group, use the median for that group; if outliers are present in both groups, use the median for both; if both groups are clean, use the median.
  • Median:

    • Calculated by ranking scores from lowest to highest and choosing the middle value.
    • Median is robust to outliers and gives a better center when extreme values exist.

Dispersion: how spread out the data are

  • Measures of dispersion describe how far data are from the center.
  • Range (max − min): mentioned but dismissed as not very informative for comparing groups.
  • Standard deviation (SD): core measure of spread around the mean.
    • Intuition: for each data point, compute its distance from the mean; square these distances, sum, and take the square root.
    • For a sample (n data points):
    • s = \sqrt{\frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2}
    • For a population (if you treat the data as the entire population):
    • \sigma = \sqrt{\frac{1}{n} \sum{i=1}^{n} (xi - \mu)^2}
    • Practical interpretation: about 68% of the data lie within ar{x} \pm s (assuming roughly symmetric data);
      about 95% lie within \bar{x} \pm 2s; about 99% within \bar{x} \pm 3s (empirical rule for normal-like data).
  • Reporting: after the mean, it is common to note the SD in parentheses, e.g., "mean ≈ 80, SD ≈ 10".
    • Example interpretation: if mean = 80 and SD = 10, about 68% of students scored between 70 and 90.
  • Practical takeaway: describe data with central tendency (mean or median) plus dispersion (SD); this combination characterizes the typical score and its variability.

Graphical data presentation: moving from numbers to pictures

  • The lab emphasizes graphing data clearly and accurately; many students struggle with graphing.
  • Variables and graph types:
    • Independent variable (x-axis): the variable you control or categorize (e.g., treatment type, time, pH, etc.).
    • Dependent variable (y-axis): the variable you measure (e.g., score, growth rate).
  • Discrete groups (categorical data): you have separate groups (e.g., ponds irrigated with wastewater vs. ponds not irrigated).
    • Use a bar chart where each bar represents the central tendency (mean or median) for that group.
    • Do not plot 100 separate bars for 100 students; summarize with a single bar per group.
  • Continuous data: data that flow or vary continuously (e.g., time, pH, velocity).
    • Use a line graph if you have measurements that progress in order (e.g., time series, a measured variable across a gradient); plot individual points and connect them to show the trend.
  • Scatter plots: two continuous variables plotted against each other to look for relationships.
    • Each point represents a pair (x, y).
    • Interpret slope:
    • Positive correlation: upward-sloping pattern.
    • Negative correlation: downward-sloping pattern.
    • No apparent correlation: scattered without trend.
    • Classic example: ice cream sales vs outdoor temperature – warmer temperatures tend to increase ice cream sales.
  • Common mistakes to avoid:
    • Starting axes at zero is not always appropriate; instead, scale should cover the data range, with a little margin below the smallest point and above the largest point (e.g., if min ≈ 6.34, start slightly below, say 6.28; if max ≈ 6.46, go slightly above).
    • Do not over- or under-scale; improper scaling can create misleading trends.
    • For grouped data, do not show many small bars; show a single central tendency per group and use error bars to display variability.
  • How to display variability on bar charts:
    • Add error bars representing standard deviation (SD) to show the spread around the central tendency.
    • Example: two groups with means and SDs; bars of height equal to the means, with SD bars indicating variability.
  • Visual storytelling tips:
    • A bar chart with identical central tendencies but different SDs communicates different stability/variability between groups.
    • A line or scatter plot reveals trends or relationships not apparent in a bar chart.

The lab and assessment structure

  • Four recommended problems: numbers 2, 6, 9, 12 (a cross-section of problems).
  • Group work: collaborate in study groups; labs and problem sets reinforce concepts.
  • The lab handout and problem sets: practice problems are available in the packet; there is a page reference for sample statistics problems (page 16 in the packet).
  • Quizzes and exams:
    • Quizzes are in person, like exams.
    • The quiz day after the test covers lab topics (lab 1, 2, and 3).
  • What quiz questions focus on:
    • Background information: when to use mean vs median.
    • Procedures: describing what you did in a lab or activity (e.g., a home energy audit, water calculators).
    • Results: what results you obtained.
  • Homework access: textbook link in the B2L (course platform); look for Unit One Active.
  • Practical advice: bring the PDF packet or have it digitally; use the packet as a daily reference and supplement your notes.

Scientific method model and the classroom story

  • The traditional flowchart of the scientific method:
    • Observations -> generalizations -> explanatory model -> hypotheses -> experimental tests -> iteration if hypotheses fail or hold.
    • This model is especially useful for complex ideas with multiple hypotheses.
  • A second model is introduced via a real-world story about wastewater and forest ponds to illustrate how science is done in practice.
  • The example uses a hypothetical or partly-real scenario to show how models guide hypothesis generation and testing, and how adjustments are made when observations contradict expectations.

Case study: wastewater irrigation, ponds, and eutrophication (Penn State example)

  • Background: wastewater goes to a treatment plant that removes organics and sterilizes water; treated wastewater (TWV) is typically discharged back, but Penn State uses groundwater and spray irrigation on land near forests.
  • Nutrient problem: treated wastewater contains nutrients like nitrogen (N) and phosphorus (P); phosphorus and nitrogen are plant nutrients and fertilizers.
  • Irrigation effect: an artificial rain of about 1 inch per week is added to a forested area; this, combined with nutrients, alters habitat conditions.
  • Five observed changes in the irrigated forest ponds (summarized from the talk):
    1) Extra water creates ponds and depressions, especially in spring.
    2) In untreated, normal forests, weeds are limited by light and nutrients; irrigated sites receive more light and nutrients due to canopy disturbance (ice damage), enabling dense weed growth and unusual understory patterns.
    3) Leaves fall (leaf litter layer) creating a rich microhabitat underneath; in irrigated sites, the leaf litter layer persists less because wet conditions promote decay, so the classic leaf-litter ecosystem may not form as in non-irrigated forests.
    4) The ponds show increased plant life on the water surface, notably duckweed forming a continuous mat across ponds (tiny floating plants that absorb nutrients).
    5) The presence of duckweed and nutrient-rich water creates conditions different from typical forest ponds (a shift toward eutrophication-like conditions).
  • Duckweed and pond ecology:
    • Duckweed is a tiny floating plant that spreads across the pond surface, forming a mat; it has tendrils that absorb nutrients.
    • Duckweed mats indicate high nutrient input; not typical for forest ponds.
  • Eutrophication definition and mechanism:
    • Eutrophication is the process by which excessive nutrients (mainly nitrates and phosphates) enter a water body, stimulating algal blooms.
    • Algal blooms reduce light penetration, causing photosynthetic plants to die on the bottom.
    • Bacteria decompose dead plants and algae, releasing more nutrients and consuming oxygen, which reduces dissolved oxygen levels.
    • Low oxygen can lead to anoxic conditions (no oxygen), causing death of fish and other aquatic life.
    • Practical implication: excessive fertilizer use in agriculture can trigger eutrophication in nearby water bodies.
  • Important biology reminders embedded in the talk:
    • Oxygen basics: air has oxygen; aquatic organisms use dissolved oxygen via gills; land animals use lungs; plants in water rely on dissolved oxygen for respiration.
    • When bacteria decompose organic matter, they consume oxygen, contributing to hypoxic or anoxic conditions.
  • The classroom model build-out for irrigated ponds:
    • Winter: a large duckweed mat forms and freezes, killing plant cells with ice damage; dead duckweed accumulates in the pond bottom sediments after thaw.
    • Spring/summer: warming temperatures wake bottom-dwelling bacteria that feed on dead duckweed, rapidly consuming oxygen and depleting the dissolved oxygen in the water.
    • Hypothesis-driven predictions:
    • Hypothesis 1: The chemistry of irrigated ponds will differ (more nutrients).
    • Hypothesis 2: Dissolved oxygen will be lower in warm conditions in irrigated ponds.
    • Hypothesis 3: Amphibian populations (frogs and tadpoles) will be lower in irrigated ponds due to hypoxia.
    • Hypothesis 4: Amphibian eggs and tadpoles will experience higher mortality in irrigated ponds.
  • Real-world findings after 12 years of study at Penn State:
    • Easy creeks (focal wildlife groups like birds and small mammals) showed higher species diversity where irrigation occurred, but the communities were less evenly distributed (lower evenness).
    • Some species thrived in irrigated areas (e.g., those that utilize weed-rich environments), while leaf-litter–associated species declined (weed-preferring vs leaf-litter–preferring species shift).
    • Amphibians were not extensively studied in this context, making it unclear how aquatic and semi-aquatic species respond; the narrative emphasizes the potential for cascading effects since amphibians sit in the middle of the food chain.
    • The broader ecological implication: changes to amphibians can cascade to insects, birds, and small mammals, altering ecosystem balance and pest dynamics.
  • Final takeaways from the case study:
    • Descriptive statistics and graphical representations are essential for summarizing and communicating ecological data.
    • A mechanistic model (winter duckweed die-off, spring bacterial bloom, oxygen depletion) helps generate testable hypotheses about ecological responses to nutrient inputs.
    • Long-term monitoring revealed shifts in species composition and evenness, underscoring the complexity of ecosystem responses to nutrient enrichment and hydrological changes.
    • The example illustrates how science integrates observations, models, hypotheses, experiments, and revisions to explain real-world phenomena.

Practical exam prep: takeaways and study strategy

  • Focus four problems (2, 6, 9, 12) to cover core ideas across central tendency, dispersion, and graphing.
  • Be able to:
    • Decide when to use mean vs median based on presence of outliers.
    • Compute or interpret standard deviation and describe what 68%, 95%, and 99% ranges mean in context.
    • Choose appropriate graph type depending on whether data are discrete or continuous and how many groups there are.
    • Read and interpret bar charts with error bars; understand how SD influences the interpretation of differences between groups.
    • Comment on axis scaling and why starting at zero is sometimes inappropriate.
    • Explain the basic steps of the scientific method as a cycle of observations, generalizations, explanations, hypotheses, and testing, including how models adapt when predictions fail.
  • Remember the lab and homework logistics:
    • Lab quizzes are in person, in the same room as exams.
    • Homework access: find Unit One Active via the textbook link in the course platform (B2L).
    • Bring the PDF packet to class or use it digitally; incorporate it into daily study and note-taking.
  • Final note: the content blends statistics, graphing best practices, and an ecological case study to illustrate how data are collected, interpreted, and used to inform real-world environmental decisions.