University Seminar

UNIV1014: POPULATIONS, PARAMETERS, SAMPLES, and ESTIMATES

Key Concepts

  • Sampling Error: Systemic difference between estimates and parameters, leading to inaccurate estimations.

  • Sampling Bias: Undirected deviation of estimates away from parameters; lower sampling error results in higher precision.

  • Sample Precision and Accuracy:

    • Low Sampling Precision: Sample does not accurately represent the population due to small size, high variability, poor techniques, or measurement errors.

    • Low Sampling Accuracy: Systematic deviation from true population characteristics, affecting results (caused by sampling bias, non-random selection, measurement errors, and non-response bias).

Definitions

  • Population: All individuals in the world; parameters describe these populations.

  • Sample: A subset of a population collected for analysis.

  • Estimates: Statistics that approximate parameters based on sample data.

    • Parameters: Constants that do not change.

    • Estimates: Variables that may change from one sample to another.

Sampling Techniques

  • Random Selection: Ensures each individual has an equal and independent chance of being selected to create samples.

TERMINOLOGY OF DIFFERENT TYPES OF VARIABLES

Numeric Variables

  • Discrete: Can only take specific values (e.g., number of trees in a plot).

  • Continuous: Can take any value with an infinite number of possibilities.

Categorical Variables

  • Ordinal: Can be ranked (e.g., sizes).

  • Nominal: Cannot be ranked (e.g., species).

Predictive Variables

  • Response variables are predicted from explanatory variables, noting the complexity of variable interrelationships.

COMMON DESCRIPTIONS OF DATA

  • Location (Central Tendency): Describes the central values of data.

  • Width (Spread/Variability): Measures the dispersion of data points.

  • Association (Correlation): Explores relationships between variables.

Central Tendency Measures

  1. Mean: Average of a dataset (center of gravity).

  2. Median: Middle value when arranged; less influenced by extremes; preferable for income data.

  3. Mode: Most frequent data point; easily determined via histograms.

    • In symmetric data, mean and median are identical.

Variability Measures

  1. Range: Difference between maximum and minimum values.

  2. Coefficient of Variation (CV): Standard deviation relative to the mean (CV = S/x × 100).

  3. Variance and Standard Deviation: Measures how far a sample’s values differ from the mean.

  4. Interquartile Range (IQR): Difference between the 75th and 25th percentiles, reducing outlier impact.

  5. Standard Deviation: Average distance from the mean.

Gaussian Distribution

  • Known as the normal distribution; often used in statistical analysis.

SAMPLE SIZE INFLUENCE

  • Good estimators should be unbiased; expected values align with population parameters. Sample variance and standard deviation are calculated using (n-1) to reduce bias. Range is biased, increases with sample size.

OBSERVATIONAL STUDIES & EXPERIMENTATION

Observational Studies

  • Nature: Researchers observe without assignment of choices.

    • Retrospective Study: Past data collection on previously identified subjects.

    • Prospective Study: Data collection happens as conditions unfold for pre-identified subjects.

Experimental Design is Necessary for

  • Establishing cause-and-effect relationships through controlled studies.

  • Components:

    • Factor: Explanatory variable manipulated.

    • Experimental Units: Subjects of the experiment, such as plots or individuals.

FOUR PRINCIPLES OF EXPERIMENTAL DESIGN

  1. Control: Control for variation by keeping conditions constant across groups.

  2. Randomization: Distributes unknown variables across treatment levels uniformly.

  3. Replication: Apply treatments to multiple subjects.

  4. Blocking: Group similar experimental units to isolate treatment effects.

SIGNIFICANCE OF RESULTS

  • Statistically Significant: Differences larger than expected by random chance.

  • Power Analysis: Helps determine the appropriate sample size and significance level for the study.

Issues of Confounding and Lurking Variables

  • Confounding: Where one factor’s levels cannot be isolated from another’s effects (e.g., drinking coffee and heart disease).

  • Lurking Variables: Variables that create associations misleadingly (e.g., ice cream sales and crime rates).

ENVIRONMENTAL IMPACTS OF PULP AND PAPER MILLS

Benefits

  • Integral to Canada’s forestry sector; connectivity of raw materials and infrastructure, green energy production.

Challenges

  • Environmental concerns like chemical discharge, high energy/water consumption.

  • Modern advances like closed-loop systems for water recycling aim to enhance sustainability.

DATA PRESENTATION

Importance of Context in Data

  • Data needs context for meaningful interpretation; examples of relevant "who" and "what" in forestry include forest plots, animals, and plants.

Graphical Representation Techniques

  • Frequency Tables: Count occurrences of each value.

  • Bar Charts and Pie Charts: Display categorical data.

  • Histograms: Show distributions of quantitative variables.

  • Boxplots: Represent five-number summaries for comparisons between groups.

  • Scatterplots: Illustrate relationships between two quantitative variables.

PROBABILITY

Basic Principles

  • Statistical Independence: Outcomes of one event do not affect another; disjoint events cannot be independent.

  • Probability Rules: Addition rule for disjoint events; multiplication rule for independent events.

Hypotheses and Testing

Null Hypothesis (H0)

  • A statement of no difference; guiding principles for hypothesis testing and P-value interpretations.

P-values and Statistical Significance

  • A low P-value indicates strong evidence against H0, but doesn’t imply H0 is true or false; examine practical significance alongside statistical findings.

CENTRAL LIMIT THEOREM

Definition

  • Sampling distribution of sample means approaches Normal distribution as sample size increases; depends on random and independent observations.

Applications

  • Used to understand the behavior of sample statistics under repeated sampling.

WILLIAM S. GOSSET AND STUDENT’S T DISTRIBUTION

  • Gosset formulated the Student’s t distribution, useful for comparing sample means and conducting hypothesis tests.