University Seminar

UNIV1014: POPULATIONS, PARAMETERS, SAMPLES, and ESTIMATES

Key Concepts

Sampling Error: Systemic difference between estimates and parameters, leading to inaccurate estimations.
Sampling Bias: Undirected deviation of estimates away from parameters; lower sampling error results in higher precision.
Sample Precision and Accuracy:
- Low Sampling Precision: Sample does not accurately represent the population due to small size, high variability, poor techniques, or measurement errors.
- Low Sampling Accuracy: Systematic deviation from true population characteristics, affecting results (caused by sampling bias, non-random selection, measurement errors, and non-response bias).

Definitions

Population: All individuals in the world; parameters describe these populations.
Sample: A subset of a population collected for analysis.
Estimates: Statistics that approximate parameters based on sample data.
- Parameters: Constants that do not change.
- Estimates: Variables that may change from one sample to another.

Sampling Techniques

Random Selection: Ensures each individual has an equal and independent chance of being selected to create samples.

TERMINOLOGY OF DIFFERENT TYPES OF VARIABLES

Numeric Variables

Discrete: Can only take specific values (e.g., number of trees in a plot).
Continuous: Can take any value with an infinite number of possibilities.

Categorical Variables

Ordinal: Can be ranked (e.g., sizes).
Nominal: Cannot be ranked (e.g., species).

Predictive Variables

Response variables are predicted from explanatory variables, noting the complexity of variable interrelationships.

COMMON DESCRIPTIONS OF DATA

Location (Central Tendency): Describes the central values of data.
Width (Spread/Variability): Measures the dispersion of data points.
Association (Correlation): Explores relationships between variables.

Central Tendency Measures

Mean: Average of a dataset (center of gravity).
Median: Middle value when arranged; less influenced by extremes; preferable for income data.
Mode: Most frequent data point; easily determined via histograms.
- In symmetric data, mean and median are identical.

Variability Measures

Range: Difference between maximum and minimum values.
Coefficient of Variation (CV): Standard deviation relative to the mean (CV = S/x × 100).
Variance and Standard Deviation: Measures how far a sample’s values differ from the mean.
Interquartile Range (IQR): Difference between the 75th and 25th percentiles, reducing outlier impact.
Standard Deviation: Average distance from the mean.

Gaussian Distribution

Known as the normal distribution; often used in statistical analysis.

SAMPLE SIZE INFLUENCE

Good estimators should be unbiased; expected values align with population parameters. Sample variance and standard deviation are calculated using (n-1) to reduce bias. Range is biased, increases with sample size.

OBSERVATIONAL STUDIES & EXPERIMENTATION

Observational Studies

Nature: Researchers observe without assignment of choices.
- Retrospective Study: Past data collection on previously identified subjects.
- Prospective Study: Data collection happens as conditions unfold for pre-identified subjects.

Experimental Design is Necessary for

Establishing cause-and-effect relationships through controlled studies.
Components:
- Factor: Explanatory variable manipulated.
- Experimental Units: Subjects of the experiment, such as plots or individuals.

FOUR PRINCIPLES OF EXPERIMENTAL DESIGN

Control: Control for variation by keeping conditions constant across groups.
Randomization: Distributes unknown variables across treatment levels uniformly.
Replication: Apply treatments to multiple subjects.
Blocking: Group similar experimental units to isolate treatment effects.

SIGNIFICANCE OF RESULTS

Statistically Significant: Differences larger than expected by random chance.
Power Analysis: Helps determine the appropriate sample size and significance level for the study.

Issues of Confounding and Lurking Variables

Confounding: Where one factor’s levels cannot be isolated from another’s effects (e.g., drinking coffee and heart disease).
Lurking Variables: Variables that create associations misleadingly (e.g., ice cream sales and crime rates).

ENVIRONMENTAL IMPACTS OF PULP AND PAPER MILLS

Benefits

Integral to Canada’s forestry sector; connectivity of raw materials and infrastructure, green energy production.

Challenges

Environmental concerns like chemical discharge, high energy/water consumption.
Modern advances like closed-loop systems for water recycling aim to enhance sustainability.

DATA PRESENTATION

Importance of Context in Data

Data needs context for meaningful interpretation; examples of relevant "who" and "what" in forestry include forest plots, animals, and plants.

Graphical Representation Techniques

Frequency Tables: Count occurrences of each value.
Bar Charts and Pie Charts: Display categorical data.
Histograms: Show distributions of quantitative variables.
Boxplots: Represent five-number summaries for comparisons between groups.
Scatterplots: Illustrate relationships between two quantitative variables.

PROBABILITY

Basic Principles

Statistical Independence: Outcomes of one event do not affect another; disjoint events cannot be independent.
Probability Rules: Addition rule for disjoint events; multiplication rule for independent events.

Hypotheses and Testing

Null Hypothesis (H0)

A statement of no difference; guiding principles for hypothesis testing and P-value interpretations.

P-values and Statistical Significance

A low P-value indicates strong evidence against H0, but doesn’t imply H0 is true or false; examine practical significance alongside statistical findings.

CENTRAL LIMIT THEOREM

Definition

Sampling distribution of sample means approaches Normal distribution as sample size increases; depends on random and independent observations.

Applications

Used to understand the behavior of sample statistics under repeated sampling.

WILLIAM S. GOSSET AND STUDENT’S T DISTRIBUTION

Gosset formulated the Student’s t distribution, useful for comparing sample means and conducting hypothesis tests.