University Seminar
UNIV1014: POPULATIONS, PARAMETERS, SAMPLES, and ESTIMATES
Key Concepts
Sampling Error: Systemic difference between estimates and parameters, leading to inaccurate estimations.
Sampling Bias: Undirected deviation of estimates away from parameters; lower sampling error results in higher precision.
Sample Precision and Accuracy:
Low Sampling Precision: Sample does not accurately represent the population due to small size, high variability, poor techniques, or measurement errors.
Low Sampling Accuracy: Systematic deviation from true population characteristics, affecting results (caused by sampling bias, non-random selection, measurement errors, and non-response bias).
Definitions
Population: All individuals in the world; parameters describe these populations.
Sample: A subset of a population collected for analysis.
Estimates: Statistics that approximate parameters based on sample data.
Parameters: Constants that do not change.
Estimates: Variables that may change from one sample to another.
Sampling Techniques
Random Selection: Ensures each individual has an equal and independent chance of being selected to create samples.
TERMINOLOGY OF DIFFERENT TYPES OF VARIABLES
Numeric Variables
Discrete: Can only take specific values (e.g., number of trees in a plot).
Continuous: Can take any value with an infinite number of possibilities.
Categorical Variables
Ordinal: Can be ranked (e.g., sizes).
Nominal: Cannot be ranked (e.g., species).
Predictive Variables
Response variables are predicted from explanatory variables, noting the complexity of variable interrelationships.
COMMON DESCRIPTIONS OF DATA
Location (Central Tendency): Describes the central values of data.
Width (Spread/Variability): Measures the dispersion of data points.
Association (Correlation): Explores relationships between variables.
Central Tendency Measures
Mean: Average of a dataset (center of gravity).
Median: Middle value when arranged; less influenced by extremes; preferable for income data.
Mode: Most frequent data point; easily determined via histograms.
In symmetric data, mean and median are identical.
Variability Measures
Range: Difference between maximum and minimum values.
Coefficient of Variation (CV): Standard deviation relative to the mean (CV = S/x × 100).
Variance and Standard Deviation: Measures how far a sample’s values differ from the mean.
Interquartile Range (IQR): Difference between the 75th and 25th percentiles, reducing outlier impact.
Standard Deviation: Average distance from the mean.
Gaussian Distribution
Known as the normal distribution; often used in statistical analysis.
SAMPLE SIZE INFLUENCE
Good estimators should be unbiased; expected values align with population parameters. Sample variance and standard deviation are calculated using (n-1) to reduce bias. Range is biased, increases with sample size.
OBSERVATIONAL STUDIES & EXPERIMENTATION
Observational Studies
Nature: Researchers observe without assignment of choices.
Retrospective Study: Past data collection on previously identified subjects.
Prospective Study: Data collection happens as conditions unfold for pre-identified subjects.
Experimental Design is Necessary for
Establishing cause-and-effect relationships through controlled studies.
Components:
Factor: Explanatory variable manipulated.
Experimental Units: Subjects of the experiment, such as plots or individuals.
FOUR PRINCIPLES OF EXPERIMENTAL DESIGN
Control: Control for variation by keeping conditions constant across groups.
Randomization: Distributes unknown variables across treatment levels uniformly.
Replication: Apply treatments to multiple subjects.
Blocking: Group similar experimental units to isolate treatment effects.
SIGNIFICANCE OF RESULTS
Statistically Significant: Differences larger than expected by random chance.
Power Analysis: Helps determine the appropriate sample size and significance level for the study.
Issues of Confounding and Lurking Variables
Confounding: Where one factor’s levels cannot be isolated from another’s effects (e.g., drinking coffee and heart disease).
Lurking Variables: Variables that create associations misleadingly (e.g., ice cream sales and crime rates).
ENVIRONMENTAL IMPACTS OF PULP AND PAPER MILLS
Benefits
Integral to Canada’s forestry sector; connectivity of raw materials and infrastructure, green energy production.
Challenges
Environmental concerns like chemical discharge, high energy/water consumption.
Modern advances like closed-loop systems for water recycling aim to enhance sustainability.
DATA PRESENTATION
Importance of Context in Data
Data needs context for meaningful interpretation; examples of relevant "who" and "what" in forestry include forest plots, animals, and plants.
Graphical Representation Techniques
Frequency Tables: Count occurrences of each value.
Bar Charts and Pie Charts: Display categorical data.
Histograms: Show distributions of quantitative variables.
Boxplots: Represent five-number summaries for comparisons between groups.
Scatterplots: Illustrate relationships between two quantitative variables.
PROBABILITY
Basic Principles
Statistical Independence: Outcomes of one event do not affect another; disjoint events cannot be independent.
Probability Rules: Addition rule for disjoint events; multiplication rule for independent events.
Hypotheses and Testing
Null Hypothesis (H0)
A statement of no difference; guiding principles for hypothesis testing and P-value interpretations.
P-values and Statistical Significance
A low P-value indicates strong evidence against H0, but doesn’t imply H0 is true or false; examine practical significance alongside statistical findings.
CENTRAL LIMIT THEOREM
Definition
Sampling distribution of sample means approaches Normal distribution as sample size increases; depends on random and independent observations.
Applications
Used to understand the behavior of sample statistics under repeated sampling.
WILLIAM S. GOSSET AND STUDENT’S T DISTRIBUTION
Gosset formulated the Student’s t distribution, useful for comparing sample means and conducting hypothesis tests.