statistics
Introduction to Statistics Study Guide
Fundamental Concepts: Hypotheses and P-Values
Understanding Hypotheses: - Hypotheses are the primary method used to question the environment and existing theories. - Alternative Hypothesis (): The hypothesis that there is an effect or a difference. - Null Hypothesis (): The hypothesis that there is no effect or no difference.
Understanding P-values: - P-values are typically seen associated with results from statistical tests (e.g., , P < 0.01). - Definition: The probability of observing data as extreme as, or more extreme than, the actual results, assuming that the null hypothesis () is true. - Range: P-values range from to . - Example Interpretation: A p-value of indicates there is a chance of seeing the collected data if the null hypothesis is true. - Presets: In hypothesis testing, p-values are usually preset to thresholds of or . - Significance Thresholds: A preset of means there is less than a in chance of the hypothesis being true if we reject the null. The lower the p-value, the higher the confidence in the prediction. Values lower than indicate very high confidence.
Sampling Units and Variable Types
Sampling Unit: - This is the specific unit of study defined by the experimental design or hypotheses (e.g., an experimental fish tank or a field sampling quadrat). - Sample Size (): Denotes the size of a sample. - Population Size (): Denotes the total size of a population. - Importance of Correct Sample Size:Necessary to ensure the hypothesis can be tested, to minimize harm to living things (especially in medical trials), and to avoid pseudoreplication.
Types of Variables: - Continuous Variables: Measured on a continuous scale where any numerical value is possible (e.g., salinity or temperature). - Discrete Variables: A limited number of prefixed values; the slide specifies these values cannot be ordered by magnitude (e.g., certain questionnaire responses). - Ranked Variables: Similar to discrete variables, but the values can be ordered by magnitude and the difference between values is considered equal. - Derived or Computed Variables: Variables calculated using two or more other variables, such as percentages or ratios. - Independent Variable: The underlying cause or the factor being manipulated. - Dependent Variable: The effect or response being measured.
Categorization of Statistics
Descriptive Statistics: Provide basic information about the collected data, specifically regarding position (mean/median) and dispersion (variability).
Parametric Statistics: These make specific assumptions about the shape of the data, questioning whether they follow known mathematical distributions.
Non-Parametric Statistics: These make little to no assumptions about the data. They are generally less powerful than parametric tests and are unsuitable for answering highly complex questions.
Measures of Location (Central Tendency)
Mode: The value that is most frequently repeated in the dataset.
Median: The quantity or value lying at the midpoint of a frequency distribution of observed values.
Mean: The arithmetic average of all numbers in the set, calculated as: -
The Normal Distribution and Data Shape
The Normal Distribution: - Also referred to as the Gauss curve or the bell-shaped curve. - In a perfectly normal distribution: . - A standard normal distribution has a mean () of and a variance () of .
Asymmetry and Concentration: - Skewness: Measures the asymmetry of a distribution. Data can be skewed either to the left or the right. Skewness is visualized by observing differences between the mean, median, and mode. - Kurtosis: Measures the "flatness" or peakedness of a distribution. - Leptokurtic: A distribution with more observations concentrated close to the mean. - Platykurtic: A distribution with more observations spread toward the extremes.
Modality: - Distributions can have multiple modes (e.g., Bimodal with two peaks, Multimodal with three or more). - Multimodal distributions are characteristic of wild fish populations. Understanding these distributions is vital for sustainable fish management.
Measures of Spread and Variability
Variance: - True Variance / Population Variance (): - Sample Variance ():
Standard Deviation ( or ): - Calculated as the square root of variance:
Standard Error (): - Measures the precision of the mean estimate:
Confidence Intervals (): - Used to define a range around the sample mean () based on a confidence level value () such as or . - Formula provided: (where is the sample standard deviation and is sample size).
Practical Scientific Writing Tip: A measure of location must always be paired with a measure of spread (e.g., "the average length of fish at Portsmouth Harbour was ").
Data Visualization Principles and Plot Types
General Visualization Tips: - Prioritize accessibility for the reader without sacrificing accuracy. - Simplicity is vital. - Select plot types based on the specific data type and analysis requirements.
Common Plot Types: - Box and Whisker Plots: - Upper box limit = (3rd quartile). - Horizontal bar = Median. - Lower box limit = (1st quartile). - marks the Mean. - outside the box = Outliers. - Error bars can represent or . - Error Bars and Confidence Interval Plots: Features a dot for the arithmetic mean and vertical bars for confidence intervals. - Frequency Distribution Plots: Note the separation between columns for discrete data versus continuous data (histograms). Always include x-axis labels and units. - Pie Charts: Sections represent variables as proportions (). Drawbacks include difficulty for humans to estimate small differences between sections. - Timeline / Time Series: Plots observations (dots) against time with a trend line (dashed). Often includes a regression line equation and an value (e.g., ; ). - Scatter Plots: Used to show relationship between variables. Can include multiple data series (e.g., black diamonds for series 1, white circles for series 2), legends, regression lines (solid line), and confidence intervals of the regression line (dotted lines).
Inferential Statistics Framework, Hypothesis Testing, and Probability
Statistics is defined as a branch of mathematics concerned with the collection, organization, analysis, interpretation, and presentation of numerical data. It is divided into two primary categories:
Descriptive Statistics: This involves collecting, organizing, and describing data from a specific target population.
Inferential Statistics: This practice uses data from a sample to infer characteristics about a larger population. Because it is usually impossible to measure every individual in a population, inferential statistics provides the tools to draw conclusions or make predictions about the whole based on a subset. This process inherently involves hypothesis testing.
The Inferential Framework (Underwood 1990, 1991)
The inferential framework is a structured series of logical components used in research programmes to build scientific knowledge. The components include:
Observations: Identifying patterns in space and time (e.g., "Elephant species X seems to have larger ears in the warmer North than in the cooler South").
Models: Developing possible explanations or theories for the observations. These might consider factors like predation, temperature, or vegetation. Researchers often start with the most logical model based on existing knowledge.
Hypotheses: Predictions based on the models. A hypothesis is an idea or explanation tested through study, experimentation, and data analysis.
Null Hypotheses (): The specific statement put to the test, asserting that no difference or effect exists.
Experimental Design: The sampling scheme and statistical methods used to gather data.
Interpretation: Determining whether to falsify the null hypothesis or fail to reject it.
Conceptualizing Hypotheses with ExamplesThe Elephant Ear Example
Observation: Elephant populations of Species X have bigger ears in the warmer North than the cooler South. (Note: Larger surface area allows for more heat loss/cooling in warmer climates).
Research Question: Does average temperature (climate) affect the evolution of elephant ear size in different habitat zones?
Hypothesis: IF temperature affects ear size, THEN populations moving to and evolving in the same climate zone as other populations will evolve to have similar-sized ears.
Null Hypothesis (): IF average temperature DOES NOT affect ear size, THEN populations moving to a different climate zone will not evolve different ear sizes. ( assumes No Difference).
Alternative Hypothesis (): IF average temperature DOES affect ear size, THEN populations moving from a warmer to a cooler climate will evolve smaller ears (or vice-versa).
The Hedgehog Example (Village A vs. Village B)
Observation: Hedgehogs in Village A appear smaller than those in Village B.
Potential Bias/Models: Could differences be due to trap type (only catching small ones) or trap placement timing (catching young in Spring)?
Research Question: Are hedgehogs from Village A smaller than those from neighboring Village B?
Improved Hypothesis: IF we survey six sites in each village using the same traps and repeat this in each season for one year, THEN the measured size of hedgehogs will not differ between villages.
Statistical Hypotheses: * (Null): Size Village A $=$ Size Village B. * (Alternative): Size Village A $<$ Size Village B.
Specificity in Scientific Questions
A crucial part of the inferential framework is transforming poor, vague questions into better, specific questions that are practical and realistic to measure:
Poor: Are male and female mosquitoes different?
Better: Does the body size in mosquitoes differ between sexes?
Poor: Do enzymes work better at higher temperatures?
Better: Does the activity of DNA polymerase enzyme increase when temperature increases?
Poor: Do seagrasses promote biodiversity?
Better: Is fish species diversity higher in denser seagrasses?
Falsifiability and Hypothesis Testing PrinciplesThe Law-Justice Analogy
Testing the Null Hypothesis () first follows the principle of "innocent until proven guilty."
Scenario: Testing a new cancer drug.
Assumption: Start with the assumption () that there is no difference between the drug and the control group.
The Trial: The statistical test acts as the jury. If evidence in favor of the drug is "convincing beyond a reasonable doubt," we reject the null hypothesis and accept the alternative ().
Threshold: "Beyond a reasonable doubt" is statistically represented by an alpha error (p-value) of or less. There is a small chance the result occurred by chance, but if the odds are tiny enough, we reject the null.
Key Definitions
Null Hypothesis (): States that a difference (statistical significance) does not exist between two or more populations.
Alternative Hypothesis ( or ): States that a phenomenon is occurring due to non-random causes.
Statistical Errors
Type I Error (False Positive): The null hypothesis is actually true, but the statistical test incorrectly indicates a difference exists. This is often considered more problematic (e.g., concluding a drug is effective when it is not, leading to deleterious effects).
Type II Error (False Negative): The null hypothesis is actually false, but the test fails to determine the difference as significant. This often occurs with small sample sizes.
Probability in StatisticsFoundations of Probability
Probability is the mathematical machinery used to analyze chance and quantify uncertainty. It provides the bridge between descriptive data and inferential interpretation.
Probability Range: A number between (impossibility) and (certainty).
Random Variable: A mathematical object representing the outcome of a random event (e.g., assigning for heads and for tails in a coin toss).
Probabilistic vs. Statistical Reasoning
Probabilistic Reasoning: Knowing the population and predicting a sample (e.g., knowing the proportion of shark species in a reef and calculating the probability that the first shark seen is a specifically a White Tip).
Statistical Reasoning: Observing a random sample to estimate the proportions of the whole unknown population.
P-Values
A statistical test determines the probability that the null hypothesis is true, expressed as a P-value.
If the probability () is low, is rejected.
Standard cut-off: P < 0.05 (less than chance that the null is true).
Lower values ( or ) indicate even stronger evidence against the null than higher values ( or ).
Variance and ExpectationExpectation
The expectation of a random variable captures the center of the distribution. It is the average of many independent samples and is defined as the probability-weighted sum of all possible values.
Variance
Variance is the measurement of the spread or dispersion of data around the sample's mean.
Significance: Smaller variance indicates consistent data and stronger evidence for differences. Larger variance makes it harder to identify meaningful differences between groups.
Visualizing Significance: If the mean of Population B sits outside of the observations of Population A, there is less than probability (P < 0.05) that the difference occurred by chance.
Causes of Variance
Randomness: Chance events.
Biological Variation: Genetics, environment, etc.
Measurement Error: Inaccuracy from pipetting, imaging, or instruments.
Systematic Error (Bias): A consistent difference between the recorded value and the true value.
Sampling and Quality Control
Sample Size: More units increase accuracy, though this must be balanced with practicality (time, money).
Quality over Quantity: Poor-quality data increases error and reduces statistical power.
Representativeness: Samples must be random and representative of the total population to avoid bias. Tools include quadrats, random positioning, or systematic line sampling.
Timing: Sampling must account for season, time of day, and weather.
Control Groups in Research
A control group is a group examined in parallel to the treatment group to "remove" the effect of all factors except the one being investigated.
Experimental Controls: Controlling the environment (e.g., keeping light, temperature, and humidity constant while varying only levels).
Procedural Controls: Ensuring the control group undergoes the exact same procedures as the treatment group, minus the active treatment itself. * Example: Group 1 (No drug), Group 2 (Drug), Group 3 (Placebo/dummy drug to control for the effect of the injection or tablet itself).
Statistical Controls: Instead of fixing factors, the alternate factors are measured and accounted for during analysis (e.g., measuring environmental temperature and using it in calculations to isolate the treatment effect).
Historical Controls: Comparing the current treatment group to historical data compiled when only one physical group exists.
Mathematical Calculation of Variance
To calculate the variance of a sample as an estimate for a population, follow these steps:
Calculate the Mean ().
Find the Difference between each data point () and the mean.
Square each difference (this ensures all values are positive and emphasizes larger deviations).
Sum the squared differences ().
Divide by the number of data points minus one ().
Formula for Sample Variance ():
Visual Example Calculation: Data:
Mean ():
Differences:
Squared Differences:
Sum of Squares ():
Variance ():
Introduction to Inferential Statistics
Hypotheses: Null () vs. Alternative ()
Hypothesis Definition: Statements believed to be true regarding a phenomenon that have not yet been proven but are testable.
Null Hypothesis (): * This is what we explicitly aim to test. * Definition: States that no statistical significance or difference exists between two or more populations or groups. * Example: IF average temperature DOES NOT affect the size of elephant ears, THEN elephant populations of species X moving to a different climate zone will not evolve different ear sizes.
Alternative Hypothesis ( or ): * Definition: States that a phenomenon is likely occurring due to non-random causes rather than just by chance. * Example: IF average temperature DOES affect the size of elephant ears, THEN elephant populations of species X moving from a warmer to a cooler climate zone will evolve smaller ears (or vice versa).
Decision Logic: If the Null hypothesis is rejected, the Alternative hypothesis can be accepted. Statistical methods and tests are used specifically to determine the validity of the Null hypothesis.
Standard Deviation () and Data Spread
Definition: Standard deviation quantifies the dispersion or variability of data points relative to their mean. It reflects how far, on average, each individual value deviates from the mean.
Sigma () Values: In a normal distribution, the spread is divided into standard deviations from the mean, denoted as sigma values ( etc.).
Mathematical Calculation components: * = Number of values. * = The mean. * = Individual values. * Verbatim Definition: " is the square root, of the sum of the squared deviations, divided by the number of values."
Calculating Difference: To find , one calculates the difference from the mean for every individual value in a sample (e.g., heights of people) and then averages those differences.
Visualizing Spread: * Normal distributions can have different means () but the same spread. * Normal distributions can have the same mean but different standard deviations (), resulting in thinner or wider "bells."
Probability and Sigma: For a random sample from a normally distributed population: * There is a chance the sample sits within of the mean ( chance it is higher or lower than the mean individually). * There is a chance it sits within of the mean. * There is a chance it sits within of the mean.
Confidence Intervals ()
Definition: The range in which the true mean of a population lies with a certain probability, derived from a sample set.
Logic: Taking different sample sets from the same population will likely result in different mean values due to chance. The defines the probability limits for the actual population mean.
Common Levels: and are standard.
Calculation for Normally Distributed Data: * -values: Standard numbers derived from a look-up table. * Lower Limit Calculation: * Upper Limit Calculation:
Interpretative Note: Technically, if you sample the same population times, the mean should fit within the two limits in of those instances.
Application: is frequently added to correlation or regression plots to show the reliability of the trend line.
Variable Types and Foundations of Testing
Continuous Variables: Most common in biology/biochemistry. Examples: Length, Height, Weight, pH, moles, Reaction speed.
Independent Variable: The "cause" (e.g., diet, treatments).
Dependent Variable: The "effect" (e.g., weight, health). Health is dependent on diet.
Parametric versus Non-Parametric Statistics
Parametric Tests: * Definition: Tests that have complete information about population parameters or assume the population distribution (e.g., normal distribution) is known. * Measurement Level: Applied to metric scales (Interval or Ratio). * Central Tendency: Uses the Mean. * Examples: t-test, Analysis of Variance (ANOVA), Pearson correlation.
Non-Parametric Tests: * Definition: Used when the researcher has no knowledge of population parameters or cannot make specific assumptions about the distribution (non-normal or arbitrary). * Measurement Level:Applied to Nominal or Ordinal scales. * Central Tendency: Uses the Median. * Applicability: Variables and attributes. * Examples: Mann-Whitney U test, Spearman correlation, Wilcoxon signed-rank test, Kruskal-Wallis test.
Determining Data Normality
Determine normality before choosing a statistical test.
Statistical Tests for Normality: * Shapiro-Wilk Test: Recommended for small sample sizes (n < 50). * Kolmogorov-Smirnov Test: Recommended for larger sample sizes (). * Anderson-Darling Test:Another analytical alternative.
Hypothesis Logic in Normality Testing: * Null Hypothesis (): Assumes data is normally distributed. * Alternative Hypothesis (): Data is non-normally distributed. * If P < 0.05: Reject the null; data is non-normally distributed. * If P > 0.05: Accept the null; data is normally distributed.
Sample Size Influence: Lower sample sizes often result in higher P-values for normality (failing to prove non-normality), while very large samples might result in P < 0.05 even if the distribution is nearly normal.
Visual and Non-Analytical Normality Assessment
Plotting Histograms: A basic visual inspection of frequency distribution.
Quantile-Quantile (Q-Q) Plots: A more scientific visual approach. * Process: Sort data from largest to smallest. Divide a theoretical normal curve by the number of values () to create areas. * Theoretical Quantiles: Mathematical calculations determine where lines intersect the x-axis of a normal distribution. * Interpreting Q-Q Plots: Plot theoretical values against observed values. If the data corresponds to a normal distribution, the points will form a straight line. Deviations at the edges or curves in the middle indicate non-normality.
Interpreting P-Values
Definition: The lower the P-value, the more confidence one has in rejecting the Null hypothesis, suggesting a strong effect or difference.
Strength Descriptive Wording: * P < 0.1: No compelling evidence to reject the Null hypothesis. * 0.05 < P < 0.10: Mild or weak evidence to reject the Null hypothesis. * P < 0.05: Moderate evidence to reject the Null hypothesis. * P < 0.01: Strong evidence to reject the Null hypothesis. * P < 0.001: Very strong evidence to reject the Null hypothesis.
The "Holy Grail" (0.05): Commonly used cut-off chosen by a classic statistician as a balance between strictness and leniency. Modern scientists debate if arbitrary cut-offs should be used or if significance should be described as "statistical power."
Scientific Variation: * Higgs-Boson Research: Required a threshold of (1 in 3.5 million chance). * P-hacking: The act of altering tests used just to reach a significant result (P < 0.05); this is considered scientific misconduct.
Visual Display in Reports: * = Moderate significance (P < 0.05). * = Strong significance (P < 0.01). * = Very strong significance (P < 0.001). * = Non-significant.
Box Plot Interpretation
Interquartile Range (IQR): Also known as the midspread or middle . * The data set is divided into four rank-ordered parts (quartiles). * . * Note: is not the same as Standard Deviation ().
Components of a Box Plot: * Median: The central line in the box. * Whiskers: Typically represent . * Skewness: Negatively skewed distributions have the median closer to the top (); positively skewed distributions have the median closer to the bottom ().
Data Visualization Comparison: Simple bar plots can hide data features. Box plots and Violin plots (which overlay data points) reveal skewness and outliers that can have significant biological implications.
The Importance of Manual Data Visualization
Philosophical Meandering: "A hypothesis is a liability."
Discovery Limitation: Focusing too strictly on a single hypothesis may prevent researchers from making broader discoveries in large datasets.
The Gorilla Experiment Study: * Students were given a dataset of BMI and daily steps for people. * Group 1 was asked to test a statistical difference between men and women. * Group 2 was asked a "hypothesis-free" question: "What do you conclude from the dataset?" * Result: If the data was manually plotted, a literal image of a gorilla appeared in the data points. This illustrates that automated "number crunching" can miss hidden patterns that manual visualization reveals.
Introduction to Inferential Statistics and Data Distribution
Fundamentals of Hypothesis Testing
Hypotheses as Statements of Truth: Hypotheses are defined as statements believed to be true regarding a specific phenomenon that have not yet been proven. They are structured to be testable.
The Null Hypothesis ($H_0$): This hypothesis states that no statistical significance or difference exists between two or more populations or groups. It represents the default position that any observed effect is due to chance. * Specific Example (Elephant Ears): If average temperature does not affect the size of elephant ears, then species $X$ moving to a different climate zone will not evolve different ear sizes.
The Alternative Hypothesis ($H_1$ or $H_a$): This states that a phenomenon is occurring due to non-random causes and is not just a matter of chance. It is the statement we accept if the Null Hypothesis is rejected. * Specific Example (Elephant Ears): If average temperature does affect ear size, then populations moving from warmer to cooler climates will evolve smaller ears.
Statistical Methodology: Scientists use specific statistical tests to determine whether to reject the Null Hypothesis. Rejecting $H_0$ allows for the acceptance of $H_1$.
Standard Deviation and Data Spread
Definition of Standard Deviation ($SD$): A measurement used to quantify the dispersion or variability of data points relative to their mean. It reflects how far, on average, individual values deviate from the mean.
Mathematical Representation ($\sigma$): The standard deviation is denoted by the Greek letter sigma ($\sigma$).
Calculating $SD$: The lecture defines $SD$ as "the square root of the sum of the squared deviations, divided by the number of values." * Variables involved: , , .
Normal Distribution Dispersion (The Empirical Rule): In a normal, symmetrical distribution: * There is a chance a random sample sits within of the mean ( above and below). * There is a chance a random sample sits within of the mean. * There is a chance a random sample sits within of the mean.
Interpretation of Spreads: * Normal distributions can have different means (, , or ) while maintaining the same spread. * Normal distributions can have different standard deviations (, , or ), resulting in thinner or wider curves regardless of the mean.
Confidence Intervals ($CI$)
Core Concept: The $CI$ is the range in which the true mean of a population lies with a specific probability, derived from a sample set.
Sampling Variability: Different sample sets from the same population will likely yield different means by chance. The $CI$ accounts for this.
Common Thresholds: The most used probabilities are and . In a $CI$, if a population is sampled times, the true mean should fall within the limits times.
Calculating Limits: * Lower Limit Calculation: * Upper Limit Calculation: * Components: $z$-values are standard numbers from look-up tables; other values come from the specific sample set.
Visualization: $CIs$ are frequently added to correlation or regression plots to show the reliability of the trend (e.g., , , or bands).
Parametric vs. Non-Parametric Statistics
Parametric Tests: * Definition: These tests have complete information about population parameters or make assumptions about the distribution (usually normal). * Measurement Level: Metric scales, specifically Interval or Ratio scales. * Central Tendency: Focuses on the Mean. * Examples: t-test, ANOVA, Pearson correlation.
Non-Parametric Tests: * Definition: Applied when the researcher has no knowledge of population parameters or cannot make specific assumptions about the distribution (arbitrary or non-normal). * Measurement Level:Nominal or Ordinal scales. * Central Tendency: Focuses on the Median. * Examples: Mann-Whitney U test, Spearman correlation, Wilcoxon signed-rank test, Kruskal-Wallis test.
Variable Types in Biology: Most biological and biochemical data involve continuous variables like length, weight, pH, moles, and reaction speed. * Independent Variable: The cause (e.g., diet). * Dependent Variable: The effect/measurement (e.g., weight gain) that is "dependent" on the cause.
Determining Data Normality
Importance: Normality must be determined before selecting a statistical test; it is the "key" prerequisite.
Analytical/Statistical Tests for Normality: These tests evaluate the hypothesis that the data follows a normal distribution. * Shapiro-Wilk Test: Most appropriate for small sample sizes (n < 50). * Kolmogorov-Smirnov Test: Used for larger sample sizes (). * Anderson-Darling Test: An alternative normality test. * P-value Interpretation for Normality: * If p < 0.05: Reject the Null Hypothesis (Data is non-normally distributed). * If p > 0.05: Accept the Null Hypothesis (Data is normally distributed).
Sample Size Influence: Normality tests are highly sensitive to sample size. A very large sample may return a p < 0.05 (non-normal) even if it is representative of a normal population, while a small sample may return a high p-value even if it is skewed.
Visual Inspection Methods: * Histograms: Plotting data density. * Quantile-Quantile (Q-Q) Plots: A stronger method comparing theoretical quantiles of a normal distribution against observed values. * Procedure: Sort values, divide a theoretical curve into areas, calculate theoretical quantile values, and plot against actual measurements. * Interpretation: A straight line indicates the data is normal. Deviations at the edges suggest non-normality.
The Meaning and Strength of P-values
Definition: The lower the P-value, the more confidence one has in rejecting the Null Hypothesis. It quantifies the strength of evidence for an experimental effect.
Categorization of Significance: * p > 0.1: No compelling evidence to reject the Null Hypothesis. * 0.05 < p < 0.10: Mild or weak evidence to reject the Null Hypothesis. * p < 0.05: Moderate evidence to reject the Null Hypothesis (marked by $ * $). * p < 0.01: Strong evidence to reject the Null Hypothesis (marked by $ ** $). * p < 0.001: Very strong evidence to reject the Null Hypothesis (marked by $ *** $). * *n.s.:* Over $0.05$ (non-significant).
The 0.05 Threshold: Known as the "holy grail" of cut-offs, established by classic statisticians as a balance between strictness and leniency. Modern scientists debate using binary cut-offs and prefer describing results as statistical "power."
Extreme Cases: Research like the Higgs-Boson particle requires ultra-low thresholds, such as (1 in 3.5 million).
P-hacking: The unethical practice of altering tests or data to force a result below the significane threshold (e.g., turning into ). This is considered scientific misconduct.
Box Plots and Data Visualisation
Interquartile Range (IQR): Also called the midspread or middle . * Data is divided into four rank-ordered parts (quartiles). * $IQR$ is the difference between the and percentiles.
Whiskers: Typically represent .
Skewness in Box Plots: * Positively Skewed: Distribution is skewed "up" or to the right; the median is closer to the bottom (). * Negatively Skewed: Distribution is skewed "down" or to the left; the median is closer to the top ().
Comparison of Visuals: * Bar Plots: Can hide data distributions and skewness. * Violin Plots: Overlay box plots with data points to reveal the true density and outliers of the population.
Philosophical Implications in Data Analysis
Hypothesis as a Liability: Excessive focus on a specific hypothesis may prevent researchers from making unexpected discoveries, especially in large datasets.
The Gorilla Experiment: An exercise involving a dataset of $BMI$ and step counts for people. * Group 1: Asked to test a specific significant difference between men and women. * Group 2: Asked to be "hypothesis-free" and draw conclusions generally. * Outcome: If the data is manually plotted, the image of a gorilla appears in the distribution. This serves as a metaphor for "hidden" things in data that automated number-crunching might miss.
Conclusion: Researchers must explore data manually and visually to capture biological implications that simple stat tests might obscure.
ANOVA and Variable Associations
Data Classification for Statistical Testing
Sample Data Types: * Independent (Unpaired): Separate populations or items measured in parallel. There is no direct link between the items in different groups. * Dependent (Paired): The same populations or items measured repeatedly over time (e.g., a baseline measurement followed by a later measurement).
Distribution Types: * Parametric: Assumes a Normal Distribution (symmetrical bell curve). * Non-Parametric: Used for non-normal distributions. Deviations include: * Positive Kurtosis: Higher peak with heavier tails. * Negative Kurtosis: Flatter peak with lighter tails. * Negative Skew: Tail extends to the left. * Positive Skew: Tail extends to the right.
Analysis of Variance (ANOVA)
Definition: A statistical method used to compare means across three or more groups (). It is also occasionally referred to as the F-test (after Sir Ronald Fisher).
Mechanism: ANOVA calculates the variance within each group and the variance between groups to determine if at least one group mean is significantly different from the others.
The Problem with Multiple T-Tests: * Type I Error (False Positive): Occurs when the null hypothesis is rejected even though it is true (finding a difference that does not exist). * Cumulative Risk: If a significance level of (95% confidence) is used, there is a 5% chance of a false positive for a single test. Performing multiple consecutive T-tests causes this error rate to increase dramatically (, , and up to in the provided examples). * ANOVA Solution: ANOVA controls for these errors, keeping the overall Type I error rate at 5%.
One-Way ANOVA: Features and Assumptions
Definition: Checks if exactly ONE independent variable influences a metric dependent variable (e.g., examining the effect of different light types on plant growth).
Data Requirements: * Homogeneity of Variance: The variances within each group should be roughly equal. This is formally checked using the Levene test. * Normal Distribution: Data must be normally distributed (Parametric). * Scale of Measurement: The dependent variable must be metric-scaled (ratio or interval numbers like height or weight), while the independent variable (the "factor") is nominally scaled with more than two levels (categories like diet types or UV light types). * Non-Parametric Alternative: If data is not normal, the Kruskal-Wallis test (an extension of the Mann-Whitney test) should be used.
Hypotheses: * Null Hypothesis (): The mean values of all groups are identical (). * Alternative Hypothesis (): There is at least one difference among the mean values ( or or ).
Limitation: ANOVA identifies that a difference exists but does not specify which groups are different. This requires post-hoc testing.
Mathematical Calculation of One-Way ANOVA
Grand Mean (): The mean value of all data points combined. .
Group Mean (): The mean for a specific group. .
Sums of Squares (Variability): * Total Sum of Squares (): . * Sum of Squares Between Groups (): . (Measures variability between group means and the grand mean). * Sum of Squares Within Groups (): . (Measures variability within the individual groups; also called Error).
Mean Squares (Variance): * Between (): where ( is the number of groups). * Within (): where ( is the total sample size).
F-Value: . This is the ratio of variability among groups vs. variability within groups. * If is true, should be close to . * If is false, between-group variance is larger, resulting in an F > 1.0 .
Two-Way ANOVA and Interactions
Definition: Used when there are two independent variables (factors) being considered (e.g., light type and fertilizer type on plant growth).
Core Tests: Two-way ANOVA assesses: 1. The effect of Factor 1 on the dependent variable. 2. The effect of Factor 2 on the dependent variable. 3. The Interaction Effect: Whether the effect of Factor 1 depends on the level of Factor 2.
Hypotheses: There are three sets of Null/Alternative hypotheses representing both factors and the interaction between them.
Post-Hoc Tests (Multiple Comparisons)
Purpose: Used after an ANOVA has established a significant difference to determine exactly which specific groups differ from each other.
Design: These tests use ANOVA's P and F values as inputs but apply an "Experiment-wide error rate" to control for Type I errors.
Adjusted P-Values: The tests output adjusted p-values for every group pairing. Differences are only considered significant if the adjusted p-value is below the cut-off (e.g., < 0.05 ).
Trade-offs: While they prevent false positives (Type I), they increase the risk of Type II errors (false negatives or missing real differences) due to their strictness.
Common Tests: * Tukey: Widely used, balances power and robustness. * Scheffe: Flexible but conservative. * Bonferroni: Simple and conservative calculation based on the number of comparisons.
Correlation: Direction and Strength
Definition: Evaluates the association between two numeric variables ( and ).
Coefficient (r): Values range from (perfect positive) to (perfect negative), with indicating no relationship.
Assumptions: * Variables are continuous (Pearson correlation). * Variables are ordinal (Spearman correlation - non-parametric). * Relationship is linear (not curved).
Significance: A t-test is used to determine if is significantly different from zero (Null: no correlation).
Causation Warning: Correlation does NOT imply causation. Associations may be due to a "confounding factor." * Example: Chocolate consumption and Nobel Prizes are positively correlated. Chocolate is a proxy for wealth; wealth provides educational funding, which leads to Nobel Prizes.
Linear Regression Analysis
Definition: A method to model relationships and predict values of a dependent variable based on one or more independent variables (predictors).
Regression Equation: * : Dependent variable (what we want to find). * : Y-intercept (the value of when ). * : Slope/rate of change. * : Independent variable.
Coefficient of Determination (): Measures how much variability the model explains. Values range from to . An of indicates much higher confidence in predictions than .
Residuals: The vertical distance between an observed data point and the regression line (error).
Standard Estimation Error: Effectively the standard deviation of the residuals. Plotting 2 standard deviations () around the line represents a 95% confidence shaded area/dotted boundary where future measurements are likely to fall
Statistical Differences Between Two Groups Study Guide
Classifying Groups of Data
Groups of data can be categorized into three primary types:
- Unpaired (Independent): Data sets involving separate populations or entities measured in parallel.
- Paired (Dependent): Data where the same populations or items are measured repeatedly (e.g., at time point 1 and later at time point 2).
- Repeated: A specific type of paired data.The Sample/Measurement Unit: It is vital to think carefully about the unit being measured (as introduced in statistics session 1).
Parametric vs. Non-Parametric Distributions
Parametric Data ("Normally" distributed):
- Has a symmetrical distribution around the mean value.
- Requires quantitative data that follows a specific bell-curve shape.Non-Parametric Data ("Non-normally" distributed):
- Is not symmetrically distributed around the mean (skewed distribution).
- Often occurs when there is a lack of sufficient data to establish a normal distribution.
Establishing Paired Data Structures
Definition: Paired data occurs when an individual, a sampling station, or a specific test tube is tested twice or across two different conditions.
Common Scenario: "Before chemical exposure" versus "After chemical exposure" within the same subject.
The T-Test (Parametric Analysis)
Core Definition: A parametric test used to compare the means of two groups.
Mandatory Assumptions for Dependent Variables:
1. The data must be continuous.
2. The data must be approximately normally distributed.
3. The variances of the data must be homogeneously distributed (homoscedasticity); one group should not be significantly more variable than the other.Mandatory Assumptions for Independent Variables:
- The "cause" variable must consist of two categorical, related groups or matched pairs.
- Note: If there are more than 2 groups, ANOVAs are utilized instead.The Null Hypothesis (): For a paired t-test, states there is NO difference between the two data series (e.g., comparing the heights of two populations).
Logic of the Test: A t-test evaluates whether the observed difference between two groups is larger than what sampling variation would account for, assuming no true population difference exists.
Internal Mechanics:
- The test compares the "Signal" (the size of the difference between the means) against the "Noise" (the variation of values within the groups).
- If the "Signal" is large relative to the "Noise," the groups likely differ in reality.
Variations of the T-Test
Independent Two-Sample T-test (Unpaired): Compares means of two entirely separate groups (e.g., Treatment Group vs. Control Group).
Paired T-test (Paired): Compares means from the same group at two different times (e.g., Before intervention vs. After intervention).
One-Sample T-test: Compares a single group's mean against a known standard or constant value (less common).
Calculation and Interpretation of T-Values
The T-value: This is derived as the ratio between the difference of means and the variation in the groups.
Standard Error (): Calculated differently depending on whether the test is paired or unpaired. Formula: .
Relationship of Difference to T-value:
- As the difference between the two means increases, the T-value increases.
- A high T-value suggests a substantial difference compared to variation (noise).
- A high level of variation in the data decreases the T-value, making it more likely the difference is just random fluctuation.Degrees of Freedom and P-value: After the T-value is obtained, a -value is derived using degrees of freedom.
- If p < 0.05, the Null hypothesis is rejected, indicating a likely real difference.
- If boxes on box plots overlap significantly, a low T-value and high -value are expected (p > 0.05).
- Small variance/non-overlapping boxes indicate a statistically significant difference.
Mann-Whitney U Test (Non-Parametric Unpaired)
Definition: The non-parametric equivalent to the unpaired t-test, used only for two test groups.
Assumptions (or lack thereof):
- Does NOT assume normal distribution.
- Does NOT require homogeneity of variance.Data Processing: Data is converted into ranks prior to analysis. This makes the test ideal for datasets containing very extreme values or outliers.
Comparison Metric: It does not compare means; it compares "rank sums."
Requirement for Variables:
- Two independent variables with at least ordinal scaled characteristics.
- Dependent variables must be ordinal, metric, or continuous (anything that can be ranked).Hypotheses:
- : There is no difference in terms of central tendency (medians) between the two groups.
- : There is a difference in terms of central tendency.Calculations:
- Ranks are summed for each group and average rank sums are calculated.
- -values are derived from rank sums ( and ).
- A -value is calculated using Standard Deviation and number of samples.
- close to zero = consistent with . far from zero = significant difference.
- If p < 0.05, rejection of occurs.
Wilcoxon Signed Ranks Test (Non-Parametric Paired)
Definition: The non-parametric version of the paired t-test.
Comparison Metric: Based on the ranks of the paired differences rather than means.
Requirements:
- Dependent variable must be ordinal or continuous (must be a number to calculate differences).
- Independent variable must consist of matched pairs.
- Minimum of six pairs of data required.Step-by-Step Analysis:
1. Calculate the difference for each pair (e.g., Before - After). This includes negative values.
2. Rank these absolute differences from smallest (Rank 1) to largest.
3. Re-assign the original positive or negative signs to the ranks.
4. Sum the Positive Ranks and sum the Negative Ranks.
5. Obtain the Wilcoxon Rank Statistic (), which is the minimum value of the Positive or Negative rank sums.
6. Calculate the Expected value (what is expected if there were no difference).
7. Obtain a -value using , expected , and .
8. Derive a -value. If p < 0.05, reject .Case Example (River Flow): If river flow is measured at specific stations and reported as , , since p < 0.05, is rejected.
Key Rule for Test Selection
Strength of Tests: Parametric tests (like t-tests) are "stronger" than non-parametric tests.
Priority: Always use a parametric test if the data fits the requirements (normal distribution, etc.). Do not default to Mann-Whitney or Wilcoxon if the data is parametric.
Chi-Square Test ()
Definition: A very common non-parametric statistical test used to determine if there is a relationship between two categorical variables.
Function: It checks if frequencies occurring in a sample differ significantly from expected frequencies (what would happen if there was no association).
Hypotheses:
- : Observed and expected frequencies (from a known random distribution) are NOT different.
- : Observed and expected frequencies are different.Common Applications:
- Relationship between age group and social media preference.
- Medical treatment success rates relative to patient gender.
- Association between political party and occupation.
- Student performance influenced by teaching methods.Calculation of Degrees of Freedom ():
-Critical Chi Value Logic:
- Use the calculated and desired -value (typically ) with a Chi lookup table.
- If the Critical Chi Value is larger than the Calculated Chi Value, the Null hypothesis is retained.
- If the Critical Chi Value is smaller than the Calculated Chi Value, the Null hypothesis is rejected.Case Study (Fish Lice): Testing if fish lice attach randomly () or have a preference. Reported as , . Since p > 0.05, is accepted; the lice attach randomly.
Statistical Measures of Spread
Standard Deviation ():
- The square root of the variance.
- A measure of dispersion/variability.
- Shows how far apart values in a dataset are from the mean.
- Used to identify outliers and summarize descriptive data.Standard Error ():
- Specifically the Standard Error of the Mean ().
- Measures sampling error; how accurately the sample mean represents the true population mean.
- Shows the variation likely between different samples of a population and the population itself.
- It is an estimation of the random sampling process rather than a definite descriptive value.
Statistical Test Selection Matrix (Reference Table)
Tests of Differences (Two Groups):
- Paired Continuous Data: Paired t-test.
- Unpaired Continuous Data: T-test or One-way ANOVA.
- Paired Discrete/Ordinal Data: Wilcoxon's signed ranks test.
- Unpaired Discrete/Ordinal Data: Mann-Whitney U test.
- Paired Categorical Data: Chi-square test.
- Unpaired Categorical Data: Chi-square test.Tests of Relationships:
- Correlation (Continuous): Pearson product-moment correlation.
- Correlation (Discrete/Ordinal): Kendall's rank correlation or Spearman's rank correlation.
- Regression (Continuous): Linear regression, quadratic, or polynomial regression.
- Regression (Discrete/Ordinal): Logistic regression, Model II regression, or Kendall's robust line fit.Complex Analysis:
- 1+ factor with Covariates: ANCOVA.
- Many variables/groups to discriminate: CVA, Discriminant function analysis, MANOVA, DCA.
- Exploring Variables: PCA (Principal Component Analysis).
Statistical Differences Between Two Groups and Comprehensive Study Groups
Fundamental Concepts of Data Groups
Unpaired (Independent) Data: * Involves separate populations or items measured in parallel. * Example: Comparing a treatment group against a completely separate control group.
Paired (Dependent) Data: * Involves the same populations or items measured repeatedly or in matched pairs. * Example: Measuring an individual, a test tube, or a sampling station twice (e.g., "Before" and "After" chemical exposure).
Measurement Unit: It is critical to carefully define the sample/measurement unit when determining group types.
Parametric vs. Non-Parametric Data
Parametric Data: * Refers to data that is "Normally" distributed. * Characterized by a symmetrical distribution around the mean (bell curve).
Non-Parametric Data: * Refers to data that is "Non-normally" distributed. * Characterized by skewness (not symmetrical around the mean). * This category is also used when there is insufficient data to determine a distribution.
The T-Test (Parametric Analysis)
Definition: A very common parametric test used to compare the means of two groups.
Assumptions for Use: * The Dependent Variable must be continuous. * Data must be approximately normally distributed. * Variances must be homogeneously distributed (homoscedasticity); one group should not be significantly more variable than the other. * The Independent Variable must consist of two categorical related groups or matched pairs.
Logic of the T-Test: * Evaluates if the difference between two groups is larger than expected from sampling variation alone. * Compares the Signal (size of the difference between means) against the Noise (variation within the groups). * If the mean difference is large compared to natural variation, the groups likely differ in reality.
Primary Versions of the T-Test: * Independent Two-Sample T-test (Unpaired): Compares means of two separate groups (e.g., treatment vs. control). * Paired T-test: Compares means from the same group at different times (e.g., heart rate before and after exercise). * One-Sample T-test: Compares a single group mean against a known standard or value (less common).
Hypotheses: * Null Hypothesis (): There is NO difference between the two data series (e.g., heights of two populations are equal).
Calculation and Interpretation: * -value: The ratio between the difference of means and the variation in groups. * Standard Error calculation: . * Relationship: As the difference between means increases, the -value increases. A larger -value suggests a substantial difference compared to noise. * Variability: High variation in data decreases the -value, making it more likely that mean differences are random fluctuations. * -value: Derived from the -value and degrees of freedom. If p < 0.05, the Null Hypothesis is rejected. * Visualizing Significance: In box plots, large overlapping variations usually indicate no statistical difference. If boxes do not overlap, a significant difference is likely.
Mann-Whitney U Test (Non-Parametric Unpaired)
Definition: The non-parametric equivalent to the unpaired t-test, used only for two independent test groups.
Key Characteristics: * Does not assume normal distribution or homogeneity of variance. * Requires data to be converted into Ranks before analysis, which reduces the influence of extreme outliers. * Compares "Rank Sums" or Medians (central tendency) rather than Means.
Assumptions: * Two independent variables with at least ordinal characteristics. * Dependent variable must be ordinal, metric, or continuous (numerical values that can be ranked).
Hypotheses: * Null Hypothesis (): No difference in central tendency between groups in the population. * Alternative Hypothesis (): There is a difference in central tendency.
Procedure: 1. Convert data to ranks. 2. Sum the ranks ( and ). 3. Calculate the average rank sum for each group. 4. Derive the -value from rank sums. 5. Calculate a -value (utilizes standard deviation and number of samples). A -value far from zero suggest significance. 6. Derive the -value. If p < 0.05, reject .
Wilcoxon Signed Ranks Test (Non-Parametric Paired)
Definition: The non-parametric version of the paired t-test.
Requirements: * Dependent variable must be on an ordinal or continuous scale. * Independent variable must consist of matched pairs. * Requires a minimum of six pairs of data.
Logic: Based on the ranks of the differences between paired values rather than the means of the values themselves.
Procedure (Example: Reaction time morning vs. afternoon): 1. Calculate the difference for each pair (). Differences can be positive or negative. 2. Rank the absolute differences from smallest (Rank 1) to largest. 3. Re-assign the positive or negative signs to the ranks. 4. Sum the positive ranks and sum the negative ranks separately. 5. The greater the difference between these sums, the more likely the difference is significant.
Statistical Metrics: * Wilcoxon Rank Statistic (): The minimum value of the positive or negative rank sums. * Expected : The value expected if no true difference exists. * -value: Obtained using , Expected , and Standard Deviation. * Report Example: A river flow study at several stations reported as . Since p < 0.05, is rejected.
Chi-Square Test ()
Definition: A non-parametric test used to determine if there is a relationship between two categorical variables.
Function: Compares observed frequency () to expected frequency () based on a random distribution.
Hypotheses: * Null Hypothesis (): Observed and expected frequencies are NOT different (no association). * Alternative Hypothesis (): Observed and expected frequencies are different.
Applications: * Relationship between age group and social media preference. * Medical treatment success rate by gender. * Association between political affiliation and occupation.
The Contingency Table: A table used to organize categorical data (e.g., Gender vs. Education Level). It is converted into an "Expected Frequency Table" to normalize results.
Critical Values and Calculation: * Degrees of Freedom (): Calculated as . * Comparison: Calculate the Chi-square value () and compare it to a "Critical Chi Value" from a lookup table (using and ). * If Calculated Value > Critical Value, reject . * If Calculated Value < Critical Value, retain .
Case Study (Fish Lice): * Research Question: Do fish lice attach randomly or have a preference? * Result: . * Interpretation: Because p > 0.05, is accepted; lice attach to fish randomly.
Summary of Measures of Spread
Standard Deviation (): * The square root of the variance. * A measure of dispersion or variability. * Shows how far apart individual data points are from the mean. * Used for descriptive statistics to summarize data and identify outliers.
Standard Error (): * Most commonly the Standard Error of the Mean (). * Measures sampling error. * Indicates how accurately a sample mean represents the true population mean. * It is an estimation of variation between different samples and the population.
Summary of Statistical Selection Rules
Priority Rule: Always use a parametric test (like a t-test) if the data fits the normal distribution requirements. Parametric tests are considered "stronger" than non-parametric alternatives.
Test Selection Table (Biologist's Guide): * 1 Categorical Variable: Chi-square test (fit to uniform); G-test. * 2 Categorical Variables: Chi-square for association. * Comparison of 2 Unpaired Groups: * Continuous/Normal: Unpaired t-test or One-way ANOVA. * Ordinal/Discrete: Mann-Whitney U test. * Comparison of 2 Paired Groups: * Continuous/Normal: Paired t-test. * Ordinal/Discrete: Wilcoxon signed ranks test. * Comparison of >2 Unpaired Groups: * Continuous/Normal: One-way ANOVA. * Ordinal/Discrete: Kruskal-Wallis test. * Comparison of >2 Paired Groups: * Continuous/Normal: Repeated-measures ANOVA. * Ordinal/Discrete: Friedman test.