Fundamental_skills_Statistics_for_biologists_Part3

Introduction to Statistics for Life Sciences

  • Instructor: Donald Reid (glasgow.ac.uk)

Overview of Topics Covered

  • Structure of a General Linear Model

    • Consideration of factor and covariate explanatory variables

    • Model output predictions

    • Reporting results including statistical metrics like degrees of freedom, Mean Sums of Squares, and F-ratio

Class Example: Analysis of Heights

Data Summary

  • Various attributes recorded for each observation (Height, Age, Gender, Eye colour)

  • Sample Size: 69

  • Example heights data is detailed with gender and eye color listed for each individual.

Variation in Class Heights

Total Sums of Squares (TSS)

  • TSS = 1380.64

What is Being Analyzed?

  • Research Question: Can variation in class height be explained by biological sex?

    • Response Variable: Class height

    • Explanatory Variable: Sex (factor)

    • Hypotheses:

      • Null Hypothesis (H0): Class height not affected by sex

      • Alternative Hypothesis (Ha): Class height affected by sex

    • Model Structure: Model fit formula - Height ~ Sex

Categorical Model Analysis

Boxplot Representation

  • Display of Height versus Sex visually represented.

Analysis of Variance (ANOVA) Results

Table Overview

  • Response Variable: Height

  • Degrees of Freedom (Df)

  • Sum of Squares

    • Factor (Sex): Df = 1, Sum Sq = 146.48, Mean Sq = 146.48, F-value = 7.9519, P-value = 0.006312 (significant)

    • Residuals: Df = 67, Sum Sq = 1234.16

  • Coefficients for the model analyzed:

    • Intercept: 65.0857, Std. Error: 0.6623

    • Factor (Sex) MALE: 2.9854, Std. Error: 1.0587

Fitting a Categorical Model

Interpretation

  • Coefficients indicate fitted values to assess the relationship between height and sex

  • Model equation: fheight = c + aM (males) + aF (females)

Modeling Approach with Covariates

Total and Explained Variation

  • Introduction to covariate in model fitting

Covariate Model Structure

  • Response variable vs. explanatory variables

    • Main equation: fheight = m · (Age) + c

    • Calculation of fitted values at different ages

Prediction in Linear Models

Example Calculation

  • Prediction based on height values related to weight.

    • Example: If height increases by 1 foot for a dragon, its weight increases by 0.3 tons.

Reporting Results

  • Importance of conveying relevant statistics: F-statistics, significance levels, p-values

  • Example reporting: Discussing effects on test statistics and their significance thresholds

Degrees of Freedom (df)

Explanation

  • Unique pieces of information quantifying variation

  • Calculation of total and explained variation understood through df

    • For class height, TSS: 1380.64 and Total Dfs: 68

F-Ratio Calculation

Meaning and Importance

  • F-ratio derived from comparing explained mean squares to residual mean squares

    • Formula Definition: F = Mean ESS / Mean RSS

Understanding P-values

Significance Contextualization

  • P-value indicates the probability of observing such extreme results if the null hypothesis holds true.

Lab Report and Q&A Session

Future Tasks

  • Expectation to analyze PTC data using R

Research Questions Defined

  • Investigating links between genotype and other variables like Sex, Smoking preference, etc.

Conclusion

Skills Developed

  • Choosing research questions, forming hypotheses, testing, reporting results, and predictive analytics for response variables.

Revision Notes: Introduction to Statistics for Life Sciences

Intended Learning Outcomes

  1. Understand the statistical output of a General Linear Model (Reinforced in Data Analysis 2 lab)

  2. Calculate values of your response variable for a given value of explanatory variable

  3. Know how to report statistical results (Reinforced in Data Analysis 2 lab)


Structure of a General Linear Model

  • Statistical Outputs:

    • Model output predictions include key statistical metrics:

      • Degrees of freedom

      • Mean Sums of Squares

      • F-ratio

Example Analysis: Class Heights

  • Research Question: Can variation in class height be explained by biological sex?

  • Response Variable: Class height

  • Explanatory Variable: Sex (factor)

  • Hypotheses:

    • Null Hypothesis (H0): Class height not affected by sex

    • Alternative Hypothesis (Ha): Class height affected by sex

    • Model Structure: Height ~ Sex

Data Summary

  • Sample Size: 69

  • Height Data: Variation among attributes including Age, Gender, Eye Color.

Analysis of Variance (ANOVA) Results

Factor

Df

Sum Sq

Mean Sq

F-value

P-value

Sex

1

146.48

146.48

7.9519

0.006312

Residuals

67

1234.16

  • Coefficients:

    • Intercept: 65.0857, Std. Error: 0.6623

    • Factor (Sex) MALE: 2.9854, Std. Error: 1.0587

Fitting a Categorical Model

  • Interpretation of Coefficients:

    • Fitted values assess relationship between height and sex.

    • Model equation: fheight = c + aM (males) + aF (females)

Predicting Response Variable Values

  • Modeling with Covariates:

    • Covariate Model Structure: Response variable vs. other explanatory variables.

    • Main equation for prediction: fheight = m · (Age) + c

Example Prediction Calculation

  • If height increases by 1 foot, weight increases by 0.3 tons for the modeled variable.

Reporting Statistical Results

  • Importance of Reporting:

    • Clearly convey relevant statistics:

      • F-statistics

      • Significance levels

      • P-values

Example Reporting Structure

  • Discussing effects on test statistics and their significance thresholds.

  • Understanding Degrees of Freedom (df) for quantifying variation.

Conclusion

  • Competence in forming hypotheses, reporting results, and predictive analytics for response variables is developed through these concepts and tasks.

Revision Notes: Introduction to Statistics for Life Sciences

Intended Learning Outcomes

  1. Understand the statistical output of a General Linear Model (Reinforced in Data Analysis 2 lab)

  2. Calculate values of your response variable for a given value of explanatory variable

  3. Know how to report statistical results (Reinforced in Data Analysis 2 lab)


Structure of a General Linear Model

  • Statistical Outputs:

    • Model output predictions include key statistical metrics:

      • Degrees of freedom

      • Mean Sums of Squares

      • F-ratio

Example Analysis: Class Heights

  • Research Question: Can variation in class height be explained by biological sex?

  • Response Variable: Class height

  • Explanatory Variable: Sex (factor)

  • Hypotheses:

    • Null Hypothesis (H0): Class height not affected by sex

    • Alternative Hypothesis (Ha): Class height affected by sex

    • Model Structure: Height ~ Sex

Data Summary

  • Sample Size: 69

  • Height Data: Variation among attributes including Age, Gender, Eye Color.

Analysis of Variance (ANOVA) Results

Factor

Df

Sum Sq

Mean Sq

F-value

P-value

Sex

1

146.48

146.48

7.9519

0.006312

Residuals

67

1234.16

  • Coefficients:

    • Intercept: 65.0857, Std. Error: 0.6623

    • Factor (Sex) MALE: 2.9854, Std. Error: 1.0587

Fitting a Categorical Model

  • Interpretation of Coefficients:

    • Fitted values assess relationship between height and sex.

    • Model equation: fheight = c + aM (males) + aF (females)

Predicting Response Variable Values

  • Modeling with Covariates:

    • Covariate Model Structure: Response variable vs. other explanatory variables.

    • Main equation for prediction: fheight = m · (Age) + c

Example Prediction Calculation

  • If height increases by 1 foot, weight increases by 0.3 tons for the modeled variable.

Reporting Statistical Results

  • Importance of Reporting:

    • Clearly convey relevant statistics:

      • F-statistics

      • Significance levels

      • P-values

Example Reporting Structure

  • Discussing effects on test statistics and their significance thresholds.

  • Understanding Degrees of Freedom (df) for quantifying variation.

Conclusion

  • Competence in forming hypotheses, reporting results, and predictive analytics for response variables is developed through these concepts and tasks.

robot