Multivariate Analysis Study notes

Multivariate Data Analysis: Eurovision Example

  • Eurovision Song Contest as a case study for multivariate data analysis.
  • Objective: Determine if Eurovision judging is fair or if underlying patterns exist related to geopolitical relationships.
  • Data: Matrix of countries scoring other countries' acts.
  • Technique: Cluster analysis used to generate a dendrogram, revealing non-random patterns.
  • Findings:
    • Nordic/Scandinavian block.
    • Greece and Cyprus cluster together.
    • UK and Ireland cluster together.
    • Bosnia and Turkey cluster.
  • Conclusion: Voting is biased towards friendly nations rather than based purely on the quality of the acts.
  • Representation: Multivariate data analysis can be represented in 2D or 3D mapping, showing clusters (e.g., Eastern, Nordic/Baltic, Western European blocks).
  • Australia's chances: Australia is unlikely to win due to these geopolitical biases.

Course Overview: Techniques and Data

  • First lecture: Focuses on the definition, collection, analysis, and exploration of multivariate data.
  • Second lecture: Covers Principal Components Analysis and Factor Analysis.
  • Other techniques:
    • Cluster Analysis.
    • Non-metric multi-dimensional scaling.
    • MANOVA (multivariate, analysis of variance) - similar to ANOVA but with multiple variables.

References

  • Quinn and Keogh (old or new edition).
    • Chapters accessible online through the library.
    • Examples using soil samples and biodiversity.
    • Data available for download.
  • Data chapter for free.

Course goals

  • Understand major concepts such as ordination, principal components, factor analysis, non-metric multidimensional scaling, cluster analysis, MANOVA, and PerMANOVA.
    *Learn to make decisions about data and its interpretation.
    *To understand and interpret scientific papers using these techniques.

Multivariate Techniques: Key Themes

  • Linear combinations of variables: A recurring concept.
  • Distance/dissimilarity/similarity measures: Used repeatedly.
  • Transformation of data: Note that its use differs from univariate transformation.
  • Standardization.

Types of Multivariate Techniques

  • Focus on ordination and clustering.
  • Brief coverage of regression (multiple regression).
  • Classification (some overlap with clustering).

Multivariate Data: Definition

  • Multiple response variables.
  • Variables are not necessarily independent; they interrelate.
  • Examples:
    • Biodiversity of a park.
    • Physical properties of an environment (soil/water chemistry).
  • Specialized approaches are needed because data often do not conform to traditional statistical assumptions.

Why use Multivariate Analysis

  • Avoid conducting multiple ANOVAs on interrelated variables.
    • Problem: Type I errors (false positives) increase with multiple tests.
  • Multivariate statistics consider the interactions of variables together.

Types of Questions Addressed

  • Change in community composition.
  • Differences in water quality.
  • Changes in habitat characteristics.
  • Organismal traits (phenotypic traits, diet effects).

Examples of Outputs

  • Ordinations from principal components.
  • Clustering diagrams.
  • Non-metric multidimensional scaling plots.

Habitat Complexity: An Example

  • Habitat complexity is a multivariate term, comprising multiple interrelated variables.
  • Measurement:
    • Canopy cover.
    • Shrub cover.
    • Number of stems.
    • Diversity of trees/shrubs.
    • Number of logs.
    • Percentage of dirt cover.
    • Soil nutrients.
  • Multivariate analysis can compare sites based on these variables.

Data Analysis

  • Multivariate analysis summarizes existing data by measuring many variables.
  • Simplifies data into fewer derived variables (e.g., principal components).
  • Reduces Type I errors and reveals patterns.
  • Applicable to soil characteristics, habitat, organismal traits, water quality, etc.

Good Practices

  • Just because you can measure it, does not mean you should.

Brazil Football Example

  • Skills measured:
    • Passing accuracy.
    • Kicking strength (ball speed).
    • Speed on angle tracks.
    • Overall strength.
  • Principal components analysis results:
    • Skill Level: accuracy, technique, ball controle, etc..,
    • Athleticism Level: Physical strength and speed.