Data Analysis in Ecology

5.0(1)

Studied by 9 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/99

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

100 Terms

New cards

Correlation

One variable changes when the other variable changes

New cards

Regression

One variable changes because of the other variable changing

New cards

Assumptions of Correlation

Both variables are continuous

Both variables are normally distributed

New cards

Drawbacks of Correlation

No assumption of causation

May miss non-linear relationships

New cards

Coefficient of Determination

R²

New cards

Cook’s distance

A value which dramatically effects a regression
Has unusual X and Y values

New cards

Multicollinearity

An independent variable highly correlated with another independent variable

New cards

What are the assumptions of Linear Regression?

Linear Relationship between X and Y

Normal distribution of Y at each value of X

Variance of Y is the same at each value of X

No correlation of errors

New cards

Covariate

Any continuous value that is not of direct interest

New cards

Model I Regression

Assumes X values are fixed by design

New cards

Model II Regression

Does not assume X values are fixed by design

New cards

When to use ANOVA?

2 independent categorical variables

New cards

When to use ANCOVA?

1 independent continuous variable, 1 independent categorical variable

New cards

When to use Multiple Regression?

2 independent continuous variables

New cards

Random effect

Any categorical variable with more than 5 levels that we are not directly interested in

New cards

Blocking variable

Any categorical variable with 5 or less levels that we are not directly interested in

New cards

Conditional R²

Explained variance in a whole mixed model

New cards

Marginal R²

Explained variance by fixed effects in a mixed model

New cards

General Linear Models

Linear Regression

ANOVA

ANCOVA

New cards

Generalized Linear Models

Logistic regression

Poisson regression

ANOVA

New cards

Components to a GLM

Random component

Systematic component

Link function

New cards

Random component

Probability distribution of a response variable

New cards

Systematic component

Explanatory variables as a combination of linear predictors

New cards

Link function

How the explanatory variables are related to the response variables

New cards

Fixed effects

Variables which are of direct interest

New cards

Logistic Regression

When you have a continuous predictor and a categorical response

New cards

Logit function

The link function in a Logistic Regression

New cards

Null-Hypothesis Testing

Decision based on acceptance or rejection

New cards

Information Theoretic Approach

Develops a likelihood of a model being correct

New cards

Bayesian Inference

Update beliefs about a parameter’s distribution based on a prior probability and a likelihood function.

New cards

Assumptions of Logistic Regression

Independent Error terms

Little to no multicollinearity

New cards

Non-assumptions of Logistic Regression

No linear relationship necessary

Independent variables do not need to be normal

No homoscedascticity

No continuous independent variables

New cards

Stepwise Regression

Building the best model by examining the impact of each variable to a model

New cards

Forward Selection

Build a model from scratch, adding variables if they significantly increase the model fit

New cards

Backward Elimination

Deconstruct a global model, removing variables until the model fits the data the best it can

New cards

Akaike’s Information Criteria

Selects the best model from a combination of model fit and parsimony

New cards

What information is needed to calculate AIC?

SSE or Log likelihood

Sample size

Number of parameters in the model

New cards

ΔAIC

AIC for current model - AIC for smallest model

New cards

w_i

AIC Model Probability

New cards

Effect Size

The magnitude of an effect

New cards

Types of effect statistics

d-stats

r-stats

odds ratios

New cards

Statistical Power

Probability of correctly finding a real pattern

New cards

What is the equation for statistical power?

1-β

New cards

Power analysis

The examining of a statistical test to ensure it has enough power to make a reasonable conclusion

New cards

What 3 factors affect statistical power?

Sample Size

alpha

Effect size

New cards

A priori Power Analysis

Power analysis done before an experiment to test if the sample size is large enough to detect a significant effect

New cards

Post hoc power analysis

Power analysis done after an experiment to test if the sample size was large enough to detect a significant effect

New cards

Steps to perform power analysis

Choose type

Select expected study design

Select tool which supports design

Provide 3 of 4 parameters

New cards

Overfitting

Creation of a model which is too focused on a certain set of data

New cards

Multivariate Data

Data with many dependent/response variables

Variables have interactions

Covariates

New cards

Non-parametric data

Independence may be violated

Variances are unequal

Not normally distributed

New cards

What is a decision tree?

A Non-parametric algorithm to classify and make predictions based on inputs

New cards

What is a random forest?

A series of multiple, randomly created decision trees

New cards

How many decision trees usually compose a random forest?

1000

New cards

How is a random forest made?

Training Dataset
Bootstrapping
Create individual decision trees from bootstrapping
1. Collection of answers from the decision tree, choosing the majority decision in a process called Bagging

New cards

What is cluster analysis?

The grouping of data points into clusters based upon similar traits

New cards

Why should you use cluster analysis?

Reveal hidden patterns

New cards

Hard clustering

Each data point in a cluster analysis belongs only to one cluster

New cards

Soft clustering

Each data point in a cluster analysis is given a probability it would be found in one cluster or another

New cards

Hierarchical Clustering

Clustering based on relationship between data points

New cards

How is hierarchical clustering performed?

Finding the greatest vertical distance in a dendrogram made up of the same degree of splitness

New cards

K-Means Clustering

Choosing a predefined number of centroids (K) which the data will be clustered too

New cards

What is a Centroid?

The mean of a cluster point

New cards

How to choose K?

Elbow method

Silhouette method

New cards

What are HBIs?

Long-chained Alkenes produced by Marine Diatoms

New cards

How to use HBIs in data analysis?

They are produced by different forms of algae, and are thus biomarkers of what algae are primarily being consumed in the food web

New cards

What is H-Print?

A singular index for multiple biomarkers

Lower values mean it’s more sympagic, higher means more pelagic

New cards

iPOC

Index indicating the proportion of organic carbon derived from sea ice algae

New cards

Sea Ice Algae

Sympagic Diatoms which produce HBI I

New cards

Phytoplanktonic Algae

Pelagic algae which produce HBI III

New cards

Kernel Density Estimation

A visual display of a probability distribution using density curves

New cards

Bandwidth

A scalar for the width of a kernel

New cards

Ecological Spatial Analysis

Relationship between the observed spatial distribution of a species and the mechanisms behind that distribution

New cards

Minimum Convex Polygon

Draws the smallest polygon around a series of points with all interior angles being less than 180 degrees

New cards

Utilization Distribution

A method for determining an organisms home range based upon density points

Can use Kernel Density Estimation to get this

New cards

How do you collect shape data?

Take standardized photographs (include a scale reference)

Digitize landmarks for shape and ensure they’re consistent and repeatable

New cards

What is General Procrustes Analysis

An analysis which outputs centroid sizes and coordinates which represent the shape

It preserves euclidean distance, and scales/transforms/rotates so the images have a common frame of reference

New cards

Procrustes ANOVA

Determines the variation in shape caused by one or more factors

New cards

Residual Randomization in Permutation Procedures

Sums of squares are calculated across many permutations to determine effect probabilities

New cards

Assumptions of PCA?

Correlation in data

Most data points being non-zeros

New cards

Steps to a PCA

Centering & scaling the data

Calculating covariance matrix

Calculating eigenvalues/vectors

Finding principle components

New cards

Covariance matrix

Matrix with each variable appearing in the rows and columns, where variance is shown for every variable and covariance is shown for different variables

New cards

Calculate Eigenvalues of a covariance matrix

Find the determinant of the covariance matrix and solve for lambda to find the variances for the new axes

New cards

Redundancy analysis

Allows you to find correlation between a predictor and a response and visually graph them in a tri-plot

New cards

Survival Analysis

Statistical method to analyze “time to event” data

New cards

Survival Function

Probability an event hasn’t occurred by a given time point