Data Analysis in Ecology

5.0(1)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/99

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

100 Terms

1
New cards

Correlation

One variable changes when the other variable changes

2
New cards

Regression

One variable changes because of the other variable changing

3
New cards

Assumptions of Correlation

Both variables are continuous

Both variables are normally distributed

4
New cards

Drawbacks of Correlation

No assumption of causation

May miss non-linear relationships

5
New cards

Coefficient of Determination

R²

6
New cards

Cook’s distance

A value which dramatically effects a regression
Has unusual X and Y values

7
New cards

Multicollinearity

An independent variable highly correlated with another independent variable

8
New cards

What are the assumptions of Linear Regression?

Linear Relationship between X and Y

Normal distribution of Y at each value of X

Variance of Y is the same at each value of X

No correlation of errors

9
New cards

Covariate

Any continuous value that is not of direct interest

10
New cards

Model I Regression

Assumes X values are fixed by design

11
New cards

Model II Regression

Does not assume X values are fixed by design

12
New cards

When to use ANOVA?

2 independent categorical variables

13
New cards

When to use ANCOVA?

1 independent continuous variable, 1 independent categorical variable

14
New cards

When to use Multiple Regression?

2 independent continuous variables

15
New cards

Random effect

Any categorical variable with more than 5 levels that we are not directly interested in

16
New cards

Blocking variable

Any categorical variable with 5 or less levels that we are not directly interested in

17
New cards

Conditional R²

Explained variance in a whole mixed model

18
New cards

Marginal R²

Explained variance by fixed effects in a mixed model

19
New cards

General Linear Models

Linear Regression

ANOVA

ANCOVA

20
New cards

Generalized Linear Models

Logistic regression

Poisson regression

ANOVA

21
New cards

Components to a GLM

Random component

Systematic component

Link function

22
New cards

Random component

Probability distribution of a response variable

23
New cards

Systematic component

Explanatory variables as a combination of linear predictors

24
New cards

Link function

How the explanatory variables are related to the response variables

25
New cards

Fixed effects

Variables which are of direct interest

26
New cards

Logistic Regression

When you have a continuous predictor and a categorical response

27
New cards

Logit function

The link function in a Logistic Regression

28
New cards

Null-Hypothesis Testing

Decision based on acceptance or rejection

29
New cards

Information Theoretic Approach

Develops a likelihood of a model being correct

30
New cards

Bayesian Inference

Update beliefs about a parameter’s distribution based on a prior probability and a likelihood function.

31
New cards

Assumptions of Logistic Regression

Independent Error terms

Little to no multicollinearity

32
New cards

Non-assumptions of Logistic Regression

No linear relationship necessary

Independent variables do not need to be normal

No homoscedascticity

No continuous independent variables

33
New cards

Stepwise Regression

Building the best model by examining the impact of each variable to a model

34
New cards

Forward Selection

Build a model from scratch, adding variables if they significantly increase the model fit

35
New cards

Backward Elimination

Deconstruct a global model, removing variables until the model fits the data the best it can

36
New cards

Akaike’s Information Criteria

Selects the best model from a combination of model fit and parsimony

37
New cards

What information is needed to calculate AIC?

SSE or Log likelihood

Sample size

Number of parameters in the model

38
New cards

ΔAIC

AIC for current model - AIC for smallest model

39
New cards

w_i

AIC Model Probability

40
New cards

Effect Size

The magnitude of an effect

41
New cards

Types of effect statistics

d-stats

r-stats

odds ratios

42
New cards

Statistical Power

Probability of correctly finding a real pattern

43
New cards

What is the equation for statistical power?

1-β

44
New cards

Power analysis

The examining of a statistical test to ensure it has enough power to make a reasonable conclusion

45
New cards

What 3 factors affect statistical power?

Sample Size

alpha

Effect size

46
New cards

A priori Power Analysis

Power analysis done before an experiment to test if the sample size is large enough to detect a significant effect

47
New cards

Post hoc power analysis

Power analysis done after an experiment to test if the sample size was large enough to detect a significant effect

48
New cards

Steps to perform power analysis

Choose type

Select expected study design

Select tool which supports design

Provide 3 of 4 parameters

49
New cards

Overfitting

Creation of a model which is too focused on a certain set of data

50
New cards

Multivariate Data

Data with many dependent/response variables

Variables have interactions

Covariates

51
New cards

Non-parametric data

Independence may be violated

Variances are unequal

Not normally distributed

52
New cards

What is a decision tree?

A Non-parametric algorithm to classify and make predictions based on inputs

53
New cards

What is a random forest?

A series of multiple, randomly created decision trees

54
New cards

How many decision trees usually compose a random forest?

1000

55
New cards

How is a random forest made?

  1. Training Dataset

  2. Bootstrapping

  3. Create individual decision trees from bootstrapping

    1. Collection of answers from the decision tree, choosing the majority decision in a process called Bagging

56
New cards

What is cluster analysis?

The grouping of data points into clusters based upon similar traits

57
New cards

Why should you use cluster analysis?

Reveal hidden patterns

58
New cards

Hard clustering

Each data point in a cluster analysis belongs only to one cluster

59
New cards

Soft clustering

Each data point in a cluster analysis is given a probability it would be found in one cluster or another

60
New cards

Hierarchical Clustering

Clustering based on relationship between data points

61
New cards

How is hierarchical clustering performed?

Finding the greatest vertical distance in a dendrogram made up of the same degree of splitness

62
New cards

K-Means Clustering

Choosing a predefined number of centroids (K) which the data will be clustered too

63
New cards

What is a Centroid?

The mean of a cluster point

64
New cards

How to choose K?

Elbow method

Silhouette method

65
New cards

What are HBIs?

Long-chained Alkenes produced by Marine Diatoms

66
New cards

How to use HBIs in data analysis?

They are produced by different forms of algae, and are thus biomarkers of what algae are primarily being consumed in the food web

67
New cards

What is H-Print?

A singular index for multiple biomarkers

Lower values mean it’s more sympagic, higher means more pelagic

68
New cards

iPOC

Index indicating the proportion of organic carbon derived from sea ice algae

69
New cards

Sea Ice Algae

Sympagic Diatoms which produce HBI I

70
New cards

Phytoplanktonic Algae

Pelagic algae which produce HBI III

71
New cards

Kernel Density Estimation

A visual display of a probability distribution using density curves

72
New cards

Bandwidth

A scalar for the width of a kernel

73
New cards

Ecological Spatial Analysis

Relationship between the observed spatial distribution of a species and the mechanisms behind that distribution

74
New cards

Minimum Convex Polygon

Draws the smallest polygon around a series of points with all interior angles being less than 180 degrees

75
New cards

Utilization Distribution

A method for determining an organisms home range based upon density points

Can use Kernel Density Estimation to get this

76
New cards

How do you collect shape data?

Take standardized photographs (include a scale reference)

Digitize landmarks for shape and ensure they’re consistent and repeatable

77
New cards

What is General Procrustes Analysis

An analysis which outputs centroid sizes and coordinates which represent the shape

It preserves euclidean distance, and scales/transforms/rotates so the images have a common frame of reference

78
New cards

Procrustes ANOVA

Determines the variation in shape caused by one or more factors

79
New cards

Residual Randomization in Permutation Procedures

Sums of squares are calculated across many permutations to determine effect probabilities

80
New cards

Assumptions of PCA?

Correlation in data

Most data points being non-zeros

81
New cards

Steps to a PCA

Centering & scaling the data

Calculating covariance matrix

Calculating eigenvalues/vectors

Finding principle components

82
New cards

Covariance matrix

Matrix with each variable appearing in the rows and columns, where variance is shown for every variable and covariance is shown for different variables

83
New cards

Calculate Eigenvalues of a covariance matrix

Find the determinant of the covariance matrix and solve for lambda to find the variances for the new axes

84
New cards

Redundancy analysis

Allows you to find correlation between a predictor and a response and visually graph them in a tri-plot

85
New cards

Survival Analysis

Statistical method to analyze “time to event” data

86
New cards

Survival Function

Probability an event hasn’t occurred by a given time point

87
New cards

Survival Curves

Graphical representation of event occurrence over time

88
New cards

What are some characteristics of Time to Event Data?

Non-negative values

Non-normal distribution (right-skewed)

89
New cards

Right censoring

Event isn’t observed within a study period

90
New cards

Left censoring

Event occurs before a study period

91
New cards

Random censoring

Event occurs independently of time to event

92
New cards

Interval censoring

Specific time of event is unknown, but does happen in the interval

93
New cards

Kaplan-Meier Survival Curve

Non-parametric method used to estimate survival function

94
New cards

Log-rank test

Non-parametric used to compare survival function curves between two groups

95
New cards

Cox Proportional Hazards Model

Semi-parametric method used to assess impact of covariates on Hazard rate

96
New cards

Hazard Rate

Rate at which subjects experience event

97
New cards

Prior distribution

Framework for parameters in Bayesian analysis based on what we already know

98
New cards

Posterior distribution

Prior distribution of bayesian analysis with data added to it

99
New cards

How to interpret Bayesian analysis?

Confidence intervals and data visualization

100
New cards

Why use Bayesian methods?

Flexibility

Robustness

Nuance