Data Analysis in Ecology

studied byStudied by 9 people
5.0(1)
Get a hint
Hint

Correlation

1 / 99

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

100 Terms

1

Correlation

One variable changes when the other variable changes

New cards
2

Regression

One variable changes because of the other variable changing

New cards
3

Assumptions of Correlation

Both variables are continuous

Both variables are normally distributed

New cards
4

Drawbacks of Correlation

No assumption of causation

May miss non-linear relationships

New cards
5

Coefficient of Determination

New cards
6

Cook’s distance

A value which dramatically effects a regression
Has unusual X and Y values

New cards
7

Multicollinearity

An independent variable highly correlated with another independent variable

New cards
8

What are the assumptions of Linear Regression?

Linear Relationship between X and Y

Normal distribution of Y at each value of X

Variance of Y is the same at each value of X

No correlation of errors

New cards
9

Covariate

Any continuous value that is not of direct interest

New cards
10

Model I Regression

Assumes X values are fixed by design

New cards
11

Model II Regression

Does not assume X values are fixed by design

New cards
12

When to use ANOVA?

2 independent categorical variables

New cards
13

When to use ANCOVA?

1 independent continuous variable, 1 independent categorical variable

New cards
14

When to use Multiple Regression?

2 independent continuous variables

New cards
15

Random effect

Any categorical variable with more than 5 levels that we are not directly interested in

New cards
16

Blocking variable

Any categorical variable with 5 or less levels that we are not directly interested in

New cards
17

Conditional R²

Explained variance in a whole mixed model

New cards
18

Marginal R²

Explained variance by fixed effects in a mixed model

New cards
19

General Linear Models

Linear Regression

ANOVA

ANCOVA

New cards
20

Generalized Linear Models

Logistic regression

Poisson regression

ANOVA

New cards
21

Components to a GLM

Random component

Systematic component

Link function

New cards
22

Random component

Probability distribution of a response variable

New cards
23

Systematic component

Explanatory variables as a combination of linear predictors

New cards
24

Link function

How the explanatory variables are related to the response variables

New cards
25

Fixed effects

Variables which are of direct interest

New cards
26

Logistic Regression

When you have a continuous predictor and a categorical response

New cards
27

Logit function

The link function in a Logistic Regression

New cards
28

Null-Hypothesis Testing

Decision based on acceptance or rejection

New cards
29

Information Theoretic Approach

Develops a likelihood of a model being correct

New cards
30

Bayesian Inference

Update beliefs about a parameter’s distribution based on a prior probability and a likelihood function.

New cards
31

Assumptions of Logistic Regression

Independent Error terms

Little to no multicollinearity

New cards
32

Non-assumptions of Logistic Regression

No linear relationship necessary

Independent variables do not need to be normal

No homoscedascticity

No continuous independent variables

New cards
33

Stepwise Regression

Building the best model by examining the impact of each variable to a model

New cards
34

Forward Selection

Build a model from scratch, adding variables if they significantly increase the model fit

New cards
35

Backward Elimination

Deconstruct a global model, removing variables until the model fits the data the best it can

New cards
36

Akaike’s Information Criteria

Selects the best model from a combination of model fit and parsimony

New cards
37

What information is needed to calculate AIC?

SSE or Log likelihood

Sample size

Number of parameters in the model

New cards
38

ΔAIC

AIC for current model - AIC for smallest model

New cards
39

w_i

AIC Model Probability

New cards
40

Effect Size

The magnitude of an effect

New cards
41

Types of effect statistics

d-stats

r-stats

odds ratios

New cards
42

Statistical Power

Probability of correctly finding a real pattern

New cards
43

What is the equation for statistical power?

1-β

New cards
44

Power analysis

The examining of a statistical test to ensure it has enough power to make a reasonable conclusion

New cards
45

What 3 factors affect statistical power?

Sample Size

alpha

Effect size

New cards
46

A priori Power Analysis

Power analysis done before an experiment to test if the sample size is large enough to detect a significant effect

New cards
47

Post hoc power analysis

Power analysis done after an experiment to test if the sample size was large enough to detect a significant effect

New cards
48

Steps to perform power analysis

Choose type

Select expected study design

Select tool which supports design

Provide 3 of 4 parameters

New cards
49

Overfitting

Creation of a model which is too focused on a certain set of data

New cards
50

Multivariate Data

Data with many dependent/response variables

Variables have interactions

Covariates

New cards
51

Non-parametric data

Independence may be violated

Variances are unequal

Not normally distributed

New cards
52

What is a decision tree?

A Non-parametric algorithm to classify and make predictions based on inputs

New cards
53

What is a random forest?

A series of multiple, randomly created decision trees

New cards
54

How many decision trees usually compose a random forest?

1000

New cards
55

How is a random forest made?

  1. Training Dataset

  2. Bootstrapping

  3. Create individual decision trees from bootstrapping

    1. Collection of answers from the decision tree, choosing the majority decision in a process called Bagging

New cards
56

What is cluster analysis?

The grouping of data points into clusters based upon similar traits

New cards
57

Why should you use cluster analysis?

Reveal hidden patterns

New cards
58

Hard clustering

Each data point in a cluster analysis belongs only to one cluster

New cards
59

Soft clustering

Each data point in a cluster analysis is given a probability it would be found in one cluster or another

New cards
60

Hierarchical Clustering

Clustering based on relationship between data points

New cards
61

How is hierarchical clustering performed?

Finding the greatest vertical distance in a dendrogram made up of the same degree of splitness

New cards
62

K-Means Clustering

Choosing a predefined number of centroids (K) which the data will be clustered too

New cards
63

What is a Centroid?

The mean of a cluster point

New cards
64

How to choose K?

Elbow method

Silhouette method

New cards
65

What are HBIs?

Long-chained Alkenes produced by Marine Diatoms

New cards
66

How to use HBIs in data analysis?

They are produced by different forms of algae, and are thus biomarkers of what algae are primarily being consumed in the food web

New cards
67

What is H-Print?

A singular index for multiple biomarkers

Lower values mean it’s more sympagic, higher means more pelagic

New cards
68

iPOC

Index indicating the proportion of organic carbon derived from sea ice algae

New cards
69

Sea Ice Algae

Sympagic Diatoms which produce HBI I

New cards
70

Phytoplanktonic Algae

Pelagic algae which produce HBI III

New cards
71

Kernel Density Estimation

A visual display of a probability distribution using density curves

New cards
72

Bandwidth

A scalar for the width of a kernel

New cards
73

Ecological Spatial Analysis

Relationship between the observed spatial distribution of a species and the mechanisms behind that distribution

New cards
74

Minimum Convex Polygon

Draws the smallest polygon around a series of points with all interior angles being less than 180 degrees

New cards
75

Utilization Distribution

A method for determining an organisms home range based upon density points

Can use Kernel Density Estimation to get this

New cards
76

How do you collect shape data?

Take standardized photographs (include a scale reference)

Digitize landmarks for shape and ensure they’re consistent and repeatable

New cards
77

What is General Procrustes Analysis

An analysis which outputs centroid sizes and coordinates which represent the shape

It preserves euclidean distance, and scales/transforms/rotates so the images have a common frame of reference

New cards
78

Procrustes ANOVA

Determines the variation in shape caused by one or more factors

New cards
79

Residual Randomization in Permutation Procedures

Sums of squares are calculated across many permutations to determine effect probabilities

New cards
80

Assumptions of PCA?

Correlation in data

Most data points being non-zeros

New cards
81

Steps to a PCA

Centering & scaling the data

Calculating covariance matrix

Calculating eigenvalues/vectors

Finding principle components

New cards
82

Covariance matrix

Matrix with each variable appearing in the rows and columns, where variance is shown for every variable and covariance is shown for different variables

New cards
83

Calculate Eigenvalues of a covariance matrix

Find the determinant of the covariance matrix and solve for lambda to find the variances for the new axes

New cards
84

Redundancy analysis

Allows you to find correlation between a predictor and a response and visually graph them in a tri-plot

New cards
85

Survival Analysis

Statistical method to analyze “time to event” data

New cards
86

Survival Function

Probability an event hasn’t occurred by a given time point

New cards
87

Survival Curves

Graphical representation of event occurrence over time

New cards
88

What are some characteristics of Time to Event Data?

Non-negative values

Non-normal distribution (right-skewed)

New cards
89

Right censoring

Event isn’t observed within a study period

New cards
90

Left censoring

Event occurs before a study period

New cards
91

Random censoring

Event occurs independently of time to event

New cards
92

Interval censoring

Specific time of event is unknown, but does happen in the interval

New cards
93

Kaplan-Meier Survival Curve

Non-parametric method used to estimate survival function

New cards
94

Log-rank test

Non-parametric used to compare survival function curves between two groups

New cards
95

Cox Proportional Hazards Model

Semi-parametric method used to assess impact of covariates on Hazard rate

New cards
96

Hazard Rate

Rate at which subjects experience event

New cards
97

Prior distribution

Framework for parameters in Bayesian analysis based on what we already know

New cards
98

Posterior distribution

Prior distribution of bayesian analysis with data added to it

New cards
99

How to interpret Bayesian analysis?

Confidence intervals and data visualization

New cards
100

Why use Bayesian methods?

Flexibility

Robustness

Nuance

New cards

Explore top notes

note Note
studied byStudied by 9 people
... ago
4.0(1)
note Note
studied byStudied by 68 people
... ago
4.2(5)
note Note
studied byStudied by 25 people
... ago
5.0(1)
note Note
studied byStudied by 1 person
... ago
5.0(1)
note Note
studied byStudied by 394 people
... ago
5.0(6)
note Note
studied byStudied by 4 people
... ago
4.0(1)
note Note
studied byStudied by 11 people
... ago
5.0(1)
note Note
studied byStudied by 1378 people
... ago
5.0(11)

Explore top flashcards

flashcards Flashcard (35)
studied byStudied by 10 people
... ago
5.0(1)
flashcards Flashcard (140)
studied byStudied by 23 people
... ago
5.0(1)
flashcards Flashcard (22)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (97)
studied byStudied by 10 people
... ago
5.0(1)
flashcards Flashcard (53)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (46)
studied byStudied by 43 people
... ago
5.0(2)
flashcards Flashcard (29)
studied byStudied by 2 people
... ago
5.0(2)
flashcards Flashcard (93)
studied byStudied by 8 people
... ago
5.0(1)
robot