biol 330

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/159

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

160 Terms

1
New cards

Identifier AKA key

Number or text that is used to identify and track unique rows of data in a data table

2
New cards

Sampling units

Any elements selected from a statistical population that make up your sample

3
New cards

continous data

Data that can take any value

4
New cards

discrete data

Numerical data values that can be COUNTED, whole numbers

5
New cards

intreval data

continous data that can have negative values

6
New cards

circular data

Data where highest value loops back around to lowest (e.g. clocks)

7
New cards

ordinal data

a type of data that refers solely to a ranking of some kind

8
New cards

nominal data

data of categories only. Data cannot be arranged in an ordering scheme. (Gender, Race, Religion)

9
New cards

Outcome

= Intercept + Slope * Predictor + Error

10
New cards

A model

is a mathematical equation that describes patterns in the data

11
New cards

resolution

Smallest difference in an object you can measure

12
New cards

30 to 300 rule

Find the maximum and the minimum value in a dataset. The number of unit steps between the two should be between 30 and 300

13
New cards

The fundamental difference between ordinal and nominal data is

there is NO order to nominal data.

14
New cards

wide-form flat file

A table where rows represents all the information collected for/ about a sampling unit

All the data collected for that sampling unit are contained in a row

15
New cards

5 types of derived variables

1. Ratio: Expresses as single value the relationship two variables have to one another

2. Proportion: A ratio with 1 understood as the denominator

3. Percentage: A ratio with 100 understood as the denominator

4. Rate: Amount of something that happens per unit time or space

5. Index: A general description of derived variables

16
New cards

Long-form flat file

Data stored in a single table (i.e. one worksheet).

Use far fewer columns & many more rows.

In our example, data is entered as two variables (in columns), one for species identity and one for species abundance.

17
New cards

meta data

Metadata is a file and/or variable comment that explains in detail what each variable is, how the data were collected, and any key assumptions of the data collection process.

Two good ways to provide metadata within your spreadsheet.

1) Create a worksheet at the start of the workbook called Metadata that outlines what fields are

2) Create a Metadata sheet and use Comments in the label for each column

18
New cards

SUMMARIZING PATTERNS IN DATA VIA DESCRIPTIVE STATISTICS

1. Measures of location - statistics that demonstrate where majority of data lie

2. Measures of spread - statistics that demonstrate how variable/ different your data are

3. Measures of certainty - statistics that tell you how well your sample represents your population

4. Measures of shape - statistics that describe the distribution of your data

19
New cards

inferential statistics

They use sample data to make generalizations or predictions about a population.

20
New cards

What are measures of location?

Statistics that describe the center or typical value of a dataset (mean, median, mode).

21
New cards

What is the arithmetic mean?

The sum of all values divided by the number of observations.

22
New cards

: What is the geometric mean used for?

Growth rates or percentages.

23
New cards

What is the harmonic mean used for?

Rates and ratios (like speed).

24
New cards

What does variance measure?

The average squared deviation from the mean.

25
New cards

What's the difference between sample and population variance?

Sample variance divides by (n - 1); population variance divides by N.

26
New cards

What is MAD?

The average of absolute differences from the mean.

27
New cards

What does the 68-95-99.7 rule tell us?

About 68% of data lie within 1 SD, 95% within 2 SDs, 99.7% within 3 SDs.

28
New cards

What are measures of certainty?

Statistics that describe how confident we are in our sample estimates.

29
New cards

What is margin of error?

Half the width of a confidence interval

30
New cards

What does a PDF represent?

The continuous probability curve — total area under curve = 1.

31
New cards

What is a CDF?

It shows cumulative probability up to each value.

32
New cards

What's the main difference between descriptive and inferential stats?

Descriptive = describe data; Inferential = predict or generalize from data.

33
New cards

What determines good data quality?

Representativeness, randomness, and lack of bias in data collection.

34
New cards

Define stratified random sample.

A sample where the population is divided into natural groups called strata, and random samples are taken from each.

35
New cards

What are strata?

Natural groupings within a population that do not overlap and together make up the whole population.

36
New cards

What is a stratified estimate of the mean also called?

A weighted mean.

37
New cards

List benefits of a stratified design.

Increases precision, ensures representation of all groups, and reduces sampling error.

38
New cards

Steps to get a stratified sample.

1. Divide population into non-overlapping strata. 2. Take random samples within each stratum. 3. Use weighted formulas to estimate overall mean and variance.

39
New cards

Define proportional allocation.

Sampling where the number of elements from each stratum is proportional to the stratum's size in the population.

40
New cards

Rules of proportional allocation.

1. Must round to nearest whole number. 2. Need at least two samples per stratum.

41
New cards

What is optimal allocation?

A design using prior population information and cost functions to increase sampling efficiency.

42
New cards

Three rules of thumb for optimal allocation.

1. Larger strata → more samples. 2. More variable strata → more samples. 3. Cheaper strata → more samples.

43
New cards

Define systematic sampling.

Sampling every k-th unit from an ordered list after a random start.

44
New cards

What is a complex sample?

A combination of sampling plans, like door-to-door surveys.

45
New cards

What do inferential statistics do?

Draw conclusions about a population from sample data.

46
New cards

What does the frequentist paradigm assume?

There is a single truth we estimate using samples.

47
New cards

Define frequentist statistics.

A statistical approach using event frequencies to make conclusions.

48
New cards

Define type I error.

Rejecting the null hypothesis when it's true.

49
New cards

Define type II error.

Failing to reject the null hypothesis when it's false.

50
New cards

What is a model?

A simplification of reality using predictor variables to explain variation in a response variable.

51
New cards

Write the general model equation.

Outcome = Intercept + Slope × Predictor + Error.

52
New cards

Define response variable.

The measured outcome you want to understand or explain.

53
New cards

Define predictor variable.

A variable that explains or predicts changes in the response variable.

54
New cards

Define slope.

Rate at which the response changes for a one-unit change in predictor.

55
New cards

Define intercept.

Mean response value when all predictors equal zero.

56
New cards

Define error (residual).

Difference between observed and predicted response values.

57
New cards

Define categorical variable.

A qualitative variable with a limited set of fixed values.

58
New cards

Define dummy variables.

Numeric representations of qualitative categories for modeling.

59
New cards

What are estimates in a model?

Values of slopes (betas) and intercepts describing predictor-response relationships.

60
New cards

How are p-values useful?

They test whether variation is explained by the model or if predictors are significant.

61
New cards

What do residual plots show?

Whether model errors are random and assumptions are met.

62
New cards

List goals in building a useful model.

1. Include enough variables for unbiased estimates. 2. Avoid too many variables for precision. 3. Balance bias and precision.

63
New cards

What is AIC?

Akaike Information Criterion, used to compare nested models for best fit with fewest variables.

64
New cards

Define Gaussian (normal) variable.

A variable with a symmetric distribution centered around the mean.

65
New cards

Properties of a normal distribution.

1. Most observations near mean. 2. Symmetrical shape. 3. Infinite tails.

66
New cards

Define correlation.

Measures strength and direction of a linear relationship between two variables.

67
New cards

Define partial correlation.

Relationship between two variables while controlling for others.

68
New cards

What are predicted values?

The arithmetic mean response predicted by the model.

69
New cards

What does the residual value show?

How far observed data deviate from predicted values.

70
New cards

Define 95% confidence interval.

Range where the true slope would fall 95 times out of 100 repeated samples.

71
New cards

What does a 95% confidence band represent?

The range of uncertainty for predicted values over predictor values.

72
New cards

Define linear model.

A statistical model describing response-predictor relationships using a linear equation and least squares regression.

73
New cards

Steps to build a model.

1. Plot response histogram. 2. Plot predictors. 3. Check scatterplots. 4. Calculate slope/intercept. 5. Assess fit. 6. Check residuals.

74
New cards

How does a GLM estimate slopes and intercepts?

By maximizing the likelihood function through repeated guesses until best fit is found.

75
New cards

WHAT DO I DO IF THE ERROR FAMILY IS NOT NORMAL?

OPTION 1: Transform it in order to normalize it and use linear model (LM)

OPTION 2: Figure out what distribution fits the data the best and model accordingly (GLM)

OPTION 3: Convert it to Ordinal data and ignore distribution (Non-parametric)

76
New cards

ALLOMETRY EXAMPLE OF WHERE TRANSFORMATION IS COMMON

How the characteristics of living creatures change with size

Isometric scaling happens when proportional relationships are preserved as size (1 to 1)

Log-log transformation of data and LM are the standard by which we measure allometry and if a relationship deviates from isometry

measures how the geometric mean changes

77
New cards

allometry defintions

Evolutionary allometry: Relationship observed between species

Ontogenetic allometry: Relationship of trait Y against trait X during development (how fast does heart grow relative to body?)

Static allometry: Relationship between individuals in a biological population at same stage of development

78
New cards

GENERALIZED LINEAR MODEL

Is a linear model with a different error distribution (probability density function) that better captures data generating process

If range of values is constrained (> 0, truncated to some max), variance is not constant or residuals not normally distributed might use GLM

79
New cards

How can both LM and GLM be viewed conceptually?

Response=deterministic part+stochastic part

80
New cards

What is the deterministic part of the model?

It predicts the mean — also called the systematic part of the equation.

81
New cards

What is the stochastic part of the model?

: The error or random part that models variability and uncertainty.

82
New cards

What is the key difference between a data transformation and a GLM link function?

Data transformation changes both mean and variance.

Link function only transforms the mean (systematic part).

83
New cards

When are LM and GLM the same?

When the error is normal and the identity link is used.

84
New cards

The log-normal is a probability distribution:

1) Let's us model multiplicative relationships. The linear model with normal distribution is additive

2) Ensures negative / zero values can't be predicted (normal will make such predictions)

3) Is one way of dealing with positive skew and heterogenous variance

4) Is typically used with continuous and interval data AND sometimes discrete

5) Has longer right tail with larger SD

6) Has a mode and median shifted to left relative to arithmetic mean

85
New cards

What are the two parts of a model?

Response = deterministic (systematic) part + stochastic (random) part

86
New cards

What does the deterministic part represent?

It predicts the mean (systematic component) of the response

87
New cards

What does the stochastic part represent?

The error or variability in the model

88
New cards

What makes a GLM different from a linear model?

GLM allows non-normal error distributions and uses a link function

89
New cards

What are the three components of a GLM?

Systematic component, random component, and link function

90
New cards

What is the role of the link function?

It connects the mean of the response to the linear predictor

91
New cards

What does a link function do?

It transforms the response and constrains predicted values within valid range

92
New cards

What happens when a link is not the identity function?

The model becomes nonlinear on the original scale

93
New cards

What is the purpose of transformations in modelling?

To stabilize variance and linearize relationships

94
New cards

What is the log-normal model based on?

Log-transformation of the response variable (ln(Y))

95
New cards

What type of error does a log-normal model assume?

Multiplicative error (constant variance on log scale)

96
New cards

What is a log-link Gaussian GLM?

A model where ln(E[Y]) is linear in predictors

97
New cards

How is a log-link GLM interpreted?

A unit increase in X multiplies the expected Y by e^beta

98
New cards

What is the main difference between log-normal and log-link models?

Log-normal transforms data; log-link transforms the mean

99
New cards

What does additive vs multiplicative response mean?

Additive = equal differences; multiplicative = equal proportions

100
New cards

When is multiplicative modelling better?

When data vary proportionally or span several magnitudes