BILD 5 MIDTERM

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/114

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:09 PM on 6/8/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

115 Terms

1
New cards

PPDAC

problem, plan, data, analysis, conclusion

2
New cards

Experimental Journey

Observation, question, background research, identify variables, hypothesis, experimental design, predictions

3
New cards

Steps to test hypothesis

  1. Import data, 2. Tidy the data, 3. Look at the data 4. Test Hypotheses

4
New cards

Fundamental parts of coding

object, function, new object (input, process, output)

5
New cards

Objects in R

an object is anything that stores data in R, which you can assign a name too

6
New cards

Object Rules

allowed characters: letters, numbers, . and _

Must start with a letter

No spaces

case sensitive

7
New cards

Function

code that commands an operation and gives an output

8
New cards

Function Rules

uses parentheses

data goes inside parantheses

9
New cards

Continuous data

can take any value within a range, infinite possible values (height, weight, time)

10
New cards

Count

whole numbers only, represents how many (number of students)

11
New cards

Categorical

groups or labels with no inherent order (eye color, specifies, blood type)

12
New cards

Binomial

two possible outcomes (yes/no)

13
New cards

ordinal

distance between them is not equal or known (rankings, class level, pain scale) not numerical distances

14
New cards

tidy data

each variable in its own column, each observation in its own row, each value is in one cell

15
New cards

Bar Plot

relationship between 1 continous variable and 1 or more catrotgical variables

16
New cards

Scatterplot

relationship between 2 continous variables

17
New cards

line graph

relationship between 2 contintous variables, 1 variable is ordered (usually time)

18
New cards

Historgram

distribution of 1 continous variable

19
New cards

Bar and violin plots

visualize the distribution of a continuous variable for one or more categories

20
New cards

descriptive statistics

a set of summary measurements that simply communicate important information like centrality and variation

21
New cards

data type: non numerical

porportions, percentages, rations

22
New cards

data type: numerical

mean, median, mode

23
New cards

Mean

the average

24
New cards

Median

the middle of all ranked values (most robust)

25
New cards

Mode

the most common value

26
New cards

Robust

an overall measure being resilient to single values

27
New cards

residuals

oberservation minus the mean to predict future yields

28
New cards

range

the difference between the largest and the smallest value, range()

29
New cards

interquartile range

the range of the middle 50% of data, bar plot

30
New cards

variance

measures how far data points are from the mean on average (squared distance), var()

31
New cards

standard deviation

the square root of variance, average distance from the mean, sd()

32
New cards

Sample Mean

calculated from data, to estimate true mean

33
New cards

true mean

actual average of the entire population, usually unknown

34
New cards

Sample Standard Deviation

how spread out sample data is

35
New cards

True standard Dev

uses N (full population), usually unkown

36
New cards

Uncertaintiy a sample is accurate

standard error, confidence intervals, pvalues, statistical power

37
New cards

normal distribtuion

bell curve, are everywhere

38
New cards

Central Limit Theorem

assume you sample a population many times independently and each time you calc a sample mean, the distrubtion of those means will be normally distributred

39
New cards

standard error

how much the sample mean is expected to vary from the true population mean (how accurate sample mean is)

40
New cards

Confidence intervals

a range of values used to estimate the true population parameter (where the true mean is likely to be)

A 95% confidence means that 95% of the intervals would contain the true mean

as sample size increases, CL gets more percise

as variability increases, CL become more uncertain

41
New cards

independent variables

scientist change this factor on purpose to see what happens

x axis

42
New cards

dependent variable

measure to see if the IV made a difference

y axis

what changes

43
New cards

hypothesis

testable and falsifiable statement that explains a possible relationship between 2 or more variables based on existing knowledge

44
New cards

null hypothesis

the assumption that there is no effect, no differences, or no relationship

45
New cards

Parametric tests

t-test, ANOVA, chi-squared

46
New cards

Homodscedasticity

normal variance

47
New cards

heteroscedasticity

unequal variance

48
New cards

transforming data

change all values of a variable in an identical way mathematically, not changing the relationship between values

ex) square root, natural log

49
New cards

back transformation

transforming data and then undoing it using the reverse transformation

50
New cards

normality test

statistical method used to determine whether a dataset follows a normal distribution

51
New cards

Kolmogorov Smirnov test

genertates an ideal distribution using parameters drawn from our data, and we then compare this to our data

52
New cards

outliers

data that exists outside of the typical distribution of data

impossible values or in the 1.5 interquartile range

53
New cards

trimming

removes outliers

  • when outliers are extreme or impossible

54
New cards

winsorization

replaces outliers with less extreme values, such as the 5% and 95%

  • when extreme outliers still hold biological significance

55
New cards

Confusion matrix

a table that compares reality to wwhat your data set or test concludes

56
New cards

True postive

effect is real and the effect is detected

57
New cards

false positive

data shows effect, no effect in reality

58
New cards

false negataive

no effect in data but effect is real

59
New cards

true negative

no effect in reality to effect in data

60
New cards

test statistics

any calculated value that measures the difference between experimental groups (control vs treatment)

larger=more likely the null is false

difference between groups over amount of variation

61
New cards

z-score

are two populations means significantly different

62
New cards

t-score

are two sample means significatly different

63
New cards

F-score

are any of 3 or more samples means different

64
New cards

chi-squared

does the observed data match an expected distribution

65
New cards

p-value

assuming the null hypothesis is true, the p-value represents the probability you would have gotten your measured test statistic or smth greater by random chance

66
New cards

false negative

type 2 error

67
New cards

false positive

type 1 error

68
New cards

Scatterplot

use when both variables are continuous

  • can see clustering

  • direction of the relationship

  • ex: petal length and petal width

69
New cards

Covariance

shows how to variables vary together

  • if both increase together, the covariance is postive

  • if one increases and on decreases, the covariance is negative

  • covariance of zero means no consistent relationship

70
New cards

Variance vs Covariance

Variance only measures the spread of a single variable around its mean, while covariance measures how two variables vary together

71
New cards

Correlation

measures the direction and strength of a linear relationship between two variables

  • r value near 1 is strong positive relationship

  • r -1 is a strong negative

  • 0 means no relationship

72
New cards

Correlation vs Regression

  • if you want to know if two variables are related, use correlation

  • if you want to predict one variable from another, use regression

73
New cards

Linear Regression

model that shows how a dependent variable changes as an independent variable changes

y= mX + b

  • y is dependent variable

  • x is independent

  • m is slope: how much the dependent variable changes for every one unit increase of the independent

  • b is intercept

74
New cards

residuals

measure the difference between the observed value and the predicted value

Residual = observed - predicted

attempts to minimize the sum of squared residuals using ordinary least squares

  • prevents postive and negative errors from canceling each other out

75
New cards

R

correleation coefficient

76
New cards

R squared

measures explanatory power

  • proportion of variation in the dependent variable that be explained by the independent variable

  • the larger the R the better the model is predicting

77
New cards

Regression Assumptions

  • model must be linear

  • normally distributed

  • homoscedacity, spread should be consistent

  • elimant outliers etc

78
New cards

ANOVA

independent variable is categorical with more than two groups

dependent variable is continuous

  • researchers may test whether different chicken feed types produce different average chicken weights

79
New cards

Factor and levels

categorical independent variable

  • levels would be the categories within the factor

80
New cards

ANOVA assumptions

-independent, normality, equal variance (Kolmogroov Smirnov Test)

81
New cards

F statistic

between group variance over within group variance

  • f value become large is between group variance is larger than within group, and the p-value decreases

82
New cards

Positive control

expected to produce an effect

83
New cards

negative control

expected to produce no effect

84
New cards

correlation study

observes variables without directly changing them

ex: studying whether noise levels are associated with poor sleep in ICU

  • do not assign noise levels

  • observe existing conditions

  • correlation does not prove causation, may be confounding variables

85
New cards

Manipulative study

researches directly manipulate the independent variable

ex: assign one group to a new exercise regimen and another group is the control

  • stronger evidence for causation

86
New cards

retrospective studies

looks backward and uses existing records

  • less control over variable

87
New cards

prospective

follow subjects forward in time

88
New cards

Field experiments

occur in natural enviroments

  • more realistic

  • less control

  • more confounding variables

89
New cards

Labratory experiments

occur in controlled settings

  • easier to isolate variables

  • may not reflect real world conditions

90
New cards

In Vivo

living organism

91
New cards

In vitro

means outside a living organism, lab dish or test tube

92
New cards

randomized single factor

experiment with on independent variable (factor) where subjects are randomly assigned to treatment groups.

93
New cards

case control study

observational study that works backwards

  • start with people who already have an outcome and compare them to people who dont, look back at what different

94
New cards

repeated measures design

same subjects measured multiple times across different conditions or time points

  • each person serves as their own control

95
New cards

cross over design

  • type of repeated measures design where subjects switch between treatments in a sequence, with a wash out period, each subject experiences each treatment

96
New cards

quasi experiement

resembles a experiment but lacks a full random assignment

  • doesnt get full control of who gets which treatment

  • when randomization is unethical or impracticle

97
New cards

factorial design

two or more independent vairables tested simulatanelous, allowin g to examain main effects and interactions between factors

98
New cards

bootstrapping method

A resampling method that repeatedly draws samples (with replacement) from your existing data to estimate a statistic's distribution — without assuming normality

99
New cards

factorial design

A factorial design is an experimental design where researchers study the effects of two or more independent variables simultaneously.

Example:

A researcher wants to see how sleep and caffeine affect test scores.

  • Factor 1: Sleep

    • 4 hours

    • 8 hours

  • Factor 2: Caffeine

    • No caffeine

    • Coffee

100
New cards

Blocking

Blocking = grouping experimental subjects based on a characteristic that could affect the response variable, then randomly assigning treatments within each group.