BILD 5 MIDTERM

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/114

There's no tags or description

Looks like no tags are added yet.

Last updated 8:09 PM on 6/8/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

115 Terms

New cards

PPDAC

problem, plan, data, analysis, conclusion

New cards

Experimental Journey

Observation, question, background research, identify variables, hypothesis, experimental design, predictions

New cards

Steps to test hypothesis

Import data, 2. Tidy the data, 3. Look at the data 4. Test Hypotheses

New cards

Fundamental parts of coding

object, function, new object (input, process, output)

New cards

Objects in R

an object is anything that stores data in R, which you can assign a name too

New cards

Object Rules

allowed characters: letters, numbers, . and _

Must start with a letter

No spaces

case sensitive

←

New cards

Function

code that commands an operation and gives an output

New cards

Function Rules

uses parentheses

data goes inside parantheses

New cards

Continuous data

can take any value within a range, infinite possible values (height, weight, time)

New cards

Count

whole numbers only, represents how many (number of students)

New cards

Categorical

groups or labels with no inherent order (eye color, specifies, blood type)

New cards

Binomial

two possible outcomes (yes/no)

New cards

ordinal

distance between them is not equal or known (rankings, class level, pain scale) not numerical distances

New cards

tidy data

each variable in its own column, each observation in its own row, each value is in one cell

New cards

Bar Plot

relationship between 1 continous variable and 1 or more catrotgical variables

New cards

Scatterplot

relationship between 2 continous variables

New cards

line graph

relationship between 2 contintous variables, 1 variable is ordered (usually time)

New cards

Historgram

distribution of 1 continous variable

New cards

Bar and violin plots

visualize the distribution of a continuous variable for one or more categories

New cards

descriptive statistics

a set of summary measurements that simply communicate important information like centrality and variation

New cards

data type: non numerical

porportions, percentages, rations

New cards

data type: numerical

mean, median, mode

New cards

Mean

the average

New cards

Median

the middle of all ranked values (most robust)

New cards

Mode

the most common value

New cards

Robust

an overall measure being resilient to single values

New cards

residuals

oberservation minus the mean to predict future yields

New cards

range

the difference between the largest and the smallest value, range()

New cards

interquartile range

the range of the middle 50% of data, bar plot

New cards

variance

measures how far data points are from the mean on average (squared distance), var()

New cards

standard deviation

the square root of variance, average distance from the mean, sd()

New cards

Sample Mean

calculated from data, to estimate true mean

New cards

true mean

actual average of the entire population, usually unknown

New cards

Sample Standard Deviation

how spread out sample data is

New cards

True standard Dev

uses N (full population), usually unkown

New cards

Uncertaintiy a sample is accurate

standard error, confidence intervals, pvalues, statistical power

New cards

normal distribtuion

bell curve, are everywhere

New cards

Central Limit Theorem

assume you sample a population many times independently and each time you calc a sample mean, the distrubtion of those means will be normally distributred

New cards

standard error

how much the sample mean is expected to vary from the true population mean (how accurate sample mean is)

New cards

Confidence intervals

a range of values used to estimate the true population parameter (where the true mean is likely to be)

A 95% confidence means that 95% of the intervals would contain the true mean

as sample size increases, CL gets more percise

as variability increases, CL become more uncertain

New cards

independent variables

scientist change this factor on purpose to see what happens

x axis

New cards

dependent variable

measure to see if the IV made a difference

y axis

what changes

New cards

hypothesis

testable and falsifiable statement that explains a possible relationship between 2 or more variables based on existing knowledge

New cards

null hypothesis

the assumption that there is no effect, no differences, or no relationship

New cards

Parametric tests

t-test, ANOVA, chi-squared

New cards

Homodscedasticity

normal variance

New cards

heteroscedasticity

unequal variance

New cards

transforming data

change all values of a variable in an identical way mathematically, not changing the relationship between values

ex) square root, natural log

New cards

back transformation

transforming data and then undoing it using the reverse transformation

New cards

normality test

statistical method used to determine whether a dataset follows a normal distribution

New cards

Kolmogorov Smirnov test

genertates an ideal distribution using parameters drawn from our data, and we then compare this to our data

New cards

outliers

data that exists outside of the typical distribution of data

impossible values or in the 1.5 interquartile range

New cards

trimming

removes outliers

when outliers are extreme or impossible

New cards

winsorization

replaces outliers with less extreme values, such as the 5% and 95%

when extreme outliers still hold biological significance

New cards

Confusion matrix

a table that compares reality to wwhat your data set or test concludes

New cards

True postive

effect is real and the effect is detected

New cards

false positive

data shows effect, no effect in reality

New cards

false negataive

no effect in data but effect is real

New cards

true negative

no effect in reality to effect in data

New cards

test statistics

any calculated value that measures the difference between experimental groups (control vs treatment)

larger=more likely the null is false

difference between groups over amount of variation

New cards

z-score

are two populations means significantly different

New cards

t-score

are two sample means significatly different

New cards

F-score

are any of 3 or more samples means different

New cards

chi-squared

does the observed data match an expected distribution

New cards

p-value

assuming the null hypothesis is true, the p-value represents the probability you would have gotten your measured test statistic or smth greater by random chance

New cards

false negative

type 2 error

New cards

false positive

type 1 error

New cards

Scatterplot

use when both variables are continuous

can see clustering
direction of the relationship
ex: petal length and petal width

New cards

Covariance

shows how to variables vary together

if both increase together, the covariance is postive
if one increases and on decreases, the covariance is negative
covariance of zero means no consistent relationship

New cards

Variance vs Covariance

Variance only measures the spread of a single variable around its mean, while covariance measures how two variables vary together

New cards

Correlation

measures the direction and strength of a linear relationship between two variables

r value near 1 is strong positive relationship
r -1 is a strong negative
0 means no relationship

New cards

Correlation vs Regression

if you want to know if two variables are related, use correlation
if you want to predict one variable from another, use regression

New cards

Linear Regression

model that shows how a dependent variable changes as an independent variable changes

y= mX + b

y is dependent variable
x is independent
m is slope: how much the dependent variable changes for every one unit increase of the independent
b is intercept

New cards

residuals

measure the difference between the observed value and the predicted value

Residual = observed - predicted

attempts to minimize the sum of squared residuals using ordinary least squares

prevents postive and negative errors from canceling each other out

New cards

correleation coefficient

New cards

R squared

measures explanatory power

proportion of variation in the dependent variable that be explained by the independent variable
the larger the R the better the model is predicting

New cards

Regression Assumptions

model must be linear
normally distributed
homoscedacity, spread should be consistent
elimant outliers etc

New cards

ANOVA

independent variable is categorical with more than two groups

dependent variable is continuous

researchers may test whether different chicken feed types produce different average chicken weights

New cards

Factor and levels

categorical independent variable

levels would be the categories within the factor

New cards

ANOVA assumptions

-independent, normality, equal variance (Kolmogroov Smirnov Test)

New cards

F statistic

between group variance over within group variance

f value become large is between group variance is larger than within group, and the p-value decreases

New cards

Positive control

expected to produce an effect

New cards

negative control

expected to produce no effect

New cards

correlation study

observes variables without directly changing them

ex: studying whether noise levels are associated with poor sleep in ICU

do not assign noise levels
observe existing conditions
correlation does not prove causation, may be confounding variables

New cards

Manipulative study

researches directly manipulate the independent variable

ex: assign one group to a new exercise regimen and another group is the control

stronger evidence for causation

New cards

retrospective studies

looks backward and uses existing records

less control over variable

New cards

prospective

follow subjects forward in time

New cards

Field experiments

occur in natural enviroments

more realistic
less control
more confounding variables

New cards

Labratory experiments

occur in controlled settings

easier to isolate variables
may not reflect real world conditions

New cards

In Vivo

living organism

New cards

In vitro

means outside a living organism, lab dish or test tube

New cards

randomized single factor

experiment with on independent variable (factor) where subjects are randomly assigned to treatment groups.

New cards

case control study

observational study that works backwards

start with people who already have an outcome and compare them to people who dont, look back at what different

New cards

repeated measures design

same subjects measured multiple times across different conditions or time points

each person serves as their own control

New cards

cross over design

type of repeated measures design where subjects switch between treatments in a sequence, with a wash out period, each subject experiences each treatment

New cards

quasi experiement

resembles a experiment but lacks a full random assignment

doesnt get full control of who gets which treatment
when randomization is unethical or impracticle

New cards

factorial design

two or more independent vairables tested simulatanelous, allowin g to examain main effects and interactions between factors

New cards

bootstrapping method

A resampling method that repeatedly draws samples (with replacement) from your existing data to estimate a statistic's distribution — without assuming normality

New cards

factorial design

A factorial design is an experimental design where researchers study the effects of two or more independent variables simultaneously.

Example:

A researcher wants to see how sleep and caffeine affect test scores.

Factor 1: Sleep
- 4 hours
- 8 hours
Factor 2: Caffeine
- No caffeine
- Coffee

100

New cards

Blocking

Blocking = grouping experimental subjects based on a characteristic that could affect the response variable, then randomly assigning treatments within each group.