biostats - unit 1

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/42

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

43 Terms

New cards

console

stat output and successful code show up here

New cards

script (.rmd)

type codes here

New cards

viewer

graphs and help files show up here

New cards

environment

what I told R to plan to use today (objects, variables, models, etc. are stored here)

New cards

vector

set of numbers

creating a vector with 5 numbers: c(96, 95, 99, 97, 98, 96)

assign to a variable by: tempF ← c(96, 95, 99, 97, 98, 96)

New cards

goal of biostatistics

focuses on collecting, describing, and drawing conclusions from data to learn more about the world

uses estimation to infer an unknown of a population using sample data

New cards

data

cumulative measurements of individual biological entities

New cards

population

all individuals or observations in the world (group we are typing to gain info about)

New cards

sample

subset of measurements we collect and analyze to learn about the population (what we measure to draw inferences)

New cards

parameters

describe populations (true value)

fixed and constant
- represented by mu

New cards

estimate/statistic

taken from samples

random, change with each sample
- represented by capital Y with a bar on top

New cards

descriptive statistics

implies that statistics is an estimate of parameter

New cards

weighted mean

assigns different importance (weights) to each data points so it reflects relative importance in overall calculation

New cards

statistics of locatoin/central tendency

mean, median, mode

New cards

mean vs median

mean = mathematical center of gravity, median = average individual (middle measurement)

New cards

statistics of dispersion

range, IQR, variance, standard deviation, coefficient of variation

New cards

variance

expected squared difference between observations and the mean

estimate = s²
- denominator is n-1 for degrees of freedom
parameter = lowercase sigma²
- denominator is just N

New cards

standard deviation

measure of dispersion that weighs each item by distance from mean

estimate = s (sqrt of variance)
- parameter = lowercase sigma (sqrt of variance)

New cards

coefficient of variation

standard deviation expressed as percent of mean

CV = sample standard deviation (s) / Y (sample mean) x 100

New cards

when to use variance

understand how much values in a dataset vary from each other

New cards

when to use standard deviation

understand how data points typically deviate from the mean

New cards

when to use coefficient of variation

used to compare variation between two datasets

New cards

inference

requires independent and random sampling

New cards

sampling bias

systematic difference between estimates and parameters

samples aren’t representative of population

New cards

sampling error

undirected deviation of estimates away from parameters

influenced by chance and differs among samples from the same population
- decreases with increasing sampling size

New cards

sampling distribution

probability distribution of all possible sample means

random process
sample from a population provides an estimate of the parameter

New cards

sampling error

difference between parameter and estimate

New cards

key features of sampling distribution

normally distributed
mean of sampling distribution = true mean of population
spread depends on sample size used for estimate
- as sampling size increases, parameter estimates become more precise and spread of sampling distribution decreases

New cards

standard error

standard deviation of an estimate’s sampling distribution

New cards

key features of standard error

decreases with increasing sample size
tells us how close sample mean is to true mean (population)
tells us how unusual a sample is

New cards

standard error vs standard deviation

SD describes how individuals in a sample differ from sample mean

SE describes how far sample mean is from true mean

New cards

95% CI

range of data around potential sample mean values in our sampling distribution for which we are 95% sure contains the true population parameter (2 * SE)

New cards

addition rule for probability

if two events are mutually exclusive, then P[A or B] = Pr[A] + Pr[B]

New cards

multiplication rule for probability

if two events are independent, then P[A and B] = Pr[A] x Pr[B]

New cards

Bayes Theorem

use for conditional probability

New cards

variable

any characteristic that varies from one biological entity to another

New cards

categorical variables

qualitative characteristics of individuals that do not have a numerical magnitude

nominal = unordered descriptions
ordinal = ordered descriptions (1-5 how are you feeling, etc.)
binary = 2 mutually exclusive outcomes (yes or no)

New cards

numerical variables

quantitative measurements of individuals that have a numerical magnitude

continuous = measured data, can have infinite values within possible range (decimals, fractions, etc.)
discrete = observations can only exist at limited values, often counts (whole numbers)

New cards

tidy data

standard way of mapping the meaning of a dataset to its structure

variable = column
observation = row
cell = single measurement

New cards

visualizing one variable categorical data

frequency table = counts of the number of occurrences of each category
bar graph = frequency distributions of categorical data

New cards

one variable numerical data

goal = measure center, spread, and shape of data

frequency table = counts the number of occurrences within set bins
histogram = displays the frequency distributions of numerical data

New cards

frequency distribution vs probability distribution

frequency distribution = describes the number of times a value occurs in a sample

probability distribution = describes the proportion of the population with a value

New cards

when is an estimate considered to be unbiased

when across infinite repeated samples, the average of the estimates equals the true parameter value