biostats - unit 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/42

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

43 Terms

1
New cards

console

stat output and successful code show up here

2
New cards

script (.rmd)

type codes here

3
New cards

viewer

graphs and help files show up here

4
New cards

environment

what I told R to plan to use today (objects, variables, models, etc. are stored here)

5
New cards

vector

set of numbers

creating a vector with 5 numbers: c(96, 95, 99, 97, 98, 96)

assign to a variable by: tempF ← c(96, 95, 99, 97, 98, 96)

6
New cards

goal of biostatistics

focuses on collecting, describing, and drawing conclusions from data to learn more about the world

uses estimation to infer an unknown of a population using sample data

7
New cards

data

cumulative measurements of individual biological entities

8
New cards

population

all individuals or observations in the world (group we are typing to gain info about)

9
New cards

sample

subset of measurements we collect and analyze to learn about the population (what we measure to draw inferences)

10
New cards

parameters

describe populations (true value)

  • fixed and constant

    • represented by mu

11
New cards

estimate/statistic

taken from samples

  • random, change with each sample

    • represented by capital Y with a bar on top

12
New cards

descriptive statistics

implies that statistics is an estimate of parameter

13
New cards

weighted mean

assigns different importance (weights) to each data points so it reflects relative importance in overall calculation

<p>assigns different importance (weights) to each data points so it reflects relative importance in overall calculation</p>
14
New cards

statistics of locatoin/central tendency

mean, median, mode

15
New cards

mean vs median

mean = mathematical center of gravity, median = average individual (middle measurement)

16
New cards

statistics of dispersion

range, IQR, variance, standard deviation, coefficient of variation

17
New cards

variance

expected squared difference between observations and the mean

  • estimate = s²

    • denominator is n-1 for degrees of freedom

  • parameter = lowercase sigma²

    • denominator is just N

<p>expected squared difference between observations and the mean </p><ul><li><p>estimate = s²</p><ul><li><p>denominator is n-1 for degrees of freedom</p></li></ul></li><li><p>parameter = lowercase sigma²</p><ul><li><p>denominator is just N </p></li></ul></li></ul><p></p>
18
New cards

standard deviation

measure of dispersion that weighs each item by distance from mean

  • estimate = s (sqrt of variance)

    • parameter = lowercase sigma (sqrt of variance)

19
New cards

coefficient of variation

standard deviation expressed as percent of mean

  • CV = sample standard deviation (s) / Y (sample mean) x 100

20
New cards

when to use variance

understand how much values in a dataset vary from each other

21
New cards

when to use standard deviation

understand how data points typically deviate from the mean

22
New cards

when to use coefficient of variation

used to compare variation between two datasets

23
New cards

inference

requires independent and random sampling

24
New cards

sampling bias

systematic difference between estimates and parameters

  • samples aren’t representative of population

25
New cards

sampling error

undirected deviation of estimates away from parameters

  • influenced by chance and differs among samples from the same population

    • decreases with increasing sampling size

26
New cards

sampling distribution

probability distribution of all possible sample means

  • random process

  • sample from a population provides an estimate of the parameter

27
New cards

sampling error

difference between parameter and estimate

28
New cards

key features of sampling distribution

  • normally distributed

  • mean of sampling distribution = true mean of population

  • spread depends on sample size used for estimate

    • as sampling size increases, parameter estimates become more precise and spread of sampling distribution decreases

29
New cards

standard error

standard deviation of an estimate’s sampling distribution

<p>standard deviation of an estimate’s sampling distribution</p>
30
New cards

key features of standard error

  • decreases with increasing sample size

  • tells us how close sample mean is to true mean (population)

  • tells us how unusual a sample is

31
New cards

standard error vs standard deviation

SD describes how individuals in a sample differ from sample mean

SE describes how far sample mean is from true mean

32
New cards

95% CI

range of data around potential sample mean values in our sampling distribution for which we are 95% sure contains the true population parameter (2 * SE)

33
New cards

addition rule for probability

if two events are mutually exclusive, then P[A or B] = Pr[A] + Pr[B]

34
New cards

multiplication rule for probability

if two events are independent, then P[A and B] = Pr[A] x Pr[B]

35
New cards

Bayes Theorem

use for conditional probability

<p>use for conditional probability </p>
36
New cards

variable

any characteristic that varies from one biological entity to another

37
New cards

categorical variables

qualitative characteristics of individuals that do not have a numerical magnitude

  • nominal = unordered descriptions

  • ordinal = ordered descriptions (1-5 how are you feeling, etc.)

  • binary = 2 mutually exclusive outcomes (yes or no)

38
New cards

numerical variables

quantitative measurements of individuals that have a numerical magnitude

  • continuous = measured data, can have infinite values within possible range (decimals, fractions, etc.)

  • discrete = observations can only exist at limited values, often counts (whole numbers)

39
New cards

tidy data

standard way of mapping the meaning of a dataset to its structure

  • variable = column

  • observation = row

  • cell = single measurement

40
New cards

visualizing one variable categorical data

  • frequency table = counts of the number of occurrences of each category

  • bar graph = frequency distributions of categorical data

41
New cards

one variable numerical data

goal = measure center, spread, and shape of data

  • frequency table = counts the number of occurrences within set bins

  • histogram = displays the frequency distributions of numerical data

42
New cards

frequency distribution vs probability distribution

frequency distribution = describes the number of times a value occurs in a sample

probability distribution = describes the proportion of the population with a value

<p>frequency distribution = describes the number of times a value occurs in a sample</p><p>probability distribution = describes the proportion of the population with a value</p>
43
New cards

when is an estimate considered to be unbiased

when across infinite repeated samples, the average of the estimates equals the true parameter value