Biostats Exam 3

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/45

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

46 Terms

New cards

general null hypothesis definition

a statement that the value of a population parameter is equal to some value

New cards

general alternative hypothesis definition

a statement that the parameter has a value that somehow differs from the null hypothesis

New cards

general null hypothesis symbolic notation

H0: mean = expected value

New cards

general alternative hypothesis symbolic notation

HA: mean =/ (does not equal) expected value

New cards

steps of a hypothesis test

1. establish null and alt. hypotheses

2. select a significance level

3. select an appropriate statistical test

4. collect sample and summarize the data into a test statistic

5. decide whether the result is statistically significant based on critical and/or p-value

New cards

alpha level definition

-aka significance level

-the probability value used as the cutoff for determining when the evidence is significant against the null hypothesis

-the probability of mistakenly rejecting the null hypothesis when it is true (type 1 error)

- = 1 - confidence levels

-commonly 0.01, 0.05, and 0.10

New cards

test statistic definition

-a value used in making a decision about the null hypothesis

-found by converting a sample statistic to a score with the assumption that the null hypothesis is true

-different for each statistical test used

-calculated by technology when running the particular statistical test

New cards

p-value definition and usage

-tells you how likely it is that the test statistic could have occurred under the null hypothesis

-the probability of getting a value of the test statistic that is at least as extreme as the actual calculated test statistic, assuming that the null hypothesis is true

-used to decide if a test is statistically significant

-p-value > alpha: fail to reject null hypothesis

-p-value < alpha: reject the null hypothesis

New cards

one-sample t-test in R

-t.test(x, mu, alternative)

-x = sample values

-mu = known population mean or value of comparison

-alternative = "less", "greater", or "two.sided"

New cards

independent t-test assumptions, usage, and distinguishing features

-compares the mean scores of 2 different GROUPS of people or conditions

-independent groups- the measurement of an individual in one group is unrelated to measurements in the second group

-observations are independent and randomly selected

-populations are normally distributed or n > 30

-test for normality using quantile-quantile plot of dependent variable

New cards

independent t-test in R

-t.test(x, y, alternative)

-x = dependent variable

-y = group

-alternative = "less", "greater", or "two.sided"

New cards

paired t-test assumptions, usage, and distinguishing features

-compares the mean scores for the same group of people on 2 different OCCASIONS

-simple random sampling is used

-the 2 groups of data are dependent

-the differences of individuals between 2 observations approximately follow a normal distribution or n > 30

-test for normality using quantile-quantile plot of difference between groups

New cards

paired t-test in R

-t.test(after, before, alternative, paired)

-after - dataset$after

-before - dataset$before

-alternative - "less", "greater", or "two.sided"

-paired = TRUE

New cards

generic independent t-test null and alternative hypotheses

-H0: mean(difference) = 0

-HA: mean(difference) < or > 0

New cards

generic paired t-test null and alternative hypotheses

-H0: mean(difference) = 0

-HA: mean(difference) < or > 0

New cards

what is the difference in using a comma or a tilda to separate the data sample inputs in R

-comma: x - y = ?

-tilda: alphabetical

New cards

chi-squared goodness of fit test variables, purpose, and requirements

-counts of categorical variables (frequencies)

-frequency tables (2 columns)

-used to test the hypothesis that an observed frequency distribution fits some expected distribution

-the sample data consists of frequency counts for each of the different categories

-the data are randomly selected

-for each category, the expected frequency is at least 5

New cards

chi-squared tests of independence variables, purpose, and requirements

-counts of categorical variables (frequencies)

-contingency tables (3+ columns) which consist of frequency counts of categorical data corresponding to 2 different variables

-test whether or not the row and column variables are independent

-the sample data are randomly selected

-the sample data are represented as frequency counts in a two-way table

-for every cell in the contingency table, the expected frequency (E) is at least 5

New cards

chi-squared distribution basic features

-continuous distribution

-minimum value of 0

-right-skewed

-shape determined by degrees of freedom (# of categories - 1)

-mean is the same as the degrees of freedom

New cards

generic chi-squared goodness of fit null and alternative hypotheses

-H0: the frequency counts agree with the expected distribution

-HA: the frequency counts do not agree with the expected distribution

New cards

generic chi-squared tests of independence null and alternative hypotheses

-H0: success or failure are independent of treatment type

-HA: success or failure are dependent on treatment type

New cards

chi-squared goodness of fit test in R

-chisq.test(x, p)

-x = a vector with the observed values

-p (optional) = a vector of probabilities, if the expectation is NOT that all probabilities are equal

New cards

making a contingency table in R

-(x, ncol)

-x = a vector with the observed values, entered one column at a time, starting with the left-most column

-ncol = number of columns

New cards

chi-squared tests of independence in R

-chisq.test(x)

-x = a matrix with the observed values

New cards

calculating expected values for a chi-squared test of independence

E = (row total * column total) / (grand total)

New cards

what are the main 2 problems of performing multiple t-tests?

-time consuming

-increase false positive probability

New cards

generic ANOVA null and alternative hypotheses

-H0: the means of all groups are equal

-HA: at least one group's mean is different

New cards

changes in variance effect on the F-statistic

-F = (variance between groups) / (variance within the groups)

-F = (n * s (2/x) / (s^2 pooled)

-greater variance between groups increases F

-greater variance within the groups decreases F

New cards

ANOVA assumptions

-random and independent observations

-assumption of normality or n > 30

-test normality using quantile-quantile plots

-assumption of homogeneity of variance

-test homogeneity of variance using Bartlett's test

New cards

Bartlett's test in R

-bartlett.test(y ~ x)

-y = the dependent variable (numbers)

-x = the independent variable (category names)

-P < 0.05 indicates that the groups do NOT have the same variance

New cards

ANOVA in R

-Bartlett's test

-aov(y ~ x)

-y = dependent variable (numbers)

-x = independent variable (category names)

-save ANOVA as an object

-summary(ANOVA object)

-Tukey's Honest Significant Different test

New cards

Tukey's Honest Significant Different Test in R (posthoc)

-TukeyHSD(ANOVA object)

New cards

Pearson correlation distinguishing features and assumptions

-provides information on the strength and direction (positive or negative) of a linear relationship between 2 variable

-no causation inferred

-effects on variables are symmetric

-data are independent and random

-variables are quantitative

-both variables are normally distributed

-correlation coefficient alone cannot tell us if results are significant

New cards

Linear regression distinguishing features and assumptions

-provides an explicit formula for a line that can be used to predict values of one variable based on the other

-provides a robust measure of "fit"

-only accounts for uncertainty in the dependent variable

-often interpreted as one variable causing an effect on the other

-data are random and independent

-residuals are normally distributed

-homogeneity of variance

-check assumptions at the end of test

New cards

Pearson correlation coefficients

-a correlation exists between 2 variables when the values of one variable are somehow associated with the values of the other variable

-a linear correlation exists between 2 variables when there is a correlation and the plotted points of paired data result in a patter than be approximated by a straight line

-r is always between -1 to 1

-r measures the strength of a linear relationship

-r is sensitive to outliers

New cards

generic Pearson correlation test null and alternative hypotheses

-H0: rho = 0

-HA: rho =/ (does not equal) 0

New cards

Pearson correlation test in R

-cor.test(y ~ x)

--both variables need to be numeric vectors of equal length

New cards

parameters of a linear regression model

-y = b0 + bi * x

-b0 = the intercept

-bi = the slope

residuals are the vertical distances of the data to the value predicted by the line

-linear regression fits a line to the data that minimizes the squares of the residuals

New cards

interpret R2 and recognize residuals

-R2 = 1 - [sum of (yi -y^)^2 / sum of (yi - ybar)^2]

-y^ = predicted value

-ybar = mean value

-numerator = sum of squared residuals

-denominator = sum of squared distances from mean

-R2 = (variance explained by line) / (total variance)

-R2 states what percent of variance is explained by a certain variable

New cards

generic linear regression null and alternative hypotheses

-H0: b1 = 0 (slope of best fit line is 0)

-HA: b1 =/ (does not equal) 0 (slope of best fit line is not 0)

New cards

linear regression in R

-lm(y ~ x)

-y = dependent variable

-x = independent variable

-save as model

-summary(model) to see p-value

New cards

checking linear regression test assumptions

-check after testing

-plot(model)

-Residual graph: want to see an even cloud across 0

-QQ graph: follow the QQline

New cards

what kind of variables do chi-squared test use

frequency counts of categories

New cards

what kind of variables do linear regression and correlation use

numerical variables to each other

New cards

what kind of data do paired t-test, independent t-test, and ANOVA us

population means to each other

New cards

what data does z-test and one-sample t-test use

unknown population mean compared to known population mean or set value