BSTAT Conceptual Info CH 11-14

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/42

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

43 Terms

New cards

Two population hypothesis testing for independent samples

Testing if population means/proportions are equal by comparing two independent samples

-assume difference = 0 (null)

New cards

Two sample hypothesis tests for dependent samples

-There is only one sample - made up of data from two independent populations

- We are testing whether the separate populations are equal by comparing the differences between the two populations in the sample

- We assume that the distributions of differences = 0 (null hypothesis)

- Aka paired samples, repeated measures, related samples

New cards

Dependent samples vs independent samples

dependent samples are preferred over independent samples because they reduce the variation in the sampling distribution (aka remove extraneous variation)

New cards

Quantitative data (means)

Data that measures 'how much' of an attribute a variable has

New cards

Qualitative data (proportions)

Data that answers 'does the variable have the attribute?' with yes or no

New cards

3 basic assumptions for 2-sample hypothesis testing

1. Two populations are normally distributed, 2. Two populations are independent, 3. Two populations standard deviations are known

New cards

2 extra assumptions made when test is about proportions (qualitative data)

1. Binomial conditions: Mutually exclusive results, independent trials

2. large sample: n(p) and n(1-p) >=5

New cards

Standard error

Standard deviation of the sampling distribution

New cards

Center of the sampling distribution

The population parameter/s

New cards

ANOVA

Analysis of variance technique to test whether 3 or more population means are equal or different

--The populations must follow a normal distribution

- The populations must have equal standard deviations

- The populations must be independent

New cards

One-way ANOVA

ANOVA with only one factor (treatments)

-completely randomized design

New cards

Two-way ANOVA

ANOVA with two factors (treatments and blocks)

-randomized block design

-removes extraneous variation by decreasing MSE

New cards

Treatments/Blocks

The independent populations being examined

New cards

SOV

Source of variation

-one-way: treatments, error, total

-two-way: treatments, blocks, error, total

New cards

SOS

Sum of squares aka variation

one-way: SST, SSE, SStotal

two-way: SST, SSB, SSE, SS Total

New cards

Degrees of freedom

-one-way:

treatments: k-1

error: n-k

total: n-1

-two-way:

treatments: k-1

blocks: B-1

error: (k-1)(b-1)

total: n-1

*k=total # treatments

*b=total # blocks

*n= total # of observations

New cards

F-distribution

Used to test whether samples are from populations with equal variances

New cards

SST (treatment variation)

the sum of the squared differences between each treatment mean and the overall mean

-Aka "explained variation" because the variation is explained by the factor

-variation BETWEEN the treatment means

New cards

SSE (random variation)

the sum of the squared differences between each observation and its treatment mean

-Aka "unexplained variation" because the variation is explained by error/chance

-variation WITHIN the treatment means

New cards

SST (total variation)

sum of the squared differences between each observation and the overall mean

New cards

Mean Squares

estimate of variance

New cards

F-distribution

used to test whether samples are from populations with equal variances

- Has a family

- Continuous

- Asymptotic

- Positively skewed (unlike Z and T which follow a normal distribution)

-cannot be negative (unlike z and t)

New cards

6 Step Hypothesis Test for ANOVA

1. Null hypothesis, 2. Alternative hypothesis,

3. Test statistic,

4. P-value,

5. Decision,

6. Answer

New cards

F-value (test statistic)

Indicates whether the ratio is too much greater than 1 to have happened by chance

-If F ≈ 1, accept Ho (if F is equal to or really close to 1, accept)

-If F > 1, reject Ho (if F is too much greater than 1, reject)

New cards

Interaction

The effect of one factor on a response variable differs depending on the value of another factor

New cards

Scatter plot

Visual graphical representation of the relationship between two variables (bivariant)

New cards

Correlation coefficient (r)

Measure of the strength of the linear relationship between two variables

New cards

Elements of correlation coefficient (r)

- Shows direction (positive/negative) and strength (weak/moderate/strong)

- 0 = no linear relationship

- +1 = direct/positive relationship

- -1 = inverse/negative relationship

New cards

Spurious correlations

Relationship between two variables seems to be cause and effect but is actually not

New cards

Regression analysis

Technique used to find the equation of the line that best fits the data

New cards

Regression equation

Equation that expresses the linear relationship between 2 variables

New cards

Least squares method

Uses data to position the line of best fit that minimizes the sum of squares of the vertical distances between the points (actual y values) and the line (predicted y values)

New cards

Standard error of the estimate

Measure of the dispersion around the regression line (average distance of the points from the line)

-smaller SE, better prediction

New cards

Coefficient of determination (r^2)

Proportion of variation in the dependent variable explained by the variation in the independent variable

New cards

Global test

*for multiple reg

Hypothesis test to test the ability of the independent variables to predict the dependent variable

New cards

Homoscedasticity

Variation in the residuals (diff btwn Y and y hat) is the same for all values of the x variables

New cards

Multicollinearity

Exists when independent variables are strongly correlated (close to +1 or -1)

-Causes: inaccurate estimates of population slopes

-to fix this: check the VIF (variance inflation factor)

-If VIF >10, then independent variables should be removed from the analysis

New cards

Adjusted coefficient of determination (adjusted r^2)

Adjusts for the number of additional independent variables

-inc as more are added

New cards

Dummy variables

Variables with only 2 possible outcomes used to represent a qualitative variable in regression analysis

-coded 0 or 1

New cards

Interaction

2+ independent variables combined have a larger effect on the dependent variable as compared to the independent variables alone

New cards

Stepwise regression

Method used to decide which independent variables to use in a multiple regression equation

-Only independent variables with nonzero regression coefficients can be used (aka they must have a relationship with the dependent variable otherwise they are useless)

New cards

forward selection method (stepwise)

start with 0 ind variables and add one one at a time to regression equation

New cards

backward elimination method (stepwise)

start with entire set of variables and eliminate one at a time