BSTAT Conceptual Info CH 11-14

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/42

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

43 Terms

1
New cards

Two population hypothesis testing for independent samples

Testing if population means/proportions are equal by comparing two independent samples

-assume difference = 0 (null)

2
New cards

Two sample hypothesis tests for dependent samples

-There is only one sample - made up of data from two independent populations

- We are testing whether the separate populations are equal by comparing the differences between the two populations in the sample

- We assume that the distributions of differences = 0 (null hypothesis)

- Aka paired samples, repeated measures, related samples

3
New cards

Dependent samples vs independent samples

dependent samples are preferred over independent samples because they reduce the variation in the sampling distribution (aka remove extraneous variation)

4
New cards

Quantitative data (means)

Data that measures 'how much' of an attribute a variable has

5
New cards

Qualitative data (proportions)

Data that answers 'does the variable have the attribute?' with yes or no

6
New cards

3 basic assumptions for 2-sample hypothesis testing

1. Two populations are normally distributed, 2. Two populations are independent, 3. Two populations standard deviations are known

7
New cards

2 extra assumptions made when test is about proportions (qualitative data)

1. Binomial conditions: Mutually exclusive results, independent trials

2. large sample: n(p) and n(1-p) >=5

8
New cards

Standard error

Standard deviation of the sampling distribution

9
New cards

Center of the sampling distribution

The population parameter/s

10
New cards

ANOVA

Analysis of variance technique to test whether 3 or more population means are equal or different

--The populations must follow a normal distribution

- The populations must have equal standard deviations

- The populations must be independent

11
New cards

One-way ANOVA

ANOVA with only one factor (treatments)

-completely randomized design

12
New cards

Two-way ANOVA

ANOVA with two factors (treatments and blocks)

-randomized block design

-removes extraneous variation by decreasing MSE

13
New cards

Treatments/Blocks

The independent populations being examined

14
New cards

SOV

Source of variation

-one-way: treatments, error, total

-two-way: treatments, blocks, error, total

15
New cards

SOS

Sum of squares aka variation

one-way: SST, SSE, SStotal

two-way: SST, SSB, SSE, SS Total

16
New cards

DF

Degrees of freedom

-one-way:

treatments: k-1

error: n-k

total: n-1

-two-way:

treatments: k-1

blocks: B-1

error: (k-1)(b-1)

total: n-1

*k=total # treatments

*b=total # blocks

*n= total # of observations

17
New cards

F-distribution

Used to test whether samples are from populations with equal variances

18
New cards

SST (treatment variation)

the sum of the squared differences between each treatment mean and the overall mean

-Aka "explained variation" because the variation is explained by the factor

-variation BETWEEN the treatment means

19
New cards

SSE (random variation)

the sum of the squared differences between each observation and its treatment mean

-Aka "unexplained variation" because the variation is explained by error/chance

-variation WITHIN the treatment means

20
New cards

SST (total variation)

sum of the squared differences between each observation and the overall mean

21
New cards

Mean Squares

estimate of variance

22
New cards

F-distribution

used to test whether samples are from populations with equal variances

- Has a family

- Continuous

- Asymptotic

- Positively skewed (unlike Z and T which follow a normal distribution)

-cannot be negative (unlike z and t)

23
New cards

6 Step Hypothesis Test for ANOVA

1. Null hypothesis, 2. Alternative hypothesis,

3. Test statistic,

4. P-value,

5. Decision,

6. Answer

24
New cards

F-value (test statistic)

Indicates whether the ratio is too much greater than 1 to have happened by chance

-If F ≈ 1, accept Ho (if F is equal to or really close to 1, accept)

-If F > 1, reject Ho (if F is too much greater than 1, reject)

25
New cards

Interaction

The effect of one factor on a response variable differs depending on the value of another factor

26
New cards

Scatter plot

Visual graphical representation of the relationship between two variables (bivariant)

27
New cards

Correlation coefficient (r)

Measure of the strength of the linear relationship between two variables

28
New cards

Elements of correlation coefficient (r)

- Shows direction (positive/negative) and strength (weak/moderate/strong)

- 0 = no linear relationship

- +1 = direct/positive relationship

- -1 = inverse/negative relationship

29
New cards

Spurious correlations

Relationship between two variables seems to be cause and effect but is actually not

30
New cards

Regression analysis

Technique used to find the equation of the line that best fits the data

31
New cards

Regression equation

Equation that expresses the linear relationship between 2 variables

32
New cards

Least squares method

Uses data to position the line of best fit that minimizes the sum of squares of the vertical distances between the points (actual y values) and the line (predicted y values)

33
New cards

Standard error of the estimate

Measure of the dispersion around the regression line (average distance of the points from the line)

-smaller SE, better prediction

34
New cards

Coefficient of determination (r^2)

Proportion of variation in the dependent variable explained by the variation in the independent variable

35
New cards

Global test

*for multiple reg

Hypothesis test to test the ability of the independent variables to predict the dependent variable

36
New cards

Homoscedasticity

Variation in the residuals (diff btwn Y and y hat) is the same for all values of the x variables

37
New cards

Multicollinearity

Exists when independent variables are strongly correlated (close to +1 or -1)

-Causes: inaccurate estimates of population slopes

-to fix this: check the VIF (variance inflation factor)

-If VIF >10, then independent variables should be removed from the analysis

38
New cards

Adjusted coefficient of determination (adjusted r^2)

Adjusts for the number of additional independent variables

-inc as more are added

39
New cards

Dummy variables

Variables with only 2 possible outcomes used to represent a qualitative variable in regression analysis

-coded 0 or 1

40
New cards

Interaction

2+ independent variables combined have a larger effect on the dependent variable as compared to the independent variables alone

41
New cards

Stepwise regression

Method used to decide which independent variables to use in a multiple regression equation

-Only independent variables with nonzero regression coefficients can be used (aka they must have a relationship with the dependent variable otherwise they are useless)

42
New cards

forward selection method (stepwise)

start with 0 ind variables and add one one at a time to regression equation

43
New cards

backward elimination method (stepwise)

start with entire set of variables and eliminate one at a time