STAT 430 EXAM

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/100

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

101 Terms

1
New cards
Statistical Inference
Provides methods for drawing conclusions about a population from sample data
2
New cards
Two most common types of statistical inference
Confidence Intervals and Tests of Significance
3
New cards
Conditions for Inference about the Mean
1) We must have a simple random sample
2) We must have normally distributed data
3) We must know the population standard deviation sigma
4
New cards
Confidence Interval Estimate
An interval of likely values for the population parameter determined from sample information
5
New cards
Confidence Level
The measure of one's confidence that the true parameter value lies within the interval
6
New cards
Margin of Error
z_alpha/2 * (sigma/SQRT(n))
7
New cards
How to make the confidence interval narrower?
Increase the sample size or decrease the level of confidence
8
New cards
How to interpret a confidence interval?
"We estimate that the __parameter__ is between __lowerbound__ and __upperbound__ __units__ with a p% confidence interval"
9
New cards
Hypothesis Testing
The process of using sample data to test a claim about the value of a population parameter
10
New cards
Null Hypothesis
Statement of equality and statement of no difficulty
11
New cards
Alternative Hypothesis
Statement of inequality, represents the claim that we seek evidence for
12
New cards
Right-Tailed Test
Tests whether the parameter is either equal to vs greater than some value
13
New cards
Left-Tailed Test
Tests whether the paramete is either equal to vs less than some value
14
New cards
Two-Tailed Test
Tests whether the parameter is either equal to vs not equal to some value
15
New cards
Type I Error
We reject the null when the null is true
16
New cards
Type II Error
We fail to reject the null when the null is false
17
New cards
Z-Statistic
z = (xbar - mu)/(sigma/SQRT(n))
18
New cards
When to reject the null?
If p-value < alpha, reject the null hypothesis
19
New cards
When to fail to reject the null?
If p-value> alpha, fail to reject the null hypothesis
20
New cards
What does the p-value measure?
How unlikely the observed sample mean is, probability of committing a Type I error based on your sample
21
New cards
Null Hypothesis of a T-Test
Ho: mu_a = mu_b
22
New cards
Alternative Hypothesis of a T-Test
Ha: mu_a > mu_b
OR
Ha: mu_a < mu_b
OR
Ha: mu_a =! mu_b
23
New cards
Conditions for a T-Test
1) The two groups being compared must be independent
2) The theoretical distribution of the sampling means should be normally distributed
3) The variances of the two groups should be approximately equal
24
New cards
What procedure do we use in SAS to get a T-Test?
PROC TTEST
25
New cards
How can we know if variances are equal?
Check the Equality of Variances section, which runs a hypothesis test with the null hypothesis of the variances being equal
26
New cards
If variances are equal, what method of t-test is used?
Pooled
27
New cards
How can we randomly assign treatments to individuals in SAS?
1) Use a random number generator to assign a value to each of the individuals
2) Use a PROC RANK to split the data into the desired number of groups
28
New cards
What is the format required for PROC RANK?
PROC RANK DATA=NAME GROUPS=n;
OUT=Output_name;
VAR variable_to_rank;
RUN;
29
New cards
Wilcoxon Rank-Sum Test
Nonparametric test
Ho: The two distributions are identical
Ha: Distributioons are not identical (Dist A is SHIFTED from Dist B)
30
New cards
How do we compute a Wilcoxon Rank-Sum test in SAS?
PROC NPAR1WAY
31
New cards
Paired T-Test
Testing differences between two dependent sample means
32
New cards
How to run a paired T-Test in SAS?
PROC MEANS on the difference with options T and PRT
OR
PROC TTEST on the difference
33
New cards
Linear Correlation Coefficient
Provides a descriptive measure of the strength fo relationship between variable x and y, has a range of values between -1 and 1
34
New cards
Direct Correlation
R = 1
35
New cards
No Correlation
R = 0
36
New cards
Inverse Correlation
R = -1
37
New cards
How to manually calculate R?
Sxy / SQRT(Sxx * Syy)
38
New cards
Coefficient of Determination
r^2, can be interpreted as the proportion of variance in one of the variables that can be explained by variation in the other variable
39
New cards
Important assumption about a correlation coefficient of determination
Each pair of x, y data points is independent of any other pair
40
New cards
What procedure do we use to test for correlation?
PROC CORR
41
New cards
What type of correlation is run by default in a PROC CORR?
Pearson Correlation
42
New cards
How to get a partial correlation?
Use a Partial var; statement in a PROC CORR
43
New cards
Why use a partial correlation?
It controls for the effects of a certain variable, i.e. it finds the relationship between two variables when the effect of the other variables have been removed
44
New cards
What is a Spearman Correlation?
A non-parametric correlation coefficient that looks at the correlation between the ranks of each of the observations oof each of the variables instead of individual values
45
New cards
Least-Squares Criterion
The straight line that best fits a set of data points is the one having the smallest possible sum of squared errors
46
New cards
Residuals
The vertical distance form the actual observed point to the expected value on the line
47
New cards
Regression Line
The straight line that best fits a set of data points according to the least squares criterion
48
New cards
Regression Equation
The equation of the regression line
49
New cards
What is the difference between correlation coefficient and regression?
Correlation coefficient is an index while regression is mathematical function
50
New cards
Format of the Regression Equation
y = B0 + B1*x
where y = dependent variable
B0 = intercept
B1 = coefficient of independent variable
x = independent variable
51
New cards
How to find B1_hat manually?
Sxy/Sxx
52
New cards
How to find B0_hat manually?
y - B1_hat*x
53
New cards
Extrapolation
Trying to use a regression line to predict values outside of the range of observed values
54
New cards
Outliers
Outliers are data points that lie far from the regression line and outside the overall pattern of the data, we need to identify and remove them
55
New cards
Influential Observation
A legitimate data point whose removal causes the regression equation to change considerably, we must determine the reason for an influential observation and try to control for it, rather than removing the point
56
New cards
Why is the sum of residuals always 0?
Because we are using the least squares criterion
57
New cards
What are the two components of a valid regression model?
Deterministic portion and stochastic error
58
New cards
Deterministic Portion
The part that is explained by the predictor variables in the model, all of the explanatory/predictive information of the model should be in this portion
59
New cards
Stochastic Error
The randomness of the error between observed values and expected values (residuals)
60
New cards
What do regression residuals estimate?
The true error
61
New cards
What does random error look like in SAS?
1) The residuals should not be either systematically high or low
2) The residuals should be centered on zero throughout the range of fitted values
3) Random errors should produce residuals that are normally distributed
62
New cards
What does it mean if residuals are not random?
The deterministic portion is not capturing all of the possible explanatory information. This could mean a variable is missing, a higher-order term of a variable in the model is missing, or an interconnection between terms already in the model is missing
63
New cards
What procedure is used to run a simple linear regression?
PROC REG
64
New cards
How do we find the values for our regressions equation?
The parameter estimates section
65
New cards
How do we know which parameters are good?
Check the p-value of each parameter estimate, which is the result of the hypothesis test that the parameter is equal to 0
66
New cards
If the residuals look curved, how do we try to fix it?
Add a quadratic term
67
New cards
What are ways we can transform the data?
Use the log of the data
68
New cards
What should we look at when trying to determine if the model is good?
1) R-Square or Adjusted R-Sq being close to 1, Adjusted R-Sq is used when there are multiple variables and it adjusts itself for interactions between the variables
2) Parameter estimates and the p-value of each
3) ANOVA table p-value
4) Residual analysis, check they are normally distributed and random
69
New cards
Multiple Regression Analysis
Relates two or more independent variables to one dependent variable
70
New cards
In multiple regression, what types of variables must we have?
Dependent variable must be continuous, and independent variables can be continuous or categorical
71
New cards
What are the two types of multiple regression?
Nonexperimental regression which deals with samples of subjects that have a variety of naturally occurring variables
Designed regressed controls the levels of the of the independent variable experimentally
72
New cards
Assumptions for Simple and Multiple Regression
1) The dependent variable must be continuous
2) The data must be independent and identically distributed
3) The errors must be normally distributed with a mean of zero and a standard deviation of sigma squared
73
New cards
How to turn a categorical variable into numeric values?
Use n-1 dummy variables, one will have all 0's, the others will have a 1 at a unique index and 0's everywhere else
74
New cards
What is another procedure that you can use to create a linear regression model?
PROC GLM
75
New cards
How to see different models of multiple regression?
Use the SELECTION = option
76
New cards
How do we select the best regression model by adjusted R-Sq value?
SELECTION = ADJSQ
77
New cards
How to select the best regression model by adding in the variable with the next largest F value as long as the p-value is less than .5?
SELECTION = FORWARD
78
New cards
How to select the best regression model by adding in variables with the next largest F statistic, but if the p-value gets too high (above .15) it removes it?
SELECTION = STEPWISE
79
New cards
How to select the best regression model by choosing the best model at each level of variables with the best R-Sq value?
SELECTION = MAXR
80
New cards
Cooks D Statistic
A statistic that detects outlying observations by evaluating all the variables simultaneous, a Cooks D value greater than the absolute value of 2 should be investigated
81
New cards
One-Way ANOVA
An inferential method used to test the equality of three or more population means
82
New cards
The Null Hypothesis of a One-Way ANOVA
Ho: the population means of each group is the same
83
New cards
The Alternative Hypothesis of a One-Way ANOVA
Ha: at least one population mean is different from the others
84
New cards
Why don't we run a bunch of t-tests on the population means?
1) We have nC2 tests to run which gets big very quickly
2) The probability of making a type I error increases significantly with each test
85
New cards
Requirements of a One-Way ANOVA
1) There must be k simple random samples
2) The k samples are independent of each other
3) The sampling distributions of the sample mean are normally distributed
4) The populations must have the same variance
86
New cards
Between-Sample Variability
The variability among the sample means
87
New cards
Within-Sample Variability
The variability of each sample
88
New cards
How do between-sample variability and within-sample variability relate to ANOVA?
If the between-sample variability is large relative to the within-sample variability, we have evidence to suggest that the samples come from populations with different means
89
New cards
What procedure is ran to get the results of an ANOVA?
PROC ANOVA
90
New cards
What is a post hoc test?
Additional comparisons between the means to determine which means differ significantly
91
New cards
Requirements for Tukey's Test
1) There are k simple random samples from k populations
2) The k samples are independent of each other
3) The populations are normally distributed
4) The populations have the same variance
92
New cards
How do we run a Tukey's Test?
Use the TUKEY option in the means line
93
New cards
Factorial Design
We have two factors that have multiple different levels within each
94
New cards
Crossed Factors
If all levels of factor A combine with all levels of factor B, the factors are crossed
95
New cards
Main Effects
The main effect of factor A is the change in the dependent variable that results from change in level of A
The main effect of factor B is the change in the dependent variable that results from change in level of B
96
New cards
Interaction Effect
There is an interaction effect if the effect of factor A on the dependent variable varies with factor B
97
New cards
What hypotheses are tested in two-way ANOVA?
INTERACTION EFFECT:
Ho: There is no interaction effect between the factors
Ha: There is an interaction effect between the factors

MAIN EFFECTS:
Ho: There is no effect of factor A on the response variable
Ha: There is an effect of factor A on the response variable

Ho: There is no effect of factor B on the response variable
Ha: There is an effect of factor B on the response variable
98
New cards
What happens if the null hypothesis regarding the interaction effect is rejected?
We do not interpret the result of the hypotheses involving main effects because this interaction clouds the interpretation of the main effects
99
New cards
Requirements for Two-Way ANOVA
1) The populations from which the samples are drawn must be normal
2) The samples are independent
3) The populations all have the same variance
100
New cards
How to run a two-way ANOVA in SAS?
In the MODEL statement of PROC ANOVA, use DV = IV1*IV2