MIS 301 FINAL SHAUL / SDSU

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/49

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

50 Terms

1
New cards

Explained Variance

Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.). If the samples are different sizes, the variance between samples is weighted to account for the different sample sizes. The variance is also called variation due to treatment or explained variation.

(The explained variation is the sum of the squared of the differences between each predicted y-value and the mean of y. And on the other hand

2
New cards

Unexplained Variance

Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance). When the sample sizes are different, the variance within samples is weighted. The variance is also called the variation due to error or unexplained variation.

The unexplained variation is the sum of the squared of the differences between the y-value of each ordered pair and each corresponding predicted y-value.)

3
New cards

Variance

standard deviation squared

4
New cards

Anova test

Analysis of Variance; used to compare three or more independent continuous outcome variables; parametric

5
New cards

Anova Multiple 2 sample t - test

The two-sample t-test (also known as the independent samples t-test) is a method used to test whether the unknown population means of two groups are equal or not.

6
New cards

Assumptions

. the population follows the normal distribution

2. the populations have equal standard deviations

3. the populations are independent

Each population from which a sample is taken is assumed to be normal.

All samples are randomly selected and independent.

The populations are assumed to have equal standard deviations (or variances).

The factor is a categorical variable.

The response is a numerical variable.

7
New cards

Null and Alternative

H0: The null hypothesis: It is a statement of no difference between the variables-they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

Ha The alternative hypothesis is the contender and must win with significant evidence to overthrow the status quo. This concept is sometimes referred to the tyranny of the status quo because as we will see later, to overthrow the null hypothesis takes usually 90 or greater confidence that this is the proper decision.

8
New cards

Variation due chance

any change in hereditary traits due to unknown factors

9
New cards

Categories of Variables

- presage variables

- context variables

- process variables

- product variables

10
New cards

criterion variable

the variable in a multiple-regression analysis that the researchers are most interested in understanding or predicting / dependent variable

11
New cards

Classification variable

an independent variable that is observed but not controlled by the researcher

12
New cards

factor variable

those aspects of a situation that may influence particular phenomena

13
New cards

treatment variable

an independent variable that is manipulated in an experiment

14
New cards

F-Distribution in ANOVA

F= (estimate of pop variance based on the differences BETWEEN the sample means) / (estimate of the population variance based on the variation WITHIN the sample)

**If the ratio does not equal 1, we can conclude that the treatment means are not the same. There is a difference in the....

15
New cards

F distribution (shape)

mathematically defined curve that is the comparison distribution used in an analysis of variance / skewed right / always positive

16
New cards

F distribution (variance ratio distribution)

The distribution of the ratio of two independent quantities each of which is distributed like a variance in normally distributed samples. So named in honor of R.A. Fisher who first described the distribution. / range [0, + ∞)

17
New cards

Correlation

is mainly about relationships

18
New cards

Correlation of X

X: Independent variable and predictor variable

19
New cards

Correlation of Y

Y: dependent variable, variable of interest, criterion variable

20
New cards

sample & population notation for the correlation coefficient

x: population : p

y: sample: r

21
New cards

Range of r

r= +/- 1

22
New cards

R is measuring

the common variance

23
New cards

Correlation Relationship

changes in one variable are associated with changes in another but it is not known whether one variable directly influences the other

In negative correlation when y increases x decreases / perfect relationship for Negative is r = -1

If r=0 is perfect no correlation

In positive correlation when x increases y increases

24
New cards

Linear Relationships

A relationship that has a straight line graph / where one variable changes by consistent amounts as you increase the other variable.

25
New cards

Non-linear relationships

Curve Graph /

slope increases at different rates at different points on the curve

Where one variable changes by inconsistent amounts as you increase the other variable.

Ex: emotional investment and performance

26
New cards

Proceed with Caution

. Sample Size

. Relationships Change

. Correlation is NOT Causation

. correlation - > causation -> liability

. Affect of Sample size

Relationships change over time

Correlation has to do with measuring the strength of the relationship

Underlyn factors of that relationship may change and therefore our correlation or our regression model becomes outdated

27
New cards

Null and Alternative Hypothesis

Null hypothesis (Ho)

- stating no difference

Alternative hypothesis (Ha)

- stating there's a difference

28
New cards

T- test in R

an inferential statistical analysis used when comparing two samples of data in either a matched groups design or a repeated-measures design

29
New cards

degrees of freedom

df = n-1

30
New cards

Chance Model

generate data from random processes to help them investigate such processes.

is the foundation upon which regression is built

31
New cards

Chance Model

How it's calculated:

no predictor variables

Importance:

to generate data from random processes to help statisticians investigate the process.

How is it graphed:

Slope is 0 so best forecast is always the mean of Y

32
New cards

Full Model

(FM) all predictor variables

33
New cards

Restricted Model (RM)

(RM) Some predictor variables

34
New cards

Correlations vs Regressions

Coefficient of Determination -> RSQ

RSQ: The percentage of the variation in the y-variable that is accounted for by the variation in the x-variable (s).

A percentage range from 0 to 1

Practical more so than r

35
New cards

Properties & Qualities regarding residuals (aka error)

Notation for a sample -> e

for a population -> E

Ordinary least squares regression : mimizing error.

If r= +/- we would have a perfect error (e=0)

{ ( Y- ^Y) = 0 Always !

36
New cards

Plot errors

R and only scattered around the x-axis normally distributed.

37
New cards

Residual Plot for Sale Predictions:

If the plots make a pattern it is a non-linear relationship

We expect a majority of the errors to be near the x-axis with fewer outliers.

38
New cards

The variation in the y-variable consists of 2 components:

Y's relationship with the x-variable

Random Factors NOT in the model

39
New cards

Isolating the Slope

The affect of marginal inputs on predicted outcomes

A look at major league baseball

X-var : payroll ($mil)

Y-var: Wins

40
New cards

Adjusted RSQ

The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.

r2: sample r-square

p: number of predictors

N: total sample size

41
New cards

Multicollinearity

The term used to describe the correlation among the independent variables.

2 or more highly correlated predictor variables

Each predictor explains a unique portion of the variability within the y-variable

42
New cards

multiple regression equation

ŷ = b0 + b1x1 + b2x2 + ... + bk-1xk-1 + bkxk

43
New cards

Characteristics of R:

A single value representing the strength of a simuultaneous relationship between the x-variables and y-variables.

R is never negative (unlike r)

R ranges between 0-1 (r range is 1 to +1)

R does not indicate the direction of the relationship (unlike r)

R > or equal any single r for any single x-y relationship

44
New cards

dummy variable

A variable for which all cases falling into a specific category assume the value of 1, and all cases not falling into that category assume a value of 0.

- using 2 category nominal data

45
New cards

1-way anova

to determine the existence of a statistically significant difference among several group means.

The test actually uses variances to help determine if the means are equal or not.

46
New cards

multiple regression

a statistical technique that computes the relationship between a predictor variable and a criterion variable, controlling for other predictor variables

1. Prediction, forecasting

2. To determine underlying causes of changes in y-variable (variable of interests

Based on the least squares method.

Trying to minimize error, don't care if it's positive or negative.

47
New cards

Summary outputs:

This is an almost infinitesimal level of probability and is certainly less than our alpha level of .05 for a 5 percent level of significance.

48
New cards

Anova Table

The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation.

49
New cards

Coefficient table

A coefficient (input-output) table records the amount of each product (or the amount of output by each industry)

used as input per unit of output of the various products/industries.

50
New cards

Collinearity Matrix

is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.