Data analysis midterm 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/101

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

102 Terms

1
New cards

Correlational designs

Measures two variables as a naturally occurred to see if there’s a relationship

2
New cards

Operational definition

Describes how your variable is measured and what it means

3
New cards

Experimental method

Manipulating one variable while controlling the other variable to see if there’s a relationship

4
New cards

Independent variable

Predictor variable, variable that’s being manipulated in an experiment eg. Sitting with a friend versus sitting with a stranger.

5
New cards

Dependent variable

The outcome variable, the variable that’s being measured as the outcome eg. Roller coaster enjoyment.

6
New cards

Systematic variation

Difference in outcome created by a specific experimental manipulation (eg. Sitting with a stranger versus sitting with a friend.)

7
New cards

Unsystematic variation

Variation that’s not due to the effect we’re interested in (eg. Natural differences, time of day, etc..) things that may affect ride enjoyment that are not a part of the experiment

8
New cards

Within subject

The same entities take part in all experimental conditions

9
New cards

Problem for within subject, design, designs, and the solution

Practice effects, familiarity and boredom

Solution can be counter balancing: some will start with a different condition and then go to the stranger condition, vice versa

10
New cards

Between subject designs

Having different entities in experimental conditions

11
New cards

Types of variables

Discrete and continuous

12
New cards

Discrete variables

Finite numbers (whole number) eg. Age

13
New cards

Continuous variables

Numbers that have infinite number of possible values (fractions, decimals, time, weight, grades)

14
New cards

Scales of measurement

Nominal, ordinal, interval, and ratio

15
New cards

Nominal

Data that can be separated into exclusive categories (eg. Jobs (teacher, lawyer doctor.) relationship status (married single divorce)

16
New cards

Ordinal

Assigning numerical values to categories in an order sequence (eg. Gold,Silver,Bronze in a race = 1,2,3

17
New cards

Interval and ratio

Assigning numerical values in an order sequence (eg.timed races)

18
New cards

Interval

Zero is arbitrary, doesn’t include a true zero (eg.temperature, 0° doesn’t mean no more temperature)

19
New cards

Ratio

Zero is as meaningful as absolute zero (eg. Gas tank with zero gas means gas is absent.)

20
New cards
<p>Measurement error</p>

Measurement error

The difference between what we know is true and what we measure. In this image, the measurement error would be 3 pounds because the absolute truth was 100 pounds and the cheap bathroom scale was 103 so we would just subtract the two measurements

21
New cards

Validity

Is it measuring what needs to be measured?

22
New cards

Reliability

Weather an instrument can be interpreted consistently across situations

23
New cards

Central tendencies

Mean, median, and mode

24
New cards

Mode

The most frequent scores

25
New cards

Bimodal

Having 2 modes

26
New cards

Multimodal

Having several modes

27
New cards

Median

The middle score when scores are ordered

28
New cards
<p>Sum of score is divided by the number of scores</p>

Sum of score is divided by the number of scores

Mean equation

29
New cards
<p>frequency distribution</p>

frequency distribution

Organize data by the number of individuals located within each category

30
New cards
term image

N = 10

N= the amount of people in the study

31
New cards
<p>%= f (frequency) divided by n (number of participants x 100</p>

%= f (frequency) divided by n (number of participants x 100

Percentages equation

32
New cards

Percentile ranks

The percent of people what score is equal to or less than a specific value (if I scored in the 70th percentile 70% of people score lower than me

33
New cards
<p>Cumulative frequency</p>

Cumulative frequency

The amount of people at or below a square

34
New cards

Cumulative percentage equation (percentage rank)

C%= cf (cumulative frequency) divided by n x 100

<p>C%= cf (cumulative frequency) divided by n x 100</p>
35
New cards
<p>Symmetrical distribution</p>

Symmetrical distribution

If it’s perfectly symmetrical, it means the main median and mode are the same

36
New cards
term image

Positive skew

37
New cards
term image

Negative skew

38
New cards

Skew

The symmetry of the distribution

39
New cards

Kurtosis

The heaviness of the tails

40
New cards

Leptokurtic

Heavy tails (pointy) higher probability of extreme values

41
New cards

Platykurtic

Light tails (flatter)

42
New cards
term image

Leptokurtic

43
New cards
term image

Platykurtic

44
New cards

Population

The group of people you wanna study (eg. First year university students in Canada.)

45
New cards

Sample

A set of people from the population you want to study (eg. First year, Ontario Tech student students.)

46
New cards

Sampling variation

Statistics varying across different samples

<p>Statistics varying across different samples</p>
47
New cards
<p>Xi - mean </p>

Xi - mean

deviation equation

48
New cards

adding Deviations in a model

Will always equal 0 because some are positive and some are negative

49
New cards

SS equation (sum of squared errors)

Sum of squared deviations

<p>Sum of squared deviations</p>
50
New cards

Variance equation

SS (sum of squared deviations) divided by n-1

<p>SS (sum of squared deviations) divided by n-1</p>
51
New cards

Sample standard deviation equation (S)

= square root of SS divided by n-1

52
New cards

Is S and SD the same thing

Yes

53
New cards

What does it mean if data is clustered close together

Standard deviation is smaller

54
New cards

What does it mean if data is clustered far apart

Standard deviation is larger

55
New cards

What does Z score do

Interprets score in the context of their distribution (mean and standard deviation)

56
New cards

Distribution of Z score

Mean of 0 and standard deviation of 1

57
New cards

Z score equation

Score-mean divided by standard deviation

58
New cards

What does it mean if Z score is postive

It’s above the mean

59
New cards

What does smaller z score mean

It’s closer to the average

60
New cards

What does z score tell us

The location of a score within a distribution

61
New cards

What percent of scores fall between 1.96

5%, 2.5% on each side (positive and negative)

62
New cards

Standard error

Tells us the similarity of estimates from different samples

63
New cards

If standard error (SE) is small what does that mean

Samples produce similar estimates (tightly pooled together) good

64
New cards

Large SE

produce very different estimates (bad) (need a larger estimate to capture that 95%)

65
New cards

Confidence intervals

Gives an estimated range of values which likely includes the mean

66
New cards

Confidence interval equation

Mean (+/-) (1.96*standard error)

67
New cards

What does correlation do

Examines association between 2 variables (ranges form -1 to 1)

68
New cards

What does covariance do

Quantify a relationship between correlation variables

69
New cards

What does it mean if convince deviate from the mean by the same amount

They’re likely to be related

70
New cards

What does R2 (r squared) mean

Represents correlation. Squaring the value of r gives you the proportion of variance in one variable shared by another

71
New cards

Issues of correlation

Third variable problem and direction of causality

72
New cards

Third variable problem

Causality can’t be assumed

73
New cards

Direction of causality

Correlation coefficients say nothing about which variable causes the other to change

74
New cards

R= .1

Small effect (the effect explains 1% of total variance)

75
New cards

R= .3

Medium effect (the effect accounts for 9% of total variance

76
New cards

R= .5

Large effect ( the effect accounts for 25% of the variance)

77
New cards

What is a regression

A way of predicting the value of a variable from another

78
New cards

Why do we use regression

To predict values of a dependent variable from one or multiple predictors

79
New cards

Any straight line can be defined by

The slope b1 and the y intercept b0

80
New cards

Slope (b1) tells us

The direction of the relationship and the strength (if it’s really steep, there’s a stronger relationship between the predictor and outcome)

81
New cards

The intercept b0 tells us

Where the line starts

82
New cards

b0 =

Y intercept

83
New cards

Yi=

Score of y for an individual score

84
New cards

Xi=

Score of x for individual score

85
New cards

B1=

Slope (effect)

86
New cards

Ei=

Error

87
New cards

Why do we have an error

Bc not every person will fit perfectly on the line

88
New cards

Flat line means

No relationship

89
New cards

x variable for regression is

The predictor

90
New cards

Y variable of regression is

The outcome

91
New cards

Sst (total variability) is

Distance between the data and the mean of Y

92
New cards

SSR (residual error variability) is

Difference between regression model and the actual data (if the number is big then regression line isn’t good)

93
New cards

If SSm is large that means

Linear model has made a big improvement to how well the outcome variable can be predicted

94
New cards

If SSm is small it means

using the linear model is little better than using the mean

95
New cards

Ssm (SSmodel) looks at what

Model variability (difference in variability between the model and the mean) relationship between regression and the mean

96
New cards

mean squared errors (MS) equation

Ss divided by df

97
New cards

MSM means

Mean squared model

98
New cards

MSR means

Mean squared residual

99
New cards

For the null what do we assume

The hypothesis is true (there’s not effect)

100
New cards

R squared

Tell us the preparation of variance of accounted for by a regression model (eg. If r squared is 0.483 that means this model (garlic worn) can account for 48.3% of the variation shared in the zombie approach times