SOCI 418 - Social Statistics II

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/76

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

77 Terms

1
New cards

Positive skew

  • When the mean is larger than the median and mode

  • I.e., income is typically positively skewed

  • The tail is to the right

2
New cards

Negative skew

  • When the mode is larger than the median and the mean 

  • The mean is smaller than the median

  • Tail is to the left

3
New cards

What are the different types of univariate graphs?

  • Histograms

  • Frequency distributions (kernel density plots)

  • (Normal) quantile comparison plots

  • Box plots

4
New cards

Histograms

  • Place variables into intervals of equal width, we call these bins

  • Count the number of observations within each bin

  • Display the frequency counts in bar graph

5
New cards

CONS of histograms

  • The visual representation of data depends on the arbitrary origin of bins

  • Shape of histogram depends on arbitrary width of bins

  • Histograms appear discontinous, even if they actually display continuous data

  • Bins may be too narrow to avoid “noise” where data is thinly dispersed

6
New cards

Kernel Density Plots

  • Non-parametric way of smoothing histograms

  • Alternative to histograms by averaging and smoothing them

  • Continuously moves window of fixed width across the data, calculating locally weighted avg of number of observations falling within window

  • Choose window width is a matter of trial and error, must see statistical theory to determine what works

7
New cards

Quantile comparison plots

  • Helps to compare the distribution with the theoretical distribution

  • One kind of data:

    • How close does our data apply to the normal curve?

  • It doesn’t use arbitrary bins or averages

  • The continuity of data is preserved!

  • The more the data points deviate from the comparison line, the more it deviates from the normal curve

8
New cards

In quantile comparison plots, it allows us to look at the _____ of the distribution

tails

9
New cards

Boxplots

  • Shows summary information on the center, spread, and skewness

  • Show individual observations in tails and potential outliers

  • Useful to compare several distributions or make data look more symmetrical

  • We use box plots when we look at multiple variables 

10
New cards

In boxplots, when the median is ____ in the middle, the distribution is most likely ____

not, skewed

11
New cards

What are the main components of a boxplot?

  • Minimum

  • Q1

  • Median

  • Q3

  • Maximum

12
New cards

Skewness

In distribution, where do the tails condense?

13
New cards

Center

Where is the mean, median, and mode.

14
New cards

Spread

  • Where is most of the data contained, and what is the range of data

  • The difference between Q1 – Q3 (IQR)

  • Minimum and maximum data points (variance)

15
New cards

Scatterplots

  • Display the relationship between two quantitative variables

  • Does not work well for discontinuous or non-continuous variables, OR values within a few categories relative to size

16
New cards

In a scatterplot, watch out for skewed data. Data that are skewed need to be _____ !

transformed

17
New cards

Multivariate graphs are helpful to examine ______ for all pairs of variables

bivariate scatterplots

18
New cards

Non-normality

When data is not normal

19
New cards

Heteroskedastivity

Variance is not constant

20
New cards

Non-linearity

The relationship is not linear

21
New cards

Linear transformation

  • Goal is to keep the spacing the same 

  • I.e., inches → cm / Fahrenheit → Celsius / American dollars → Canadian dollars

22
New cards

Values that are _____ before transformation will still stay the same space afterwards

evenly spaced

23
New cards

Nonlinear transformation

  • Change spacing and shape, but keep data in order

  • I.e., log, powers, roots

  • Helpful for fixing regression issues

24
New cards

Monotonic increasing function

  • It maintains the order of data

  • If a > b then f(a) > f(b)

25
New cards

Monotonic decreasing function

  • Reverses the order of data

  • If a > b then f(a) < f(b)

26
New cards

Descending powers (log, roots, reciprocals) ____ large values and _____ small ones 

shrink, spread

27
New cards

Descending powers can fix _____

Positive skew

28
New cards

Ascending powers (x²) do the opposite effect, they fix ____

Negative skew

29
New cards

We must only have ______ in a Box-Cox family of transformation

positive values

30
New cards

How to make positive values in Box-Cox

  • Add a constant (start)

  • i.e., X² + 3

31
New cards

Power transformations are effective ONLY when ratio of _______ is sufficiently large

highest to lowest data values

32
New cards

Positive skew (right tail too long) use ____ transformations to pull the tail in

log or root

33
New cards

Negative skew (left tail too long) uses ____ to stretch the tail 

powers (x²)

34
New cards

Transformation can help _____ and make data ______

stabilize variance, easier to analyze

35
New cards

Mosteller and Turkey’s bulging rule

It gives guidance on which transformations to try

36
New cards

Nominal variables

Simple categories, categorize variables. (i.e., gender)

37
New cards

Ordinal variables

Rank different categories; however, we cannot quantify the variables. (i.e., education level)

38
New cards

Interval variable

Rank different categories and quantify the variables. (i.e, temperature)

39
New cards

Dichotomous variable

Works with only two categories. It can be nominal or ordinal.

40
New cards

Interval variables use measures of dispersion:

  • Range

  • Variance

  • Standard Deviation

41
New cards

Sample

Subset of the population

42
New cards

Population parameters

Information we want to know

43
New cards

Sample distribution

The distribution within a sample

44
New cards

Descriptive statistics

Describe the traits of a population/sample

45
New cards

Inferential statistics

Make predictions about a population derived from our sample

46
New cards

Theoretical distribution of sample means

  • Take all possible random samples

  • Calculate the mean for each sample

  • Plot the distribution of those means

47
New cards

Sample mean should congregate around the ______

population mean

48
New cards

The ____ the sample size, the _____ the sample mean aligns to the population mean

larger, closer

49
New cards

Central Limit Theorem

If all possible random samples of size n are drawn from a population with a mean and a SD then as n gets larger the distribution of sample means becomes approximately normal, with mean equal to the population mean and a SD equal to the standard error (SE).

50
New cards

CLT tells us three things:

  • Shape

  • Central tendency

  • Variability

51
New cards

Mean of the distribution of sample means is ____ to the true population mean

equal

52
New cards

If sample is big enough the SE will be very _____ and means cluster around the true pop mean

small

53
New cards

When we ____ sample size (n), we ____ standard error

increase, decrease

54
New cards

Standard error

Average between the difference between pop mean and sample mean.

55
New cards

Sample mean is an _____ point estimate of the real pop mean

unbiased

56
New cards

Standard deviation

How far does the score of a distribution deviate from the mean of the distribution. It describes the distribution of scores.

57
New cards

Null hypothesis

  • No association between two variables or conditions

  • Statistical independence

  • H0

58
New cards

Alternative hypothesis

  • Research hypothesis

  • There IS an association between two variables or conditions

  • Statistical dependence

59
New cards

We can only ____ or ____ the null hypothesis

reject, fail to reject

60
New cards

We ____ prove the alternative hypothesis to be true

cannot

61
New cards

Falsifiability

A single study can never prove something to be true. We can only fail to prove that it is false.

62
New cards

Type I error

  • Reject null hypothesis when it is actually true

  • I.e., Conclude the treatment is effective when it does not create any impact

  • False positive

  • Probability of making Type I error is alpha

63
New cards

Type II error

  • Reject null hypothesis when it is actually true

  • I.e., Conclude the treatment is NOT effective when it actually does create an impact

  • False negative

  • Probability of making Type II error is beta

64
New cards

We focus on ____ type I errors

decreasing

65
New cards

If sample means fall in the critical region than we must _____ the null

reject

66
New cards

T-test

Calculation used to test the null hypothesis about a population mean when the population SD is unknown and estimated using the sample standard deviation. It is characterized by heavy tails.

67
New cards

We use the t-distribution when the population standard deviation is _____

unknown

68
New cards

We use the Z distribution when the population standard error (SE) of the difference is _____

known

69
New cards

When sample size (n) is greater than ____ the t-distribution is roughly the same as z-distribution (normal distribution)

120

70
New cards

One-tail test

Test between two different variables going in one direction (i.e., women’s GPA is higher than a man’s).

71
New cards

Two-tailed test

Is the population mean equal to or not equal to a predetermined value? It is a test between various dimensions. The value could fall one way or the other (i.e., women’s GPA differs from men)

72
New cards

Steps for hypothesis testing:

  • State null and alternative hypotheses

  • Set alpha level

  • Find critical regions

  • Collect data and compute the test statistic

  • Once you calculate, decide if you want to reject or accept the null

73
New cards

Alpha

Probability that hypothesis test will result in Type I error

74
New cards

The most common alpha level:

95% confidence level, alpha = 0.05

75
New cards

Degrees of freedom

df = n - 1

76
New cards

Statistical significant is _______ practical importance

not the same as

77
New cards

P-value

Probability value that tells you how likely it is that your data could have occurred under the null hypothesis. It is calculated based on the results of a statistical test using your data. A small p-value (x<0.05) indicates that the observed results are unlikely to be due to chance alone.