SOCI 418 - Social Statistics II

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/76

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

77 Terms

New cards

Positive skew

When the mean is larger than the median and mode
I.e., income is typically positively skewed
The tail is to the right

New cards

Negative skew

When the mode is larger than the median and the mean
The mean is smaller than the median
Tail is to the left

New cards

What are the different types of univariate graphs?

Histograms
Frequency distributions (kernel density plots)
(Normal) quantile comparison plots
Box plots

New cards

Histograms

Place variables into intervals of equal width, we call these bins
Count the number of observations within each bin
Display the frequency counts in bar graph

New cards

CONS of histograms

The visual representation of data depends on the arbitrary origin of bins
Shape of histogram depends on arbitrary width of bins
Histograms appear discontinous, even if they actually display continuous data
Bins may be too narrow to avoid “noise” where data is thinly dispersed

New cards

Kernel Density Plots

Non-parametric way of smoothing histograms
Alternative to histograms by averaging and smoothing them
Continuously moves window of fixed width across the data, calculating locally weighted avg of number of observations falling within window
Choose window width is a matter of trial and error, must see statistical theory to determine what works

New cards

Quantile comparison plots

Helps to compare the distribution with the theoretical distribution
One kind of data:
- How close does our data apply to the normal curve?
It doesn’t use arbitrary bins or averages
The continuity of data is preserved!
The more the data points deviate from the comparison line, the more it deviates from the normal curve

New cards

In quantile comparison plots, it allows us to look at the _____ of the distribution

tails

New cards

Boxplots

Shows summary information on the center, spread, and skewness
Show individual observations in tails and potential outliers
Useful to compare several distributions or make data look more symmetrical
We use box plots when we look at multiple variables

New cards

In boxplots, when the median is ____ in the middle, the distribution is most likely ____

not, skewed

New cards

What are the main components of a boxplot?

Minimum
Q1
Median
Q3
Maximum

New cards

Skewness

In distribution, where do the tails condense?

New cards

Center

Where is the mean, median, and mode.

New cards

Spread

Where is most of the data contained, and what is the range of data

The difference between Q1 – Q3 (IQR)
Minimum and maximum data points (variance)

New cards

Scatterplots

Display the relationship between two quantitative variables
Does not work well for discontinuous or non-continuous variables, OR values within a few categories relative to size

New cards

In a scatterplot, watch out for skewed data. Data that are skewed need to be _____ !

transformed

New cards

Multivariate graphs are helpful to examine ______ for all pairs of variables

bivariate scatterplots

New cards

Non-normality

When data is not normal

New cards

Heteroskedastivity

Variance is not constant

New cards

Non-linearity

The relationship is not linear

New cards

Linear transformation

Goal is to keep the spacing the same
I.e., inches → cm / Fahrenheit → Celsius / American dollars → Canadian dollars

New cards

Values that are _____ before transformation will still stay the same space afterwards

evenly spaced

New cards

Nonlinear transformation

Change spacing and shape, but keep data in order

I.e., log, powers, roots
Helpful for fixing regression issues

New cards

Monotonic increasing function

It maintains the order of data
If a > b then f(a) > f(b)

New cards

Monotonic decreasing function

Reverses the order of data
If a > b then f(a) < f(b)

New cards

Descending powers (log, roots, reciprocals) ____ large values and _____ small ones

shrink, spread

New cards

Descending powers can fix _____

Positive skew

New cards

Ascending powers (x²) do the opposite effect, they fix ____

Negative skew

New cards

We must only have ______ in a Box-Cox family of transformation

positive values

New cards

How to make positive values in Box-Cox

Add a constant (start)
i.e., X² + 3

New cards

Power transformations are effective ONLY when ratio of _______ is sufficiently large

highest to lowest data values

New cards

Positive skew (right tail too long) use ____ transformations to pull the tail in

log or root

New cards

Negative skew (left tail too long) uses ____ to stretch the tail

powers (x²)

New cards

Transformation can help _____ and make data ______

stabilize variance, easier to analyze

New cards

Mosteller and Turkey’s bulging rule

It gives guidance on which transformations to try

New cards

Nominal variables

Simple categories, categorize variables. (i.e., gender)

New cards

Ordinal variables

Rank different categories; however, we cannot quantify the variables. (i.e., education level)

New cards

Interval variable

Rank different categories and quantify the variables. (i.e, temperature)

New cards

Dichotomous variable

Works with only two categories. It can be nominal or ordinal.

New cards

Interval variables use measures of dispersion:

Range
Variance
Standard Deviation

New cards

Sample

Subset of the population

New cards

Population parameters

Information we want to know

New cards

Sample distribution

The distribution within a sample

New cards

Descriptive statistics

Describe the traits of a population/sample

New cards

Inferential statistics

Make predictions about a population derived from our sample

New cards

Theoretical distribution of sample means

Take all possible random samples
Calculate the mean for each sample
Plot the distribution of those means

New cards

Sample mean should congregate around the ______

population mean

New cards

The ____ the sample size, the _____ the sample mean aligns to the population mean

larger, closer

New cards

Central Limit Theorem

If all possible random samples of size n are drawn from a population with a mean and a SD then as n gets larger the distribution of sample means becomes approximately normal, with mean equal to the population mean and a SD equal to the standard error (SE).

New cards

CLT tells us three things:

Shape
Central tendency
Variability

New cards

Mean of the distribution of sample means is ____ to the true population mean

equal

New cards

If sample is big enough the SE will be very _____ and means cluster around the true pop mean

small

New cards

When we ____ sample size (n), we ____ standard error

increase, decrease

New cards

Standard error

Average between the difference between pop mean and sample mean.

New cards

Sample mean is an _____ point estimate of the real pop mean

unbiased

New cards

Standard deviation

How far does the score of a distribution deviate from the mean of the distribution. It describes the distribution of scores.

New cards

Null hypothesis

No association between two variables or conditions
Statistical independence
H0

New cards

Alternative hypothesis

Research hypothesis
There IS an association between two variables or conditions
Statistical dependence

New cards

We can only ____ or ____ the null hypothesis

reject, fail to reject

New cards

We ____ prove the alternative hypothesis to be true

cannot

New cards

Falsifiability

A single study can never prove something to be true. We can only fail to prove that it is false.

New cards

Type I error

Reject null hypothesis when it is actually true
I.e., Conclude the treatment is effective when it does not create any impact
False positive
Probability of making Type I error is alpha

New cards

Type II error

Reject null hypothesis when it is actually true
I.e., Conclude the treatment is NOT effective when it actually does create an impact
False negative
Probability of making Type II error is beta

New cards

We focus on ____ type I errors

decreasing

New cards

If sample means fall in the critical region than we must _____ the null

reject

New cards

T-test

Calculation used to test the null hypothesis about a population mean when the population SD is unknown and estimated using the sample standard deviation. It is characterized by heavy tails.

New cards

We use the t-distribution when the population standard deviation is _____

unknown

New cards

We use the Z distribution when the population standard error (SE) of the difference is _____

known

New cards

When sample size (n) is greater than ____ the t-distribution is roughly the same as z-distribution (normal distribution)

120

New cards

One-tail test

Test between two different variables going in one direction (i.e., women’s GPA is higher than a man’s).

New cards

Two-tailed test

Is the population mean equal to or not equal to a predetermined value? It is a test between various dimensions. The value could fall one way or the other (i.e., women’s GPA differs from men)

New cards

Steps for hypothesis testing:

State null and alternative hypotheses
Set alpha level
Find critical regions
Collect data and compute the test statistic
Once you calculate, decide if you want to reject or accept the null

New cards

Alpha

Probability that hypothesis test will result in Type I error

New cards

The most common alpha level:

95% confidence level, alpha = 0.05

New cards

Degrees of freedom

df = n - 1

New cards

Statistical significant is _______ practical importance

not the same as

New cards

P-value

Probability value that tells you how likely it is that your data could have occurred under the null hypothesis. It is calculated based on the results of a statistical test using your data. A small p-value (x<0.05) indicates that the observed results are unlikely to be due to chance alone.