Stat 201: Module 1-2

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/47

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

48 Terms

New cards

Inferential Questions

Questions regarding how summaries, patterns, trends or relationships in a data set extend to the wider population

New cards

Observation

A quantity or a quality (or a set of these) we collect from a given entity/object

New cards

Population

Complete collection of individuals or cases of interest

the population is fixed
you don’t have access to all elements of the population
we mathematically denote the population’s size using upper-case N

New cards

Population Parameter

A quantity that summarizes the population

parameters are constant (not random)
parameters are usually unknown

Whenever we’re interested in a proportion of some value in a population, the population parameter has a specific name: the population proportion

New cards

Census

An exhaustive enumeration or counting of all N individuals in the population

we do this in order to compute the population parameter’s value exactly

New cards

Sample

A collection of observations from a population

New cards

Sample estimate

Numerical characteristic of the sample that estimates the population parameter

New cards

Statistical Inference

The process of using a sample to make a conclusion about the broader population from which it is taken

New cards

Point Estimate (aka Sample Statistic)

Single number calculated from a random sample that estimates an unknown population parameter of interest

the value is only an estimate (our best guess of our population parameter using this sample)
since the sample was random, if we took another random sample and compute the value for that sample, we’d get a different answer
estimates vary from sample to sample due to sampling variability
- if we get a different value for an estimate w/ a different sample, the point estimate might be unreliable
specific name when considering proportion: the sample proportion

New cards

Variable

An attribute of the elements in the population

what we’re measuring for each individual

New cards

Random Sample

A randomly selected subset of the population

ensures our point estimates are accurate
ensures that a sample is unbiased and representative of the population
every individual has an equal chance of being sampled
changes every time you draw a sample
you do have access to all elements of the sample

New cards

Sampling

The act of collecting a sample from the population, which we generally do when we can’t perform a census

we mathematically denote the sample size using lower case n
typically n is much smaller than the population size
sampling is a cheaper alternative than performaing a census

New cards

Representative Sample

A sample is said to be representative if it roughly “looks like” the population

New cards

Generalizable Sample

We say a sample is generalizable if any results based on the sample can generalize to the population

New cards

Biased/Unbiased Sampling Procedure

a sampling procedure is biased if certain individuals in a population have a higher chance of being included in a sample than others
a sampling procedure is unbiased if every individual in a population has an equal chance of being sampled

New cards

Accuracy vs Precision

random sampling ensures out point estimates are accurate
having large sample sizes ensures our point estimates are precise

<ul><li><p>random sampling ensures out point estimates are accurate</p></li><li><p>having large sample sizes ensures our point estimates are precise</p></li></ul><p></p>

New cards

Statistic

A quantity calculated based on a sample

eg. sample mean, sample standard deviation
statistics are used to estimate parameters
statistics are random quantities and depend on the sample drawn

New cards

What happens to the statistics when a new sample is taken?

Statistics will change, since the sample is random. It is likely that the sample will be different every time

New cards

Proportion

The number of entities/objects with a specific characteristic divided by the total number of entities/objects

can be used to describe categorical data

New cards

Variance

The mean of the sum of the squared distances of each observation from the mean value of all observations

New cards

Quantile

A number such that a given percentage of the data is lower than that number

New cards

Correlation

The strength and direction of the relationship between two variables

New cards

Sampling Distribution

A probability distribution of a statistic calculated from all possible samples of a specific size drawn from a population

helps us see how spread out the samples are from each other
each time we take a random sample we will have the same sampling distribution
an ideal sampling distribution would be one bar concentrated on the population parameter
tells us a lot about our statistic
- we could calculate the probability that our statistic would be to a certain range around the parameter

New cards

Sample size on Sampling Distribution

the bigger the sample size, the more narrow the sampling distribution
- there are fewer differences due to sampling variation and the distribution centers more tightly around the same value
the smaller the sample size, the wider the sampling distribution

New cards

What does it mean to have a narrower sampling distribution?

less variability between samples
the values in the ‘table containing all possible samples' will be closer to the population parameter
- it matters less which sample is randomly chosen

New cards

Questions to consider when looking at a sampling distribution

what is the centre of the sampling distribution?
how spread is the sampling distribution?
what is the shape of the sampling distribution?

New cards

What does the sampling distribution show us?

what point estimates are possible (even more: their probabilities of occurring)
where the true parameter is (eg. for means it lies at the mean of the sampling distribution)

New cards

If we knew the population, could we find the sampling distribution?

Yes, the sampling distribution is usually unknown, but technically if we knew population, you could potentially obtain the exact sampling distribution

calculate the statistic across all possible samples
this is only manageable for very tiny problems as there are many possible samples for even small samples

New cards

What is the center of a sampling distribution of the sample mean?

The population mean

New cards

What affects the variability of the sampling distribution?

population: affects the samples taken
sample size
statistic

New cards

Symbols to remember

Population average = μ
Sample average = x̄
Proportion = p
sample proportion = p̂

New cards

Population Distribution

the population distribution is obtained by measuring all the elements in the population
the population distribution is unknown

New cards

Sample Distribution

the sample distribution is obtained by measuring all the elements in the sample
the sample distribution is known
we hope that the sample distribution resembles the population distribution

New cards

Is the mean susceptible to outliers?

Yes, but the median is more robust

New cards

Median

The middle observation of a sorted variable’s data

New cards

Independence

Independent sample: the selection of one element does not influence the selection of another

New cards

Sampling with Replacement

This allows repeated elements in our sample

works better when the population is larger, as there is a lower chance of getting the same values in a sample, and actually collecting new information

Steps:

select one element from the population
put the element back in the population
do steps 1 and 2 n times

New cards

Sampling w/o Replacement

This does not allow repeated elements in our sample

always getting new information
more informative than sampling w/ replacement
impacts independence and the chances of getting chosen
- when the population is large, the chances of getting chosen aren’t changing as much, but are still not independent

Steps:

select one element from the population
remove the element from the population
do steps 1 and 2 n times

New cards

Rule of Thumb: Sample Size

The sample size at most 10% of the population size

New cards

Pros of Sampling w/ Replacement

independent sample: selection of an element doesn’t influence the selection of other elements
variability even when the sample is the same size as the population

Used for bootstrap samples

New cards

Cons of Sampling w/ Replacement

less informative (repeated information)
less efficient

New cards

Pros of Sampling w/o Replacement

more informative (less repeated information)
- more precise parameter estimate

Used for sampling the population

New cards

Cons of Sampling w/o Replacement

Dependence: elements picked affect the chance of the elements you will pick later
- less problematic when sample is small compared to population
- need to check 10% rule
no variability if the sample is the same size as the population

New cards

Standard Error (SE)

The standard deviation of a statistic/sampling distribution

quantifies the amount of variation of the sample statistic around its mean
general rule, as sample size increases, the standard error decreases

New cards

Standard Deviation (σ or s)

The square root of the variance

measures the amount of variation of the values of a variable about its mean

New cards

Surreal Approach to Approximating the Sample Distribution

taking many samples from the population
- the approximation depends on the samples we draw
the more samples that are taken, the more representative they are of the population

New cards

Estimating the sampling distribution w/ bootstrapping

in reality, we only take 1 sample from the population of interest
- the bigger the sample, the better
we use the sample distribution as an estimate of the population distribution and take samples w/ replacement from the original sample (bootstrap samples)
bootstrap samples need to be the same size as the original sample size (smaller = wider bootstrap distribution, larger = narrow bootstrap distribution)
we then calculate the average in every bootstrap sample and plot the bootstrap distribution to approximate the sampling distribution
bootstrap samples can’t be used to improve the original sample, we only use it to estimate the sampling distribution

New cards

The Bootstrap Distribution

an approximation of the sampling distribution (has similar spread and shape)
is centered around the sample statistic (not the parameter)
used to estimate the standard error of a statistic