GW Blok 6 Seminar 2.2 Foundation of inferential statistics

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/20

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

21 Terms

New cards

What is a population in statistics?

The entire group of individuals or observations we want to study or make conclusions about.
Population → the entire set of relevant units of analysis
- Example: first year students of EPH and GZW
Population → parameter
Parameter → the value of a variable that characterizes a population
- Example: the mean of the expected exam score in the entire cohort of students, but population mean is unknown

New cards

What is a population parameter?

A numerical value that describes a characteristic of the entire population, often denoted using Greek letters (e.g., μ for mean, σ for SD).

New cards

What is a sample in statistics?

A subset of the population selected for analysis, ideally randomly chosen to represent the population.
Sample → any subset of the population
- Example: students who participated the wooclap event in the lecture
Sample → statistic
Statistic → the value of a variable that characterizes a sample
- Example: mean of the expected exam score in the sample (97 students) x bar = 5.77

New cards

What is a sample statistic?

A numerical value calculated from a sample, used to estimate the population parameter (e.g., x̄ for mean, s for SD).
Population parameters are usually unknown and the purpose of inferential statistics is to try to estimate these as precise as possible by means of estimates calculated at sample level.

New cards

What is a random sample?

A sample where every individual in the population has an equal chance of being selected.
What we can do is to select a sample of say 1000 Dutch people such that every Dutch person has an equal probability to be included in your sample

New cards

What is a population distribution?

A population distribution is the (often unknown) distribution of the whole population from which the sample at hand is a random sample with (often unknown) true average μ0 or difference between true averages or any other measure of central tendency or measure of spread) and (also often unknown) standard deviation σ

New cards

What is a sample distribution?

A sample distribution is an observed distribution of the outcome values of the sample at hand of size N and with standard deviation s

New cards

What is the standard error (SE) of the mean?

It quantifies how much sample means vary from the population mean.
Standard error: SD of sampling distribution
SE becomes smaller with a larger sample size, because larger samples provide more accurate estimates, reducing variability between sample means.

<ul><li><p>It quantifies how much sample means vary from the population mean.</p></li><li><p>Standard error: SD of sampling distribution</p></li><li><p>SE becomes smaller with a larger sample size, because larger samples provide more accurate estimates, reducing variability between sample means.</p><p></p></li></ul><p></p>

New cards

What is the sampling distribution?

However, is a distribution of sample averages (or any other statistic of interest, e.g. distribution of sample odds ratios). It cannot be observed in practice because it will only be determined in the situation that you could have obtained an infinitely (a very, very large) large number of samples each of size N and standard deviation s
The theoretical distribution of a statistic (like the sample mean) calculated from all samples of the same sample size drawn from a population.
It shows how the sample mean (x̄) varies across different random samples and centers around the true population mean (μ).
the average of the sampling distribution is exactly equal to the true population average μ0.
If sample size increases, the sampling distribution becomes more peaked (narrower), meaning sample means are more tightly clustered around the true population mean.

<ul><li><p>However, is a distribution of sample averages (or any other statistic of interest, e.g. distribution of sample odds ratios). It cannot be observed in practice because it will only be determined in the situation that you could have obtained an infinitely (a very, very large) large number of samples each of size N and standard deviation s</p></li><li><p>The theoretical distribution of a statistic (like the sample mean) calculated from all samples of the same sample size drawn from a population.</p></li><li><p>It shows how the sample mean (x̄) varies across different random samples and centers around the true population mean (μ).</p></li><li><p>the average of the sampling distribution is exactly equal to the true population average μ0.</p></li><li><p>If sample size increases, the sampling distribution becomes more peaked (narrower), meaning sample means are more tightly clustered around the true population mean.</p></li></ul><p></p>

New cards

What is the region of likely outcomes in a sampling distribution?

What can be concluded is, e.g. that 95 % of all possible samples will fall within two standard deviations around the population average (property of the normal distribution). This region would be called the region of likely outcomes.

<ul><li><p>What can be concluded is, e.g. that 95 % of all possible samples will fall within two standard deviations around the population average (property of the normal distribution). This region would be called the region of likely outcomes.</p><p></p></li></ul><p></p>

New cards

Summary of population, sample and sampling distributions?

X and s indicate known values
mu and sigma are parameters and you know them

<ul><li><p>X and s indicate known values</p></li><li><p>mu and sigma are parameters and you know them</p></li></ul><p></p>

New cards

What is the Central Limit Theorem (CLT)?

If the samples are independent and the sample size is large enough, the sampling distribution of the sample mean will be approximately normal, regardless of the population’s distribution.
- N>40
The distribution of sample mean bar will, as the sample size increases, approach a normal distribution
The mean of all the sample means will be the population mean (mu), niet afhankelijk van steekproefgrootte
- Alleen de vorm is afhankelijk van de sample size
The standard deviation of the sample means will approach sigma/wortel van N = standard error
It allows us to use normal distribution properties to make inferences about the population mean even when the population distribution is not normal.

<ul><li><p>If the samples are independent and the sample size is large enough, the sampling distribution of the sample mean will be approximately normal, regardless of the population’s distribution.</p><ul><li><p>N>40</p></li></ul></li><li><p>The distribution of sample mean bar will, as the sample size increases, approach a normal distribution</p></li><li><p>The mean of all the sample means will be the population mean (mu), niet afhankelijk van steekproefgrootte</p><ul><li><p>Alleen de vorm is afhankelijk van de sample size</p></li></ul></li><li><p>The standard deviation of the sample means will approach sigma/wortel van N = standard error</p></li><li><p>It allows us to use normal distribution properties to make inferences about the population mean even when the population distribution is not normal.</p></li></ul><p></p>

New cards

Central limit theorem visualized

New cards

What is inference in statistics?

Drawing conclusions about the population based on data from a sample.

New cards

What is a confidence interval and what effect does the sample size, CI and low variation have on the CI?

How accurate our sample estimate (e.g., sample mean) is likely to be in estimating the true population parameter.
- Used to express the range in which we are pretty sure the population parameter lies
If sample size increases the CI becomes narrower (more precise estimate)
If confidence level increases, the confidence interval increases, it becomes wider (more certainty = bigger range)
Population with low variation → similar samples with low variation → narrow confidence interval
- And vice versa
Small samples vary more from each other and have less information → wider confidence interval
- And vice versa

<ul><li><p>How accurate our sample estimate (e.g., sample mean) is likely to be in estimating the true population parameter.</p><ul><li><p>Used to express the range in which we are pretty sure the population parameter lies</p></li></ul></li><li><p>If sample size increases the CI becomes narrower (more precise estimate)</p></li><li><p>If confidence level increases, the confidence interval increases, it becomes wider (more certainty = bigger range)</p></li><li><p>Population with low variation → similar samples with low variation → narrow confidence interval</p><ul><li><p>And vice versa</p></li></ul></li><li><p>Small samples vary more from each other and have less information → wider confidence interval</p><ul><li><p>And vice versa</p></li></ul></li></ul><p></p>

New cards

What is a 95% CI and how was it made possible?

95% CI: with 95% confidence, the interval will cover the true mean, with lower bound … and upper bound …
In real life you don’t know what the true value of the population is → 95% CI for the sample value
This is made possible because of the central limit theorem for 𝑋 bar
In general, we can use a (1-alfa)100% -confidence interval (C.I.), with alfa =0.05. Basically, alfa can be chosen between 0 and 1

<ul><li><p>95% CI: with 95% confidence, the interval will cover the true mean, with lower bound … and upper bound …</p></li><li><p>In real life you don’t know what the true value of the population is → 95% CI for the sample value</p></li><li><p>This is made possible because of the central limit theorem for 𝑋 bar</p></li><li><p>In general, we can use a (1-alfa)100% -confidence interval (C.I.), with alfa =0.05. Basically, alfa can be chosen between 0 and 1</p></li></ul><p></p>

New cards

What determines the width of the confidence interval

New cards

How do you calculate the confidence interval?

T krijg je vaak gegeven
Eerst som + en daarna - → betrouwbaarheidsinterval

<ul><li><p>T krijg je vaak gegeven</p></li><li><p>Eerst som + en daarna - → betrouwbaarheidsinterval</p></li></ul><p></p>

New cards

How do you use a 95% confidence interval to test a hypothesis?

Check if the hypothesized value (μ₀) is within the 95% CI:

If μ₀ is inside the CI → not significantly different → not enough evidence provided to reject H0
If μ₀ is outside the CI → significantly different → enough evidence provided to reject H0

New cards

How is the 95% confidence interval related to p-values?

If the null value (μ₀) is outside the 95% CI, then p < 0.05 → significant.
If μ₀ is inside the CI, then p > 0.05 → not significant.

New cards

Some notation