Statistics

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/35

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:29 PM on 4/4/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

36 Terms

1
New cards

When do you use a bar chart?

When the data is qualitative (or discrete)

  • X-axis shows distinct values of variables

  • Y-axis shows number of times value occurs in data

2
New cards

When do you use a histogram?

Used for continuous quantitative data.

3
New cards

How to calculate density histogram?

  • Choose and origin t_{0} and a bin width, h.

  • Construct a mesh of qeually spaced points such that t_{j}=t_{0}+jh

  • Define bins B_{k}=(t_{k-1},t_{k}]

  • The height of the histogram is then Hist(x)=V_{k}/nh where V_{k} is the number of sample values in the bin B_{k}

4
New cards

When do you use a box and whisker plot?

Used for continuous data and based on the 5 number summary.

  • outlying data points are shown by circles

5
New cards

What do we class as an outlier when drawing boxplots?

We use the convention that an observation x_{i} is an outlier if, x_{i}<\tilde{Q}(0.25)-1.5IQR or x_{i}>\tilde{Q}(0.75)+1.5IQR

6
New cards

How do we calculate a qualtine?

for quantile \tilde{Q}(p) :

  • Calculate r=p(n+1)

  • If r is integer set \tilde{Q}(p)=x_{1}

  • if r<1 set \tilde{Q}(p)=x_{1}

  • if r>n, set \tilde{Q}(p)=x_{n}

  • Otherwise set \tilde{Q}(p)=x_{\lfloor r \rfloor}+(r-\lfloor r \rfloor)(x_{\lfloor r \rfloor +1}-x_{\lfloor r \rfloor})

7
New cards

What is a population?

A collection of individuals or items under consideration in a study.

8
New cards

What is a variable?

A measure of interest of the population.

  • A value is the state of the variable when measured

  • An observation is a set of measurements made under similar conditions, an observation may be considered a data point.

9
New cards

Types of qualitative and quantitative data?

Qualitative

  • Nominal

    • yes/no

    • eye colour

  • Ordinal

    • child,adult,elderly

    • small,medium,large

Quantitative

  • Discrete

    • count data

  • Continuous

    • income

10
New cards

What is a statistic?

Let x_{1,}\ldots,x_{n} be a random sample with pdf F_{X}

A statistic is then a function of the data h(X_{1},…,X_{n})

  • This statistic is a random variable itself

Repeated sampling and noting the value of the statistic builds up a probability distribution. This is called the sampling distribution.

11
New cards

How do we sample without replacement

  • Definition: Selecting n distinct individuals from a population of size N, where each possible sample has equal probability 1/\binom{N}{n} .

  • Key idea: Once an individual is selected, it cannot be selected again.

Procedure (sequential method):

  1. Randomly choose one individual from the remaining population.

  2. Add them to the sample.

  3. Remove them from the population.

  4. Repeat until n individuals are selected.

12
New cards

How do we sample with replacement?

  • Definition: Individuals are sampled from a population of size N , and can be selected more than once.

  • Number of possible samples: N^n

  • Probability of a specific sample: N^{-n}.

Procedure:

  1. Randomly select an individual from the population.

  2. Add them to the sample without removing them from the population.

  3. Repeat until n selections are made.

13
New cards

Expectation and variance of the sampling distribution \overline{X}

\mathbb{E}\left(\overline{X}\right)=\mu so as a result \overline{X} is said to be unbiased.

Var\left(\overline{X}\right)=\frac{\sigma^2}{n}\left(\frac{N-n}{N-1}\right)

For a larger population \frac{\left(N-N\right)}{N-1}\thickapprox1

Note:

Var(x)=\mathbb{E}(x²)-\mathbb{E}(x)²

14
New cards

What is the population variance?

\sigma^2=\frac{1}{N}\sum_{i=1}^{N}v_{i}^2-\mu^2

=\mathbb{E}(x²)-\mathbb{E}(x)²

15
New cards

Whats the sample variance and sample standard deviuation?

s^2=\frac{1}{n-1}\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^2

=\frac{1}{n-1}\left(\sum_{i=1}^{n}x_{i^{}}^2-n\overline{x}^2\right)

and sample standard deviation s=\sqrt{s²}

16
New cards

Whats the mean and variance of the sampling distribution?

\mathbb{E}\left(\overline{X}\right)=\mu

Var\left(\overline{X}\right)=\frac{\sigma^2}{n}

In the special case we are sampling from a normal distribution\overline{X} is also normally distributed.

17
New cards

Whats the central limit theorem?

Let X be a RV with mean \mu and variance \sigma². If \overline{X}_{n} is the mean of a random sample of size n drawn from the distribution of X, distribution of the statistic \frac{\overline{X}_{n}-\mu}{\sigma/\sqrt{n}} tends to the standard normal distribution as n tends to infinity.

Thus, for a large random sample from a population with mean \mu and variance \sigma², the sample mean \overline{X}_{n} is approximately normally distributed with mean ^{}\mu and variance \sigma²/n.

A rule of thumb is n should be at least 30.

18
New cards

Whats the expectation of sample variance

\mathbb{E}(s²)=\sigma²

19
New cards

Whats the chi-squared distribution (\chi²)

The continuous random variable Y is said to have\chi² distribution with k degrees of freedom (\chi²(k)) iff its pdf is given by

f(y)=\frac{1}{2^{\frac{k}{2}}\Gamma\left(\frac{k}{2}\right)}y^{\left(\frac{k}{2}\right)-1}e^{-\frac{y}{2}} if y>0, 0 otherwise.

Note:

  • this is a special case of the Gamma distribution with parameters \alpha=k/2 and \beta=1/2.

  • When k=2, y~Exp(1/2)

20
New cards

Whats the mean and variance of the chi-squared distribution?

\mathbb{E}(Y)=k

Var(Y)=2k

21
New cards

Whats the connection between the chi-squared distribution and the normal distribution?

Let Z_1,…,Z_k be i.i.d standard normal RVs. Then the random variable Y=\sum_1^{k}Z_{i}^2 had \chi² distribution with K degrees of freedom.

22
New cards

If X_1, \dots, X_n \sim N(\mu, \sigma^2) independently, what is the distribution of the sample variance S^2?

\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)

Equivalently:

\sum_{i=1}^n \left(\frac{X_i - \bar{X}}{\sigma}\right)^2 \sim \chi^2(n-1)

Key facts:

- Degrees of freedom = n-1

- \bar{X} and S^2 are independent

23
New cards

What are 2 important properties of estimators?

we would like the estimator \hat{\theta} of \theta to be such that:

  • the sampling distribution of \hat{\theta} is centred about the target parameter, \theta

  • the spread of the sampling distribution of \hat{\theta} is small.

If it has these properties we can expect estimates resulting from experiments to be close to the true value of the population parameter we are trying to estimate.

24
New cards

Whats the bias of a point estimator?

bias(\hat{\theta})=\mathbb{E}(\hat{\theta})-\theta

The estimate is said to be unbiased if \mathbb{E}(\hat{\theta})=\theta

25
New cards

Whats the standard error of a point estimator?

\sqrt{Var(\hat{\theta}})

We want the estimator with the smallest variance.

26
New cards

What are Method of Moment estimators?

Matching the parameters to the moments of a distribution.

i.e 1 parameter - population mean

2 parameters - population mean and variance

27
New cards

What is the likelihood function?

Let X_1,…,X_n be an i.i.d random sample from the discrete distribution with pmf p(x|\theta) where \theta is a parameter whose value is unknown.

Given observed data values x_1,…,x_n from this model, the likelihood function is defined as L(\theta)=p(X_1=x_1,…,X_n=x_n|\theta)

Which due to independence is L(\theta)=\prod_{i=1}^{n}p\left(x_{i}\vert\theta\right)

28
New cards

Whats is maximum likelihood estimation?

The maximum likelihood estimator for \theta is the value \hat{\theta} that maximises the joint probability of the observed data, i.e. that maximises the value of the likelihood function L(\theta).

Maximisation of L(\theta)=\prod_{i=1}^{n}p\left(x_{i}\vert\theta\right) leads to a numerical value \hat{\theta} for the estimate of \theta.

29
New cards

How can we find the maximum likelhiood estimator?

In simple cases it can be found by simple calculus techniques, i.e solving \frac{dL(\theta)}{d\theta}=0.

It’s ussually easier algebraically to fin the maximum of the log-likelihood l(\theta)=logL(\theta) because for i.i.d data, logL(\theta)=log\left\lbrack\prod_{i=1}^{n}p\left(x_{i}\vert\theta\right)\right\rbrack=\sum_{i=1}^{n}logp(x_i|\theta)

It’s easier to differentiate a sum of functions rather than a product of functions.

\frac{dl\left(\theta\right)}{d\theta}\sum_{i=1}^{n}\frac{d\log_{}p\left(x_{i}\vert\theta\right)}{d\theta}=0

Solution is a maximum if second derivative is less than o at \theta=\hat{\theta}

30
New cards

What is an interval estimator?

A: An interval estimator gives a range of plausible values for an unknown parameter θ, rather than a single point estimate. It takes the form:

(L(X), U(X))

where L and U are functions of the sample data. The associated confidence level (e.g. 95%) tells you the probability that the interval contains the true parameter value across repeated samples.

31
New cards

What is the coverage probability of an interval estimator?

This is the probability that the interval contains, or ‘covers’, the true value of the parameter, i.e

P_\theta[l(X)\le\theta\le u(X)].

We use the notation P_\theta to emphasise that the probability distributions of l(X) and u(X) depend on \theta.

Mathematically defined as 100(1-\alpha) % where \alpha is the significance level (probability of rejecting a true null hypothesis).

The proportion 1-\alpha is called the confidence level, and the interval endpoints l(X),u(X) are known as the confidence limits.

32
New cards

What is the confidence interval formula for the mean of a normal distribution when the variance is known?

Answer: For X_1, \ldots, X_n \sim N(\mu, \sigma^2) with \mu unknown and \sigma^2 known, the 100(1-\alpha)\% confidence interval for \mu is:

I(x) = \left[\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right]

where z_{\alpha/2} is the upper \alpha/2 point of the standard normal distribution.

33
New cards

Define the Student's t-distribution.

Answer: If Z \sim N(0,1) and V \sim \chi^2(\nu) are independent, then:

T = \frac{Z}{\sqrt{V/\nu}} \sim t(\nu)

has a Student t-distribution with \nu degrees of freedom.

It had probability density function

f_{T}\left(x\right)=\frac{\Gamma\left(\frac{v+1}{2}\right)}{\sqrt{v\pi}\Gamma\left(\frac{v}{2}\right)}\left(1+\frac{x^2}{v}\right)^{-\frac{\left(v+1\right)}{2}}

It has mean 0 and variance \frac{\nu}{\nu-2} for \nu > 2. As \nu \to \infty, the t-distribution approaches the normal distribution.

34
New cards

What is the confidence interval for the mean of a normal distribution with unknown variance?

Answer: For X_1, \ldots, X_n \sim N(\mu, \sigma^2) with both \mu and \sigma^2 unknown, the 100(1-\alpha)\% confidence interval for \mu is:

I(x) = \left[\bar{x} - t_{\alpha/2}\frac{s}{\sqrt{n}}, \bar{x} + t_{\alpha/2}\frac{s}{\sqrt{n}}\right]

where t_{\alpha/2} is the upper \alpha/2 point of the t(n-1) distribution and s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i - \bar{x})^2}.

35
New cards

How do we construct a confidence interval for the mean of a non-normal distribution?

Answer: By the Central Limit Theorem, for large n (typically n \geq 30):

- If variance \sigma^2 is known: I(x) = \left[\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right]

- If variance is unknown: I(x) = \left[\bar{x} - z_{\alpha/2}\frac{s}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{s}{\sqrt{n}}\right]

These are approximate 100(1-\alpha)\% confidence intervals.

36
New cards

What is the confidence interval for an unknown population proportion p?

Answer: For a large random sample from \text{Bi}(1, p), the approximate 100(1-\alpha)\% confidence interval for p is:

\left[\hat{p} - z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p} + z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]

where \hat{p} = \bar{x} is the sample proportion.

Rule of thumb: n \geq 9\max\left\{\frac{p}{1-p}, \frac{1-p}{p}\right\}

Explore top notes

Explore top flashcards