OCR MEI Further Stats

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/34

There's no tags or description

Looks like no tags are added yet.

Last updated 10:25 PM on 6/11/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

35 Terms

New cards

Under what conditions is Binomial distribution suitable?

- set number of trials

- each event is independent and random

- only 2 possible outcomes / number of 'successes' is being modeled

New cards

How to convert binomial to mean and variance

Mean = np, Var = npq

New cards

Under what conditions is Poisson distribution suitable?

When a given event has a uniform mean rate of occurrence within a fixed interval of time/space, and events are independent of each other

New cards

What might suggest Poisson distribution to be unsuitable

If E(X) is not approximately equal to Var(x)

New cards

In what conditions could a Poisson distribution be used instead of a Binomial

When 'n' is large and 'P' is small in the binomial model

New cards

If given raw data, what could be estimated to be lambda

E(X) or Var(x)

New cards

what must 2 events be in order to add their poisson parameters together?

Independent of each other

New cards

Under what conditions is Geometric distribution suitable?

- when no. of trials are up to and including the first success

- events are random and independent

- 2 possible outcomes

New cards

In geo model: P(X>r) = ?

(1-p)^r

New cards

Layout of goodness of fit test

Ho: model is a good fit for given data

H1: model is not a good fit for given data

Given observed values, use model to find estimated values.

Combine columns is Fe<5

Ki squared = sum of (Fe-Fo)^2 / Fe

degrees of freedom = no. of columns -1 (if set sample size) -1 (if estimated parameter)

Use v and sign lvl to find crit value and compared to Ki squared value

Therefore, we (fail to) reject Ho, (in)sufficient evidence to suggest ...

New cards

What is random on non-random bivariate data?

When one variable is being measured (independent / random variable) which another variable is changed (dependent / non-random variable). Changes are fixed + any errors in this variable are negligible.

New cards

What is random on random bivariate data?

When measuring 2 variables which can be considered as random. Example: measuring height and weight of population as a given height doesn't have a set value for weight

New cards

Conditions for valid PMCC test

- invalid to use PMCC for random on non-random as the non-random variable doesn't have a probability distribution

- assume random on random data has an underlying bivariate normal distribution

New cards

What does underlying bivariate normal distribution look like on a scatter diagram. What could prevent this shape on a distribution?

Eliptical shape | one outlier / differently shaped scatter

New cards

What does regression line for random on non-random bivariate data represent

The true value of the random variable, i.e. value that would be observed from a perfect experiment

New cards

What do the regression lines for random on random bivariate data show?

The mean value of x for a given y, and the mean value of y for a given x.

New cards

When making a regression line with y on x, what must you define?

the variables x and y

New cards

Layout of PMCC hypothesis test

Ho: row = 0

H1: row >/

New cards

What is the effect of sample size on PMCC

As sample size increases, critical value decreases, therefore a sign. result can occur despite lower test statistic

New cards

When is Spearman's Rank appropriate?

When scatter diagram shows a monotonic relationship. I.e. as x increases, y increases

New cards

Con of using Spearman's rank?

Ranking data loses information -> affects test outcome

New cards

Outline the Spearman's Rank hypothesis test

Ho: There is no association between x and y

H1: There is some association between x and y

Calculate Rs value

Use sign lvl and n to find crit value

Compare Rs to crit value then conclude

New cards

Describe the regression line of y on x

This regression line minimizes the square of vertical distances between data points and the line.

Used to predict Y value based on X value.

Typically used in random on non-random scatter

New cards

Describe the regression line x on y

This regression line minimizes the square of horizontal distances from data points and the line.

Predicts x value based on y value.

New cards

What could effect the reliability / accuracy of predictions made from regression lines?

Interpolation vs extrapolation of data.

Low (PMCC)^2 value suggests data points lie far from regression line

New cards

How to calculate residual

Used for random on non-random bivariate data

Observed value - value from regression line

New cards

How could a Chi squared test produce a very low value and what is the significance?

Low when Fe = Fo

Evidence to suggest experimental data was falsified / inaccurate -> invalid test

New cards

Calculating cells in frequency table

Using observed table:

(row total x column total) divided by sample size = Fe

This is the expected frequency if the events were independent / had no association.

New cards

Outline Chi squared hypothesis test

Ho: no association between x and y

H1: there is an association between x and y

This is the only alternative hypothesis and is still one tail!

v = (no. of rows - 1)x(no. of columns -1)

Use v and sign lvl to find crit value

compare Chi squared to crit value

draw conclusion

New cards

What is the problem with a sample non-randomly chosen (2 marks)

If a sample is not RANDOM, then you are unable to make statistical INFERENCES about the POPULATION based on the sample

New cards

Pros and cons of large sample / census (entire population is being tested)

Pros: very accurate and representative

Cons: Costly and wasteful if must dispose of items post-test

New cards

Desirable traits of a sample

- unbiased

- representative of entire population

- randomly selected data points

New cards

Why use a random sample for a hypothesis test?

- random sample is unbiased

- enables statistical inferences to be about the population

- necessary to assume random sample for validity in hypo test

New cards

Why are columns combined in Chi Squared tests?

Because otherwise columns individual expected frequencies are too low, and would make the Chi Squared test invalid

New cards

Advantage of using a larger sample size in spearman's / pmcc

as sample size increases, random variation in the sample decreases, and samples r/rs value tends to get closer to the populations' r/rs value