1/34
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Under what conditions is Binomial distribution suitable?
- set number of trials
- each event is independent and random
- only 2 possible outcomes / number of 'successes' is being modeled
How to convert binomial to mean and variance
Mean = np, Var = npq
Under what conditions is Poisson distribution suitable?
When a given event has a uniform mean rate of occurrence within a fixed interval of time/space, and events are independent of each other
What might suggest Poisson distribution to be unsuitable
If E(X) is not approximately equal to Var(x)
In what conditions could a Poisson distribution be used instead of a Binomial
When 'n' is large and 'P' is small in the binomial model
If given raw data, what could be estimated to be lambda
E(X) or Var(x)
what must 2 events be in order to add their poisson parameters together?
Independent of each other
Under what conditions is Geometric distribution suitable?
- when no. of trials are up to and including the first success
- events are random and independent
- 2 possible outcomes
In geo model: P(X>r) = ?
(1-p)^r
Layout of goodness of fit test
Ho: model is a good fit for given data
H1: model is not a good fit for given data
Given observed values, use model to find estimated values.
Combine columns is Fe<5
Ki squared = sum of (Fe-Fo)^2 / Fe
degrees of freedom = no. of columns -1 (if set sample size) -1 (if estimated parameter)
Use v and sign lvl to find crit value and compared to Ki squared value
Therefore, we (fail to) reject Ho, (in)sufficient evidence to suggest ...
What is random on non-random bivariate data?
When one variable is being measured (independent / random variable) which another variable is changed (dependent / non-random variable). Changes are fixed + any errors in this variable are negligible.
What is random on random bivariate data?
When measuring 2 variables which can be considered as random. Example: measuring height and weight of population as a given height doesn't have a set value for weight
Conditions for valid PMCC test
- invalid to use PMCC for random on non-random as the non-random variable doesn't have a probability distribution
- assume random on random data has an underlying bivariate normal distribution
What does underlying bivariate normal distribution look like on a scatter diagram. What could prevent this shape on a distribution?
Eliptical shape | one outlier / differently shaped scatter
What does regression line for random on non-random bivariate data represent
The true value of the random variable, i.e. value that would be observed from a perfect experiment
What do the regression lines for random on random bivariate data show?
The mean value of x for a given y, and the mean value of y for a given x.
When making a regression line with y on x, what must you define?
the variables x and y
Layout of PMCC hypothesis test
Ho: row = 0
H1: row >/
What is the effect of sample size on PMCC
As sample size increases, critical value decreases, therefore a sign. result can occur despite lower test statistic
When is Spearman's Rank appropriate?
When scatter diagram shows a monotonic relationship. I.e. as x increases, y increases
Con of using Spearman's rank?
Ranking data loses information -> affects test outcome
Outline the Spearman's Rank hypothesis test
Ho: There is no association between x and y
H1: There is some association between x and y
Calculate Rs value
Use sign lvl and n to find crit value
Compare Rs to crit value then conclude
Describe the regression line of y on x
This regression line minimizes the square of vertical distances between data points and the line.
Used to predict Y value based on X value.
Typically used in random on non-random scatter
Describe the regression line x on y
This regression line minimizes the square of horizontal distances from data points and the line.
Predicts x value based on y value.
What could effect the reliability / accuracy of predictions made from regression lines?
Interpolation vs extrapolation of data.
Low (PMCC)^2 value suggests data points lie far from regression line
How to calculate residual
Used for random on non-random bivariate data
Observed value - value from regression line
How could a Chi squared test produce a very low value and what is the significance?
Low when Fe = Fo
Evidence to suggest experimental data was falsified / inaccurate -> invalid test
Calculating cells in frequency table
Using observed table:
(row total x column total) divided by sample size = Fe
This is the expected frequency if the events were independent / had no association.
Outline Chi squared hypothesis test
Ho: no association between x and y
H1: there is an association between x and y
This is the only alternative hypothesis and is still one tail!
v = (no. of rows - 1)x(no. of columns -1)
Use v and sign lvl to find crit value
compare Chi squared to crit value
draw conclusion
What is the problem with a sample non-randomly chosen (2 marks)
If a sample is not RANDOM, then you are unable to make statistical INFERENCES about the POPULATION based on the sample
Pros and cons of large sample / census (entire population is being tested)
Pros: very accurate and representative
Cons: Costly and wasteful if must dispose of items post-test
Desirable traits of a sample
- unbiased
- representative of entire population
- randomly selected data points
Why use a random sample for a hypothesis test?
- random sample is unbiased
- enables statistical inferences to be about the population
- necessary to assume random sample for validity in hypo test
Why are columns combined in Chi Squared tests?
Because otherwise columns individual expected frequencies are too low, and would make the Chi Squared test invalid
Advantage of using a larger sample size in spearman's / pmcc
as sample size increases, random variation in the sample decreases, and samples r/rs value tends to get closer to the populations' r/rs value