Correlation Coefficient

  • if the points on a graph can be contained within an ellipse, then there is a correlation.

  • the narrower the elliptical profile, the greater the correlation

  • if question asks why PMCC valid/not valid, mention whether the scattered diagram looks elliptical which means that distribution may be bivariate normal

  • Bivariate data are usually displayed on a scatter diagram. if there is a telationship, this is called association.

  • there are sometimes two conditions: both variabes are random, he relationship is linear. if these two onditions are met, this is called a correlation.

  • Sxy by itself does not tell very much becaude no allowance as been made for the spread pf items of data, no allowance has been made for the spread withing the data, no allowance has been ade for units of x and y

  • Hypothesis testing for PMCC (remember to always define p - PMCC between x and y):

    H0:p=0, there is no correlation beteen the two variables

    H1:p not equal to 0: there is a correlation, but two tails test, so could be either positive or negative

    H1:p>0, there is a positive correlation between the two variables

    H1:p<0, there is a negative correlation between the two variables

  • the role of signidicance level is that for any sample size it represents the probability that the value of r will be further from zero than the critical value. r need to be bigger than the critical value to reject the null hypothesis.

  • hypothesis tests using PMCC require modeling assumptions that both varaibles are random and that the data are drawn from a bivariate Nomral distribution. This is usually the case when the diagram gives an approximately elliptical distribution. If one or both of distributions is skewed or bimodal, the procedure is likely to be inaccurate.

  • When you have n points, only n-2 of them count towards any test/can be effectlvely used to define the line of best fit, and this is called the degree of freedom, denoted by v.

  • in general, degrees of freedom = sample sixe - number of restrictions

  • A linear relationship established over a particular domain should not be assumed to hold outside this range. Hence, extrapolation does not accurately predict data. So when the data calculated is outside the given range of data, then prediction is unreliable.

  • Cohen suggestion: r of 0.1 represents a small effect size, r of 0.3 represents a medium effect size, r of 0.5 represents a large effect size.

  • For rank correlation coefficient:

    H0 : there is no association

    H1: there is positive association

  • For Spearman’s rank correlation coefficient(Rs):

    H0 : there is no asscoiation between variables

    H1: there is association (two tailed tests) or there is positive association or there is negative association

  • Spearman’s rank - need to rank from1 to n for data then use this - not the original data, to calculate Spearman’s rank

  • PMCC is not suitable for non-linear data.

  • a t-test can be used to test significance of a correlation coefficient by applying an appropriate transformation