The trade-off here of having no assumptions is reduced sensitivity.
Parametric
Non-parametric
Distribution
Normally distributed
Non-normal distribution
Variance
Homogeneous variance
Heterogeneous variance
Data Screening
Testing for normality and homogeneity of variance.
What to do if your data fail these tests!!!!
Normality
The data should be normally distributed BUT many statistical tests are nevertheless robust to deviations.
Testing for normality
Numerical - Shapiro-Wilks, Kolmogorov- Smirnov etc.
Graphical - histograms, boxplots, QQ plots, normal probability plots, detrended normal plot, etc
Hypotheses for Normality Tests
Tested at (α=0.05)
H0: The data come from a population which is not different to a normal distribution
HA: The data come from a population which is different to a normal distribution
Graphical Tests of Normality - Histograms
Relatively Normal
Positive Skew
Negative Skew
Bi-Modal
Graphical Tests of Normality - Boxplots
Extreme value
Median
minimum
maximum
75th Percentile
25th Percentile
Graphical Tests of Normality – Normal Probability Plots (QQ Plots)
Normal
Positive Skew
Fat positive tail
Positive skew (A greater frequency of large measurements than expected)
Middle slightly negative
Negative Skew
Thicc negative tail
Negative skew (a greater frequency of small measurements than expected)
Homogeneity of Variance
The variances of all treatment groups should be similar.
Homogeneity of variance is especially important with unequal group sizes.
Beware unequal and/or small (< 6 per group) sample sizes. These usually do NOT have homogenous variance.
Variance = SD2ˆ
Hypotheses for Homogeneity of Variance (Levene’s Test)
(α=0.05)
H0: There is no difference in the amount of variance between treatment groups. (i.e. have equal variances, homogenous variance).
HA: The variances of the treatment groups are different (i.e. heterogeneous variance).
Levene’s Test
Levene’s test is a robust test as it handles data sets that deviate from a normal distribution.
Hypotheses for Levene’s Test: tested at (α=0.05)
H0: The treatment groups have equal variances (i.e. homogenous variance).
HA: The variances of the treatment groups are not all the same (ie heterogeneous variance).
P > 0.05 so ACCEPT HO
The Importance of Meeting Assumptions
Homogeneity of variance is a more important assumption than normality
i.e. some deviation from Normality will not affect results as much as non-homogeneous variance
Data Transformation
Why transform?
How do you transform data?
Choosing the right transformation.
Why Transform? – Normalise the Data!
Distribution
Transformation
Symmetrical dist
None needed
Positively (right) skewed
x
Negatively (left) skewed
K−x
Irregular dist
Give up!
Data Transformation - Common Transformations
Distribution
Transformation
Moderately positively skewed
Square-root (x)
Substantially positively skewed
Log10 (x)
Substantially positively skewed
Log10 (x + C)
Moderately negatively skewed
Square root (K-x)
Substantially negatively skewed
Log10(K-x)
x = each value in the data
K = your largest value + 1
C = is a constant you add to make your smallest value 1 (usually 1)
How to Transform Data
e.g. log 10 transformation
log10 (55) =
log10 (48) =
How to Transform Data - Examples of Transformations
Value
Square Root
Log10
1
1
0
10
3.16
1
100
10
2
1,000
31.6
3
10,000
100
4
When to Transform?
Remember - Transformations on small data sets are rarely very successful.
When p < 0.05 for normality and homogeneity of variance tests AND difference is visible in plots.
BUT – what if only 1 data vector has a p<0.05?
MUST transform all data vectors that you are comparing
Once transformed – carry out statistical analysis USING transformed data
Report – Test statistic, df and p value
Remember to use original numbers (‘back-transform’) for graphs / interpretation – make sense of the results.
T-Tests
t-tests to compare the means of two groups of data to establish whether they are different or not.
When are they used?
Data Assumptions for t-tests
t-tests are parametric tests and therefore have a number of data assumptions:
Normal Distribution
Homogeneity of variance
Random assignment of subjects
If these are violated, it is not appropriate to perform a t-test.
Instead, it is necessary to perform a non-parametric test (covered later in the course).
Three Types of t-Tests
One-sample
Independent
Dependent / paired t-tests
One-Sample t-Test
Compare the mean of your samples with an expected mean
Example: A scientist measured the body temperature of a bunch of crabs and wanted to know if their mean body temperature tended to ambient after exposure for a period of time. Ambient temperature was 24.30C.
H0: µ = 24.30C
HA: µ ≠ 24.30C
H0: The body temperature of this sample of crabs is not significantly different from ambient (24.30C)
HA: The body temperature of this sample of crabs is significantly different from ambient (24.3 0C)
Two Samples – Independent t-Test
Testing for differences between the means of both samples
Example: In order to deduce the relative intelligence of two species, an anthropologist is investigating the brain capacity of hominoids in the genus Homo.
H0: µ(Hh) = µ(He)
HA: µ(Hh) ≠ µ(He)
H0: The mean brain capacity of Homo habilis and Homo erectus are not different
HA: The mean brain capacity of Homo habilis and Homo erectus are significantly different
Two Samples – Dependent (Paired) t-Test
Testing for differences between the means of repeated or paired measurements on the same samples
Example: As a test for stress, blood cortisol levels (ng/ml) were measured in 8 wombats when they were newly captured and again after one month in captivity.
H0: µ(Time 1) = µ(Time 2)
HA: µ(Time 1) ≠ µ(Time 2)
H0: Mean blood cortisol levels are not different between Time 1 and Time 2
HA: Mean blood cortisol levels are significantly different between Time 1 and Time 2
One-Sample t-Test Explained
Measures if the sample mean from one set of measurements is different from the reference mean
Where:
Population mean
Sample mean
Std error
Hypotheses (tested at α = 0.05):
HO: µ1 = µ2 HA: µ1 ≠ µ2
HO: There is no significant difference between the sample mean and test mean
HA: The sample mean and test mean are not the same
Reject HO if: If t(calculated) > t(critical) (or) p < 0.05
One-Sample t-Test - Worked Example
Do quokkas from the mainland (Pinjarra) have a similar body size to those found on Rottnest Island?
Adult male quokkas from Rottnest average 22.8 cm. Adult males from the mainland colony have the following body length measurements (cm):
H0: x = 22.8 cm
HA: x ≠ 22.8 cm
H0: There is no significant difference in mean body length between the Rottnest Island and mainland Quokka colonies
HA: There is a significant difference in mean body length between the Rottnest Island and mainland Quokka colonies
Rstudio output:
p > 0.05 therefore accept HO
Results of One-Sample t-Test Example
A one sample t-test (α = 0.05) was performed to determine whether the mean body length of adult male quokkas from two colonies (Rottnest and Pinjarra) were similar. There was no significant difference in mean length between colonies OR Quokkas from both colonies were similar in mean length (t = 0.246; df = 6; p = 0.814).
The mean body lengths of Rottnest and Pinjarra quokkas were 22.8 cm and 22.6 cm (± 0.99s.e.) respectively.
Worked Example (using T Scores)
Longevity of a sample of 9 Humans
Population norms: mean = 80, S.D = 10 Sample: mean = 85, n=9, α = 0.05 (probability level)
CALCULATIONS
Sample mean (85) – population mean (80) = 5
S.D (10)/ SQRT (9) = 3.334
t value = 5/3.334 = 1.5
Degrees of freedom (f) = n - 1 (for one sample & dependent)
n - 2 (for independent samples)
Two-Sample Independent t-Test
t = difference between group means / variability of groups
Measures the probability of overlap between the distributions of 2 means
Equation:
t = XA-XB / SEd
Two sources of variance when looking at groups of data:
between the groups and
within the groups
Big variance BETWEEN groups compared with WITHIN groups = Significant difference between groups = REJECT HO
Big variance WITHIN groups compared with BETWEEN groups = NO significant difference between groups = ACCEPT HO
How to Reject/Accept Hypothesis
Either:
p<0.05 reject HO (if using RStudio or any computer package)
Or:
compare tcalculated to tcritical (from tables) to judge significance. If tcalculated > tcritical reject HO
Two-Sample Independent t-Test - Worked Example 1
In order to deduce the relative intelligence of two species, an anthropologist is investigating the brain capacity of hominoids in the genus Homo.
An independent t-test indicated that the mean brain capacity (cm3)of Homo habilis and Homo erectus are not the same (t = 11.06; df = 12; p < 0.001). The mean brain capacity of Homo habilis and Homo erectus were 656 and 894 cm3 respectively .
Discussion/conclusions: If brain capacity is indicative of intelligence, Homo erectus was more intelligent that Homo habilis.
Two-Sample Independent t-Test - Worked Example 2
A forensic scientist wants to determine whether two cannabis samples have come from the same supplier. Measures the THC content of cannabis in 2 samples (%w/v) (n=7)
RESULTS: An independent t-test (n = 7; α = 0.05) indicated that the mean THC content (%w/v) of cannabis from sample one and sample two were not statistically different (t = 0.16; df = 12; p = 0.875). The mean %w/v of the samples were 4.043 and 4.057 for sample one and sample two respectively.
Dependent (Paired) t-Test
To test whether the means of two sets of paired measurements are different from each other.
Look at the differences between each pair of points and then see if the mean of these values is significantly different from zero.
using a one-sample t-test with a comparison mean of zero
How to use the formula: Calculate the t value and use the tables to reject/accept the null hypothesis
DF = n-1
S.E = SD ÷ √n
T = mean diff ÷ S.E. diff
Independent Versus Paired Test
Paired t-Test – does not consider within group variability – therefore it is a more sensitive test
More likely to be significant
Independent – considers within group variability – a penalty factor which results in a larger SE and smaller t value (Look at equation!)
Smaller T-value = less likely to be significant
Review Question
The water content of a sample of nine sediment cores taken at random is: 6.1, 5.5, 5.3, 6.8, 7.6, 5.3, 6.9, 6.1, 5.7 wt% H2O
H0: The sample is from a population with a mean content of 7.0% H2O
HA: The sample is from a population with a mean water content that is NOT 7.0% H2O
SD = 0.803, α=0.05 and 0.01
Calculate which hypothesis is true for each α value