Statistical Tests and Practical Skills

Evaluation of Variables using Spearman's Rank Correlation

Spearman's Rank Correlation, denoted as rsr_s, is a statistical tool employed to determine if a correlation exists between two variables. The foundation of this test is the Null Hypothesis (H0H_0), which posits that there is no correlation between the variables being studied. The strength and direction of the correlation are determined by an rsr_s value that falls on a scale from 1-1 to +1+1. A value of 1-1 indicates a strong negative correlation, 00 indicates no correlation, and +1+1 indicates a strong positive correlation. Values reaching or exceeding ±0.7\pm 0.7 are generally considered to represent a strong correlation. The significance of the correlation is assessed by comparing the calculated rsr_s value to a provided critical value; if rsr_s is greater than the critical value, the correlation is deemed significant, allowing the null hypothesis to be rejected. In practice, a significance level of 5%5\% (written as 0.050.05) is typically used, meaning researchers are 95%95\% certain of the correlation's existence.

The formula used for this calculation is rs=16D2n(n21)r_s = 1 - \frac{6 \sum D^2}{n(n^2 - 1)}, where nn represents the number of pairs and D2\sum D^2 represents the sum of the differences in ranked pairs. Critical values vary depending on the sample size (nn) and the desired significance level. For instance, at a 0.050.05 significance level, the critical value for n=4n=4 is 1.0001.000, for n=5n=5 it is 0.9000.900, for n=6n=6 it is 0.7710.771, for n=7n=7 it is 0.6790.679, for n=8n=8 it is 0.6430.643, for n=9n=9 it is 0.6000.600, for n=10n=10 it is 0.5640.564, for n=11n=11 it is 0.5270.527, for n=12n=12 it is 0.5040.504, for n=13n=13 it is 0.4780.478, for n=14n=14 it is 0.4590.459, for n=15n=15 it is 0.4430.443, and for n=16n=16 it is 0.4270.427. Other significance levels such as 0.10.1 and 0.010.01 provide higher or lower thresholds; for example, at n=6n=6, the values are 0.6570.657 for 0.10.1 and 0.9430.943 for 0.010.01.

A practical example of this test involves measuring distance from a source against the width of a feature across 15 sites. At Site 1 (150 m), the width is 0.40 m (Rank R1=1R_1=1, R2=1R_2=1, d2=0d^2=0). Site 2 (300 m) width is 0.80 m (R1=2R_1=2, R2=2R_2=2, d2=0d^2=0). Site 3 (450 m) width is 1.00 m (R1=3R_1=3, R2=4R_2=4, d2=1.000d^2=1.000). Site 4 (600 m) width is 0.95 m (R1=4R_1=4, R2=3R_2=3, d2=1.000d^2=1.000). Site 5 (750 m) width is 1.20 m (R1=5R_1=5, R2=6R_2=6, d2=1.000d^2=1.000). Site 6 (900 m) width is 1.10 m (R1=6R_1=6, R2=5R_2=5, d2=1.000d^2=1.000). Site 7 (1050 m) width is 1.30 m (R1=7R_1=7, R2=7R_2=7, d2=0d^2=0). Site 8 (1200 m) width is 1.40 m (R1=8R_1=8, R2=8R_2=8, d2=0d^2=0). Site 9 (1350 m) width is 1.85 m (R1=9R_1=9, R1=9R_1=9, d2=0d^2=0). Site 10 (1500 m) width is 2.40 m (R1=10R_1=10, R2=10R_2=10, d2=0d^2=0). Site 11 (1650 m) width is 2.55 m (R1=11R_1=11, R2=11R_2=11, d2=0d^2=0). Site 12 (1800 m) width is 3.20 m (R1=12R_1=12, R2=12.5R_2=12.5, d2=0.25d^2=0.25). Site 13 (1950 m) width is 3.80 m (R1=13R_1=13, R2=15R_2=15, d2=4d^2=4). Site 14 (2100 m) width is 3.60 m (R1=14R_1=14, R2=14R_2=14, d2=0d^2=0). Finally, Site 15 (2250 m) width is 3.20 m (R1=15R_1=15, R2=12.5R_2=12.5, d2=6.25d^2=6.25).

Analyzing Biodiversity with Simpson's Index of Diversity

Simpson's Index of Diversity (DD) is a statistical measure used to quantify the biodiversity of a habitat by considering both species richness and species evenness. The calculation follows the formula D=1[(nN)2]D = 1 - [ \sum (\frac{n}{N})^2 ], where nn is the number of individuals of one specific species and NN is the total number of all individuals of all species collected. The index operates on a scale from 00 to 11. A lower value indicates low biodiversity, which is often characterized by fewer species, unstable or extreme environments, and simple food webs. Conversely, a higher value signifies high biodiversity, indicating a stable environment with more species, more niches, and complex food webs. This index is a powerful tool for evaluating changes in an environment over time by monitoring whether the value is increasing or decreasing.

An ecological survey of diverse insects provides a clear application of this index. The survey recorded the following numbers of individuals (nn): Northern brown argus butterfly (7), Ladybird (34), Forester moth (6), Wasp (21), Grass spider (12), Bee (37), Hornet (7), Fly (59), and Highland Midge (19). The total number of all organisms (NN) is 202202. To find the index, the proportion of each species (nN\frac{n}{N}) is squared and then summed. For the Northern brown argus butterfly, nN=0.035\frac{n}{N} = 0.035 and $(\frac{n}{N})^2 = 0.001$. For the Ladybird, nN=0.168\frac{n}{N} = 0.168 and $(\frac{n}{N})^2 = 0.028$. For the Forester moth, nN=0.030\frac{n}{N} = 0.030 and $(\frac{n}{N})^2 = 0.001$. For the Wasp, nN=0.104\frac{n}{N} = 0.104 and $(\frac{n}{N})^2 = 0.011$. For the Grass spider, nN=0.059\frac{n}{N} = 0.059 and $(\frac{n}{N})^2 = 0.003$. For the Bee, nN=0.183\frac{n}{N} = 0.183 and $(\frac{n}{N})^2 = 0.033$. For the Hornet, nN=0.035\frac{n}{N} = 0.035 and $(\frac{n}{N})^2 = 0.001$. For the Fly, nN=0.292\frac{n}{N} = 0.292 and $(\frac{n}{N})^2 = 0.085$. Finally, for the Highland Midge, nN=0.094\frac{n}{N} = 0.094 and $(\frac{n}{N})^2 = 0.009$. The sum of these values, (nN)2\sum (\frac{n}{N})^2, equals 0.1720.172, leading to a high biodiversity index of D=10.172=0.828D = 1 - 0.172 = 0.828.

Statistical Relationships via Chi-Squared Test

The Chi-Squared Test (χ2\chi^2) is used to test for a statistically significant relationship between variables by comparing observed frequencies (OO) to expected frequencies (EE). The formula is χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}. To determine significance, the calculated χ2\chi^2 value is compared to a critical value obtained from a table. This critical value depends on the degrees of freedom, calculated as the number of categories minus one, and a chosen pp value, typically 0.050.05 (5%5\%). If the calculated χ2\chi^2 is greater than the critical value, the null hypothesis is rejected, meaning the variables are related and the difference is not due to chance. If the calculated χ2\chi^2 is less than or equal to the critical value, the variables are not considered related, and any differences may be attributed to chance.

A case study involving acclimatised fish migration patterns illustrates this test. In this study, observed migration data indicated that 40 fish migrated during the night and 40 fish migrated during the day. However, the expected values were 65 fish for the night and 15 fish for the day. For the night group, the calculation involves (OE)=25(O - E) = -25, (OE)2=625(O - E)^2 = 625, and $\frac{(O - E)^2}{E} = 9.6$. Critical values for the test are provided based on degrees of freedom (dfdf). For df=1df=1, values are 0.0160.016 (at p=0.900p=0.900), 0.4550.455 (at p=0.500p=0.500), 2.7062.706 (at p=0.100p=0.100), 3.8413.841 (at p=0.050p=0.050), and 6.6356.635 (at p=0.010p=0.010). For df=2df=2, values are 0.2110.211 (0.9000.900), 1.3861.386 (0.5000.500), 4.6054.605 (0.1000.100), 5.9915.991 (0.0500.050), and 9.2109.210 (0.0100.010). For df=3df=3, critical values are 0.5840.584, 2.3662.366, 6.2516.251, 7.8157.815, and 11.34511.345. For df=4df=4, values are 1.0641.064, 3.3573.357, 7.7797.779, 9.4889.488, and 13.27713.277.

Measuring Data Reliability with Standard Deviation and Confidence Limits

Standard Deviation (ss) is a measure used to quantify how much individual data points in a sample deviate from the sample mean. A lower standard deviation indicates that the data points are clustered closely around the mean, suggesting the data is more reliable and consistent. Conversely, a higher standard deviation indicates that the data is more spread out and potentially less reliable. The formula for calculating standard deviation is s=(xxˉ)2n1s = \sqrt{\frac{\sum (x - \bar{x})^2}{n - 1}}, where xx represents each measurement, xˉ\bar{x} is the mean, and nn is the number of measurements. Standard deviation is further used to calculate the Standard Error (SESE) through the formula SE=snSE = \frac{s}{\sqrt{n}}.

The 95% Confidence Limit is a range calculated as Mean±2×Standard Error\text{Mean} \pm 2 \times \text{Standard Error}. This limit provides a statistical basis for trusting that the data observed was not the result of chance. In graphical representations, these limits are displayed as error bars. If the error bars for different groups do not overlap, it indicates 95%95\% confidence that the difference between the groups is significant and not due to chance. If the error bars do overlap, the data is considered less trustworthy as the difference may be incidental. This is often applied in competitive scoring scenarios, such as comparing points scored by Team A, Team B, and Team C, where the presence or absence of overlap between their respective confidence intervals determines the significance of the performance differences.