Sampling Techniques and Statistical Analysis Review

Probability and Inclusion Probability

  • Probability is like inclusion probability.
  • Consider the probability of picking a female from the general human population.
  • Ideally, you want equal chances of sampling, meaning that the inclusion probability matches what you want.

Critical Value

  • QT grabs the t critical value from a probability distribution for a given degrees of freedom.
  • It's a fixed number for that degrees of freedom and confidence level.

Lecture Outline & Revision

  • Simple random sampling is not always the ideal technique for certain scenarios.
  • It works in theory, but real-world constraints exist.
  • Constraints include limited resources, money, and cost, especially when sampling land for soil carbon content.

Monitoring Question

  • This section will talk about stratified random sampling and revisiting a site.
  • Revisiting a site influences your thinking about the sets.
  • It involves different variations of recomputing a confidence interval band.
  • The focus is on one particular thing called weight.

Confidence Interval Band

  • The lecture discusses why we would want to torture ourselves to calculate confidence intervals differently.

Simple Random Sampling

  • If you toss 10 random points onto a landscape, you might miss something or sample some areas too much.
  • The idea of sampling is that it has to be representative.
  • To mitigate the two problems, you could sample more to the point that the chance of having these two problems is now randomized.
  • Limitations can hinder you such as a tall piece of rock that prevents you from throwing that ring behind that rock ever.

Example of Grassland and Wetland

  • Analogy: If an area has 80% grassland and 20% wetland, the ideal simple random sampling technique will out of 10 samples, sample 10 in the grassland and two in wetland.
  • Very possible that you get zero wetland.
  • Even more possible is that you will never ever hit that ratio of eight to two.
  • It's always variable.
  • Avoid always variable representation in the example.

Using Prior Information

  • If you have more information about the population, you do not have to have any additional effort in your sampling regime. You just have to ensure that, this technique is used to have a better representation of your population.
  • Knowing this prior information, we can use it to our advantage to perform our sampling.

Stratified Random Sampling

  • Knowing this prior information, we can use it to our advantage to perform our sampling.

  • That technique is called certified random sampling.

  • Divide the population into subgroups, which we call substrata, and then we sample from each stratum using simple random sampling.

  • This involves an extra step than random picking data points using a random number generator.

  • It will be difficult at first because you want to combine the estimates to get a more accurate estimate.

  • You're stratifying and you're getting individual stats from each sample, each stratum, but how do you then consolidate into one value that still represents all.

  • Algorithm:

    • Divide
    • Take random samples within
    • Combine

Rules for Strata

  • When you start to think about stratified random sampling, you have to understand that there are rules, which the first is extremely important, mutually exclusive, and collectively exhaustive.
  • Every sample belongs to exactly one stratum.
  • Boundaries must be clear.
  • Within a stratum, they have the same inclusion probability, hopefully.
  • Each stratum must be sampled.

Good vs. Poor Stratification Choices

  • Good strata is where you can truly separate groups.
    • Undergrad versus master's versus PhD.
    • Judicious forest versus coniferous versus mixed
    • Income levels, low, medium, high
  • Poor strata choices is where we go, oh, spot fans versus music lovers versus foodies.
  • The number of samples is relative to the size of the stratum to reppresent but not overrepresent

Advantages of Stratified Random Sampling

  • Addresses problems of simple random sampling in small sample sizes.
  • Simple random sampling giving enough samples will solve all of these problems anyway.
  • Advantageous for small sample sizes.

Simple Random Sampling

  • Not obsolete; still very reliable.
  • You're just gonna have a wider confidence interval margin.
  • People can make decisions based on any confidence interval.

Calculating Confidence Interval

  • Once you have your stratified sample, you need to go back to the same three things that you need to calculate the confidence interval band to describe this sample.
  • You still have to calculate the mean, but now it's a pooled mean.
  • You still have to calculate standard error, but now it's a pooled standard error.
  • Calculate your confidence interval to make your decisions.
  • Based on stratum weight.

Equations

  • Pool mean, a pool standard error needs to be calculated using the weighting approach in order to make statistical inferences about the true population parameters.
  • w = weight

Understanding Pooling by Weight

  • Goes back to inclusion probabilities.
  • For example, the land that is 80% of the population should have a weight of 0.8 compared to the land that is 20% should be a weight of 0.2.
  • The biggest stratum is more representative than the smallest stratum.
  • If you want to calculate the weight, it's simply 0.8 multiplied by the mean of that big population plus 0.2 multiplied by the mean of that small population. That's your pooled mean.
  • The weight is based on a population and not based on the samples that you take because samples will vary, but the weight is fixed.

Weight

  • Based on initial strata rules that you specify

Pooled Mean

  • The equation should be relatively simple because you're just multiplying by the weight there.
  • Equation: 0.69% can be used as a basis of how you're going to compute.
  • Weighted mean is pooled mean.

Pooled Standard Error

  • Relates back to your variance measure.
  • Instead of single variance term, we use the sum of weighted variances from each stratum.
  • You don't compute individual standard errors per stratum; instead, you start with s^2, which is your variance.
  • Multiply by the weight squared.
  • The reason for that is because fundamentally weight variance squared has a squared unit to it, and therefore you square the weight so that eventually when you square root it for standard deviation and standard error, the unit goes back to usual. So you square it knowing that you've gotta square root it.
  • You also have to standardize the standard error and the variance by the number of samples because that's a standard error thing.
  • Degrees of freedom, this is gonna be different for stratified random sample.
  • Again, instead of just minus one, we now have we just have to understand that we lose one degree of freedom for each stratum.
  • The mean of each stratum needs to be fixed by the time you hit some of the numbers there, and so you lose one per per mean value per strata.
  • Degrees of freedom has gone down or will go down.

Calculate Confidence Interval

  • After weighting, 95% confidence interval is the same as before. Pulled mean, plus-minus, pull standard error multiplied by multiplied by the t grid additive part.
  • The fundamental idea is still the same, Central value plus minus margin of error.
    For each stratum, you calculate the variance divided by the number of samples. To get the weighted variance, you multiply it by weight square per variance.
  • You are standardizing to the stratum weight.
  • Then you're square rooting the whole thing to convert from variance to standard error.
  • The longer way of calculating what we call the variance of the mean and the standard area of the mean.
  • To perform the confidence interval, it's really just mean plus minus t grid multiple by SE.

Comparison of Random Sampling and Stratified Random Sampling

  • If you perform a simple random sampling and if you perform a hypothetical stratified random sampling, we actually grouped it before you perform.
  • When a stratified random sample is computed,
    • You could, in theory, achieve a much, much lower variance of the mean.
    • Your confidence interval band, you look at it here, is smaller.
      • More confident that the value lies around the range.

Efficiency

  • The sampling efficiency is written as: Var{SRS}/Var{StRS}, where SRS refers to simple random sampling, and StRS refers to stratified random sampling.
  • If the number is bigger than 1, this means that the stratified random sample is more efficient.
  • Sampling efficiency is ratio variances.

Stratification Tips

  • Your stratification can be spatial, temporal, or the way the land is managed.
    Strata size determines sample size and try to follow sampling you're gonna try to sample, close to the strata size.

Monitoring Studies

  • In monitoring studies, you have prior information of that site, and therefore, it is possible to use the information to make a better estimate of the site rather than assuming that you're coming to a new site each time.
  • It has an added statistical advantage that you have prior knowledge.
  • Coming back means there's a change in mean value before and after because things do change over time.

Measuring Soil Carbon Content

  • How do we select sites for that second measurement?
  • Do you return to the same sites or do you select new sites? The choice affects this thing called covariance in our measurements.

Confidence Interval

  • To measure confidence interval of a monitored site, and to represent it in one single number, the best way to incorporate both is to have a common measurement, and a common measurement is usually difference.
  • The one value that links two sides on a before and after experiment is the difference between the two sides.
  • If you return to the same sites, you're assuming that there's covariance between the before and after, and therefore you do include it as an advantage to your calculations.
  • If you are doing a monitoring study where you're measuring one site and your second study is a different site by the same population, then this whole thing disappears.
  • Variance can be found by adding variances together

Covariance

  • If paired, then you're thinking of covariance

Sampling Design

  • In general, returning to the same site works better, but you lose information of the overall general estimate.

Statistical Analysis

  • 95% coverage is double for the change in mean.
  • Use the t test function with the path equals the true option to just do the whole thing there to get a confidence interval.
  • Adding covariance means you tell the t test function that you're doing a paired t test.
  • The calculations are a bit more complex to the confidence interval when using covariance, and you are going to perform a t-test using the paired t-test function.