Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

View the linked PDF

Lecture 2: Samples and Populations

Samples & Populations

Outline & Objectives

A dataset is a single sample of a broader population.
- It's rare to sample the entire population.
- This leads to uncertainty in estimated parameters.
Data samples can be systematically biased due to participant recruitment and data collection methods.
Assuming a normal distribution simplifies statistical calculations.
- However, not all data is normally distributed.
Standard Error of the Mean
- Resampling to estimate variability.
- Precision in estimating the population mean from the sample.

Sampling & Populations

A histogram represents the sample distribution of the value of interest.
There's an underlying population distribution that can't be directly measured.
Each measured sample is an approximation of the underlying population.
Bigger samples tend to provide more accurate approximations.

Summary 1

Histograms represent the distribution of data samples.
The distribution of data samples approximates population distributions, assuming a large enough sample size and minimal bias.

Sampling

A ‘population’ is the total set of everyone within a group that we want to test.
- Examples include all primary school children in the UK, all living adults, and everyone with depressive symptoms.
It's usually impossible to test everyone in a population, so we take a ‘sample’.
Populations aren't homogeneous, leading to variability that we can't control.

Sampling

It's possible to get lucky and find a sub-sample that properly represents the population.
However, it's more likely to get a sample with some form of bias, where certain traits are over-represented compared to the population.

Sampling

Bias might occur randomly, in which case we can try another sample or recruit a larger sample to balance things out.
A bigger issue is systematic bias e.g., certain people are more or less likely to respond to a recruitment email, which remains true even when recruiting a larger sample.
Worst of all, there may be bias in some aspect that we aren’t even aware of…

Sampling Methods

Method	Description
Random	Recruitment done completely by chance; participants selected at random from a list.
Systematic	A structured approach to selecting participants; every nth participant is selected from a list.
Opportunity/Convenience	Recruitment from people closest and/or most accessible to the experimenter; e.g., participants who attended that day’s lecture.
Stratified	Recruitment aims to match key characteristics of the target population; e.g. participant group that matches known age, sex, and political characteristics.
Cluster	Whole groups are recruited at once, sometimes combined with other methods e.g. whole university football team recruited at once.
Quantitative Methods
Qualitative Methods

Ecological Validity

Do the variables and conclusions of a study sufficiently reflect the real-world context of its population?
If writing about a very specific population (e.g., elite athletes), recruiting a relatively homogeneous sample may be appropriate.
However, drawing conclusions about a very broad population, perhaps even all humans, then a narrow sample is unlikely to be appropriate.

Ethics in research Ethics & Sampling

Psychologists:
- Avoid unfair, prejudiced, or discriminatory practice, e.g., in participant selection or in the research content itself.
- Accept that individuals may choose not to be involved in research, or may withdraw their data.
- Are alert to the possible consequences of unexpected outcomes and acknowledge the problematic nature of interpreting research findings.
Biased or poorly considered sampling can lead to ethical concerns.
It can reinforce inequalities and marginalize certain groups.

Example 1 - Students

Can the body of knowledge on the psychology of prejudice garnered largely from student samples be applied to the general adult population?
For sure, much of it can.
The problem, however, is that at this point we cannot be sure which parts, or how much, can. These questions are empirical ones that we as a science should not lose sight of.

Example 2 – WEIRD populations

Behavioral scientists routinely publish broad claims about human psychology and behavior based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies.
Researchers implicitly assume either there is little variation across human populations or that these “standard subjects” are as representative of the species as any other population.
Are these assumptions justified?

Discussion point

What could go wrong if our sampling methods are biased?
How can we recruit more diverse & representative samples in our research?

Sampling Methods - Summary

Samples can approximate target populations.
- We can’t test everyone in a target population and often have to use a ‘sample’.
- Samples contain randomness, and different samples may give different results.
We must carefully plan our sample.
- Who will we recruit?
- How will they be recruited?
- How will we maintain our obligation to conduct ethical research?
Sampling is a major element in statistics
- We can use statistics to estimate how variable our statistics are likely to be across repeated samples from a population.
- But no statistics can save us from a strongly biased dataset

Revising Distributions

The ‘Normal’ Distribution

A special distribution with some convenient properties.
It can be summarised by 2 parameters – the mean and the standard deviation.
68.2% of observations lie within 1 standard deviation.
95.4% lies with 2 standard deviations.
99.6% lies within 3 standard deviations

The ‘Standard Normal’ Distribution

A special case of the normal distribution in which the mean is zero and the standard deviation is one.

The ‘Normal’ Distribution

Many tests make use of the assumption that our data is ‘normally’ distributed, which simplifies a lot of calculations.
These are called ‘parametric’ tests.
There are alternatives if data are not normally distributed, but we typically start with parametric tests.
Assessing the shape of our data distribution and whether it meets parametric assumptions is a good first step any analysis.

Testing for a normal distribution

Personality data from 500 participants

The Shapiro-Wilk

The Shapiro-Wilk provides an objective test for whether data is normally distributed.
Shapiro-Wilk W – is a metric indicating how ‘normal’ the data is, higher values indicate more normal data
Shapiro-Wilk p - a probability indicating how significant any difference from normality is.

Data types revisited! Are Likert Scales Ordinal or Interval?

This is a bit ambiguous and can be tricky to decide.
The critical factor is – does the data have a meaningful mean and standard deviation?
Ordinal data does not have an interpretable mean as its value will depend on how we have coded the values
Interval data is more inherently numeric and does have a meaningful mean and standard deviation

Data types revisited! Are Likert Scales Ordinal or Interval?

This can be difficult as researchers make a lot of decisions when presenting questions. Some of these examples might be more naturally ordinal and some more interval.
This is a massive, ongoing debate!
Did the participants see words or numbers? were the numbers presented continuously?
- Individual questions are often ordinal
- Scores aggregated across several single items are often interval (this is the computer practical data)

Data types revisited! Are Likert Scales Ordinal or Interval?

In my opinion – it comes down to distributions. Do the data have a distribution that you think are fairly summarized with a mean and standard deviation?
If yes, then we can proceed with interval – if no, we should proceed with ordinal.
This matters as it relates to whether we can use parametric statistics – or do we need a non-parametric alternative.

Standard Error of the Mean

It's all about precision!

Sampling Variability

The sample mean is the sum of all the individual data points divided by the total number of data points
𝑥ഥ = σ 𝑥ⱼ / 𝑁
Each sample only gives an estimate of the ‘true’ mean
The sample mean is the sum of all the individual data points divided by the total number of data points
𝑥ഥ = σ 𝑥ⱼ / 𝑁
The sample standard deviation is the square root of the sum of the squared difference between the sample mean and each individual data point divided by the total number of data points minus one
Sampling Variability
\sigmaഥ=\sqrt{\sigma(𝑥ⱼ-𝑥ഥ)^2/𝑁-1}

Bessel’s Correction

Why do we need this minus one..?
Estimates of the population mean from a sample might be wrong, but they are equally likely to be too big or too little.
Estimates of the population standard deviation are biased. They are nearly always too small!
Bessel’s correction makes the estimated standard deviation a bit bigger to account for this bias. Jamovi will always give you the corrected estimate.
Uncorrected
\sqrt{\sigmaഥ=\sigma(𝑥ⱼ-𝑥ഥ)^2/𝑁}
Corrected
\sqrt{\sigmaഥ=\sigma(𝑥ⱼ-𝑥ഥ)^2/𝑁-1}

Sampling Variability

Each sample only gives an estimate of the ‘true’ mean
We can’t directly measure the population mean in most cases
The best we can do is estimate how close our sample mean might be to the underlying population mean

Standard Error of the Mean

The standard error of the mean is the likely variability in our estimate of the population mean from a given data sample
A shortcut to the variability in the sample distribution when the data are normally distributed
It decreases as the sample size grows larger – larger samples are more reliable!
The standard error of the mean is the standard deviation of the sample divided by the square root of the total number of data points
𝑆𝐸𝑀 = {σ / \sqrt{N}}

95% Confidence Intervals

Confidence intervals are an intuitive way to communicate how reliable our estimate of the mean is likely to be.
They provide two values which define a range that has a 95% chance of containing the true mean.
95% CI = 1.96 ∗ 𝑆𝐸𝑀
U𝑝𝑝𝑒𝑟 = 𝑥ҧ+ 𝐶𝐼
L𝑜𝑤𝑒𝑟 = 𝑥ҧ − 𝐶𝐼

Summary

Metric	Symbol	Meaning
Population Mean	μ	The true, unobservable mean of our population
Sample Mean	𝑥ഥ	A sample estimate of the mean
Standard Deviation	𝝈ഥ	A sample estimate of the variability in the data points
Standard Error	SEM	The precision to which our sample mean has been estimated
Confidence Intervals	CI	A range of values around the sample mean that has a 95% chance of containing the population mean

Example with Extraversion Data

Extraversion is a measure of how sociable and outgoing someone is.
Let’s take a look at some example data.

Results

We can visualise the sample distribution using a histogram (or perhaps a box and whisker chart)

#

Descriptive statistics give us our sample statistics. We have a total of 500 observations and a mean value of 3.49 (this matches what we see on the histogram) and the standard deviation is 0.355

#

We can compute both the standard error and the confidence intervals in the Jamovi Descriptives tab.

#

The standard error of the mean is very small – only 0.0159. This is likely as we have a large data sample. As a result of this, the confidence intervals specify a very small range. The true population mean is likely to be between 3.46 and 3.52. This is a small proportion of the overall variability

Examples from data

The amount of data affects our precision
Simulated data with N = 10
Simulated data with N = 25

Examples from really big data

With massive datasets we can be extremely precise

Summary

The normal distribution
- Assuming a normal distribution simplifies many calculations in statistics
- Not all data is normally distributed though..
Samples & Populations
- A dataset is a single sample of a broader population – we can very rarely sample the whole population
- This leads to uncertainty in the parameters we estimate
- Data samples can be systematically biased due to participant recruitment and data collection methods.
Standard Error of the Mean
- How precisely have we estimated the population mean from our sample?
- The standard error of the mean is the likely variability in our estimate of the population mean from a given data sample

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

View the linked PDF