Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Chapter 24: The Innate and Adaptive Immune Systems

Studied by 22 people

Chapter 4 - An Overview of Prices

Studied by 15 people

CELLULAR RESPIRATION

Studied by 195 people

Physical Science - Chapter 17

Studied by 15 people

Chapter 8: The Appendicular Skeleton

Studied by 56 people

Chapter 29: Waste Water Treatment

Studied by 8 people

RMB

LECTURE 2: DISTRIBUTION, SAMPLES & POPULATIONS

Outline & Objectives

Topics covered: Introduction, Data & Experiments, Distributions, Samples & Populations, Testing Hypotheses (One-Sample t-test, Independent Sample t-test), Statistical Inference, Consolidation week, Non-Parametric alternative tests, Comparing multiple means, Qualitative Methods, Advanced Thematic Analysis, Revision & Open Science
Lecture 2 focuses on Distributions, Samples, and Populations.
A dataset is a single sample of a broader population.
Assuming a normal distribution simplifies many statistical calculations.
Resampling is used to estimate variability.
Key questions:
- How precisely have we estimated the population mean from our sample?
- This leads to uncertainty in the parameters we estimate.
- Not all data is normally distributed.
- Data samples can be systematically biased due to participant recruitment and data collection methods.

Populations

A 'population' is the entire group we want to test (e.g., all primary school children in the UK, all living adults, everyone with depressive symptoms).
We usually cannot test everyone, so we take a 'sample' from the population.
Populations are heterogeneous, leading to additional variability that is hard to control.

Samples & Populations

A histogram represents the sample distribution of our value of interest.
There is an underlying population distribution that we can't directly measure.
Each sample we measure is an approximation of the underlying population.
Larger samples tend to give more accurate approximations.

Summary 1

Histograms represent the distribution of data samples.
We can use the distribution of data samples to infer things about the population (assuming the sample is large enough and doesn't have too much bias).

Sampling

Potential bias in sampling:
- Random chance: Can be addressed by taking another sample or recruiting a larger sample.
- Systematic bias: Occurs when certain people are more or less likely to respond (e.g., to a recruitment email); a larger sample won't fix this.
- Unrecognized bias: Bias in an aspect we aren’t even aware of.

Sampling Methods

Random: Participants selected at random from a list.
Systematic: Structured approach, e.g., every 5th participant selected from list.
Opportunity/Convenience: Recruitment from people closest and/or most accessible, e.g., participants who attended that day's lecture.
Stratified: Recruitment aims to match key characteristics of the target population, e.g., participant group that matches known age, sex, and political characteristics.
Cluster: Whole groups are recruited at once, can be combined with other methods (e.g., whole university football team recruited).

Ethics & Sampling

Biased or poorly considered sampling can lead to ethical concerns.
It can reinforce inequalities and exclude certain groups.
Psychologists must:
- Avoid any unfair, prejudiced, or discriminatory practice in participant selection or research content.
- Accept that individuals may choose not to participate or may withdraw their data.
- Be alert to the possible consequences of unexpected as well as predicted outcomes of work, and the often public nature of the interpretation of research findings.
Example 2 - WEIRD populations
- Behavioral scientists often make broad claims based on samples from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies.
- Researchers often assume little variation across human populations, or that “standard subjects” are representative of the species, which may not be justified.

Sampling Methods - Summary

Samples can approximate target populations.
We can't test everyone in a target population and often have to use a 'sample.'
We can use statistics to estimate how variable our statistics are likely to be across repeated samples from a population.
Samples contain randomness, and different samples may give different results.
Careful planning is essential:
- Who will we recruit?
- How will they be recruited?
- How will we maintain our obligation to conduct ethical research?
Statistics cannot save us from a strongly biased dataset.

The ‘Normal’ Distribution

A special distribution with convenient properties.
Summarized by two parameters: the mean and the standard deviation.
68.2\% of observations lie within 1 standard deviation.
95.4\% lie within 2 standard deviations.
99.6\% lie within 3 standard deviations.

The ‘Normal’ Distribution

Many tests assume data is 'normally' distributed, simplifying calculations; these are 'parametric' tests.
Alternatives exist for non-normally distributed data, but we typically start with parametric tests.
Assessing data distribution shape and meeting parametric assumptions is a good first step in any analysis.

Data Skills & Coding

Example: Personality data from 500 participants.
Testing for a normal distribution.
The Shapiro-Wilk test objectively tests whether data is normally distributed.
- Shapiro-Wilk W: A metric indicating how ‘normal’ the data is; higher values indicate more normal data.
- Shapiro-Wilk p: A probability indicating how significant any difference from normality is.

Data types revisited!

Are Likert Scales Ordinal or Interval?
Ambiguous and tricky to decide.
Critical factor: does the data have an interpretable mean and standard deviation?
Ordinal data: Does not have an interpretable mean as its value depends on how we have coded the values
Interval data: Inherently numeric and does have a meaningful mean and standard deviation

Data types revisited!

This can be difficult as researchers make a lot of decisions when presenting questions. Some of these examples might be more naturally ordinal and some more interval. This is a massive, ongoing debate!
Did the participants see words or numbers? were the numbers presented continuously?
Individual questions are often ordinal
Scores aggregated across several single items are often interval (this is the computer practical data)

Data types revisited!

In my opinion - it comes down to distributions. Do the data have a distribution that you think are fairly summarized with a mean and standard deviation?
If yes, then we can proceed with interval - if no, we should proceed with ordinal
This matters as it relates to whether we can use parametric statistics - or do we need a non-parametric alternative

Sampling Variability

Each sample only gives an estimate of the ‘true’ mean
The mean is the sum of all the individual data points divided by the total number of data points

Sampling Variability

The mean is the sum of all the individual data points divided by the total number of data points
The standard deviation is the square root of the sum of the squared difference between the sample mean and each individual data point divided by the total number of data points minus one

Bessel’s Correction

Estimates of the population mean from a sample might be wrong, but they are equally likely to be too big or too little.
Estimates of the population standard deviation are biased. They are nearly always too small!
Bessel’s correction makes the estimated standard deviation a bit bigger to account for this bias.
Jamovi will always give you the corrected estimate.

Sampling — Variability

Notation:
- x: Sample mean (Yes, calculated from the raw data).
- \mu: True population mean (Almost never known for sure).
- \hat{\mu}: Estimate of the population mean (Yes, identical to the sample mean in most cases).
The best we can do is estimate how close our sample mean might be to the underlying population mean

Standard Error of the Mean

The standard error of the mean is the likely variability in our estimate of the population mean from a given data sample
SEM = \frac{s}{\sqrt{N}}, is the standard deviation/\sqrt{N} of the sample divided by the square root of the total number of data points
A shortcut to the variability in the sample distribution when the data are normally distributed
It decreases as the sample size grows larger - larger samples are more reliable!

5% Confidence Intervals

95\% CI = 1.96 * SEM
- Upper = \bar{x} + CI
- Lower = \bar{x} - CI
Confidence intervals are an intuitive way to communicate how reliable our estimate of the mean is likely to be.
They provide two values which define a range that has a 95% chance of containing the true mean.

Summary

Metric	Symbol	Meaning
Population Mean	\mu	The true, unobservable mean of our population
Sample Mean	\bar{x}	A sample estimate of the mean
Standard Deviation	\sigma	A sample estimate of the variability in the data points
Standard Error	SEM	The precision to which our sample mean has been estimated
Confidence Intervals	CI	A range of values around the sample mean that has a 95% chance of containing the population mean

Data Skills & Coding

Example with Extraversion Data
Extraversion is a measure of how sociable and outgoing someone is. Let's take a look at some example data.
You will go through a similar process with another datasets in the computer practical sessions.
We can visualise the sample distribution using a histogram (or perhaps a box and whisker chart)

Descriptives

Descriptive statistics give us our sample statistics.
We have a total of 500 observations and a mean value of 3.49 (this matches what we see on the histogram) and the standard deviation is 0.355
We can compute both the standard error and the confidence intervals in the Jamovi Descriptives tab.

Descriptives

The standard error of the mean is very small - only 0.0159. This is likely as we have a large data sample.
As a result of this, the confidence intervals specify a very small range. The true population mean is likely to be between 3.46 and 3.52. This is a small proportion of the overall variability
Simulated data with N = 10
The amount of data affects our precision
6 simulations show false positive
Simulated data with N = 25
The amount of data affects our precision
3 simulations show false positive
Examples from really big data
With massive datasets we can be extremely precise
Fig. 1: Gender rating gap across platforms.

Summary

The normal distribution: Assuming a normal distribution simplifies many calculations in statistics. Not all data is normally distributed though…
Samples & Populations: A dataset is a single sample of a broader population - we can very rarely sample the whole Population. Data samples can be systematically biased due to participant recruitment and data collection methods.
Standard Error of the Mean: How precisely have we estimated the population mean from our sample? This leads to uncertainty in the parameters we estimate
LECTURE 3: TESTING HYPOTHESIS: ONE-SAMPLE T-TEST
- Next 3 weeks
Differences:
- Compare one sample to reference
- Compare two samples
Assumptions
Independent Samples
Dependent Samples
- Shapiro-Wilk test
- Consider non-parametric alternative
- Checking normality of the paired-difference, not the data
Normality
- One sample test -> Wilcoxon Rank
- One-Sample t-test
- Assumption of normality is violated
- Wilcoxon Rank Test
- Ind sample test -> Mann-Whitney U
Equal variance
- Consider Welch's test if groups do not have comparable variance
- Levene’s test
- Student's t-test
- Welch's t-test

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Chapter 24: The Innate and Adaptive Immune Systems

Studied by 22 people

Chapter 4 - An Overview of Prices

Studied by 15 people

CELLULAR RESPIRATION

Studied by 195 people

Physical Science - Chapter 17

Studied by 15 people

Chapter 8: The Appendicular Skeleton

Studied by 56 people

Chapter 29: Waste Water Treatment

Studied by 8 people