Basics of Quantitative Data Analysis

Statistics and data are crucial for evidence-based decision-making.
Statistics originates from the Latin "statistical," referring to state affairs.
Data comes from the Latin "datum," meaning given.
Data and statistics provide a foundation for truth and informed arguments.
Student unions are using evidence-based approaches for campaigns and policy.
The National Student Survey (NSS) aids students in making choices and helps institutions understand their students.
Student unions use NSS data to drive educational change.

Population: The entire group of interest (e.g., all final year undergraduate students).
For NSS, population includes all final year undergraduates at a specific university/college or in the UK.
Sample: A subset of the population from which data is collected (e.g., students who completed the NSS).
A sample is a group of people whose data is being examined.
Representativeness:
- Essential for accurate quantitative data analysis.
- The sample must reflect the demographic characteristics of the broader population.
- Ensuring representativeness involves:
  - High response rates.
  - Inclusion of students from diverse courses and backgrounds.
Descriptive Statistics:
- Describes findings from the sample.
- Does not infer anything about the broader population.
- Example: "85% of survey respondents agreed…"
Inferential Statistics:
- Infers characteristics about the larger population based on the sample.
- Relies on sound methodology, analysis, and interpretation.
- Statistical and significance tests are used to validate claims and assess the sample's reflection of larger population trends.

Univariate Description:
- An individual fact about a specific group.
- Example: "190 MPs are women."
- Full name: univariate descriptive sample statistics.
  - Univariate: one variable.
  - Descriptive: doesn't infer about the population.
  - Sample: based on an observed group
- Simply states a fact without context or interpretation.
Bivariate Analysis:
- Examines the relationship between two variables.
- Explores whether variation in one variable coincides with variation in another.
- Example: Student Union satisfaction scores in London compared to the rest of the UK.
  - London students' union scores are, on average, four points lower than the UK average.
  - Suggests location might relate to lower satisfaction, but the nature of the relationship is unknown.
The trick for a good data analyst is to make their findings insightful and impactful.
Presenting data clearly makes descriptive statistics a powerful argument against injustice.

Correlation measures the association between two variables.
It indicates the direction (positive or negative) and strength of the association.
Correlation coefficient (r) is used to measure correlation.
- r ranges from -1 to 1.
  - 0: No association.
  - -1: Perfect negative correlation.
  - 1: Perfect positive correlation.
Scatter plots visually represent correlation by plotting one variable on each axis.
A line of best fit can illustrate the strength and direction of a correlation.
- The steepness of the line show the strength of the correlation.
- The direction indicates whether the correlation is positive or negative.

Correlation does not equal causation.
Just because two variables are correlated does not mean that one causes the other.
Third Variable Problem:
- A third, unmeasured variable might be influencing both variables.
- Example: Correlation between number of cats and mice in a street may be impacted by a third varibale: new location of cheese shop nearby.
Direction of Causality Problem:
- Correlation does not indicate which variable causes the change in the other.
- Example: It's assumed cats are eating mice. But what if mutated mice compete with cats which deters them from being in the area?
Human input and knowledge of the world are crucial when interpreting correlations.

To prove that one thing causes another, you need to establish:
- Statistical association or correlation.
- Temporal precedence: the cause must precede the effect.
- Ruling out alternative explanations: if you can't reject alternative explanations, you can't demonstrate causality.
Correlation doesn't imply cause and effect.

Significance testing determines the accuracy of inferring sample results to the wider population.
Example: If 75% of 2,000 students surveyed indicate a 64% satisfaction rate, how confident can we be that this figure represents all 2,000 students?
It assesses the confidence level in generalizing findings to the population.
A significance level is used, commonly 95% in social research.
- 95% confidence level means a 5% risk of being wrong.
The goal of significance testing is to give more credibility to your argument.

Experimental Hypothesis (H1): The initial hypothesis or hunch (e.g., a student union score is lower than the average for London).
Null Hypothesis: States that there is no effect or relationship between the variables.
- Statistical tests aim to disprove the null hypothesis.
- Example: The null hypothesis would be that a student union does not score lower than the average for London.
Analysis determines the likelihood of observing the collected data if the null hypothesis were true.
A confidence level of 95% is typically required to reject the null hypothesis.

Start with a null hypothesis that there is no relationship between two variables.
Establish an acceptable significance level (commonly 95%).
Test the significance of your findings using online tools.
If you're 95% sure your hypothesis is accurate, reject the null hypothesis.
Significance of Variables can be tested using the Z-test
- Input sample size and percentage into a Z-test to see significance.

Validate the importance of identified differences.
- Determine if a drop in department scores is statistically significant.
- Assess whether the difference between departments like Dentistry and Medicine is significant.
Support arguments by demonstrating significant associations between variables.
- Suggests that changing one variable can impact the other, especially when relating questions to overall satisfaction.