Basics of Quantitative Data Analysis
The Importance of Data
- Statistics and data are crucial for evidence-based decision-making.
- Statistics originates from the Latin "statistical," referring to state affairs.
- Data comes from the Latin "datum," meaning given.
- Data and statistics provide a foundation for truth and informed arguments.
- Student unions are using evidence-based approaches for campaigns and policy.
- The National Student Survey (NSS) aids students in making choices and helps institutions understand their students.
- Student unions use NSS data to drive educational change.
Quantitative Data Analysis: Key Terms
- Population: The entire group of interest (e.g., all final year undergraduate students).
- For NSS, population includes all final year undergraduates at a specific university/college or in the UK.
- Sample: A subset of the population from which data is collected (e.g., students who completed the NSS).
- A sample is a group of people whose data is being examined.
- Representativeness:
- Essential for accurate quantitative data analysis.
- The sample must reflect the demographic characteristics of the broader population.
- Ensuring representativeness involves:
- High response rates.
- Inclusion of students from diverse courses and backgrounds.
- Descriptive Statistics:
- Describes findings from the sample.
- Does not infer anything about the broader population.
- Example: "85% of survey respondents agreed…"
- Inferential Statistics:
- Infers characteristics about the larger population based on the sample.
- Relies on sound methodology, analysis, and interpretation.
- Statistical and significance tests are used to validate claims and assess the sample's reflection of larger population trends.
Univariate vs. Bivariate Analysis
- Univariate Description:
- An individual fact about a specific group.
- Example: "190 MPs are women."
- Full name: univariate descriptive sample statistics.
- Univariate: one variable.
- Descriptive: doesn't infer about the population.
- Sample: based on an observed group
- Simply states a fact without context or interpretation.
- Bivariate Analysis:
- Examines the relationship between two variables.
- Explores whether variation in one variable coincides with variation in another.
- Example: Student Union satisfaction scores in London compared to the rest of the UK.
- London students' union scores are, on average, four points lower than the UK average.
- Suggests location might relate to lower satisfaction, but the nature of the relationship is unknown.
- The trick for a good data analyst is to make their findings insightful and impactful.
- Presenting data clearly makes descriptive statistics a powerful argument against injustice.
Correlation
- Correlation measures the association between two variables.
- It indicates the direction (positive or negative) and strength of the association.
- Correlation coefficient (r) is used to measure correlation.
- r ranges from -1 to 1.
- 0: No association.
- -1: Perfect negative correlation.
- 1: Perfect positive correlation.
- Scatter plots visually represent correlation by plotting one variable on each axis.
- A line of best fit can illustrate the strength and direction of a correlation.
- The steepness of the line show the strength of the correlation.
- The direction indicates whether the correlation is positive or negative.
Important Considerations Regarding Correlation:
- Correlation does not equal causation.
- Just because two variables are correlated does not mean that one causes the other.
- Third Variable Problem:
- A third, unmeasured variable might be influencing both variables.
- Example: Correlation between number of cats and mice in a street may be impacted by a third varibale: new location of cheese shop nearby.
- Direction of Causality Problem:
- Correlation does not indicate which variable causes the change in the other.
- Example: It's assumed cats are eating mice. But what if mutated mice compete with cats which deters them from being in the area?
- Human input and knowledge of the world are crucial when interpreting correlations.
Proving Causality
- To prove that one thing causes another, you need to establish:
- Statistical association or correlation.
- Temporal precedence: the cause must precede the effect.
- Ruling out alternative explanations: if you can't reject alternative explanations, you can't demonstrate causality.
- Correlation doesn't imply cause and effect.
Significance Testing
- Significance testing determines the accuracy of inferring sample results to the wider population.
- Example: If 75% of 2,000 students surveyed indicate a 64% satisfaction rate, how confident can we be that this figure represents all 2,000 students?
- It assesses the confidence level in generalizing findings to the population.
- A significance level is used, commonly 95% in social research.
- 95% confidence level means a 5% risk of being wrong.
- The goal of significance testing is to give more credibility to your argument.
Hypotheses
- Experimental Hypothesis (H1): The initial hypothesis or hunch (e.g., a student union score is lower than the average for London).
- Null Hypothesis: States that there is no effect or relationship between the variables.
- Statistical tests aim to disprove the null hypothesis.
- Example: The null hypothesis would be that a student union does not score lower than the average for London.
- Analysis determines the likelihood of observing the collected data if the null hypothesis were true.
- A confidence level of 95% is typically required to reject the null hypothesis.
Steps for Statistically Valid Approach
- Start with a null hypothesis that there is no relationship between two variables.
- Establish an acceptable significance level (commonly 95%).
- Test the significance of your findings using online tools.
- If you're 95% sure your hypothesis is accurate, reject the null hypothesis.
- Significance of Variables can be tested using the Z-test
- Input sample size and percentage into a Z-test to see significance.
Practical Applications of Significance and Correlation with NSS Data
- Validate the importance of identified differences.
- Determine if a drop in department scores is statistically significant.
- Assess whether the difference between departments like Dentistry and Medicine is significant.
- Support arguments by demonstrating significant associations between variables.
- Suggests that changing one variable can impact the other, especially when relating questions to overall satisfaction.