Basics of Quantitative Data Analysis

The Importance of Data

  • Statistics and data are crucial for evidence-based decision-making.
  • Statistics originates from the Latin "statistical," referring to state affairs.
  • Data comes from the Latin "datum," meaning given.
  • Data and statistics provide a foundation for truth and informed arguments.
  • Student unions are using evidence-based approaches for campaigns and policy.
  • The National Student Survey (NSS) aids students in making choices and helps institutions understand their students.
  • Student unions use NSS data to drive educational change.

Quantitative Data Analysis: Key Terms

  • Population: The entire group of interest (e.g., all final year undergraduate students).
  • For NSS, population includes all final year undergraduates at a specific university/college or in the UK.
  • Sample: A subset of the population from which data is collected (e.g., students who completed the NSS).
  • A sample is a group of people whose data is being examined.
  • Representativeness:
    • Essential for accurate quantitative data analysis.
    • The sample must reflect the demographic characteristics of the broader population.
    • Ensuring representativeness involves:
      • High response rates.
      • Inclusion of students from diverse courses and backgrounds.
  • Descriptive Statistics:
    • Describes findings from the sample.
    • Does not infer anything about the broader population.
    • Example: "85% of survey respondents agreed…"
  • Inferential Statistics:
    • Infers characteristics about the larger population based on the sample.
    • Relies on sound methodology, analysis, and interpretation.
    • Statistical and significance tests are used to validate claims and assess the sample's reflection of larger population trends.

Univariate vs. Bivariate Analysis

  • Univariate Description:
    • An individual fact about a specific group.
    • Example: "190 MPs are women."
    • Full name: univariate descriptive sample statistics.
      • Univariate: one variable.
      • Descriptive: doesn't infer about the population.
      • Sample: based on an observed group
    • Simply states a fact without context or interpretation.
  • Bivariate Analysis:
    • Examines the relationship between two variables.
    • Explores whether variation in one variable coincides with variation in another.
    • Example: Student Union satisfaction scores in London compared to the rest of the UK.
      • London students' union scores are, on average, four points lower than the UK average.
      • Suggests location might relate to lower satisfaction, but the nature of the relationship is unknown.
  • The trick for a good data analyst is to make their findings insightful and impactful.
  • Presenting data clearly makes descriptive statistics a powerful argument against injustice.

Correlation

  • Correlation measures the association between two variables.
  • It indicates the direction (positive or negative) and strength of the association.
  • Correlation coefficient (r) is used to measure correlation.
    • r ranges from -1 to 1.
      • 0: No association.
      • -1: Perfect negative correlation.
      • 1: Perfect positive correlation.
  • Scatter plots visually represent correlation by plotting one variable on each axis.
  • A line of best fit can illustrate the strength and direction of a correlation.
    • The steepness of the line show the strength of the correlation.
    • The direction indicates whether the correlation is positive or negative.

Important Considerations Regarding Correlation:

  • Correlation does not equal causation.
  • Just because two variables are correlated does not mean that one causes the other.
  • Third Variable Problem:
    • A third, unmeasured variable might be influencing both variables.
    • Example: Correlation between number of cats and mice in a street may be impacted by a third varibale: new location of cheese shop nearby.
  • Direction of Causality Problem:
    • Correlation does not indicate which variable causes the change in the other.
    • Example: It's assumed cats are eating mice. But what if mutated mice compete with cats which deters them from being in the area?
  • Human input and knowledge of the world are crucial when interpreting correlations.

Proving Causality

  • To prove that one thing causes another, you need to establish:
    • Statistical association or correlation.
    • Temporal precedence: the cause must precede the effect.
    • Ruling out alternative explanations: if you can't reject alternative explanations, you can't demonstrate causality.
  • Correlation doesn't imply cause and effect.

Significance Testing

  • Significance testing determines the accuracy of inferring sample results to the wider population.
  • Example: If 75% of 2,000 students surveyed indicate a 64% satisfaction rate, how confident can we be that this figure represents all 2,000 students?
  • It assesses the confidence level in generalizing findings to the population.
  • A significance level is used, commonly 95% in social research.
    • 95% confidence level means a 5% risk of being wrong.
  • The goal of significance testing is to give more credibility to your argument.

Hypotheses

  • Experimental Hypothesis (H1): The initial hypothesis or hunch (e.g., a student union score is lower than the average for London).
  • Null Hypothesis: States that there is no effect or relationship between the variables.
    • Statistical tests aim to disprove the null hypothesis.
    • Example: The null hypothesis would be that a student union does not score lower than the average for London.
  • Analysis determines the likelihood of observing the collected data if the null hypothesis were true.
  • A confidence level of 95% is typically required to reject the null hypothesis.

Steps for Statistically Valid Approach

  1. Start with a null hypothesis that there is no relationship between two variables.
  2. Establish an acceptable significance level (commonly 95%).
  3. Test the significance of your findings using online tools.
  4. If you're 95% sure your hypothesis is accurate, reject the null hypothesis.
  5. Significance of Variables can be tested using the Z-test
    • Input sample size and percentage into a Z-test to see significance.

Practical Applications of Significance and Correlation with NSS Data

  • Validate the importance of identified differences.
    • Determine if a drop in department scores is statistically significant.
    • Assess whether the difference between departments like Dentistry and Medicine is significant.
  • Support arguments by demonstrating significant associations between variables.
    • Suggests that changing one variable can impact the other, especially when relating questions to overall satisfaction.