GK

Data Interpretation Notes

Data Interpretation

Types of Data

  • Quantitative Data:

    • Numerical data that is factual and not open to interpretation.
    • Examples: Population of a city, median income of a town (from sources like the Census).
  • Qualitative Data:

    • Data found in word form, often from surveys and interviews, and is open to interpretation.
    • Describes qualities or characteristics.
    • Examples: Ratings of school lunch, opinions on the president's job performance.

Descriptive vs. Inferential Statistics

  • Descriptive Statistics:

    • Researchers organize and describe data.
    • Focuses on describing the data collected.
  • Inferential Statistics:

    • Researchers make predictions about data and independent variables.
    • Helps determine if data from a sample can be applied to a population.
    • Techniques are used to make generalizations about a population based on sample data.
    • Aids in testing hypotheses and provides insight into study results.
    • Helps determine if there was bias or if the results are statistically significant.

Hypothesis Testing

  • Hypothesis:

    • A specific prediction about the relationship between variables.
  • Null Hypothesis:

    • A claim that there is no effect or difference between variables.
    • Serves as the baseline for testing.
  • Alternative Hypothesis:

    • A claim that there is an effect or difference between the variables.
    • Often what the researcher is trying to prove.

P-Value

  • Ranges from 0 to 1.
  • Indicates the statistical significance of a study's results.
  • Used to decide whether to accept or reject the null hypothesis.
    • If P-value \le 0.05: Results are statistically significant, meaning they were likely not caused by chance.
      • Reject the null hypothesis and accept the alternative hypothesis.
      • Smaller P-value = stronger evidence against the null hypothesis.
    • If P-value is large (e.g., 0.90): Results were likely due to chance or luck.
      • Reject the alternative hypothesis and accept the null hypothesis.

Effect Size

  • Tells the strength of the relationship between variables.
  • Indicates how meaningful the effect is in real-world terms.
    • Large effect size: Substantial difference between groups.
    • Small effect size: Minor difference between groups.
  • Example: A therapy reduces anxiety with a P-value of 0.05 (statistically significant) but a small effect size (minimal improvement in practical terms).
  • Effect size indicates how much the results matter in real life, while statistical significance indicates if the results matter.

Displaying Data (Descriptive Statistics)

  • Frequency Distribution Table:

    • Shows how often sets of data occur.
    • Example: A table displaying quiz scores, showing how many students achieved each score.
  • Frequency Polygon:

    • Visual representation of a frequency distribution table.
    • Highlights connections between points on a scatter plot.
  • Histogram:

    • Bar graph showing frequencies through vertical columns.
    • No spaces between the bars (unlike typical bar graphs).
  • Pie Chart:

    • Data is divided into sections of a circle.
    • Each section represents a proportion of the whole.

Measures of Central Tendency

  • Mean:

    • The average of the data set.
    • Calculated by summing all values and dividing by the number of values.
    • Formula: Mean = \frac{\sum{i=1}^{n} xi}{n}, where x_i represents each value and n is the number of values.
  • Regression Toward the Mean:

    • Outliers (very high or very low results) are followed by results closer to the average.
    • Example: A basketball player who usually scores 15 points scores 30 points in one game (outlier). Over the next few games, their score returns closer to 15 points.
    • The more extreme the outlier, the more regression is likely to occur.
  • Mode:

    • The value that occurs most often in a data set.
  • Median:

    • The middle value in a data set when the data is organized from smallest to largest.
    • Odd number of values: The middle value is the median.
    • Even number of values: Average the two middle values to find the median.

Measures of Variability

  • Range:

    • The difference between the highest and lowest values in a data set.
    • Example: If the highest value is 210 and the lowest is 95, the range is 210 - 95 = 115.
  • Standard Deviation:

    • Indicates the average distance from the mean for a data set.
    • Formula: \sigma = \sqrt{\frac{\sum{i=1}^{N}(xi - \mu)^2}{N}}, where \sigma is the standard deviation, x_i are the values, \mu is the mean, and N is the number of values.

Distributions

  • Normal Distribution:

    • Symmetrical bell-shaped curve.
    • Mean, median, and mode are at the center of the distribution.
    • One mode.
  • Skewed Distributions:

    • Positive Skew: Scores are low and clustered to the left of the mean.
    • Negative Skew: Highest scores are clustered on the right of the mean.
  • Bimodal Distribution:

    • Distribution with two modes.
    • Has two peaks.

Standard Deviation and Z-Scores/Percentiles in Normal Distribution

  • In a normal distribution:

    • 68% of scores fall within one standard deviation of the mean in each direction.
    • 95% of scores fall within two standard deviations of the mean in each direction.
    • 99% of scores fall within three standard deviations of the mean in each direction.
  • Z-Score:

    • Numerical measurement describing how many standard deviations a score is from the mean.
    • Positive Z-score: Higher than the mean.
    • Negative Z-score: Lower than the mean.
    • Allows comparisons of different normally distributed data.
    • Formula: Z = \frac{x - \mu}{\sigma}, where x is the value, \mu is the mean, and \sigma is the standard deviation.
  • Percentile Rank:

    • Percentage of scores at or below a particular score.
    • Median is the 50th percentile.
    • Example: Being in the 73rd percentile for height means 73% of people are shorter or the same height.

Coefficients and Correlational Studies

  • Correlational Studies:

    • Determine the relationship between two variables.
    • Correlation does not equal causation.
  • Correlation Coefficient:

    • Value between 0 and 1: Positive correlation (as one variable increases, the other increases).
      • Plotted on a scatter plot, it shows an upward trend.
    • Value between 0 and -1: Negative correlation (as one variable increases, the other decreases).
      • Plotted on a scatter plot, it shows a downward trend.
    • No correlation: No relationship between variables.
      • Data points are scattered randomly on a scatter plot.