Statistics Exam Review

Exam 1 Review


Chapter 1 - Introduction to Statistics

  • Definition of Statistics:

    • A set of tools and techniques used for describing, organizing, and interpreting information or data.

  • Basic Goals of Science:

    • Description

    • Prediction

    • Explanation

  • Types of Statistics:

    • Descriptive Statistics: Used to organize, summarize, and describe data.

    • Inferential Statistics: Used to make inferences about a larger population and determine the likelihood that the sample is representative of the population.


Types of Statistics

  • Sample:

    • A group that data is collected from; a subset of the larger group you’re interested in.

  • Population:

    • The group you’re interested in making conclusions about.

  • Variable:

    • Something that can change or have different values for different individuals.

  • Data:

    • Information collected from the sample, which can include:

      • Continuous Data: Measured on a continuum (e.g., height, weight, age).

      • Categorical Data: Sorts people into categories (e.g., major, hometown).


Chapter 2 - Central Tendency

  • Central Tendency:

    • The data point that describes the center of a distribution; a single number that represents a group of scores.

  • Measures of Central Tendency:

    • Mean:

      • The average value of the data set.

    • Median:

      • The middle value when data is ordered.

    • Mode:

      • The most frequently occurring value in the data set.


Computing the Mean

  • Formula:

    • $X_{bar} = \frac{\Sigma X}{n}$

      • Where $X$ = each individual data score, $\Sigma$ = summation sign, and $n$ = number of items in the sample.

  • Note:

    • The mean is sensitive to extreme scores.


Computing the Median

  • Steps:

    1. List the numbers in order from lowest to highest.

    2. Find the middle score.

    • If there is an even number of scores, calculate the average of the two middle scores.


Computing the Mode

  • Steps:

    1. Count how many times each value appears.

    • Note:

      • The mode is categorized, not numerical.

    • If there is more than one mode, the data is considered bimodal.


When to Use Each Measure

  • Mode:

    • When data is categorical.

  • Mean:

    • When data is continuous and there aren’t extreme scores; most common measure of central tendency.

  • Median:

    • When data is continuous and there are extreme scores.


Chapter 3 - Variability

  • Variability:

    • How different scores are from each other; it represents the spread of the data.

  • Measures of Variability:

    • Range

    • Standard Deviation

    • Variance


Range

  • Definition:

    • The difference between the highest and lowest scores in a data set.

  • Formula:

    • $r = h - l$

      • Where $r =$ range, $h =$ highest score, $l =$ lowest score.

  • Problems:

    • Ignores the middlemost values and only considers extreme scores.


Standard Deviation

  • Definition:

    • The average amount of variability in a set of scores.

    • It indicates how far away individual data points are from the sample mean.

    • Interpretation:

      • Small/low SD: Most data points are close to the mean.

      • Large/high SD: Most data points are far from the mean.

  • Symbols:

    • $s$ or $sd = $ standard deviation.

    • The square root symbol: $\sqrt{}$.

  • Calculating SD:

    • Formula:
      s=Σ(XXbar)2n1s = \sqrt{\frac{\Sigma (X - X_{bar})^2}{n-1}}

    • Where $\Sigma$ is the summation, $X$ is each individual score, and $X_{bar}$ is the mean of the sample.


Steps for Finding Standard Deviation

  1. List all of the scores.

  2. Compute the mean.

  3. Subtract the mean from each score.

  4. Square all of the differences.

  5. Sum all of the squared deviations.

  6. Divide the sum by $n-1$.

  7. Compute the square root of the result from step 6.

  8. Report the results.


Variance

  • Definition:

    • Variance (denoted as $s^2$) is calculated by squaring the standard deviation.

  • Note:

    • Typically, we report standard deviation rather than variance.


Outliers/Extreme Values

  • Definition of Outlier:

    • A data point that deviates significantly from the other data points in the sample.

    • More than 2 standard deviations from the mean are considered potential outliers, while more than 3 standard deviations is a likely outlier.

  • Impact of Outliers:

    • Outliers can skew the distribution of data.


Finding Outliers/Extreme Values

  • Formula:

    • $X_{bar} \pm (c \times s)$

      • Where $c =$ cutoff value of interest. For example, if looking for anything over 1 SD away, $c = 1$.


Chapter 4 - Graphing Data

  • Types of Graphs:

    • Histograms: Represents the distribution of continuous data.

    • Bar Graphs: Represents the distribution of categorical data.

  • Note:

    • Regardless of the graph type, it is essential to be mindful of axes and scales.


Distributions

  • Impacts on Distributions:

    • Central Tendency: Affects the distribution of X values.

    • Variability: Standard deviation affects kurtosis.

  • Symmetrical Distributions:

    • In symmetrical distributions, the mode, median, and mean are equivalent.


Skewness

  • Skewed Distributions:

    • Lacks symmetry; one side has more data points than the other.

      • Positive Skew: Tail extends to the right.

      • Negative Skew: Tail extends to the left.

  • Kurtosis:

    • Refers to how peaked versus flat a distribution is.

      • Platykurtic: Flat distribution indicating more variability.

      • Leptokurtic: Peaked distribution indicating less variability.


Chapter 5 - Correlation Coefficients

  • Purpose of Correlation:

    • To assess the relationship between two variables.

  • Correlation Coefficient:

    • Assigns a number representing the degree of relationship between two variables, represented by “r”.

    • Ranges from -1 to 1.


Direction and Strength of Correlations

  • Indicates:

    • Direction: Positive or negative relationship.

    • Strength: Can range from none, weak to moderate, to strong.

      • Perfect positive correlation: $r = 1$.

      • Strong positive correlation: close to 1.

      • Weak positive correlation: closer to 0.

      • No correlation: $r = 0$.

      • Weak negative correlation: close to -1.

      • Strong negative correlation: also close to -1.

      • Perfect negative correlation: $r = -1$.


Limitations of Correlation Coefficient

  • Can only identify linear relationships.

  • Restriction of Range:

    • If most subjects have similar scores on one variable, this reduces the magnitude and obscures the true relationship.

  • Impact of Outliers:

    • Outliers can have a considerable influence on the correlation coefficient.


Calculating Correlation Coefficient

  • Variables:

    • $r_{xy}$ is the correlation between variables X and Y.

    • $n$ is sample size.

    • $X$ and $Y$: scores on X and Y variables respectively.

    • $XY$: product of each X score times its corresponding Y score.

    • $X^2$: square of each individual's X score.

    • $Y^2$: square of each individual's Y score.


Steps for Calculating Correlation Coefficient

  1. Create a table to organize data.

  2. Fill in the table with the appropriate values.

  3. Sum the columns of the table.

  4. Plug in the summed numbers into the correlation formula.

  5. Calculate the correlation coefficient value.


Four Types of Measurement Scales

  • Nominal Scale:

    • Categories that are mutually exclusive.

    • Data is in the form of counts or percentages (e.g., major, hair color, gender).

  • Ordinal Scale:

    • Number rankings where the distance between placements is unclear (e.g., top 10 football teams, class rank, pain scale).

  • Interval Scale:

    • Ordered events with equal spacing, but zero may not hold meaningful value (e.g., IQ tests, temperature, SAT scores).

  • Ratio Scale:

    • Interval measurement where zero indicates a complete absence of the attribute (e.g., money, weight, length); zero has a specific meaning.


Reliability and Validity

  • Reliability:

    • Refers to the consistency of a measurement.

    • Questions to consider: How do I know the measure I’m using works consistently?

  • Validity:

    • Refers to the accuracy of a measurement.

    • Questions to consider: How do I know the measure I’m using measures what it’s supposed to measure?


Error Score

  • Definitions:

    • Observed Score: Measured score.

    • True Score: The true reflection of an individual’s score.

    • Error Score:

      • The discrepancy between the true and observed score, also known as measurement error.

    • Goal:

    • Minimize error to enhance reliability and validity.


Types of Reliability

  • Test-retest:

    • Similarity of scores on the same measure at different time points.

  • Parallel forms:

    • Equivalency of different forms of the same measure.

  • Inter-item reliability:

    • Measures the similarity of responses for one person across multiple similar questions (also called internal consistency reliability).

    • Measured via Cronbach’s alpha, improved by clear instructions, removing confusing items, and increasing the number of items.

  • Inter-rater reliability:

    • Consistency of observations made by multiple people, calculated as agreements/possible agreements.

    • For continuous data, measured by correlation between raters.


Types of Validity

  • Content Validity:

    • The measure reflects a good sample of items that assess the construct.

  • Criterion Validity:

    • The measure predicts other relevant indicators.

    • Types include:

      • Concurrent Validity: Measure correlates with criterion measured simultaneously.

      • Predictive Validity: Criterion obtained after the score.

  • Construct Validity:

    • The measure relates to similar constructs and does not correlate with dissimilar constructs.

    • Types include:

      • Convergent Validity: The measure correlates with constructs it should.

      • Discriminant Validity: The measure does not correlate with constructs it shouldn't.


Extra Tips

  • Understand how to apply definitions.

  • Practice Problems: Regularly complete practice problems to strengthen understanding.

  • Show All Work: When solving problems, document all steps taken.

  • Report Results: Know how to format and present results accurately.

  • Complete the Study Guide: Ensure all sections are thoroughly reviewed.

  • Ask for Help: Don't hesitate to seek guidance if needed.