Psychological statistics involves defining the language of data, distribution, and inference.
Key concepts include probability, error, and the notion that misleading data does not indicate mistakes but rather uncertainties inherent in the data.
The goal is to calculate probabilities about inferences, which is crucial for statistical analysis.
Key Concepts in Statistics
Population vs. Sample
Population: The totality of units or individuals with which we are concerned in research.
Parameter: A value that describes a population.
Sample: Any portion of the population selected for analysis.
Statistic: A value that describes a sample.
Note: Psychology consistently relies on samples due to practical constraints.
Sampling
Random Sampling
Random Sampling: A method of taking samples in such a way that every unit in the population has an equal chance of being included.
This method aims to minimize bias and ensure that the sample accurately represents the population.
Types of Statistics
Descriptive Statistics
Definition: Organizes, summarizes, and simplifies data for easier understanding.
Uses: Presentation of data and describing data to make predictions.
Inferential Statistics
Definition: Generalizes findings from samples to populations, including hypothesis testing and studying relationships among variables.
Levels (or Scales) of Measurement
Variables may be measured on one of four scales, which determines the type of statistics and conclusions that can be drawn:
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Importance of understanding scales: They determine the appropriate statistical techniques and inferential methods.
Scales of Measurement for Qualitative Data
Nominal Scale
Definition: Consists of non-ordered categorical responses without a specific continuum.
Examples: Mood, major, gender.
Ordinal Scale
Definition: Comprises ordered categorical responses that exist on a continuum ranging from low to high, but the intervals are not necessarily equal.
Examples: Anxiety ratings, rank order descriptions.
Scales of Measurement for Quantitative Data
Interval Scale
Definition: Involves numerical responses that are equally spaced, but do not have a true zero point.
Definition: Like the interval scale but has a true zero point, allowing for meaningful ratio comparisons.
Examples: Reaction time, accuracy, height, weight.
Frequency Distributions
Definition: Describes the number of subjects falling into particular categories, condensing data into a single representative number.
Cross-Tabulation
Definition: A method of categorizing data based on two or more variables.
Example table for political sub-groups:
- Democrats: 24
- Republicans: 1
- Total: 25
Frequency percentage can be calculated as:
ext{Frequency ext{
ext{Percentage}}} = rac{ ext{Sub-group total}}{ ext{Overall total}} imes 100
Data Visualization
Bar Graphs
Rule: Appropriate for nominal scales of measurement.
Show frequency distributions across different categories (e.g., college major).
Histograms
Rule: Used for ratio data (quantitative).
Visually represent distribution of numerical data using bars, emphasizing the frequency of different ranges.
Polygons (Line Graphs)
Used to illustrate frequency distributions over a range of values.
The Normal Distribution
Characteristic: A symmetrical distribution where 68% of data falls within one standard deviation from the mean, and 95% falls within two.
Commonly used for variables such as body temperature, IQ scores, and height.
Displays different shapes, including normal, J-shaped, rectangular, and bimodal distributions.
Measures of Central Tendency
Definitions
Mode: The most frequently occurring value in a data set.
Median: The middle value when data is ordered, providing a central point dividing the data set into two equal halves.
Mean: The arithmetic average, calculated as:
ext{Mean} (ar{x}) = rac{ ext{Sum of all observations}}{ ext{Total number of observations}}
Notes on Usage
For quantitative data, all three measures are applicable.
For qualitative data, the mode is always appropriate; the mean is not valid.
Variability Measures
1. Range
Defined as the distance from the lowest value to the highest value.
Note: Considers only two data points.
2. Variance ($s^2$)
Definition: The average of the squared deviations from the mean.
Calculation involves utilizing all data points:
s^2 = rac{ ext{Σ}(x - ar{x})^2}{N}
3. Standard Deviation (SD)
The square root of variance, representing the average distance of data points from the mean.
Calculated as: s=ext√s2
4. Standard Error of the Mean (SEM)
Definition: Standard deviation divided by the square root of the sample size:
ext{SEM} = rac{s}{ ext{√}n}
Sampling Error
Definition: Refers to variability among samples that occurs by chance, not reflecting true population characteristics.
Key Questions to Consider:
1. Are the observed differences real or merely due to sampling error?
2. Are our inferences about the population valid?
Hypothesis Testing
Null Hypothesis (H₀)
Definition: Predicts no differences between means; notation: H0:m1=m2.
Core principle: Always tested in hypothesis testing scenarios.
Alternative Hypothesis (H₁)
Definition: Predicts differences between groups exist; notation: H1:m1<br/>eqm2.
Error Types in Hypothesis Testing
Type I Error (Alpha)
Definition: Occurs when the null hypothesis is rejected when it is actually true.
Significance level commonly set at 0.05.
Type II Error (Beta)
Definition: Occurs when the null hypothesis is accepted when it is false, meaning real differences exist but are overlooked.
Power Analysis
Definition: The probability of correctly rejecting the null hypothesis when it is false; the ability to detect an effect if one exists.
Strategies to increase power:
1. Increase sample size (n).
2. Decrease variability among sample measurements.
3. Use more precise instruments for measurement.
Effect Size
Definition: A measure of the magnitude of differences attributed to the treatment, distinguishing practical significance from statistical significance.
Tools for Testing Mean Differences
T-Test
Definition: Used when comparing only two groups; types include:
- Independent T-Test: Between different subjects.
- Correlated T-Test: Within subjects or matched.
Metric used: t-statistic (critical values based on degrees of freedom and alpha).
Analysis of Variance (ANOVA)
Definition: Used for comparing more than two groups; types include:
- Between Subjects: Different participants in each group.
- Within Subjects: Same participants across conditions (repeated measures).
- One-way ANOVA: One independent variable.
- Factorial ANOVA: Multiple independent variables.
Metric used: f-statistic.
Meta-Analysis
Definition: Involves statistical averaging of results from multiple independent studies evaluating the same phenomenon.