PSYCHOLOGICAL STATISTICS

Introduction to Psychological Statistics

Overview of Psychological Statistics

Psychological statistics involves defining the language of data, distribution, and inference.
Key concepts include probability, error, and the notion that misleading data does not indicate mistakes but rather uncertainties inherent in the data.
The goal is to calculate probabilities about inferences, which is crucial for statistical analysis.

Key Concepts in Statistics

Population vs. Sample

Population: The totality of units or individuals with which we are concerned in research.
Parameter: A value that describes a population.
Sample: Any portion of the population selected for analysis.
Statistic: A value that describes a sample.
Note: Psychology consistently relies on samples due to practical constraints.

Sampling

Random Sampling

Random Sampling: A method of taking samples in such a way that every unit in the population has an equal chance of being included.
This method aims to minimize bias and ensure that the sample accurately represents the population.

Types of Statistics

Descriptive Statistics

Definition: Organizes, summarizes, and simplifies data for easier understanding.
Uses: Presentation of data and describing data to make predictions.

Inferential Statistics

Definition: Generalizes findings from samples to populations, including hypothesis testing and studying relationships among variables.

Levels (or Scales) of Measurement

Variables may be measured on one of four scales, which determines the type of statistics and conclusions that can be drawn:
  1. Nominal
  2. Ordinal
  3. Interval
  4. Ratio
Importance of understanding scales: They determine the appropriate statistical techniques and inferential methods.

Scales of Measurement for Qualitative Data

Nominal Scale

Definition: Consists of non-ordered categorical responses without a specific continuum.
Examples: Mood, major, gender.

Ordinal Scale

Definition: Comprises ordered categorical responses that exist on a continuum ranging from low to high, but the intervals are not necessarily equal.
Examples: Anxiety ratings, rank order descriptions.

Scales of Measurement for Quantitative Data

Interval Scale

Definition: Involves numerical responses that are equally spaced, but do not have a true zero point.
Examples: Temperature, Likert scale ratings (1-5; 1-7).

Ratio Scale

Definition: Like the interval scale but has a true zero point, allowing for meaningful ratio comparisons.
Examples: Reaction time, accuracy, height, weight.

Frequency Distributions

Definition: Describes the number of subjects falling into particular categories, condensing data into a single representative number.

Cross-Tabulation

Definition: A method of categorizing data based on two or more variables.
Example table for political sub-groups:
  - Democrats: 24
  - Republicans: 1
  - Total: 25
Frequency percentage can be calculated as:
ext{Frequency ext{
ext{Percentage}}} = rac{ ext{Sub-group total}}{ ext{Overall total}} imes 100

Data Visualization

Bar Graphs

Rule: Appropriate for nominal scales of measurement.
Show frequency distributions across different categories (e.g., college major).

Histograms

Rule: Used for ratio data (quantitative).
Visually represent distribution of numerical data using bars, emphasizing the frequency of different ranges.

Polygons (Line Graphs)

Used to illustrate frequency distributions over a range of values.

The Normal Distribution

Characteristic: A symmetrical distribution where 68% of data falls within one standard deviation from the mean, and 95% falls within two.
Commonly used for variables such as body temperature, IQ scores, and height.
Displays different shapes, including normal, J-shaped, rectangular, and bimodal distributions.

Measures of Central Tendency

Definitions

Mode: The most frequently occurring value in a data set.
Median: The middle value when data is ordered, providing a central point dividing the data set into two equal halves.
Mean: The arithmetic average, calculated as:
ext{Mean} (ar{x}) = rac{ ext{Sum of all observations}}{ ext{Total number of observations}}

Notes on Usage

For quantitative data, all three measures are applicable.
For qualitative data, the mode is always appropriate; the mean is not valid.

Variability Measures

1. Range

Defined as the distance from the lowest value to the highest value.
Note: Considers only two data points.

2. Variance ($s^2$)

Definition: The average of the squared deviations from the mean.
Calculation involves utilizing all data points:
s^2 = rac{ ext{Σ}(x - ar{x})^2}{N}

3. Standard Deviation (SD)

The square root of variance, representing the average distance of data points from the mean.
Calculated as:
$s = ext{√}s^2$

4. Standard Error of the Mean (SEM)

Definition: Standard deviation divided by the square root of the sample size:
ext{SEM} = rac{s}{ ext{√}n}

Sampling Error

Definition: Refers to variability among samples that occurs by chance, not reflecting true population characteristics.
Key Questions to Consider:
1. Are the observed differences real or merely due to sampling error?
2. Are our inferences about the population valid?

Hypothesis Testing

Null Hypothesis (H₀)

Definition: Predicts no differences between means; notation: $H₀: m₁ = m₂$ .
Core principle: Always tested in hypothesis testing scenarios.

Alternative Hypothesis (H₁)

Definition: Predicts differences between groups exist; notation: $H₁: m₁ <br /> eq m₂$ .

Error Types in Hypothesis Testing

Type I Error (Alpha)

Definition: Occurs when the null hypothesis is rejected when it is actually true.
Significance level commonly set at 0.05.

Type II Error (Beta)

Definition: Occurs when the null hypothesis is accepted when it is false, meaning real differences exist but are overlooked.

Power Analysis

Definition: The probability of correctly rejecting the null hypothesis when it is false; the ability to detect an effect if one exists.
Strategies to increase power:
  1. Increase sample size (n).
  2. Decrease variability among sample measurements.
  3. Use more precise instruments for measurement.

Effect Size

Definition: A measure of the magnitude of differences attributed to the treatment, distinguishing practical significance from statistical significance.

Tools for Testing Mean Differences

T-Test

Definition: Used when comparing only two groups; types include:
- Independent T-Test: Between different subjects.
- Correlated T-Test: Within subjects or matched.
Metric used: t-statistic (critical values based on degrees of freedom and alpha).

Analysis of Variance (ANOVA)

Definition: Used for comparing more than two groups; types include:
  - Between Subjects: Different participants in each group.
  - Within Subjects: Same participants across conditions (repeated measures).
  - One-way ANOVA: One independent variable.
  - Factorial ANOVA: Multiple independent variables.
Metric used: f-statistic.

Meta-Analysis

Definition: Involves statistical averaging of results from multiple independent studies evaluating the same phenomenon.