Quantitative Methods Final

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/68

There's no tags or description

Looks like no tags are added yet.

Last updated 6:53 PM on 5/4/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

69 Terms

New cards

Correlation

strength and direction relationship between 2 variables

New cards

Univariate

summarizes 1 variable at a time

New cards

Bivariate

compares 2 variables (correlation & linear regression)

New cards

Multivariate

compares 2+ variables (cluster analysis, PCA, & correspondence analysis)

New cards

Models used for correlation

scatterplots & general linear model

New cards

What types of data for correlation?

continuous/rank normally distributed data

New cards

Pearson’s R

correlation coefficient that measures the strength and direction of linear relationships for normal, interval data (-1 to 0 to 1 scale)

beware outliers, must be linear

New cards

Covariance

indicates the direction of a linear relationship

New cards

deviations

differences between observed values and the mean

New cards

standardization

converting variables’ units of measurement to be able to compare by using standard deviation

New cards

standard deviation

standardized unit of measurement used to show how far away values are from the mean

New cards

partial correlation

quantifies the relationship between two variables while controlling the effect of one variable

New cards

Correlation Tests

Pearson’s R
Spearman’s rho
Kendall’s tau

New cards

Spearman’s rho

correlation coefficient test used for non-parametric ranked data, but not good for tied ranks

New cards

Kendall’s tau

correlation coefficient test used for non-parametric ranked data and is better for tied ranks

New cards

Linear Regression

predicts how the independent variable influences the dependent variable to change as well as the relationship between the variables

New cards

Steps of Linear Regression

identify dependent and independent variables
plot cases in scatterplot
fit a line to the points

New cards

What types of data for correlation, linear regression, and PCA?

continuous (interval/ratio)

New cards

Slope intercept formula

y = mx + b

y = dependent variable
m = slope
x = independent variable
b = y-intercept

New cards

Least square regression line

line determined through method of least squares that passes through the means of x & y values and minimizes the distance between itself and other points

New cards

Method of Least Squares

used to estimate parameters of the model and the line that best fits the data

New cards

simple regression

outcome variable is predicted from 1 variable

New cards

multiple regression

outcome variable is predicted from multiple variables

New cards

residuals

like deviations but for linear regression; assesses model fit by looking at the differences between the observed values and regression line

New cards

Coefficient of Determination

output for linear regression; percentage of variance in the dependent that can be explained by the variance in the independent variable (0 to 1)

New cards

ANOVA Regression

sum of squared differences (model fit)
significance for regression (p-value)
F-value/F-ratio (Between-group variance/within group variance; > 1)

New cards

degree of correlation

New cards

R²

percentage of the total variation is able to be explained

New cards

Why linear regression?

allows quantification of the relationship between 2 variables
allows you to make predictions when you only have independent variables based on a known relationship
explore exceptional cases

New cards

Cluster Analysis

groups sets of objects so that objects in groups (cluster) are more similar to each other than objects in other groups (can use all data types)

New cards

Methods of Cluster Analysis

Hierarchical Cluster Analysis
Non-Hierarchical Cluster Analysis

New cards

Hierarchical Cluster Analysis

clusters are formed during every step of the process and as a new case is entered, it is grouped into a larger cluster as well and the output is not sorted in a linear order

New cards

Dendrogram

tree branch plot used in cluster analysis

New cards

Steps for Cluster Analysis

choose variables you want
decide whether to use raw or standardized variables
choose coefficient that quantifies the similarity or dissimilarity between all cases
select method for forming clusters

New cards

Cluster Analysis Coefficients

Euclidean Distance
City Block Metrics
Jaccard Coefficient
Simple Matching Coefficient

New cards

Euclidean Distance

coefficient method for cluster analysis that uses Pythagorean theorem to measure a straight line distance from one point to another for continuous data

New cards

City Block Metric

coefficient method for cluster analysis that makes an x, y grid and moves one axis at a time to measure how many units away for large continuous data

New cards

Jaccard Coefficient

coefficient method for cluster analysis that counts negative matches as weighed differences and is best for presence/absence data

New cards

Simple Matching Coefficient

coefficient method for cluster analysis that counts negative matches as weighing the same and is best for presence/absence data

New cards

Methods for forming clusters

Simple Linkage
Average Linkage
Complete Linkage
Ward’s Procedure

New cards

Simple Linkage

method for forming clusters in cluster analysis that uses the closest variables to draw a line

New cards

Average Linkage

method for forming clusters in cluster analysis that uses the shortest distance between centroids to draw a link

New cards

Complete Linkage

method for forming clusters in cluster analysis that uses the furthest distance from the edge of each cluster

New cards

Ward’s Procedure

method for forming clusters in cluster analysis that analyzes the variance of the clusters and is best for quantitative data

New cards

Types of Hierarchical Clustering

Agglomerative
Divisive

New cards

Agglomerative Hierarchical Clustering

uses bottom up approach that repeatedly merges clusters into larger ones until a single cluster emerges (similarity based on proximity using Euclidean distance) and is more commonly used because of its ease to implement

New cards

Divisive Hierarchical Clustering

top down approach that starts from one group, then splits until more clusters are created and is better for large data sets

New cards

Rules for how Divisive clusters are formed

Monothetic
Polythetic

New cards

Monothetic

rule for forming divisive clusters where decisions are made based on one variable at a time

New cards

Polythetic

rule for forming divisive clusters where clusters are made based on multiple variables at a time

New cards

Non-Hierarchical Cluster Analysis

uses some measure to evaluate whether or not a case should be in a cluster by merging or splitting clusters instead of putting them into a hierarchical order and is better for small data sets

New cards

Simple K-means

ensures that non-overlapping groups that have no hierarchical relationships between them

New cards

Principal Component Analysis

a way to identify clusters of variables by reducing the number of dimensions in large datasets down to its principal components that still retain most of the original data (continuous); tries to explain the maximum amount of total variance in a correlation matrix by transforming the original variables into its smaller linear components (correlation & variance)

New cards

Variance

spread of data

New cards

Principal Components

lines that describe the relationship between 2 variables and is predicted from measured variables

New cards

R matrix

table that arranges the correlation between each pair of variables

New cards

What type(s) of data for correspondence analysis?

categorical/count

New cards

Steps of PCA

cases are plotted in multi-dimensional space
find where the data is most spread out
identify where the center of the spread of points is

New cards

Criteria of a Principal Component Line

must pass through center of data
must pass through spread of data along the axis that will capture the most variation
all consecutive lines must be drawn at right angles to the prior line

New cards

Evaluating PCA/Eigenvectors for significance/meaning

eigenvalues
scree plot
component loading

New cards

What kind of plot for correlation, linear regression, PCA, and correspondence analysis?

scatterplots

New cards

Eigenvalues

values derived from eigenvectors that measure the distance from one end of the matrix to another in order to understand the distribution of variance (how much variation along that dimension of data is described; >1)

New cards

Eigenvector

measures height and width of ellipse encompassing the data in the scatterplot

New cards

Scree Plot

plot that elbows where a lot of variance is accounted for/eigenvalues level off

New cards

Component Loading

breaks down principal components to understand correlations between original variables and unit-scaled components (greater # = stronger correlation)

New cards

Correspondence Analysis

plots (scatterplots/biplots) different categories in multidimensional space and breaks it down into 2 dimensional space to understand the similarities and why they are similar (expected vs observed)

New cards

Residual

like error for correspondence analysis; describes how the relationship between 2 dependent variables is influenced by individual differences in participant’s performance (difference between model prediction and value observed)

New cards

Inertia

degree to which values of rows and columns correspond to each other in correspondence analysis (chi-square/n)

New cards

Steps of Correspondence Analysis

compute averages for each row and column
compute the expected values for each cell
compute residuals for each cell
divide the residuals by the expected values
plot indexed residuals in 2 dimensions