L3 Descriptive Statistics & Getting Started with R - PSY 6500

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/18

Earn XP

Description and Tags

A collection of vocabulary flashcards based on descriptive statistics and the introduction to R programming.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

19 Terms

New cards

Descriptive Statistics

Statistical methods used to summarize, organize, and describe the main features of a dataset, such as central tendency (mean, median, mode) and variability (range, standard deviation).

New cards

Univariate

A type of statistical analysis focusing on describing or summarizing the characteristics of a single variable at a time, often using measures like frequency distributions, mean, or standard deviation.

New cards

Bivariate

A type of statistical analysis that examines the relationship or association between two variables simultaneously, often looking at how changes in one variable correspond to changes in another.

New cards

Crosstabs

Also known as contingency tables, crosstabs are a method for displaying and summarizing the relationship between two categorical variables by counting the number of observations that fall into each combination of categories.

New cards

Covariance

A measure indicating the extent to which two variables change together. A positive covariance means they tend to increase or decrease simultaneously, while a negative covariance means one increases as the other decreases. Its magnitude is influenced by the scales of the variables.

New cards

Correlation

A standardized statistical measure that quantifies the strength and direction of a linear relationship between two variables. Unlike covariance, correlation is scaled to a range (e.g., -1 to +1) making it easier to interpret the strength of the relationship regardless of the variables' units.

New cards

Pearson’s Correlation (r)

A specific type of correlation coefficient that measures the strength and direction of the linear relationship between two quantitative variables. It requires both variables to be interval or ratio scale and assumes a linear relationship.

New cards

Assumptions of Pearson’s r

These are specific conditions that must be met for the Pearson's correlation coefficient to be a valid and reliable measure of a linear relationship. Key assumptions include: 1) both variables are quantitative (interval or ratio scale), 2) the relationship is linear, 3) there is homoscedasticity, and 4) bivariate normality (or sufficiently large sample size).

New cards

Homoscedasticity

A statistical assumption, particularly important for regression and correlation, where the variance (or spread) of the residuals (the differences between observed and predicted values) is approximately constant across all levels of the independent variable. In the context of Pearson's r, it means the variability in one variable is similar for all values of the other variable.

New cards

Bivariate Normality

An assumption that for any given value of one variable, the values of the other variable are normally distributed, and vice versa. It implies that the joint distribution of the two variables forms a 3D bell shape, which is often desirable for parametric tests involving two variables.

New cards

Spearman’s Correlation (ρ)

A nonparametric measure of correlation that assesses the strength and direction of the monotonic (not necessarily linear) relationship between two variables. It is often used when variables are ordinal, or when the assumptions for Pearson's r are violated for interval/ratio variables.

New cards

Monotonic Relationship

A type of relationship between two variables where as one variable increases, the other variable either consistently increases (monotonically increasing) or consistently decreases (monotonically decreasing), but not necessarily at a constant rate (i.e., not strictly linear).

New cards

Anscombe’s Quartet

A set of four distinct pairs of datasets that each have nearly identical simple descriptive statistics (e.g., mean, variance, correlation coefficient, linear regression line), but when plotted graphically, they exhibit vastly different distributions and relationships between the variables. It famously illustrates the importance of visualizing data before relying solely on summary statistics.

• Each dataset contains 11 data points and 2 variables, X and Y

• Each dataset has nearly identical descriptive stats and correlations

• BUT the datasets have vastly different scatterplots

New cards

Range Restriction

A statistical phenomenon that occurs when the variability of scores for one or both variables in a correlation analysis is artificially limited compared to the true population variability. This often leads to an underestimation of the true correlation coefficient, as the full range of the relationship is not observed.

New cards

Spearman’s ρ vs. Pearson’s r

Use Spearman’s ρ instead of Pearson’s r when any of the following

are true about your data:

• The relationship is monotonic but not linear

• The data are at least ordinal or rank-ordered

• The variables have outliers that could distort Pearson’s r

• The assumptions of Pearson’s r aren’t met

• Takeaway: If your scatterplot isn’t a straight-line cloud or if your

data are ranks, ordinal, or contain outliers, check Spearman’s ρ—

it may give a truer measure of the association’s strength.

New cards

What’s a “Strong” Correlation?

• Depends on both the value of r and the research domain or

context in which it’s being used

• Depends on both the value of r and the research domain or

context in which it’s being used

Conventional and often cited guidelines in the behavioral and

social sciences:

• r ≈ |.10| is weak

• r ≈ |.30| is moderate

• r ≥ |.50| is moderate

New cards

You find a correlation of r = .50 between two variables in your

dataset.

• Question: What factors do you need to consider to determine

whether this correlation is “strong” or “weak”?

Context and field of study:

A moderate correlation in one field might be considered strong in another.

For example, a correlation of 𝑟=0.50 could be a significant finding in social sciences, where complex human behavior is involved

Consequences of error:

The "strength" of a correlation also depends on how much error you can tolerate in your predictions

Sample size:

With a large enough sample size, a correlation of 𝑟=0.50 could be statistically significant and unlikely to be due to random chance.

Outliers:

Be aware of outliers in your data, as a single extreme data point can significantly affect the correlation coefficient.

Statistical significance:

While r=0.50 is a moderate correlation, its statistical significance (p-value) determines the likelihood that this result is real and not due to random chance. A result with a very low p-value is more likely to be "strong" in a statistical sense

New cards

You calculate a correlation of r = .82 between two variables. Your

colleague says, “That’s all we need to know — let’s skip the

scatterplot.”

Question: Why is it still important to examine the scatterplot, and

what could you learn from it that the correlation alone would not

reveal?

Examining the scatterplot is crucial because it provides visual insight into the relationship between the two variables. It can reveal patterns such as outliers, non-linearity, homocedaciticity, bivariate normality which could all impact the interpretation and validity of the correlation.

New cards

The Y variable in your correlation has low variability. Although this

variable was measured on a scale from 1 to 100, your dataset only

contains scores that range from 80 to 100.

• Question: How is this likely to affect the correlation, and why?

A low variability in the Y variable will likely lower the correlation coefficient, potentially leading to an underestimation of the true relationship between the variables. This is because correlation measures how two variables change together, and if one variable doesn't change much, it's difficult to see a strong pattern or relationship