Stats 301 Exam 1 Review

Studied by 0 people

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

1 / 88

There's no tags or description

Looks like no one added any tags here yet for you.

89 Terms

Defining Statistics

Set of tools & techniques used to describe, organize, and interpret data

New cards

Goals of science & how stats helps achieve those

Sats helps describe, predict, and explain data

New cards

Descriptive Stats

Organize and describe data

New cards

Inferential Stats

Infer (guess) something about a larger group (population) from smaller groups (sample)

New cards

What is a sample?

A portion or subset OF the population

New cards

What is a population?

The overarching group you are studying (large)

New cards

What is a variable in stats?

Something that can change (vary) or have different values for different individuals

EX: Age, Major, etc

New cards

What is data in stats?

Information collected from the sample on the variables we are interested in (actual numbers & measurements & characteristics)

EX: Engineering, psych, business OR 18,19,20,21, etc

New cards

What is continuous data?

variables that can assume any value along some underlying continuum.

EX: height, weight, time

New cards

What is categorical data?

a variable that can take on one of a limited, usually fixed, number of possible values.

EX: political affiliation, marital status, and education level

New cards

What is central tendancy?

a statistical measure that identifies an (average value) in a data distribution

EX: mean, median, and mode

New cards

What is the mean and how do you calculate it?

The AVERAGE of the data

most sensitive to outliers
best used when there are NO extreme values in the data set

How to calculate:

x bar = sum of x over n

New cards

What is the median and how do you calculate it?

The MIDDLE number in a data set

NOT sensitive to extreme values
Use when extreme values ARE present

How to calculate:

Put data in numerical order
If an odd number of values, find the value in the center

If even number of values, find the two values in the center, add them, and divide by 2.

New cards

What is the mode, and how do you calculate it?

The MOST FREQUENT occurring value in the data set

typically used in CATEGORICAL data
you CAN have multiple in the data set (bi-multi)
LEAST precise and LEAST affected by extreme values

How to calculate:

put values in numerical order
identify the MOST occurred value
if 2 values appear, they are BOTH modes of the data set

New cards

When to use which measure of central tendency?

3 Rules

Use mode when data is CATEGORICAL
Use mean when the data is CONTINUOUS and NO outliers
Use median when the data is CONTINUOUS and you think to mean is misleading because of extreme scores

When in doubt, report BOTH!

New cards

What are the extreme values for mean, median, and mode

Mean = DON’T use for extreme values
Median = can use for extreme values
Mode = can use for extreme values

New cards

What is the measure of Variability?

Tells us how DIFFERENT the scores are from each other.

represent the spread or dispersion in the dataset

New cards

Why is variability important?

helps us understand the nature of our SAMPLE and the nature of our VARIABLES

New cards

What are the 3 measures of variability?

Range
Standard Deviation
Variance

New cards

What is range and how do we calculate it?

The DIFFERENCE between the highest and lowest score of a data set

only considers MOST EXTREME values
not very accurate

How to calculate:

Range = h - l

New cards

What is standard deviation and how do we calculate it?

The AVERAGE distance scores are from the MEAN

The most commonly used measure of variability
SMALLER stand dev. means scores are closer to the mean
LARGE stand dev. means scores are further away from the mean

How to calculate:

Sigma (x-xbar) = single deviation

Sigma (x-xbar) squared = sum of ALL squared deviations

<p>The AVERAGE <u>distance</u> scores are from the MEAN</p><ul><li><p><u>The most commonly used</u> measure of variability</p></li><li><p><strong>SMALLER</strong> stand dev. means scores are <strong><u>closer</u></strong><u> to the </u><strong><u>mean</u></strong></p></li><li><p><strong>LARGE</strong> stand dev. means scores are <strong>f<u>urther</u></strong><u> away from the </u><strong><u>mean</u></strong></p></li></ul><p>How to calculate: </p><img src="https://knowt-user-attachments.s3.amazonaws.com/b600bd2d-71db-41da-b8ae-b6273b021045.gif" data-width="100%" data-align="center" alt=""><p>Sigma (x-xbar) = single deviation</p><p>Sigma (x-xbar) squared = sum of ALL squared deviations</p><p></p>

New cards

What is variance and how do we calculate it?

The standard deviation SQUARED

rarely used to report descriptive stats
more used as a concept

How to calculate:

Variance = SD ²

New cards

What are the important Standard Deviation concepts?

By def. the average of the deviations is ZERO (assuming normal distribution)
^ we must square the deviations
Values are squared so that they do NOT cancel each other out
SD is sensitive to extreme values
We use the sq root to REVERT back to original units

New cards

What is an outlier or extreme value?

A data point that appears to deviate markedly from other data points in the sample

New cards

What is the rule of thumb for outliers and extreme values?

Anything more than two standard deviations away from the mean is a potential outlier.
Anything more than three standard deviations away from the mean is likely an outlier.

New cards

Formula to calculate outliers

x bar +- ( c cut off value x s standard deviation)

New cards

How do you use standard deviation to understand an individual data point?

determine how far the point deviates from the mean (avg) of the dataset comparing it to the overall data spread
calculate the mean and standard deviation
find the “z” score and use the outlier identification formula

New cards

What is a “Z” score aka standard score?

The raw scores that have been adjusted for the mean and standard deviation of the distribution from which the raw scores came.

New cards

What are histograms and how do you identify them?

They show distributions of continuous variables
The height of the bar is the number of times that value occurs
The bars touch on the graph

<ul><li><p>They <strong>show distributions of continuous variables </strong></p></li><li><p>The <u>height of the bar</u> is the <strong>number</strong> of <strong>times </strong>that<strong> value occurs</strong></p></li><li><p>The<strong> bars touch</strong> on the<strong> graph</strong></p><p></p></li></ul><img src="https://knowt-user-attachments.s3.amazonaws.com/66bc5b64-cf9f-4ad4-9207-ea54c71aa231.png" data-width="100%" data-align="center" alt=""><p></p>

New cards

What are bar graphs and how do you identify them?

They show the frequency of categorical responses
The bars have spaces in between them on graph

<ul><li><p>They<strong> show the frequency </strong>of <strong>categorical </strong>responses</p></li><li><p>The <strong><u>bars have spaces</u></strong> in between them on graph</p><p></p></li></ul><img src="https://knowt-user-attachments.s3.amazonaws.com/ff081d08-b105-47e6-8fb5-7bb2c63887bc.png" data-width="100%" data-align="center" alt=""><p></p>

New cards

How is central tendency described as a distribution?

Mean, median, and mode differ in central tendency but do not differ otherwise
all 3 m’s would be the same in each of the symmetrical distributions
aka the same variability, different average

New cards

How is variability described as a distribution?

Can have the same central tendency - but different amounts of variability
Some can have the same range but different standard deviations

New cards

What is skewness and how is it described in a distribution?

The lack of symmetry in a graph

New cards

What is a positive skew and which way does the tail face

When the curve's tail is on the right side of the graph.

Mode is the highest on left side
The median is typically in the middle
Mean is the lowest on right side

<p>When the curve's tail is on the <strong>right </strong>side of the graph.</p><ul><li><p><strong>Mode</strong> is the <strong>highest</strong> on <strong>left side</strong></p></li><li><p><strong>The median</strong> is typically in the <strong>middle </strong></p></li><li><p> <strong>Mean</strong> is the <strong>lowest on right side</strong></p><p></p></li></ul><img src="https://knowt-user-attachments.s3.amazonaws.com/711e6ad6-65b0-4fe9-8760-acfc87b7b963.png" data-width="100%" data-align="center" alt=""><p></p>

New cards

What is a negative skew and which way does the tail face?

When the tail curve is typically on the left side of the graph

Mode is the highest on the right side
The median is in the middle
Mean is on the left side

<p>When the tail curve is typically on the<strong> left </strong>side of the graph</p><ul><li><p><strong>Mode</strong> is the <strong>highest</strong> on the <strong>right side</strong></p></li><li><p><strong>The median</strong> is in the <strong>middle</strong></p></li><li><p><strong>Mean</strong> is on the <strong>left </strong>side</p><img src="https://knowt-user-attachments.s3.amazonaws.com/7706824a-5bff-4690-a82e-59252af7175e.png" data-width="100%" data-align="center" alt=""><p></p></li></ul><p></p>

New cards

What does skewness reflect about the mean, median, and mode?

Reflects the relation between one another

New cards

What is the floor effect?

When there is a bottom bound for the values of a data set. MUCH of the data falls around the BOTTOM bound.

creates a positive skew!
majority values fall on the LOW end of the distribution

New cards

What is the ceiling effect?

When there is an upper bound for the values of the data set

Creates a negative skew
majority of the values fall at the HIGH end of the distribution

New cards

What is kurtosis?

How peaked vs flat the distribution is

New cards

What is platykurtic?

LOW kurtosis

relatively FLAT
HIGH variability

New cards

What is leptokurtic?

HIGH kurtosis

relatively PEAKED
LOW variability

New cards

What can make graphs misleading?

This can occur when visual reprensations are off and distortions are created with manipulation of axes, scales, and more

New cards

What are correlations?

How changes in one variable relate to changes in another variable

THE RELATIONSHIP BETWEEN TWO VARIABLES

New cards

When do we use correlations?

They are used when you want to quantify the strength and direction of a liner relationship between two continuous variables

New cards

What is a correlation coefficient?

a single number that describes the relationship between two variables

New cards

How is correlation coefficient abbreviated, and what does it range from?

Abv. as “r”
Ranges from -1 to 1

New cards

What is direction in correlation coefficient?

The sign of the coefficient tells us in which direction one variable is to the other

New cards

What is the relationship of a positive coefficient?

DIRECT relationship

as x increases, y increases

New cards

What is the relationship of a negative coefficient?

INVERSE relationship

as x increases, y decreases

New cards

What is strength of a correlation coefficient?

The closer the coefficient is to -1 or 1, the stronger the relationship is

New cards

What are scatterplots in relation to correlations?

A chart or graph that uses dots to represent values for two different numeric values

New cards

What is an important idea to remember about correlation coefficient?

Correlation does NOT equal causation. Just because two variables are closely related, does not mean that one causes the other.

New cards

Understand the chart of correlation relationships

New cards

Understand scatter plots and correlation examples

New cards

What are the limitations of correlation coefficients?

Can only be used to identify LINEAR relationships
NO curvilinear relationships
Restriction of range

New cards

What is the restriction of range?

When there are too many scores that have similar values for a variable, the coefficient cannot capture the true relationship.

New cards

Do outliers have a significant effect on correlation coefficents?

YES! They have a huge impact on correlation co.

New cards

What is the coefficient of determination? And how do we calculate it?

The representation of how much variance two variables share

how much x can be accounted for y (vise versa)
How to calculate it?
simply square the coefficient! r²

New cards

How do we calculate/compute the correlation coefficient?

The formula used:

r_xy = the correlation between x and y
n is the sample size
X is each individual's score on the X variable
Y is each individual’s score on the Y variable
XY is the product of each X score times its corresponding Y score
X² is each individual's X score squared
Y² is each individual’s Y score squared

New cards

What are the numerator and denominator relationships when computing a correlation coefficient?

numerator = how much do x and y go together

denominator = how much do x and y vary on their own

New cards

What is an example on how to report a correlation coefficient?

We found a strong or weak negative/positive correlation between ——- and ——- (r=). Suggesting that…..

New cards

What is coefficient of determination?

The more two variables have in common, the more variance they share

New cards

What is coefficient of determiination?

The variance that is left over after calculation

New cards

What is a correlation matrix?

A simple way to report a bunch of correlations at one time

New cards

What is r² and how do you calculate it?

This is known as the coefficient of determination and is calculated by squaring the value of r.

New cards

What is important to remember about correlation vs causation?

Correlation does NOT equal causation

we can NEVER definitively assume causation from a correlational relationship

New cards

What is reverse causation?

The causal direction may be opposite from what has been hypothesized

New cards

What is reciprocal causation?

When two variables cause each other

spiral effect

New cards

What are measures in reliability and validity?

the act or process of assigning numbers to phenomena according to a rule.

New cards

What are the 4 measurement scales from least to most precise?

Nominal Scale: measure split into categories. A person cannot be in more than one category. Data is presented as counts or percentages.
Ex: hair color, political affiliation
Ordinal Scale: categories are ranked in a hierarchy.
Ex: class ranking
Interval Scale: ranked continuous variables, with equal spacing (intervals) between values
Ex: 1-5 strongly agree to strongly disagree
Ratio Scale: similar to interval, but has a true zero value.
0= complete absence of the attribute

New cards

What is an independent variable?

Something that can be manipulated or changed in an experiment.

Ex: the amount of water used

New cards

What is a dependent variable?

What you measure/observe as a result of change

Ex: how much the plants had grown

New cards

What is reliability?

a measure that is consistent in the values it outputs

New cards

What is validity?

the measure is actually measuring what you intended to measure

New cards

What is a key note to remember about reliability and validity.

A measure can be reliable and NOT be valid.

But a measure cannot be valid and NOT be reliable.

New cards

What is the idea of garbage in, garbage out?

if the data you collected is based on invalid or unreliable measure, your results will be useless.

New cards

What is the goal for reliability and validity/ overall stats and testing?

MINIMIZE the error!

New cards

What is an observed score?

the ACTUAL score a person receives

New cards

What is a true score?

the theoretical score representing a persons actual ability or trait without measurement errors. (aka the perfect score)

New cards

What is an error score?

AKA measurement error, the discrepancy between observed and true score.

New cards

What are the types of reliability?

Test-retest: does a person receive the SAME score when they complete the measure at two different points in time?
Parallel test forms: are different versions of the same measurements equivalent?
Internal consistency: do all items in a measure assess the same concept you are trying to measure? Is there a strong correlation between individual items and total scores?
Chronbachs Alpha ^:
Inter-rater: does the measure produce the same results regardless of who is grading the scale? Can be evaluated by looking at the correlation between raters.

New cards

What is important to remember about test-retest and parallel forms?

both can be measured using correlation
the CLOSER the coefficient is to 1, the more reliable the measure is.

New cards

What is Cronbachs Alpha in relation to internal consistency?

a stat that reflects the degree of internal consistency of items. Should always be from ZERO to ONE. The closer to 1, the better.

New cards

How to improve cronbachs alpha?

Increase # of items in the survey
properly format instructions
make sure the admin of the measure is standardized
remove unclear or confusing items

New cards

Can validity be assessed with stats?

NO.

requires theory, critical thinking, and lots of data

New cards

What are the 3 types of validity?

Content: does the measure cover ALL of what we are trying to measure?
Criterion: does the measure predict other indicators of the same construct?
Construct: is the measure related to things it shouldn’t be and is it not related to things it should? Does it measure the underlying concept you set out to measure? Requires psychological theory