1

Normal Distribution

Data near the mean are more common than data far from the mean

2

Positive Skew distribution

Tail is longer on the right

3

Negative Skew Distribution

Tail is longer on the left

4

SOCS

**S**hape, **O**utlier/unusual, **C**enter, **S**pread

5

z-score

Flashcard: "Z-score formula calculates how many standard deviations a data point is from the mean in a normal distribution. Formula: z = (X - μ) / σ."

Can also be found using **invNorm** on calculator

6

Interquartile Range

**Q3 - Q1**

7

Outlier

Data point if it is **less than [ Q1 - 1.5IQR] **or

8

Standard Deviation

Average distance between an individual data value and the mean

9

Least-Squares Regression Line

Associations that are approximately linear on a scatter plot

Formula: **ŷ = a + bx**

10

Residuals

The difference between the actual value on the scatter plot and the value on the line of the scatter plot

Formula: **e= y - ŷ**

11

Line of Best Fit

Residuals on the LSRL being in a random pattern

12

Lurking Variable

One variable causing two another variables to change together

13

Confound Variable

The effect of multiple explanatory variables on a response variables cannot be changed

14

Simple Random Sample (SRS)

Each set of individuals having an **equal** chance of being randomly selected

15

Stratified Random Sample

Population divided into groups or strata and use SRS to select individuals from groups

16

Cluster Sample

Population is divided into non-homogenous groups (ex: geographic location) and use SRS to select individuals in groups

17

Systematic Sample

Selecting individuals in a certain interval order (ex: every 5th person)

18

Convenience Sample

Biased sample that selects individuals in a ‘convenient’ manner.

19

Voluntary Response Bias

Asking for volunteers instead of selecting participants

20

Non-response Bias

Researchers choose the participants, but refuse to participants

21

Response Bias

Anything in the survey design that might induce a SPECIFIC answer

22

Undercoverage Bias

Some portion in population not being included in right proportion

23

Single-Blind

One party (researchers or participants) are blinded

24

Double-Blinded

Both parties are blinded

25

Placebo

Fake treatment

26

Conditional Probability

The likelihood of an event happening given that another event has already occurred. It is calculated by dividing the probability of both events occurring by the probability of the given event.

**P(A|B)**

27

Compound Events (AND)

Compound Events (AND): Events that occur simultaneously or in conjunction. The outcome is dependent on the occurrence of both events. **Independent**

**P(A **∩** B)=P(A)⋅P(B)**

28

Compound Events (OR)

In probability, this refers to events that can occur separately or together, resulting in at least one of the events happening.

**P(A **U **B)** **= P(A) +P(B) - P (A **∩ **B)**

29

Independent Events

One event does not affect the probability of another event

If **P(A and B)=P(A)⋅P(B)**

If** P(A|B) = P(A)**

30

Mutually-Exclusive Events

Both events cannot occur at the same time

If **P(A **∩** B) = 0**

Uses the (OR) formula

31

Binomial Probability

A probability distribution that describes the number of successes in a fixed number of independent trials with the same probability of success.

**binomPdf(n,p,x)**

**n = **number of trials

**p = **probability of success

**x = **number of successes

32

Geometric Distribution

A probability distribution that models the number of trials needed to achieve the first success in a series of Bernoulli (trials with two possible outcomes) trials.

33

z-test

**Only used if standard deviation is known**

34

one-sample t-test

Only used if experiment has only **one independent variable** and one **group/condition**

35

t-test for independent means

Only used if there is **one independent variable** and **two groups/conditions**

36

Pooled variance

