Statistics

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/71

There's no tags or description

Looks like no tags are added yet.

Last updated 10:43 PM on 2/7/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

72 Terms

New cards

Probability versus Inference

Probability models are the basis for understanding random phenomena, for which we repeat an experiment, but obtain a (slightly) different result.

Statistical inference is the science of drawing conclusions from experiments when the results could have come out differently if we repeated the experiment.

New cards

population

the entire collection of objects or outcomes about which we are seeking information

New cards

sample

a subset of a population, containing the objects or outcomes that are actually observed

New cards

simple random sample (SRS)

a sample of size n is a sample chosen by a method where each collection of n population items is equally likely to comprise the sample

New cards

conceptual population

a population that consists of all the values that might possibly have been observed

New cards

independence in a sample

The items in a sample are independent if knowing the values of some of the items does not help to predict the values of the others.

Items in a simple random sample may be treated as independent in many cases encountered in practice. The exception occurs when the population is finite and the sample comprises a substantial fraction (more than 5%) of the population.

New cards

one-sample experiment

an experiment where there is only one population of interest, and a single sample is drawn from it

New cards

multisample experiment

an experiment where there are two or more populations of interest, and a sample is drawn from each population

New cards

factorial experiment

a multisample experiment where the populations are distinguished from each other by the varying of one or more factors that may affect the outcome

New cards

numerical/quantitative data

data with a numerical quantity designating how much or how many is assigned to each item in a sample

New cards

categorical/quantitative data

data where sample items are placed into categories, and category names are assigned to sample items

New cards

controlled experiment

an experiment where the values of the factors are under the control of the experimenter in order to produce reliable information about cause-and-effect relationships between factors and response

New cards

observational study

an experiment where the experimenter simply observes the levels of the factor as they are, without having any control over them

New cards

sample mean

the arithmetic mean or average of the observations in the sample (the sum of the numbers in the sample divided by how many values there are); a measure of where the “center” of the data set is

New cards

deviation

the distance of a data value from the sample mean

New cards

sample variance

an adjusted average of the squared deviations

New cards

sample standard deviation

the (positive) square root of the sample variance

New cards

outlier

a data point that is either much larger or smaller than the rest of the data (because of measurement error, data entry error, or just because it’s different); in general values more than 1.5 x IQR from the closer of Q1 and Q3

New cards

sample median

the middle number in a data set

New cards

quartile

the quartiles divide the data set into quarters

first quartile, Q1: the data value in position (0.25)(n + 1)
third quartile, Q3: the data value in position (0.75)(n + 1)

New cards

interquartile range (IQR)

the measure of variability that is associated with the quartiles

IQR = Q3 - Q1

New cards

robust statistic

a statistic where removing the outliers does not change its value very much

New cards

stem-and-leaf plot

Select one or more leading digits for the stem values. The trailing digits become the leaves.
List possible stem values in a vertical column.
Record the leaf for every observation beside the corresponding stem value.
Indicate the units for the stems and leaves someplace in the display.

New cards

histogram

Divide the variable into discrete regions that partition the possible values of the variable, called classes, and construct class intervals of equal width.
Determine the frequency and relative frequency for each class.
Mark the class boundaries on the horizontal axis.
Above each class interval, draw a rectangle whose height is the frequency or relative frequency.

New cards

relative frequency

the relative frequency of a value is the proportion of units that have that value

New cards

When examining the distribution of data, what four aspects must we describe?

Shape
1. Modes: unimodal, bimodal, or multimodal?
2. Symmetry: symmetric, right-skewed, or left-skewed?
Center (e.g., mean or median)
Variability (e.g., standard deviation or IQR)
Outliers (unusual observations)

New cards

right-skewed (positively skewed)

lower values of the variable are more common with fewer and fewer observations having larger values of the variable; the right “tail” of the distribution is longer than the left tail

<p>lower values of the variable are more common with fewer and fewer observations having larger values of the variable; the right “tail” of the distribution is longer than the left tail</p>

New cards

left-skewed

higher values of the variable are more common with fewer and fewer observations having smaller values of the variable; the left “tail” of the distribution is longer than the right tail

<p>higher values of the variable are more common with fewer and fewer observations having smaller values of the variable; the left “tail” of the distribution is longer than the right tail</p>

New cards

five-number summary

minimum
first quartile
median
third quartile
maximum

New cards

boxplot

graphical display of the five-number summary:

minimum
first quartile
median
third quartile
maximum

New cards

extreme outlier

in general, an outlier more than 3 x IQR from the closer of Q1 and Q3

New cards

How do we choose which values to use to describe the center and variability of a distribution?

unimodal and roughly symmetric: use mean and standard deviation
skewed and/or has outliers: use median and IQR (or the five-number summary)

New cards

experiment

a process that results in an outcome that cannot be predicted in advanced with certainty

New cards

sample space, S

the set of all possible outcomes of an experiment

New cards

event, E

a subset (collection) of outcomes from the sample space

New cards

simple event

an event with exactly one outcome

New cards

compound event

an event with more than one outcome

New cards

null event

an event with no outcomes

New cards

complement of an event

denoted A^C (for an event A), is the event that consists of all outcomes of S that are not in A

New cards

union of events

denoted A ∪ B, is the event consisting of all outcomes that are in at least one of the events A or B

New cards

intersection of events

denoted A ∩ B, is the event consisting of all outcomes that are in both of the events A and B

New cards

mutually exclusive events

events that never occur together: A ∩ B = ∅

New cards

exhaustive events

events where their union is the entire sample space: A ∪ B = S

New cards

probability

the probability of an event A, written P(A), is the proportion of times that event A would occur in the long run if the experiment were repeated over and over again under the same experimental conditions

New cards

probability model P( )

a function that satisfies the following axioms:

P(S) = 1
For any event A, P(A) ≥ 0
If events A₁, A₂, … are mutually exclusive, then P(A₁ ∪ A₂ ∪ …) = P(A₁) + P(A₂) + …

New cards

complement rule

for any event A, P(A^C) = 1 - P(A)

New cards

general addition rule

for two events A and B, P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

(if A and B are mutually exclusive, then P(A ∪ B) = P(A) + P(B))

New cards

the conditional probability of event A given that event B has occurred

P(A|B) = P(A ∩ B) / P(B)

(this provides another method for computing P(A ∩ B), which is the general multiplication rule: P(A ∩ B) = P(B)*P(A|B)

New cards

independent events

two events A and B are independent if

P(A|B) = P(A)
P(B|A) = P(B)
P(A ∩ B) = P(A)P(B)

(if one is true, they are all true)

New cards

mutually independent events

events A₁, A₂, …, A_n are mutually independent if the probability of each remains the same no matter which of the others occur

New cards

law of total probability

if A₁, A₂, …, A_n are mutually exclusive and exhaustive events, and B is any event, then P(B) = P(A₁ ∩ B) + … + P(A_n ∩ B)

equivalently, P(B) = P(A₁)P(A₁|B) + … + P(A_n)P(A_n|B)

New cards

prior probability

the prior probability P(A_i) is the probability of A_i before knowing that B occurred

New cards

posterior probability

the posterior probability P(A_i|B) is the probability of A_i after knowing that B occurred

New cards

Bayes’ Rule

the posterior probability of A_k given that event B has occurred

<p>the posterior probability of A<sub>k</sub> given that event B has occurred</p>

New cards

random variable

A random variable (RV) assigns a numerical value to each outcome in a sample space. It is customary to denote random variables with uppercase letters

New cards

discrete random variable

A random variable is discrete if its set of possible values is a finite or countably infinite set of individual points. This means that if the possible values are arranged in order, there is a gap between each value and the next one.

New cards

distribution of a random variable

A distribution of a random variable describes the values that the variable can take on and the probabilities that it takes on those values.

New cards

probability mass function

The probability mass function (pmf) of a discrete random variable 𝑋 specifies the probability that a random variable 𝑋 takes on value 𝑥:

𝑝(𝑥) = 𝑃(𝑋 = 𝑥)

for any real x, p(x) = 0
the sum of p(x) for all x = 1

New cards

cumulative distribution function

The cumulative distribution function (cdf) of a random variable 𝑋 specifies the probability that a random variable 𝑋 takes on a value that is less than or equal to 𝑥:

𝐹(𝑥)=𝑃(𝑋 ≤ 𝑥).

Note: For a discrete random variable, the graph of 𝐹(𝑥) consists of a series of horizontal lines (called “steps”) with jumps at each of the possible values of 𝑋. The size of the jump at any point 𝑥 is equal to the value of the probability mass function 𝑝(𝑥) at that point 𝑥𝑥.

<p>The cumulative distribution function (cdf) of a random variable 𝑋 specifies the probability that a random variable 𝑋 takes on a value that is less than or equal to 𝑥:</p><p> 𝐹(𝑥)=𝑃(𝑋 ≤ 𝑥).</p><p><strong>Note:</strong> For a discrete random variable, the graph of 𝐹(𝑥) consists of a series of horizontal lines (called “steps”) with jumps at each of the possible values of 𝑋. The size of the jump at any point 𝑥 is equal to the value of the probability mass function 𝑝(𝑥) at that point 𝑥𝑥.</p>

New cards

mean of discrete random variable

the sum of x*p(x) for all x

The mean of X is sometimes called the expectation or the expected value of X.

New cards

variance of a discrete random variable

New cards

standard deviation

The standard deviation is the (positive) square root of the variance:

New cards

continuous random variable

A random variable is continuous if its set of possible values is an interval (with finite or infinite endpoints)

For any continuous random variable X and any number c, 𝑃(𝑋 = 𝑐) = 0

New cards

probability density function

Distribution of a continuous random variable is described by the probability density function (pdf), denoted by 𝒇(𝒙), where probabilities for a continuous random variable are the areas under the pdf

the integral from negative infinity to infinity of a pdf is 1
f(x) >= 0 for all x

<p>Distribution of a continuous random variable is described by the probability density function (pdf), denoted by 𝒇(𝒙), where probabilities for a continuous random variable are the areas under the pdf</p><ul><li><p>the integral from negative infinity to infinity of a pdf is 1</p></li><li><p><em>f</em>(<em>x</em>) >= 0 for all x</p></li></ul><p></p>

New cards

uniform distribution

New cards

mean of a continuous random variable

New cards

variance of a continuous random variable

New cards

median of a continuous random variable

New cards

100p^th percentile

if 𝑝 is a number between 0 and 1, the 100pth percentile is the point 𝑥_100p that solves the equation:

New cards

jointly distributed random variables

When two or more random variables are associated with each item in a population, the random variables are said to be jointly distributed.

If all the random variables are discrete, they are said to be jointly discrete. If all the random variables are continuous, they are said to be jointly continuous.

New cards

joint probability mass function

If X and Y are jointly discrete random variables, the joint probability mass function (joint pmf) is: