Looks like no one added any tags here yet for you.
A _______ is a characteristic that changes or varies over time and/or for different individuals or objects under consideration
variable
An ___________________ is the individual or object on which a variable is measured
experimental unit
A single ______ or data value results when a variable is actually measured on an experimental unit
measurement
A _______________ is the set of all measurements of interest to the investigator
population
A ______ is a subset of measurements selected from the population of interest
sample
_______________ data results when a single variable is measured on a single experimental unit
univariate
____________ results when two variables are measured on a single experimental unit
bivariate
________ results when more than two variables are measured
multivariate
___________ variables measure a numerical quantity or amount on each experimental unit
quantitative
A _____________ variable can assume infinitely many values corresponding to the points on a line interval. There are no gaps
continuous
When constructing a graph, we need to first construct a _________________ and then use it to create a graph called a ______________
statistical table, data distribution
The sum of the relative frequencies is ______
1
A ____________ is the familiar circular graph that shows how the measurements are distributed among the categories
pie chart
A _______________ shows the same distribution of measurements among the categories, with the height of the bar measuring how often a particular category was observed
bar chart
For a pie chart, the angle of the sector for a category = ____________ * 360
relative frequency
Pie charts and bar charts are _____________ to qualitative data
not exclusive
A variable can take on as many values as the numbers in an interval is called _____________ variable
continuous
Time series data are most effectively presented on a ___________ with time as the horizontal axis. The idea is to try to find a pattern or __________ that will likely continue into the future
line chart, trend
For a histogram, a ______ is a subinterval created when you divide up the interval from the smallest to the largest measurement
class
The ________ is the difference between the upper and lower class boundaries
width
The class _____________ is the number of measurements falling into that particular class
frequencies
Histogram Steps
1. Choose the number of classes, usually between 5 and ______. The more data you have, the more ______ you should use
12, classes
2.Find the approximate class _______ by dividing the difference between the largest and smallest values by the number of class
width
3. Round the approximate class width up to a convenient number
4. If the data is discrete, you might assign one class for each integer value. For a large number of integer values, you may need to group them into classes
5.List the class boundaries. The _________ class must include the smallest measurement. Then add the remaining classes, including the left boundary point but not the right.
lowest
6.Build a statistical table containing the classes, their ___, and their relative frequencies.
7. Draw the histogram like a bar graph, with the class intervals on the horizontal axis and relative frequencies as the bar height
frequency
A distribution is ___________ if the left and right sides of the distribution, when divided at the middle value, form mirror images
symmetrical
A distribution is _____________________ if a greater proportion of the measurements lie to the right of the peak value
skewed to the right
A distribution is ___________ if it has one peak
unimodal
The are three types of measures of variability: ____________, ___________, and ___________
range, variance, standard deviation
The _____________ of a set of n measurements is defined as the difference between the largest and smallest measurements
range
The variance of a population of N measurements is the average of the squares of the ____________ of the measurements about their mean μ
deviations
The variance of a sample of n measurements is the sum of the ______ of the measurements about their mean _______ divided by _________
squares, |x (x bar), n-1
The measures of variability can be negative. This statement is ________
false
If the measure of variability is equal to zero, all the data should have ____________
the same value
The range and standard deviation have the same _________ as the original data
unit
By Tchebysheff's Theorem, given a number k greater than or equal to 1 and a set of n measurements, at least ________ of the measurements will lie within k _________ of their mean
1-(1/k)^2, standard deviation
Suppose μ is the population mean and σ is the standard deviation. Answer the following questions using Tchebysheff's Theorem:
a. At least none of the measurements lie in the interval μ__σ to μ__σ
b. At least 3/4 of the measurements lie in the interval μ__σ to μ__σ
c. At least 8/9 of the measurements lie in the interval μ__σ to μ__σ
a. -1, +1
b. -2 , +2
c. -3 ,+3
If the data is ____________, we have
The Empirical rules
a. The interval ( μ+/-σ ) contains approximately ______% of the measurements
b. The interval (μ+/-2σ) contains approximately _____% of the measurements
c. The interval (μ+/-3σ) contains approximately _________ of the measurements
mound shaped
a. 68
b. 95
c. 99.7
The empirical rule requires the distribution to be ____________. Tchebysheff's theorem does not require anything
Unimodal
Measure of center is a measure along the ____________ that locates the _____ of the distribution
horizontal axis, center
There are three different measures: __________, __________, ____________
mean, median, mode
Arithmetic mean is the sum of data points of interest divided by ___________. For population, we use notation ________. For sample, we use notation _________
total number of data points, mew (μ), |x (x bar)
The ___________ m of a set of n measurements is the value of x that falls in the middle position when the measurements are ordered from _______ to __________
median, largest, smallest
Mean and median ____________ coincide with each other. We can use them to infer the shape of the distribution
do not always
When the distribution is ________, mean and median are the same
symmetric
When the distribution is skewed to the right, mean is ___________ than the median
larger
when the distribution is skewed to the left, mean is __________ than the median
smaller
The _______________ is the category that occurs most frequently, or the most frequently occurring value of x
mode
Mode is generally used to describe a ________ dataset
large
Mean and median can be used for both ________ and _______ datasets
large, small
It is _____________ to have more than one mode in the dataset
possible
Do we want more or less variability in the data in the following examples?
a. the lifetime of machines produced by a company
b. The SAT score
a. less
b. more
Measures of __________ can help you create a mental picture of the spread of the data
variability
The lower quartile (first quartile) Q1, is the value of x that is greater than ______ of the measurements and is less than the remaining _________
25%, 75%
The second quartile is the ________
median
The upper quartile (third quartile) Q3, is the value of x that is greater than ______ of the measurements and is less than the remaining _________
75%, 25%
The interquartile range for a set of measurements is the difference between the ___________ and ______________
third quartile, first quartile
We can use five numbers to summarize the data: _____________, _______, _________, ________, and __________
minimum, Q1, median, Q3, maximum
Box-plot can be used to detect ________
outliers
An ___________ is the process by which an observation (or measurement) is obtained
experiment
A ___________ is the outcome observed on a single repetition of an experiment
simple event
Experiment: Toss a die and observe the number on the upper face. List the simple events in the experiment:
1, 2, 3, 4, 5, 6
An _________ is a collection of simple events.
Event
Two events are _________________ if, when one event occurs, the other cannot, and vice versa
mutually exclusive
Simple events are all mutually exclusive (true/false)
true
The set of all simple events is called the __________
sample space
Some experiments can be generated in stages, and the sample space can be displayed in a ______________
tree diagram
If you repeat the experiment more and more times, n becomes larger and larger, eventually, you generate the entire population. In this population, the _________________ of the event A is defined as the probability of event A
relative frequency
Each probability must lie between ____ and ____
0, 1
The sum of the probabilities for all _____________ in S, the sample space equals 1
simple events
The probability of an event A is equal to the sum of the probabilities of the _______________ contained in A
simple events
How to calculate the probability of an event
1. List all the ___________ in the sample space
simple events
How to calculate the probability of an event
2. Assign an appropriate ________ to each simple event
probability
How to calculate the probability of an event
3. Determine which simple events result in the __________ of interest
event
How to calculate the probability of an event
4. ____________ the probabilities of the simple events that result in the event of interest
sum
What are the three rules for counting the number of simple events?
1. The ________ rule
mn
What are the three rules for counting the number of simple events?
2. A counting rule for ____
permutations
What are the three rules for counting the number of simple events?
3. A counting rule for ______________
combinations
Z-score is a measurement of _______________
relative standing
Z-score measures the distance between a particular observation x and the ________, measured in units of ____________. Its formula is z=measurement-mean/standard deviation
mean, standard deviation
A percentile is another measure of relative standing, most often used for __________ data sets
large
The p-th percentile is the value of x that is greater than __________% of the measurements and is less than the remaining ________%
p, 100-p
When the ordering or arrangement of the objects is important, you can use a counting rule for ________
permutations
Sometimes the ordering or arrangement of the objects is not important, but only the objects that are chosen. In this case, you can use a counting rule for ____________
combinations
The ________ of events A and B, denoted by A ∪ B, is the event that either A or B both occur
union
The ___________ of events A and B, denoted by A ∩ B, is the event that both A and B occur
intersection
The _________ of an event A, denoted by A^c, is the event that A does not occur
complement
Simple events are mutually exclusive (true/false)
true
Event A and its complement are mutually exclusive no matter what A is (true/false)
true
Are mutually exclusive events independent (yes/no)
no
Are two independent events mutually exclusive (yes/no)
no
A _____________ (type ___ error) is the even t that the test is positive for a given condition, given that the person does not have the condition
false positive, I
A ______________ (type _____ error) is the event that the test is negative for a given condition, given that the person has the condition
false negative, II
A variable X is a __________________ if the value that it assumes, corresponding to the outcome of an experiment, is a chance or random event
random variable
Quantitative variables are classified as either ___________ or ______________, according to the values that X can assume
discrete, continuous
We defined probability as the limiting value of the _______________________ as the experiment is repeated over and over again
relative frequency
Now we define the probability distribution for a random variable X as the ____________________ distribution constructed for the entire population of measurements
relative frequency
The ________________ for a discrete random variable is a formula, table, or graph that gives all the possible values of X, and the probability p(x)=P(X=x) associated with each value x
probability distribution
Requirements for a Discrete Probability Distribution
A. __________ </= p(x) </= _________
B. Sum of x p(x) = _______
A. 0, 1
B. 1
Comparative relative frequency distribution and probability distribution: the difference is that the relative frequency distribution describes a ________ of n measurements, while the probability distribution is constructed as a model for the entire __________ of measurements
sample, population