Stats Exam #2 (Ch 5-8)

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/72

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

73 Terms

New cards

random variable

a numerical description of the outcome of an experiment

New cards

discrete random variable

may assume either a finite number of values or an infinite sequence of values

New cards

continuous random variable

may assume any numerical value in an interval or collection of intervals

New cards

probability distribution

for a random variable, describes how probabilities are distributed over the values of the random variable

can describe a discrete one w a table, graph, or formula

New cards

two types of discrete probability distributions

uses the rules of assigning probabilities to experimental outcomes to determine probabilities for each value of the random variable
uses a special math formula to compute the probabilities for each value of the random variable

New cards

probability function (discrete prob dist)

provides the probability for each value of the random variable

f(x) > or = 0
sum of f(x) = 1

New cards

several discrete probability distributions specified by formulas

discrete-uniform
binomial
poisson
hypergeometric
in addition to tables and graphs, a formula that gives the probability function, f(x), for every value of x is often used to describe the probability distributions

New cards

discrete uniform probability distribution

the simplest example of a discrete probability distribution given by a formula
f(x) = 1/n
n is the number of values the random variable may assume (the values of the random variable are equally likely)

New cards

expected value

mean of a random variable, is a measure of its central location
a weighted average of the values the random variable may assume. the weights are the probabilities
does not have to be a value the random variable can assume

New cards

variance

summarizes the variability in the values of the random variable

sigma squared
a weighted average of the squared deviations of a random variable from its mean
the weights are the probabilities

New cards

standard deviation

sigma
the positive square root of the variance

New cards

four properties of a binomial experiment

the experiment consists of a sequence of n identical trials
two outcomes, success and failure, are possible on each trial
the probability of a success, denoted by p, does not change from trial to trial (stationarity assumption)
the trials are independent

New cards

binomial probability distribution

our interest is in the # of successes occurring in n trials
we let x denote the # of successes occurring in n trials
=binom.dist

New cards

expected value (binomial dist)

E(x) = u = np

New cards

variance (binomial dist)

np (1-p)

New cards

poisson probability distribution

random variable is often useful in estimating the # of occurrences over a specified interval of time or space
it is a discrete random variable that may assume an infinite sequence of values (x = 0,1,2,….)
ex:
- # of knotholes in 14 linear feet of pine board
- # of vehicles arriving at a toll booth in one hour
specified interval of time or space. infinite sequence of values

New cards

two properties of a poisson experiment

the probability of an occurrence is the same for any two intervals of equal length
the occurrence or nonoccurrence in any interval is independent of the occurrence or nonoccurrence in any other interval

mean and variance are equal

New cards

poission probability function

since there’s no stated upper limit for the # of occurrences, the probability function f(x) is applicable for values x= 0,1,2… w out limit
in practical applications, x will eventually become large enough so that f(x) is approximately zero and the probability of any larger values of x becomes negligible
= poisson.dist

New cards

types of continuous probability distributions

uniform probability distribution
normal probability distribution
exponential probability distribution

<ul><li><p>uniform probability distribution</p></li><li><p>normal probability distribution</p></li><li><p>exponential probability distribution</p></li></ul><p></p>

New cards

continuous random variable

can assume any value in an interval on the real line or in a collection of intervals
it’s not possible to talk about the probability of the random variable assuming a particular value
instead, we talk about the probability of the random variable assuming a value w in a given interval

the probability of the random variable assuming a value w in some given interval from x1 to x2 is defined to be the area under the graph of the probability density function btw x1 and x2

<ul><li><p>can assume any value in an interval on the real line or in a collection of intervals</p></li><li><p>it’s not possible to talk about the probability of the random variable assuming a particular value</p></li><li><p>instead, we talk about the probability of the random variable assuming a value w in a given interval </p></li></ul><p></p><ul><li><p>the probability of the random variable assuming a value w in some given interval from x1 to x2 is defined to be the area under the graph of the probability density function btw x1 and x2</p></li></ul><p></p><p></p>

New cards

uniform probability distribution

a random variable is uniformly distributed whenever the probability is proportional to the interval’s length
probability density function:
- f(x) 1/(b-a)
- a is the smallest value the variable can assume
- b is the largest value the variable can assume

<ul><li><p>a random variable is uniformly distributed whenever the probability is proportional to the interval’s length</p></li><li><p>probability density function: </p><ul><li><p>f(x) 1/(b-a) </p></li><li><p>a is the smallest value the variable can assume</p></li><li><p>b is the largest value the variable can assume</p></li></ul></li></ul><p></p>

New cards

expected value & variance of x for uniform probability distribution

E(x) = (a+b)/2
Var(x) = (b-a)²/12

New cards

area as a measure of probability

the area under the graph of f(x) and probability are identical
this is valid for all continuous random variables
the probability that x takes on a value btw some lower value x1 and some higher value x2 can be found by computing the area under the graph of f(x) over the interval from x1 to x2

New cards

normal probability distribution

the most important distribution for describing a continuous random variable
widely used in statistical inference
ex:
- heights of ppl
- rainfall amounts
- test scores
- scientific measurements
doctrine of chances Abraham de Moivre

<ul><li><p>the most important distribution for describing a continuous random variable</p></li><li><p>widely used in statistical inference</p></li><li><p>ex:</p><ul><li><p>heights of ppl</p></li><li><p>rainfall amounts</p></li><li><p>test scores</p></li><li><p>scientific measurements</p></li></ul></li><li><p>doctrine of chances Abraham de Moivre</p></li></ul><p></p>

New cards

normal probability distribution characteristics

the distribution is symmetric; its skewness measure is zero
the entire family of normal probability distributions is defined by its mean u and its standard deviation sigma
the highest point on the normal curve is at the mean, which is also the median and mode
the mean can be any numerical value: negative, zero, or positive
the standard deviation determines the width of the curve: larger values result in wider, flatter curves
probabilities for the normal random variable are given by areas under the curve. the total area under the curve is 1 (0.5 to the left of the mean and 0.5 to the right)
empirical rule:
- 68.26% of values ±1
- 95.44% of values ±2
- 99.72% of values ±3

New cards

standard normal probability distribution characteristics

a random variable having a normal distribution w a mean of 0 and a st dev of 1 is said to have a standard normal probability distribution
the letter z is used to designate the st normal random variable
we can think of z as a measure of the # of st dev away from mean

New cards

z-score

standardized value
it denotes the number of st dev a data value xi from the mean
=standardize

New cards

excel for normal distributions

norm.s.dist is used to compute the cumulative probability given a z value
norm.s.inv is used to compute the z value given a cumulative probability
the s reminds us that they relate to the standard normal probability distribution
norm.dist is used to compute the cumulative probability given an x value
norm.inv is used to compute the x value given a cumulative probability

New cards

exponential probability distribution

useful in describing the time it takes to complete a task
ex:
- time btw vehicle arrivals at a toll booth
- time required to complete a questionnaire
- distance btw major defects in a highway
in waiting line applications, the exponential distribution is often used for service times
mean and st dev equal
skewed right w a measure of 2

<ul><li><p>useful in describing the time it takes to complete a task</p></li><li><p>ex:</p><ul><li><p>time btw vehicle arrivals at a toll booth </p></li><li><p>time required to complete a questionnaire</p></li><li><p>distance btw major defects in a highway</p></li></ul></li><li><p>in waiting line applications, the exponential distribution is often used for service times</p></li><li><p>mean and st dev equal</p></li><li><p>skewed right w a measure of 2</p></li></ul><p></p>

New cards

excel exponential probabilities

expon.dist function can be used to compute exponential probabilities
3 arguments:
- 1st : the value of the random variable x
- 2nd: 1/u (the inverse of the mean # of occurrences in an interval)
- 3rd: true or false (always true)

New cards

relationship btw the poisson and exponential distributions

the poisson distribution provides an appropriate description of the # of occurrences per interval
the exponential distribution provides an appropriate description of the length of the interval btw occurrences

<ul><li><p>the poisson distribution provides an appropriate description of the # of occurrences per interval</p></li><li><p>the exponential distribution provides an appropriate description of the length of the interval btw occurrences</p></li></ul><p></p>

New cards

discrete probability distributions summary

New cards

continuous probability distributions summary

New cards

element

the entity on which data are collected

New cards

population

a collection of all the elements of interest

New cards

sample

subset of the population

New cards

sampled population

the population from which the sample is drawn

New cards

frame

a list of the elements that the sample will be selected from

New cards

the reason we select a sample:

to collect data to answer a research question about a population
the sample results provide only estimates of the values of the population characteristics
the reason is simply that the sample contains only a portion of the population
with proper sampling methods, the sample results can provide “good” estimates of the population characteristics

New cards

finite populations

defined by lists such as:
- organization membership roster
- credit card account numbers
- inventory product numbers
a simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected

New cards

sampling with replacement

replacing each sampled element before selecting subsequent elements

New cards

sampling without replacement

the procedure used most often

New cards

in larger sampling projects…

computer-generated random numbers are often used to automate the sample selection process

New cards

sampling from an infinite population

sometimes we want to select a sample, but find it’s not possible to obtain a list of all elements in the population
as a result, we can’t construct a frame for the population
hence, we cannot use the random number selection procedure

New cards

on-going process

populations are often generated by an ongoing process where there is no upper limit on the # of units that can be generated
some examples of on-going processes, with infinite populations, are:
- parts being manufactured on a production line
- transactions occurring at a bank
- telephone calls arriving at a technical help desk
- customers entering a store

New cards

a random sample from an infinite population

is a sample selected such that the following conditions are satisfied
- each element selected comes from the population of interest
- each element is selected independently
in the case of an infinite population, we must select a random sample in order to make valid statistical inferences about the population from which the sample is taken

New cards

point estimation

a form of statistical inference
we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter
point estimator

New cards

target population

the population we want to make inferences about

New cards

sampled population

the population from which the sample is actually taken

whenever a sample is used to make inferences about a population, we should make sure that the targeted population and the sampled population are in close agreement

New cards

process of statistical inference

New cards

sampling distribution of x (sample mean)

the probability distribution of all possible values of the sample mean
expected value of sample mean = population mean
when the expected value of the point estimator equals the population parameter, we say the point estimator is unbiased

New cards

sampling distribution of sample mean

when the population has a normal distribution, the sampling distribution of x is normally distributed for any sample size
in most applications, the sampling distribution of x can be approximated by a normal distribution whenever the sample is size 30 or more
in cases where the population is highly skewed or outliers are present, samples of size 50 may be needed
the sampling distribution of x can be used to provide probability info about how close the sample mean x is to the population mean u

New cards

central limit theorem

when the population from which we are selecting a random sample does not have a normal distribution, this theorem is helping in identifying the shape of the sampling distribution of x
in selecting random samples of size n from a population, the sampling distribution of the sample mean x can be approximated by a normal distribution as the sample size becomes large

New cards

other sampling methods

stratified random sampling
cluster sampling
systematic sampling
convenience sampling
judgment sampling

New cards

stratified random sampling

the population is first divided into groups of elements called strata
each element in the population belongs to one and only one stratum
best results are obtained when the elements within each stratum are as much alike as possible (ex: a homogeneous group)
a simple random sample is taken from each stratum
formulas are available for combining the stratum sample results into one population parameter estimate
advantage: if strata are homogenous, this method is as “precise” as simple random sampling but w a smaller total sample size
ex: the basis for forming the strata might be department, location, age, industry type, and so on

New cards

cluster sampling

the population is first divided into separate groups of elements called clusters
ideally each cluster is a representative small-scale version of the population (ex: heterogeneous group)
a simple random sample of the clusters is taken
all elements within each sampled (chosen) cluster form the sample
ex: a primary application is area sampling, where clusters are city blocks or other well-defined areas
advantage: the close proximity of elements can be cost effective (ex: many sample observations can be obtained in a short time)
disadvantage: this method generally requires a larger total sample size than simple or stratified random sampling

New cards

systematic sampling

if a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population
we randomly select one of the first n/N elements from the population list
we then select every n/Nth element that follows in the population list
this method has the properties of a simple random sample, especially if the list of the population elements is a random ordering
advantage: the sample usually will be easier to identify than it would be if simple random sampling were used
example: selecting every 100th listing in a telephone book after the first randomly selected listing

New cards

convenience sampling

it is a nonprobability sampling technique. items are included in the sample w out known probabilities of being selected
the sample is identified primarily by convenience
ex: a professor conducting research might use student volunteers to constitute a sample
advantage: sample selection and data collection are relatively easy
disadvantage: it’s impossible to determine how representative of the population the sample is

New cards

judgment sampling

the person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population
it is a nonprobability sampling technique
ex: a reporter might sample 3 or 4 senators, judging them as reflecting the general opinion of the senate
advantage: it’s a relatively easy way of selecting a sample
disadvantage: the quality of the sample results depends on the judgment of the person selecting the sample

New cards

recommendation

it is recommended that probability sampling methods (simple random, stratified, cluster, or systematic) be used
for these methods, formulas are available for evaluating the “goodness” of the sample results in terms of the closeness of the results to the population parameters being estimated
an evaluation of the goodness cannot be made w non-probability (convenience or judgment) sampling methods

New cards

a point estimator cannot…

be expected to provide the exact value of the population parameter
an interval estimate can be computed by adding and subtracting a margin of error to the point estimate (point estimate ± margin or error)
the purpose of an interval estimate is to provide info point how close the point estimate is to the value of the parameter

New cards

the general form of an interval estimate of a population mean is

sample mean ± margin of error

New cards

interval estimate of a population mean: sigma known

in order to develop an interval estimate of a population mean, the margin of error must be computed using either:
- the population standard deviation sigma, or
- the sample standard deviation s
sigma is rarely known exactly, but often a good estimate can be obtained based on historical data or other info
we refer to such cases as the sigma known case
there is a 1-a probability that the value of a sample mean will provide a margin of error or less

<ul><li><p>in order to develop an interval estimate of a population mean, the margin of error must be computed using either:</p><ul><li><p>the population standard deviation sigma, or</p></li><li><p>the sample standard deviation s</p></li></ul></li><li><p>sigma is rarely known exactly, but often a good estimate can be obtained based on historical data or other info</p></li><li><p>we refer to such cases as the sigma known case</p></li><li><p>there is a 1-a probability that the value of a sample mean will provide a margin of error or less</p></li></ul><p></p>

New cards

meaning of confidence

New cards

adequate sample size when sigma is known

in most applications, a sample size of n=30 is adequate
if the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended
if the population is not normally distributed but is roughly symmetric, a sample size as small of 15 will suffice
if the population is believed to be at least approximately normal, a sample size of less than 15 can be used

New cards

interval estimate of a population mean: sigma unknown

if an estimate of the population standard deviation sigma cannot be developed prior to sampling, we use the sample standard deviation s to estimate sigma
this is the sigma unknown case
in this case, the interval estimate for u is based on the t distribution

New cards

t distribution

william gosset, writing under the name “student”, is the founder
- oxford grad in math and worked for Guinness
- developed it while working on small-scale materials and temperature experiments
the t distribution is a family of similar probability distributions
a specific t distribution depends on a parameter known as the degrees of freedom
a t distribution w more degrees of freedom has less dispersion
as df increases, the diff btw the t dist and the standard normal probability distribution becomes smaller and smaller
for more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value
the standard normal z values can be found in the infinite degrees row of the t distribution table

<ul><li><p>william gosset, writing under the name “student”, is the founder</p><ul><li><p>oxford grad in math and worked for Guinness</p></li><li><p>developed it while working on small-scale materials and temperature experiments</p></li></ul></li><li><p>the t distribution is a family of similar probability distributions</p></li><li><p>a specific t distribution depends on a parameter known as the degrees of freedom</p></li><li><p>a t distribution w more degrees of freedom has less dispersion</p></li><li><p>as df increases, the diff btw the t dist and the standard normal probability distribution becomes smaller and smaller</p></li><li><p>for more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value</p></li><li><p>the standard normal z values can be found in the infinite degrees row of the t distribution table</p></li></ul><p></p>

New cards

degrees of freedom

refer to the # of independent pieces of info that go into the computation of s

New cards

sigma unknown adequate sample size

usually a sample size of n=30 is adequate to develop an interval estimate of a population mean
if the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended
if the population is not normally dist but is roughly symmetric, a sample size as small as 15 will suffice
if the population is believed to be at least approximately normal, a sample size of less than 15 can be used

New cards

summary of interval estimation procedures for a population mean

New cards

sample size for an interval estimate of a population mean

let E = the desired margin of error
E is the amount added to and subtracted from the point estimate to obtain an interval estimate
if a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined

New cards

planning value

the Necessary Sample Size equation requires a value for the population st dev
if sigma is known, a preliminary or planning value for sigma can be used in the equation
- use the estimate of the population st dev computed in a previous study
- use a pilot study to select a preliminary study and use the sample st dev from the study
- use judgment or a best guess for the value of sigma

New cards