pain and suffering 😞 💔

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/98

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

99 Terms

New cards

Data

the facts & figures collected, analyzed, and summarized for presentation and interpretation.

New cards

Dataset

all the data collected for a particular analysis

New cards

Element

the entity on which data is collected

New cards

Variable

a characteristic of interest of an element

New cards

Observation

the variables associated with an individual element

New cards

Categorical

use numeric or ordinal values of measurement of categories

New cards

Quantitative

use numeric (quantitative) measures

New cards

Cross-sectional

data collected at a similar point in time

New cards

Time Series

data collected over several time periods

New cards

Panel

combination of cross-sectional and time series data

New cards

Descriptive

describe data or variables

New cards

Population

is the set of all data/variables of a statistical analysis

New cards

Sample

is a subset of the population

New cards

Statistical Inference

uses data from a sample to make estimates and test hypothesis about the characteristics of a population

New cards

Row 1 contains the __; column A contains the __; the rest of the worksheet contains the __

variables names; elements; data in the dataset

New cards

Descriptive Analytics

which describe what has happened in the past

New cards

Predictive Analytics

uses statistical models from past data to predict the future [forecasting] or access the impact of one variable on another [inference]

New cards

Prescriptive Analytics

uses models seeking to find a best (optimal) solution. Often these are some type of optimization model

New cards

Volume

the number of observations

New cards

Velocity

the speed at which data is collected

New cards

Variety

the forms of data are of different types

New cards

Veracity

the reliability of the data generated

New cards

Data Mining

focuses on extracting predictive information from big data

New cards

Frequency Distribution

a tabular summary of data showing the number (i.e. frequency) of observations in each of several non over-lapping categories

New cards

Relative Frequency

frequency of a class divided by n of a class (total)

New cards

Percent Frequency

relative frequency x 100

New cards

Bar Chart and Pie Chart

a visual display of frequency; relative frequency & percent frequency distributions

New cards

Histogram

A visual display of a frequency, relative frequency or percent frequency distribution, where the variable of interest is on the horizontal axis and the frequency, relative frequency or percent frequency is on the vertical axis

New cards

Cumulative Percent Frequency Distribution

Shows proportion/percentage of data items with values less than or equal to the upper limits of each class

New cards

Number of Classes

Between 5 and 20

Small datasets have less; larger datasets have more

New cards

Width of the Class

Generally, the same for each class

Approx. width = (largest value - smallest value) / # of classes

New cards

Class Limits

Each data observation must only belong to one class

New cards

Relative Frequency Distribution

Frequency of the class / n

New cards

Crosstabulation

a tabular summary of data for two variables (either categorical or quantitative)

suppose we have data from a sample of 300 restaurants on overall quality and the meal price. (This allows us to see if there is a pattern the two variables)

New cards

Scatter Diagram & Trendline

a scatter diagram is a graphical display of the relationship between two quantitative variables

a trendline provides an approximation (i.e. an estimate) of the relationship; which can be positive, negative or none

New cards

Side-by-Side & Stacked Bar Charts

These are extensions of a basic bar chart as they are used to display and compare two variables.

A side-by-side bar chart depicts multiple bar charts on the same display

A stacked bar chart has one bar broken into segments of a different color showing the relative frequency of each class

New cards

Mode

is the value that occurs with the greatest frequency. If there are two values that are most frequent the variable is bi-modal; if there are more then it’s multi-modal

New cards

Geometric Mean

A measure of location by finding the n’th root of the product of n values

New cards

Percentile

provides information about how the data is spread over the interval from the smallest to the largest value

New cards

Quartiles

represent how the data is spread over four parts, each containing approximately 25% of the observations

New cards

Range

largest value - smallest value

a measure of variability or dispersion of the data

New cards

Interquartile Range

Q3 - Q1, is the range of the middle 50% of the data

a measure of variability or dispersion in the data

New cards

Variance

measures variability using all the data, since it is based on the difference between the value of xi and the mean

The difference is called deviation about the mean

For a sample, the deviation is xi - 𝑥^-

For a population, a deviation is xi - 𝜇

New cards

Distribution Shape

is measured by skewness

if the shape of the data is skewed to the left, the skewness is negative (mean < median)

if to the right then skewness is positive (mean > median)

if the data is symmetric, then skewness is zero (mean = median)

New cards

Coefficient of Variation

This is a measure of how large the standard deviation is relative to the mean

New cards

Z-Score

measures the relative location of values in the dataset, helps determine how far a particular value is from the mean

yields a standardized value and is the # of standard deviation from the mean

a measure of the relative location of the observation in the dataset

uses mean and std. deviation in calc.

New cards

Chebyshev’s Theorem

allows us to make statements about the population of the data values that must be within a specified # of standard deviation from the mean

If z = 2, 75% of data must be within 2 std. dev. of the mean

If z = 3, 89% of data must be within 3 std. dev. of the mean

If z = 4, 94% of the data must be within 4 std. dev. of the mean

New cards

If data is bell shaped around the mean, we know:

Approx. 68% of the data is within one s of sample mean (x^-)

Approx. 95% of the data is within two s of sample mean (x^-)

Approx. 99.7% of the data is within three s of sample mean (x^-)

New cards

Detecing Outliers

extreme values relative to the rest of the data

z-score can help identify outliers, any z-score greater than |3| is an outlier

Interquartile Range can also help

New cards

Covariance

is a descriptive measure of the linear association between two variables

Sxy = sample covariance,
if Sxy > 0, then there is positive linear association between x and y
if Sxy < 0, then there is negative linear association between x and y

New cards

Sample Correlation Coefficient

ranges from -1 to +1

If 1, then all data is on a positively sloped line

-1 = data would be on a negatively sloped line

As the data moves from the slope of the line, the correlation coefficient moves closer to 0

New cards

Probability

a numerical measure of the likelihood of an event occurring

a probability ranges from 0 to 1

New cards

Experiment

a process generating well-defined outcomes

ex: rolling a 6-sided die results in six possible outcomes: S = {1,2,3,4,5,6}

New cards

Combinations

A counting rule allowing one to count the # of experimental outcomes when selecting n objects from a set of N objects

New cards

Permutations

A counting rule computing the # of experimental outcomes when n objects are to be selected from a set of N objects where the order is important

New cards

Requirements of Assigning Probabilities

The probability assigned to each outcome must be between 0 and 1
The sum of the probabilities for all outcomes must be equal to 1

New cards

Classical Method

coin toss, or a roll of a 6-sided die

outcomes are divided by total possibilities

New cards

Relative Frequency Method

used when data are available to estimate the proportion of time the experimental outcomes will occur if the experiment is repeated a large # of times

New cards

Subjective Method

used when outcomes are not equally likely and data is unavailable

New cards

Probability of an Event

the probability of an event is equal to the sum of the probabilities of the sample points in the event

P(C) = P(2,6) + P(2.7) + P(3.6)
P(C) = 0.15 + 0.15 + 0.10 = 0.35

P(S) = P(2,6) + P(2,7)
P(S) = 0.15 + 0.15 = 0.30

New cards

Union of Two Events

the event containing all sample points belonging to Event A, Event B or both

denoted by A u B (whole bubble diagram)

New cards

Intersection of Two Events

the event containing the sample points belonging to both A and B

denoted by A n B (only the middle of the bubble diagram)

New cards

Addition Law

useful when we want to know the probability that at least one of two events occur

P(A u B) = P(A) + P(B) - P(A n B)

New cards

Mutually Exclusive Events

occur when two events have no sample points in common

New cards

Conditional Probability

probabilities are often influenced by whether a related event already occurred.

support A occurs with P(A). if event B already occurred, this new info will result in a new probability for A, and called the conditional probability: P(A|B)

New cards

Joint Probability

the probability of the intersection of two events

New cards

Random Variable

a numeric description of the outcome of an experiment and is either discrete or continuous

New cards

Bivariate Probability Distribution

two random variables

New cards

Marginal Probabilities

the sum of the joint probabilities (by row and column)

New cards

Independent Events

Event A and Event B are independent if: P(A|B) = P(A)

or P(B|A) = P(B)

New cards

Multiplication Law

used to compute the probability of the intersection of two events

New cards

Discrete Random Variable

a finite number of values or an infinite number of values such as 0, 1 ,2…

example are a toss of a coin, the # of customers who place an order, or the product chosen by a customer from two options

New cards

Continuous Random Variables

any numerical value in an interval or collection of intervals

example are the time a customer visits a webpage, ounces in a soft drink, the value of a stock in one year

New cards

Variance

measures the variability or dispersion of the random variable

New cards

Standard Deviation

the positive square root of the variance

New cards

Bivariate Probability Distribution

involves two random variables, such as rolling a die two times or recording the percentage change for a stock fund and a bond fund over a year

often the analyst is interested in the relationship between the two random variables, look at covariance and correlation coefficient as measures of the linear association between the two

New cards

Binomial Probability Distribution

is based on four properties:

the experiment consists of sequence of n identical trials

two outcomes are possible on each trial; success or failure

the probability of success (p) and the probability of failure (1-p) does not change from trial to trial

the trials are independent

New cards

Using Excel to Compute Binomial Probabilities

Enter formula, Binom.Dist

needs a value for x, n, and p

mark either true (cumulative probability) or false (probability)

ex. =Binom.Dist(B5,$D$1,$D$2,FALSE)

New cards

Poisson Probability Distribution

this distribution relates to the case for estimating the # of occurrences over a specified interval of space/time

New cards

Using Excel to Compute Poisson Probabilities

Enter formula, Poisson.Dist

need a value of x, the value of μ (mean), and TRUE (cumulative probability) or FALSE (probability)

ex. =Poisson.Dist(A4,$D$1,FALSE)

New cards

Hypergeometric Probability Distribution

similar to the binomial distribution, except the trials are not independent and the probability of success changes from trial to trial

r is the # of success in population N, and N-r is the # of failures

New cards

Using Excel to Compute Hypergeometric Probability

Enter Hypergeom.Dist

needs value for x, μ (mean), r, and a value of N, and either TRUE or FALSE

=Hypergeom.Dist(1,3,5,12,TRUE)

New cards

Continuous Random Variable

computed differently than a discrete random variable

for discrete, we compute the probability at a specific value of x

for continuous random variables, we compute the probability that the random variable assumes any value in an interval

computing the area under the probability density function, f(x)

New cards

Difference Between Discrete and Continuous Random Variables

discrete random variables are computed where the random variable takes on specific value; continuous random variables are computed where the random variables is within an interval

the probability of a continuous random variable within some given interval is defined to be the area under the graph of the probability density function

(a single point is an interval of 0, so the probability of a single value in the continuous case is 0)

New cards

Using Excel to Compute Exponential Probabilities

Enter Expon.Dist

needs x, a value for 1/μ, and TRUE or FALSE

=Expon.Dist(18,1/15,TRUE);

=Expon.Dist(18,1/15,TRUE)-Expon.Dist(6,1/15,TRUE);

=1 - Expon.Dist(8,1/15,TRUE)is zero

New cards

Normal Probability Distribution

most used probability distribution for continuous random variables

it provides a description of likely results obtained through sampling

bell curve

New cards

Characteristics of the Normal Distribution

only two parameters: μ and σ

highest point is the mean, which is also the median and the mode

the mean can take on any numerical value

the normal distribution is symmetric; skewness = 0

the std. dev. (σ) determines how flat or wide the curve is (larger σ = wider/flatter curves)

probabilities for a normal random variable are given by the are under the normal curve (total area under the curve = 1)

68.3% = 1 from μ, 95.4% = 2 from μ, 99.7% = 3 from μ

New cards

Using Excel to Compute Normal Probabilities

Enter Norm.Dist

find value for x, μ (mean), and standard deviation, and TRUE/FALSE

lower tail: =Norm.Dist(20000,36500,5000,TRUE)

interval: =Norm.Dist(40000,365000,5000,TRUE) - Norm.Dist(20000,36500,5000,TRUE)

upper tail: =1 - Norm.Dist(40000,36500,5000,TRUE)

New cards

Using Excel to Compute Normal Probabilities (but for value of x)

x value with 0.10 in lower tail: =Norm.Inv(0.1,36500,5000)

x value with 0.025 in upper tail: =Norm.Inv(0.975,36500,5000)

New cards

Standard Normal Probability Distribution

is where the μ is 0 and the std. dev. is 1

New cards

Using Excel to Compute Standard Normal Probabilities and Z-Values

Enter Norm.S.Dist

find value of z and TRUE or FALSE (WE USE TRUE)

P(z) <= V; V=1: =Norm.S.Dist(1,TRUE)

P(z) V1 <= z <= V2; if V1 = -0.5 and V2 = 1.25: =Norm.S.Dist(1.25,TRUE) - Norm.S.Dist(-0.5,TRUE)

P(z) >= V; if V = 1.58: =1-Norm.S.Dist(1.58,TRUE)

New cards

Using Excel to Compute Standard Normal Probabilities and Z-Values (but for z-values)

z-value with 0.025 in lower tail: =Norm.S.Inv(0.025)

z-value with 0.025 in upper tail: =Norm.S.Inv(0.975)

New cards

Using Excel to Calculate E(x) or μ, σ^2 & σ

Mean: =sumproduct(A:A,B:B)

Squared Deviation from Mean: =(A2 - F$2)²

Variance: =sumproduct(C:C,B:B)

Standard Deviation: =sqrt(B11)

New cards

Using Excel to Calculate the Sample Covariance and Sample Correlation Coefficient

Enter =covariance.s(
and select the cells

Enter =correl(
and select the cells

New cards

Using Excel to Compute the Geometric Mean

=geomean(
select the cells

New cards

Using Excel to Compute Percentiles & Quartiles

Enter =Percentile.Exc(
select the cells

Enter =Quartile.Exc(
select the cells

New cards

Using Excel to Calculate the Sample Variance and Sample Standard Deviation

Enter =var.s(
select the cells

Enter =stdev.s(
select the cells

New cards

Using Excel’s Descriptive Statistics Tool

Apply Tools:

click on Data in the Ribbon

click on Data Analysis

choose Descriptive Statistics

New cards

Using Excel’s Recommended Chart Tools to Construct a Histogram (to show a class with no data)

right click any cell in the row labels column

click field settings

click Layout and Print

choose show items with no data; click OK

Explore top notes

Unit 1: Period 1: 1491-1607

Updated 752d ago

Note

AP Physics 1: Ultimate Guide (copy)

Updated 719d ago

Note

Unit 4: Systems of Particles and Linear Momentum

Updated 752d ago

Note

Rights and protest (IB)

Updated 77d ago

Note

kasipagan

Updated 104d ago

Note

Chemical Formulas and Nomenclature

Note

Note

Note

Explore top flashcards

Spanish III: Autentico 3 Chapter 3 Vocab

Updated 728d ago

Flashcards (74)

2M: Hematologic disorder(TERMS)

Flashcards (30)

Flashcards (49)

Flashcards (52)

HOSA EPIDEMIOLOGY EVERYTHING

Flashcards (859)

Flashcards (24)

Flashcards (35)

Unit 4: Political Patterns and Processes

Updated 438d ago

Flashcards (65)