pain and suffering ๐Ÿ˜ž ๐Ÿ’”

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/98

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

99 Terms

1
New cards

Data

the facts & figures collected, analyzed, and summarized for presentation and interpretation.

2
New cards

Dataset

all the data collected for a particular analysis

3
New cards

Element

the entity on which data is collected

4
New cards

Variable

a characteristic of interest of an element

5
New cards

Observation

the variables associated with an individual element

6
New cards

Categorical

use numeric or ordinal values of measurement of categories

7
New cards

Quantitative

use numeric (quantitative) measures

8
New cards

Cross-sectional

data collected at a similar point in time

9
New cards

Time Series

data collected over several time periods

10
New cards

Panel

combination of cross-sectional and time series data

11
New cards

Descriptive

describe data or variables

12
New cards

Population

is the set of all data/variables of a statistical analysis

13
New cards

Sample

is a subset of the population

14
New cards

Statistical Inference

uses data from a sample to make estimates and test hypothesis about the characteristics of a population

15
New cards

Row 1 contains the __; column A contains the __; the rest of the worksheet contains the __

variables names; elements; data in the dataset

16
New cards

Descriptive Analytics

which describe what has happened in the past

17
New cards

Predictive Analytics

uses statistical models from past data to predict the future [forecasting] or access the impact of one variable on another [inference]

18
New cards

Prescriptive Analytics

uses models seeking to find a best (optimal) solution. Often these are some type of optimization model

19
New cards

Volume

the number of observations

20
New cards

Velocity

the speed at which data is collected

21
New cards

Variety

the forms of data are of different types

22
New cards

Veracity

the reliability of the data generated

23
New cards

Data Mining

focuses on extracting predictive information from big data

24
New cards

Frequency Distribution

a tabular summary of data showing the number (i.e. frequency) of observations in each of several non over-lapping categories

25
New cards

Relative Frequency

frequency of a class divided by n of a class (total)

26
New cards

Percent Frequency

relative frequency x 100

27
New cards

Bar Chart and Pie Chart

a visual display of frequency; relative frequency & percent frequency distributions

28
New cards

Histogram

A visual display of a frequency, relative frequency or percent frequency distribution, where the variable of interest is on the horizontal axis and the frequency, relative frequency or percent frequency is on the vertical axis

29
New cards

Cumulative Percent Frequency Distribution

Shows proportion/percentage of data items with values less than or equal to the upper limits of each class

30
New cards

Number of Classes

Between 5 and 20

Small datasets have less; larger datasets have more

31
New cards

Width of the Class

Generally, the same for each class

Approx. width = (largest value - smallest value) / # of classes

32
New cards

Class Limits

Each data observation must only belong to one class

33
New cards

Relative Frequency Distribution

Frequency of the class / n

34
New cards

Crosstabulation

a tabular summary of data for two variables (either categorical or quantitative)

suppose we have data from a sample of 300 restaurants on overall quality and the meal price. (This allows us to see if there is a pattern the two variables)

35
New cards

Scatter Diagram & Trendline

a scatter diagram is a graphical display of the relationship between two quantitative variables

a trendline provides an approximation (i.e. an estimate) of the relationship; which can be positive, negative or none

36
New cards

Side-by-Side & Stacked Bar Charts

These are extensions of a basic bar chart as they are used to display and compare two variables.

A side-by-side bar chart depicts multiple bar charts on the same display

A stacked bar chart has one bar broken into segments of a different color showing the relative frequency of each class

37
New cards

Mode

is the value that occurs with the greatest frequency. If there are two values that are most frequent the variable is bi-modal; if there are more then itโ€™s multi-modal

38
New cards

Geometric Mean

A measure of location by finding the nโ€™th root of the product of n values

39
New cards

Percentile

provides information about how the data is spread over the interval from the smallest to the largest value

40
New cards

Quartiles

represent how the data is spread over four parts, each containing approximately 25% of the observations

41
New cards

Range

largest value - smallest value

a measure of variability or dispersion of the data

42
New cards

Interquartile Range

Q3 - Q1, is the range of the middle 50% of the data

a measure of variability or dispersion in the data

43
New cards

Variance

measures variability using all the data, since it is based on the difference between the value of xi and the mean

The difference is called deviation about the mean

For a sample, the deviation is xi - ๐‘ฅ^-

For a population, a deviation is xi - ๐œ‡

44
New cards

Distribution Shape

is measured by skewness

if the shape of the data is skewed to the left, the skewness is negative (mean < median)

if to the right then skewness is positive (mean > median)

if the data is symmetric, then skewness is zero (mean = median)

45
New cards

Coefficient of Variation

This is a measure of how large the standard deviation is relative to the mean

46
New cards

Z-Score

measures the relative location of values in the dataset, helps determine how far a particular value is from the mean

yields a standardized value and is the # of standard deviation from the mean

a measure of the relative location of the observation in the dataset

uses mean and std. deviation in calc.

47
New cards

Chebyshevโ€™s Theorem

allows us to make statements about the population of the data values that must be within a specified # of standard deviation from the mean

If z = 2, 75% of data must be within 2 std. dev. of the mean

If z = 3, 89% of data must be within 3 std. dev. of the mean

If z = 4, 94% of the data must be within 4 std. dev. of the mean

48
New cards

If data is bell shaped around the mean, we know:

Approx. 68% of the data is within one s of sample mean (x^-)

Approx. 95% of the data is within two s of sample mean (x^-)

Approx. 99.7% of the data is within three s of sample mean (x^-)

49
New cards

Detecing Outliers

extreme values relative to the rest of the data

z-score can help identify outliers, any z-score greater than |3| is an outlier

Interquartile Range can also help

50
New cards

Covariance

is a descriptive measure of the linear association between two variables

Sxy = sample covariance,
if Sxy > 0, then there is positive linear association between x and y
if Sxy < 0, then there is negative linear association between x and y

51
New cards

Sample Correlation Coefficient

ranges from -1 to +1

If 1, then all data is on a positively sloped line

-1 = data would be on a negatively sloped line

As the data moves from the slope of the line, the correlation coefficient moves closer to 0

52
New cards

Probability

a numerical measure of the likelihood of an event occurring

a probability ranges from 0 to 1

53
New cards

Experiment

a process generating well-defined outcomes

ex: rolling a 6-sided die results in six possible outcomes: S = {1,2,3,4,5,6}

54
New cards

Combinations

A counting rule allowing one to count the # of experimental outcomes when selecting n objects from a set of N objects

55
New cards

Permutations

A counting rule computing the # of experimental outcomes when n objects are to be selected from a set of N objects where the order is important

56
New cards

Requirements of Assigning Probabilities

  1. The probability assigned to each outcome must be between 0 and 1

  2. The sum of the probabilities for all outcomes must be equal to 1

57
New cards

Classical Method

coin toss, or a roll of a 6-sided die

outcomes are divided by total possibilities

58
New cards

Relative Frequency Method

used when data are available to estimate the proportion of time the experimental outcomes will occur if the experiment is repeated a large # of times

59
New cards

Subjective Method

used when outcomes are not equally likely and data is unavailable

60
New cards

Probability of an Event

the probability of an event is equal to the sum of the probabilities of the sample points in the event

P(C) = P(2,6) + P(2.7) + P(3.6)
P(C) = 0.15 + 0.15 + 0.10 = 0.35

P(S) = P(2,6) + P(2,7)
P(S) = 0.15 + 0.15 = 0.30

61
New cards

Union of Two Events

the event containing all sample points belonging to Event A, Event B or both

denoted by A u B (whole bubble diagram)

62
New cards

Intersection of Two Events

the event containing the sample points belonging to both A and B

denoted by A n B (only the middle of the bubble diagram)

63
New cards

Addition Law

useful when we want to know the probability that at least one of two events occur

P(A u B) = P(A) + P(B) - P(A n B)

64
New cards

Mutually Exclusive Events

occur when two events have no sample points in common

65
New cards

Conditional Probability

probabilities are often influenced by whether a related event already occurred.

support A occurs with P(A). if event B already occurred, this new info will result in a new probability for A, and called the conditional probability: P(A|B)

66
New cards

Joint Probability

the probability of the intersection of two events

67
New cards

Random Variable

a numeric description of the outcome of an experiment and is either discrete or continuous

68
New cards

Bivariate Probability Distribution

two random variables

69
New cards

Marginal Probabilities

the sum of the joint probabilities (by row and column)

70
New cards

Independent Events

Event A and Event B are independent if: P(A|B) = P(A)

or P(B|A) = P(B)

71
New cards

Multiplication Law

used to compute the probability of the intersection of two events

72
New cards

Discrete Random Variable

a finite number of values or an infinite number of values such as 0, 1 ,2โ€ฆ

example are a toss of a coin, the # of customers who place an order, or the product chosen by a customer from two options

73
New cards

Continuous Random Variables

any numerical value in an interval or collection of intervals

example are the time a customer visits a webpage, ounces in a soft drink, the value of a stock in one year

74
New cards

Variance

measures the variability or dispersion of the random variable

75
New cards

Standard Deviation

the positive square root of the variance

76
New cards

Bivariate Probability Distribution

involves two random variables, such as rolling a die two times or recording the percentage change for a stock fund and a bond fund over a year

often the analyst is interested in the relationship between the two random variables, look at covariance and correlation coefficient as measures of the linear association between the two

77
New cards

Binomial Probability Distribution

is based on four properties:

the experiment consists of sequence of n identical trials

two outcomes are possible on each trial; success or failure

the probability of success (p) and the probability of failure (1-p) does not change from trial to trial

the trials are independent

78
New cards

Using Excel to Compute Binomial Probabilities

Enter formula, Binom.Dist

needs a value for x, n, and p

mark either true (cumulative probability) or false (probability)

ex. =Binom.Dist(B5,$D$1,$D$2,FALSE)

79
New cards

Poisson Probability Distribution

this distribution relates to the case for estimating the # of occurrences over a specified interval of space/time

80
New cards

Using Excel to Compute Poisson Probabilities

Enter formula, Poisson.Dist

need a value of x, the value of ฮผ (mean), and TRUE (cumulative probability) or FALSE (probability)

ex. =Poisson.Dist(A4,$D$1,FALSE)

81
New cards

Hypergeometric Probability Distribution

similar to the binomial distribution, except the trials are not independent and the probability of success changes from trial to trial

r is the # of success in population N, and N-r is the # of failures

82
New cards

Using Excel to Compute Hypergeometric Probability

Enter Hypergeom.Dist

needs value for x, ฮผ (mean), r, and a value of N, and either TRUE or FALSE

=Hypergeom.Dist(1,3,5,12,TRUE)

83
New cards

Continuous Random Variable

computed differently than a discrete random variable

for discrete, we compute the probability at a specific value of x

for continuous random variables, we compute the probability that the random variable assumes any value in an interval

computing the area under the probability density function, f(x)

84
New cards

Difference Between Discrete and Continuous Random Variables

discrete random variables are computed where the random variable takes on specific value; continuous random variables are computed where the random variables is within an interval

the probability of a continuous random variable within some given interval is defined to be the area under the graph of the probability density function

(a single point is an interval of 0, so the probability of a single value in the continuous case is 0)

85
New cards

Using Excel to Compute Exponential Probabilities

Enter Expon.Dist

needs x, a value for 1/ฮผ, and TRUE or FALSE

=Expon.Dist(18,1/15,TRUE);

=Expon.Dist(18,1/15,TRUE)-Expon.Dist(6,1/15,TRUE);

=1 - Expon.Dist(8,1/15,TRUE)is zero

86
New cards

Normal Probability Distribution

most used probability distribution for continuous random variables

it provides a description of likely results obtained through sampling

bell curve

87
New cards

Characteristics of the Normal Distribution

only two parameters: ฮผ and ฯƒ

highest point is the mean, which is also the median and the mode

the mean can take on any numerical value

the normal distribution is symmetric; skewness = 0

the std. dev. (ฯƒ) determines how flat or wide the curve is (larger ฯƒ = wider/flatter curves)

probabilities for a normal random variable are given by the are under the normal curve (total area under the curve = 1)

68.3% = 1 from ฮผ, 95.4% = 2 from ฮผ, 99.7% = 3 from ฮผ

88
New cards

Using Excel to Compute Normal Probabilities

Enter Norm.Dist

find value for x, ฮผ (mean), and standard deviation, and TRUE/FALSE

lower tail: =Norm.Dist(20000,36500,5000,TRUE)

interval: =Norm.Dist(40000,365000,5000,TRUE) - Norm.Dist(20000,36500,5000,TRUE)

upper tail: =1 - Norm.Dist(40000,36500,5000,TRUE)

89
New cards

Using Excel to Compute Normal Probabilities (but for value of x)

x value with 0.10 in lower tail: =Norm.Inv(0.1,36500,5000)

x value with 0.025 in upper tail: =Norm.Inv(0.975,36500,5000)

90
New cards

Standard Normal Probability Distribution

is where the ฮผ is 0 and the std. dev. is 1

91
New cards

Using Excel to Compute Standard Normal Probabilities and Z-Values

Enter Norm.S.Dist

find value of z and TRUE or FALSE (WE USE TRUE)

P(z) <= V; V=1: =Norm.S.Dist(1,TRUE)

P(z) V1 <= z <= V2; if V1 = -0.5 and V2 = 1.25: =Norm.S.Dist(1.25,TRUE) - Norm.S.Dist(-0.5,TRUE)

P(z) >= V; if V = 1.58: =1-Norm.S.Dist(1.58,TRUE)

92
New cards

Using Excel to Compute Standard Normal Probabilities and Z-Values (but for z-values)

z-value with 0.025 in lower tail: =Norm.S.Inv(0.025)

z-value with 0.025 in upper tail: =Norm.S.Inv(0.975)

93
New cards

Using Excel to Calculate E(x) or ฮผ, ฯƒ^2 & ฯƒ

Mean: =sumproduct(A:A,B:B)

Squared Deviation from Mean: =(A2 - F$2)ยฒ

Variance: =sumproduct(C:C,B:B)

Standard Deviation: =sqrt(B11)

94
New cards

Using Excel to Calculate the Sample Covariance and Sample Correlation Coefficient

Enter =covariance.s(
and select the cells

Enter =correl(
and select the cells

95
New cards

Using Excel to Compute the Geometric Mean

=geomean(
select the cells

96
New cards

Using Excel to Compute Percentiles & Quartiles

Enter =Percentile.Exc(
select the cells

Enter =Quartile.Exc(
select the cells

97
New cards

Using Excel to Calculate the Sample Variance and Sample Standard Deviation

Enter =var.s(
select the cells

Enter =stdev.s(
select the cells

98
New cards

Using Excelโ€™s Descriptive Statistics Tool

Apply Tools:

click on Data in the Ribbon

click on Data Analysis

choose Descriptive Statistics

99
New cards

Using Excelโ€™s Recommended Chart Tools to Construct a Histogram (to show a class with no data)

right click any cell in the row labels column

click field settings

click Layout and Print

choose show items with no data; click OK