buss1020

0.0(0)
studied byStudied by 1 person
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/207

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

208 Terms

1
New cards
Categorical variables
have values that can only be placed into categories such as yes or no
2
New cards
Numerical variables
have values that represent quantities and divided into discrete and continuous
3
New cards
Discrete variables
result from a counting process and finite number of integers
4
New cards
Continuous variables
result from measuring process and do not have to be integers
5
New cards
Variables can be further identified by what measurements?
interval (no true zero point e.g temp) or ratio scale (true zero point e.g height)
6
New cards
Probability sample
when the probability of each outcome is known
7
New cards
Non-probability samples
no guarantee true population will be represented (convenience, self-selection, judgment, quota)
8
New cards
framework for conducting statistical analysis
DCOVA- define, collect, organise, visualise, analyse
9
New cards
Branches of stats
descriptive (describe the sample e.g mean, median, range), inferential (estimations and hypothesis), predictive (predict how much customer spend)
10
New cards
Types of categorical variables
Ordinal variables (eye colour, car)
Nominal variables (clothing sizes)
11
New cards
Sources of data
SODOS
Surveys (cust satisfaction), organisations (historic weather conditions from country), designed experiments (consumer testing), observational studies (volume of traffic), streaming services (website traffic sites)
12
New cards
Types of survey error
1. coverage error (exclusion)
2. response error (fail collect all data)
3. sampling error (variation that will always exist)
4.measurement error (weakness in method)
13
New cards
Central tendency
point data clusters around
14
New cards
Main measures of central tendency that estimate central point using different methods
Mean, median, mode
15
New cards
Geometric mean
the average rate of return of a set of values calculated using the products of the terms and will always be equal to or less than the arithmetic mean
16
New cards
Measures of variation
range- difference between largest and smallest values, variance-average squared deviation distance from the mean
standard deviation- variation around sample mean coefficient of variation- shows the relative variation between two data sets in different units
17
New cards
Ad/dis of arithmetic mean
easy and quick to calculate and good for many things BUT tends to overstate the actual average, sometimes inappropriate
18
New cards
Z score
observation that tells us position in the data set (how many SD away from the mean)

Z\= X-Xbar/S
19
New cards
Skewness
mean < median\= left skewed
mean \> median \= right skewed
mean\=median\= symmetrical
20
New cards
How to calculate outliers
1. convert all data points into Z-scores using formula
2. use COUNTIFS function (array,"\>-3",array,"
21
New cards
To insert pivot table in excel
far left drag to values, column, row
22
New cards
Kurtosis
Measures the concentration of values in the centre as compared to the tails
23
New cards
Quartiles and box plots
IQR\=Q3-Q1
measure of variability not influenced by outliers
24
New cards
Chebyshev's Rule
Regardless of how data is distributed at least (1-1/k20)*100 of the value will fall within K standard deviation of the mean
25
New cards
Ethical considerations of descriptive stats
1. Assuming small differences are meaningful
2. Equating statistical significance with real world significance
3.Neglecting to look at extremes
4. trusting coincidence
5. Deceptive graphs
26
New cards
Monty hall problem
initial intuitions about a game of chance can be drastically incorrect
27
New cards
Sample space
the set of all possible outcomes from an experiment
28
New cards
simple event
described by a single characteristic
29
New cards
Joint events
are described by two or more characteristics
30
New cards
complement of event
The complement of event A consists of all outcomes that are NOT in A.
31
New cards
Visualisng
Tree diagram or contingency table
32
New cards
Marginal probability
refers to the probability of a single event
33
New cards
Joint Probability
considers two or more events (AND) (OR- subtract intersection)
34
New cards
Mutually exclusive events
Events that cannot occur at the same time.
35
New cards
Collectively exhaustive events
events that together cover the entire sample space (if mutually exclusive and collectively exhaustive then probability sums to 1)
36
New cards
Conditional Probability
the probability of one event, given the knowledge that another event has occurred
37
New cards
independence
when the occurrence of one event does not affect the probability of another
38
New cards
multiplication rule
To determine the probability, we multiply the probability of one event by the probability of another.- if A and B are independent and finding AND just multiply them but if dependant must calculate individual probabilities first
39
New cards
Bayes' Theorem
allows us to reverse the conditioning between two events or variables

P(A+B)/(B)
but in order to find these values as not always given in the Q:
P(A|B)*P(B)/P(B)--\> \= P(A+B)+P(A+B')
40
New cards
Multiplication counting rule
Determines the number of possible outcomes when K events can occur on each of n trials, but possible events differ from trial to trial (flip a coin in first trial and roll a dice in second (6*2\=12)
41
New cards
Repetition counting rule
Determines the number of possible outcomes when K events can occur on each of n trials (3 trials with 6 possible events - 6 power3)
42
New cards
Factorial Counting Rule
Involves counting the number of ways a set of items can be arranged in order e.g have 5 books on shelf how many different ways books can be arranged

EXCEL- FACT
43
New cards
permutations counting rule
Allows you to find out the number of ways in which a subset of an entire group of items can be arranged in order e.g have 5 books and are going to put 3 on a shelf, what are the different ways to order

EXCEL- PERMUT(have, choice)
44
New cards
combinations counting rule
This is interested in the number of ways X items can be selected from N items, irrespective of order. e.g have 5 books and select 3 to read, how many different sets of 3 possible?

EXCEL- COMBIN (N,X)
45
New cards
Ethical considerations of probability
using probability in advertising
46
New cards
Expectation
mean average where each observed outcome is multiplied by associated probabilities and then sum them

EXCEL- SUMPRODUCT(observed values and then probabilities)
47
New cards
A random variable
assigns numerical values to the possible outcomes of an uncertain event
48
New cards
discrete random variables
can only have probabilities at certain values and take only a countable number of values

Usually mutually exclusive and collectively exhaustive sums to 1 for the occurrence of each outcome
49
New cards
valid probability distribution
- all probabilities sum to 1 and are between 0 and 1
50
New cards
Summary measures for discrete probability distribution (bar graph)
expected value, variance, standard deviation
51
New cards
Variance of a random discrete variable
expected difference of mean squared
1. first find mean, then for each observation subtract mean, square, and do this for every observation
2. sum-product the previous calculation to their respective probabilities

EXCEL *SUMPRODUCT((X-Xbar)^2,p)

Observation with the higher probability will contribute higher to expectation
52
New cards
Standard deviation of a discrete random variable
square root of variance
53
New cards
Binomial distribution
Used when random variables of X counts the number of "events of interest" (successes) occurring from a fixed number of observations or trials e.g taking 10 lightbulbs and counting defects
54
New cards
Requirements for binomial distribution
1. Each trial is independent
2. Probability is always the same
3. Two outcomes
4. Set number of trials
5. pie\= probability of event occurring
55
New cards
To make calculations for binomial distribution
BINOM.DIST(x,n,prob, TRUE or FALSE)

FALSE when want to calculate probability at a point
TRUE when want cumulative probability
56
New cards
Poisson distribution
describes probability that a certain number of events (successes) will occur in a time period on average e.g given that we know 3 customers per minute arrive at a store, what is the probability that 2 customers arrive in 3 minutes
57
New cards
Requirements for the Poisson distribution
1. The probability that an event occurs in one window is the same for all other windows
2. The number of events in one window is independent of the number that occurs in other windows
3. The probability that 2 or more events will occur in a window approaches 0 as the window becomes smaller
58
New cards
Calculate poisson distribution
lander\= average (expected number of events), x\= number of events observed, variance\=lander, SD\= square root lander

EXCEL- POISSON.DIST (X, L, True or false)
59
New cards
Hypergeometric distribution
similar to a multinomial distribution, except that the probabilities associated with the classes do not remain constant as
in sampling without replacement or from a finite population.
-fixed number of trials

A\= number of items of interest in population
N-A\= number of items not of interest in the population
n\= sample size
x\= number of events interest in the sample
n-x number of events not of interest in the sample
60
New cards
Requirements for Hypergeometric
1. "n" trials in a sample taken from a finite population size "N"
2. the sample is taken without replacement
3. the outcome of trials are dependant
4. finding the probability of a particular number of events of interest
61
New cards
Calculating hypo-geometric distribtuion
EXCEL- HYPGEOM.DIST(x,n,A,N, true or false)
62
New cards
Application of covariance to summing random variables
looks like sigma(x,y)
*SUMPRODUCT (p(x,y)+(x-E(X))+(y-E(Y))
63
New cards
Positive covariance
there is a positive relationship between X and Y, when X increases, Y also increases on average
64
New cards
Negative covariance
the variables have a negative linear relationship, so as one variable increases the other variable on average decreases
65
New cards
Continuous random variables
can potentially take any value in a range, depending on how accurately or precisely it can be measured
Range could be up to infinity
66
New cards
probability for range of values
find probability from probability density function we find the area under the curve across the range
67
New cards
Interpreting PDFs
instead of probabilities for each value of X, a continuous random variable has whats called a probability density function

1. Each density curve represents the relative likelihood for each X value
2. the area under the entire density curve is always exactly 1 and the area under the curve for a single value for X equals zero
68
New cards
Normal distributions
symmetrical, bell shaped curve that represents the continuous variables will cluster around the mean
69
New cards
properties of normal distribution
1. Symmetric about the mean
- mode, median, and mean are at the same point
- the area under the curve to the right of the mean is equal to the area under the curve to the left of the mean (area\= 0.5)
2. the curve approaches, but never touches zero
3. the area under the curve is exactly 1 by definition
70
New cards
Compute probabilities for normal distribution
EXCEL- NORM.DIST
If finding in between large-small

If finding the boundaries given a probability\=
NORM.INV

if X\> then a value then use complement 1-NORM.DIST
71
New cards
Assessing normality
histogram, box plot, mean\=median-mode
72
New cards
uniform distribution
also called the rectangular distribution because it has equal density for all possible outcomes of the random variable
73
New cards
Calculating uniform distribution
height \= 1/b-a
74
New cards
exponential distributions
used to measure the length of time between two occurrences of an event
related to poisson distribution
75
New cards
calculate exponential distribution
EXCEL- expon.dist
has a long right tail

lander\= reciprocal of the mean or is the rate specified in the question.
76
New cards
sampling distribution
whenever we collect a sample and calculate a statistic for that sample, that sample statistic is a random variable and has a distribution that are dependant on the population distribution and chosen sample size

the larger the sample size the closer the sample mean will be to the middle of the population

Sampling distribution helps us know how accurate the statistic is for estimating the corresponding population parameter
77
New cards
Sample mean of sampling distribution
always be the case that the mean of the sample means is the same as the population mean
78
New cards
sample distributions of the mean rules
1. The mean of the sample mean is always equal to the population mean
2. the standard error of the sample mean is equal to the population standard deviation divided by the square root of the sample size
3. the standard error of the sample mean is smaller than the population mean when n\>1 (and gets smaller as population grows)
4. If the population is normally distributed the sample mean also follows a normal distribution
79
New cards
If population is normally distributed
the sample mean will also be normally distributed
80
New cards
Sample proportions
pie\= true population mean, P\=sample stat, average value of sample proportion is equal to pie, standard error of sample proportion\= square root pie(1-pie)/n
81
New cards
How do we know if P is normally distributed?
needs to fulfill criteria of being n*pie\> or equal to 5 and n(1-pie)\> or equal to 5
82
New cards
How to calculate proportion distributions
NORM.DIST(put proportion in)
83
New cards
finite population correction factor (FPC)
An adjustment to the required sample size that is made in cases where the sample is expected to be equal to 5 percent or more of the total population

Always less than 1 so it will reduce the standard deviation as leads to increased accuracy as the sample size increases uncertainty decreases

square root(N-n/N-1)
84
New cards
Variance of the sample mean
sigma^2/n
85
New cards
If asks sample mean lies between which 2 values
then realise that the percentage when subtracted by 100 is the amount of probability the critical values lie at. thus 0.95 means one critical value at 0.025 and one at 0.9755
86
New cards
Question says probability is larger than X
1- Norm.dist
87
New cards
measures of normality
CLT holds and symmetric is an assumption
88
New cards
Confidence intervals
confidence intervals help us to estimate the population parameters

Now use sampling distribution to estimate
89
New cards
Point estimates for confidence intervals
Population mean- use the sample mean
Standard deviation known- use Z
standard deviation unknown- use standard error and T
population proportion- Z
90
New cards
Interpret confidence interval
for a 95% confidence interval, if we collect many samples of size n, and construct confidence intervals for each, 95% of them will contain the true unknown parameter
91
New cards
General formula for confidence intervals
point estimate+_ (critical value) *(standard error)
92
New cards
Finding critical value Za/2 for when sigma known and proportions- confidence intervals
NORM.S.INV(a/2)
a\= 100-significance level
93
New cards
90% conf
1.645
94
New cards
95% conf
1.96
95
New cards
99% conf
2.58
96
New cards
Formula for what need to add point estimate when sigma known in confidence intervals
CONFIDENCE.NORM(alpha, population SD, n)
97
New cards
Student's t-distribution
if the population SD is unknown we substitute with the sample SE

introduced extra uncertainty
98
New cards
Degrees of freedom
n-1
as sample size grows so do degrees of freedom and gets closer to normal distribution
As degrees of freedom increases, t values decrease
99
New cards
To find the t-critical value in excel
T.INV.2T(a,dof)
T.INV(right tail, dof)
100
New cards
Excel formula to add to point estimate when sigma unknwon
CONFIDENCE.T(a, SE, n)