buss1020

studied byStudied by 1 person
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 207

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

208 Terms

1
Categorical variables
have values that can only be placed into categories such as yes or no
New cards
2
Numerical variables
have values that represent quantities and divided into discrete and continuous
New cards
3
Discrete variables
result from a counting process and finite number of integers
New cards
4
Continuous variables
result from measuring process and do not have to be integers
New cards
5
Variables can be further identified by what measurements?
interval (no true zero point e.g temp) or ratio scale (true zero point e.g height)
New cards
6
Probability sample
when the probability of each outcome is known
New cards
7
Non-probability samples
no guarantee true population will be represented (convenience, self-selection, judgment, quota)
New cards
8
framework for conducting statistical analysis
DCOVA- define, collect, organise, visualise, analyse
New cards
9
Branches of stats
descriptive (describe the sample e.g mean, median, range), inferential (estimations and hypothesis), predictive (predict how much customer spend)
New cards
10
Types of categorical variables
Ordinal variables (eye colour, car)
Nominal variables (clothing sizes)
New cards
11
Sources of data
SODOS
Surveys (cust satisfaction), organisations (historic weather conditions from country), designed experiments (consumer testing), observational studies (volume of traffic), streaming services (website traffic sites)
New cards
12
Types of survey error
  1. coverage error (exclusion)

  2. response error (fail collect all data)

  3. sampling error (variation that will always exist) 4.measurement error (weakness in method)

New cards
13
Central tendency
point data clusters around
New cards
14
Main measures of central tendency that estimate central point using different methods
Mean, median, mode
New cards
15
Geometric mean
the average rate of return of a set of values calculated using the products of the terms and will always be equal to or less than the arithmetic mean
New cards
16
Measures of variation
range- difference between largest and smallest values, variance-average squared deviation distance from the mean
standard deviation- variation around sample mean coefficient of variation- shows the relative variation between two data sets in different units
New cards
17
Ad/dis of arithmetic mean
easy and quick to calculate and good for many things BUT tends to overstate the actual average, sometimes inappropriate
New cards
18
Z score
observation that tells us position in the data set (how many SD away from the mean)

Z\= X-Xbar/S
New cards
19
Skewness
mean < median\= left skewed
mean \> median \= right skewed
mean\=median\= symmetrical
New cards
20
How to calculate outliers
  1. convert all data points into Z-scores using formula

  2. use COUNTIFS function (array,">-3",array,"<3") 3.if z-score greater than 3 or less than -3 is an outlier

New cards
21
To insert pivot table in excel
far left drag to values, column, row
New cards
22
Kurtosis
Measures the concentration of values in the centre as compared to the tails
New cards
23
Quartiles and box plots
IQR\=Q3-Q1
measure of variability not influenced by outliers
New cards
24
Chebyshev's Rule
Regardless of how data is distributed at least (1-1/k20)*100 of the value will fall within K standard deviation of the mean
New cards
25
Ethical considerations of descriptive stats
  1. Assuming small differences are meaningful

  2. Equating statistical significance with real world significance 3.Neglecting to look at extremes

  3. trusting coincidence

  4. Deceptive graphs

New cards
26
Monty hall problem
initial intuitions about a game of chance can be drastically incorrect
New cards
27
Sample space
the set of all possible outcomes from an experiment
New cards
28
simple event
described by a single characteristic
New cards
29
Joint events
are described by two or more characteristics
New cards
30
complement of event
The complement of event A consists of all outcomes that are NOT in A.
New cards
31
Visualisng
Tree diagram or contingency table
New cards
32
Marginal probability
refers to the probability of a single event
New cards
33
Joint Probability
considers two or more events (AND) (OR- subtract intersection)
New cards
34
Mutually exclusive events
Events that cannot occur at the same time.
New cards
35
Collectively exhaustive events
events that together cover the entire sample space (if mutually exclusive and collectively exhaustive then probability sums to 1)
New cards
36
Conditional Probability
the probability of one event, given the knowledge that another event has occurred
New cards
37
independence
when the occurrence of one event does not affect the probability of another
New cards
38
multiplication rule
To determine the probability, we multiply the probability of one event by the probability of another.- if A and B are independent and finding AND just multiply them but if dependant must calculate individual probabilities first
New cards
39
Bayes' Theorem
allows us to reverse the conditioning between two events or variables

P(A+B)/(B)
but in order to find these values as not always given in the Q:
P(A|B)*P(B)/P(B)--\> \= P(A+B)+P(A+B')
New cards
40
Multiplication counting rule
Determines the number of possible outcomes when K events can occur on each of n trials, but possible events differ from trial to trial (flip a coin in first trial and roll a dice in second (6*2\=12)
New cards
41
Repetition counting rule
Determines the number of possible outcomes when K events can occur on each of n trials (3 trials with 6 possible events - 6 power3)
New cards
42
Factorial Counting Rule
Involves counting the number of ways a set of items can be arranged in order e.g have 5 books on shelf how many different ways books can be arranged

EXCEL- FACT
New cards
43
permutations counting rule
Allows you to find out the number of ways in which a subset of an entire group of items can be arranged in order e.g have 5 books and are going to put 3 on a shelf, what are the different ways to order

EXCEL- PERMUT(have, choice)
New cards
44
combinations counting rule
This is interested in the number of ways X items can be selected from N items, irrespective of order. e.g have 5 books and select 3 to read, how many different sets of 3 possible?

EXCEL- COMBIN (N,X)
New cards
45
Ethical considerations of probability
using probability in advertising
New cards
46
Expectation
mean average where each observed outcome is multiplied by associated probabilities and then sum them

EXCEL- SUMPRODUCT(observed values and then probabilities)
New cards
47
A random variable
assigns numerical values to the possible outcomes of an uncertain event
New cards
48
discrete random variables
can only have probabilities at certain values and take only a countable number of values

Usually mutually exclusive and collectively exhaustive sums to 1 for the occurrence of each outcome
New cards
49
valid probability distribution
- all probabilities sum to 1 and are between 0 and 1
New cards
50
Summary measures for discrete probability distribution (bar graph)
expected value, variance, standard deviation
New cards
51
Variance of a random discrete variable

expected difference of mean squared

  1. first find mean, then for each observation subtract mean, square, and do this for every observation

  2. sum-product the previous calculation to their respective probabilities

EXCEL *SUMPRODUCT((X-Xbar)^2,p)

Observation with the higher probability will contribute higher to expectation

New cards
52
Standard deviation of a discrete random variable
square root of variance
New cards
53
Binomial distribution
Used when random variables of X counts the number of "events of interest" (successes) occurring from a fixed number of observations or trials e.g taking 10 lightbulbs and counting defects
New cards
54
Requirements for binomial distribution
  1. Each trial is independent

  2. Probability is always the same

  3. Two outcomes

  4. Set number of trials

  5. pie= probability of event occurring

New cards
55
To make calculations for binomial distribution
BINOM.DIST(x,n,prob, TRUE or FALSE)

FALSE when want to calculate probability at a point
TRUE when want cumulative probability
New cards
56
Poisson distribution
describes probability that a certain number of events (successes) will occur in a time period on average e.g given that we know 3 customers per minute arrive at a store, what is the probability that 2 customers arrive in 3 minutes
New cards
57
Requirements for the Poisson distribution
  1. The probability that an event occurs in one window is the same for all other windows

  2. The number of events in one window is independent of the number that occurs in other windows

  3. The probability that 2 or more events will occur in a window approaches 0 as the window becomes smaller

New cards
58
Calculate poisson distribution
lander\= average (expected number of events), x\= number of events observed, variance\=lander, SD\= square root lander

EXCEL- POISSON.DIST (X, L, True or false)
New cards
59
Hypergeometric distribution
similar to a multinomial distribution, except that the probabilities associated with the classes do not remain constant as
in sampling without replacement or from a finite population.
-fixed number of trials

A\= number of items of interest in population
N-A\= number of items not of interest in the population
n\= sample size
x\= number of events interest in the sample
n-x number of events not of interest in the sample
New cards
60
Requirements for Hypergeometric
  1. "n" trials in a sample taken from a finite population size "N"

  2. the sample is taken without replacement

  3. the outcome of trials are dependant

  4. finding the probability of a particular number of events of interest

New cards
61
Calculating hypo-geometric distribtuion
EXCEL- HYPGEOM.DIST(x,n,A,N, true or false)
New cards
62
Application of covariance to summing random variables
looks like sigma(x,y)
*SUMPRODUCT (p(x,y)+(x-E(X))+(y-E(Y))
New cards
63
Positive covariance
there is a positive relationship between X and Y, when X increases, Y also increases on average
New cards
64
Negative covariance
the variables have a negative linear relationship, so as one variable increases the other variable on average decreases
New cards
65
Continuous random variables
can potentially take any value in a range, depending on how accurately or precisely it can be measured
Range could be up to infinity
New cards
66
probability for range of values
find probability from probability density function we find the area under the curve across the range
New cards
67
Interpreting PDFs

instead of probabilities for each value of X, a continuous random variable has whats called a probability density function

  1. Each density curve represents the relative likelihood for each X value

  2. the area under the entire density curve is always exactly 1 and the area under the curve for a single value for X equals zero

New cards
68
Normal distributions
symmetrical, bell shaped curve that represents the continuous variables will cluster around the mean
New cards
69
properties of normal distribution
  1. Symmetric about the mean

  • mode, median, and mean are at the same point

  • the area under the curve to the right of the mean is equal to the area under the curve to the left of the mean (area= 0.5)

  1. the curve approaches, but never touches zero

  2. the area under the curve is exactly 1 by definition

New cards
70
Compute probabilities for normal distribution
EXCEL- NORM.DIST
If finding in between large-small

If finding the boundaries given a probability\=
NORM.INV

if X\> then a value then use complement 1-NORM.DIST
New cards
71
Assessing normality
histogram, box plot, mean\=median-mode
New cards
72
uniform distribution
also called the rectangular distribution because it has equal density for all possible outcomes of the random variable
New cards
73
Calculating uniform distribution
height \= 1/b-a
New cards
74
exponential distributions
used to measure the length of time between two occurrences of an event
related to poisson distribution
New cards
75
calculate exponential distribution
EXCEL- expon.dist
has a long right tail

lander\= reciprocal of the mean or is the rate specified in the question.
New cards
76
sampling distribution
whenever we collect a sample and calculate a statistic for that sample, that sample statistic is a random variable and has a distribution that are dependant on the population distribution and chosen sample size

the larger the sample size the closer the sample mean will be to the middle of the population

Sampling distribution helps us know how accurate the statistic is for estimating the corresponding population parameter
New cards
77
Sample mean of sampling distribution
always be the case that the mean of the sample means is the same as the population mean
New cards
78
sample distributions of the mean rules
  1. The mean of the sample mean is always equal to the population mean

  2. the standard error of the sample mean is equal to the population standard deviation divided by the square root of the sample size

  3. the standard error of the sample mean is smaller than the population mean when n>1 (and gets smaller as population grows)

  4. If the population is normally distributed the sample mean also follows a normal distribution

New cards
79
If population is normally distributed
the sample mean will also be normally distributed
New cards
80
Sample proportions
pie\= true population mean, P\=sample stat, average value of sample proportion is equal to pie, standard error of sample proportion\= square root pie(1-pie)/n
New cards
81
How do we know if P is normally distributed?
needs to fulfill criteria of being n*pie\> or equal to 5 and n(1-pie)\> or equal to 5
New cards
82
How to calculate proportion distributions
NORM.DIST(put proportion in)
New cards
83
finite population correction factor (FPC)
An adjustment to the required sample size that is made in cases where the sample is expected to be equal to 5 percent or more of the total population

Always less than 1 so it will reduce the standard deviation as leads to increased accuracy as the sample size increases uncertainty decreases

square root(N-n/N-1)
New cards
84
Variance of the sample mean
sigma^2/n
New cards
85
If asks sample mean lies between which 2 values
then realise that the percentage when subtracted by 100 is the amount of probability the critical values lie at. thus 0.95 means one critical value at 0.025 and one at 0.9755
New cards
86
Question says probability is larger than X
1- Norm.dist
New cards
87
measures of normality
CLT holds and symmetric is an assumption
New cards
88
Confidence intervals
confidence intervals help us to estimate the population parameters

Now use sampling distribution to estimate
New cards
89
Point estimates for confidence intervals
Population mean- use the sample mean
Standard deviation known- use Z
standard deviation unknown- use standard error and T
population proportion- Z
New cards
90
Interpret confidence interval
for a 95% confidence interval, if we collect many samples of size n, and construct confidence intervals for each, 95% of them will contain the true unknown parameter
New cards
91
General formula for confidence intervals
point estimate+_ (critical value) *(standard error)
New cards
92
Finding critical value Za/2 for when sigma known and proportions- confidence intervals
NORM.S.INV(a/2)
a\= 100-significance level
New cards
93
90% conf
1.645
New cards
94
95% conf
1.96
New cards
95
99% conf
2.58
New cards
96
Formula for what need to add point estimate when sigma known in confidence intervals
CONFIDENCE.NORM(alpha, population SD, n)
New cards
97
Student's t-distribution
if the population SD is unknown we substitute with the sample SE

introduced extra uncertainty
New cards
98
Degrees of freedom
n-1
as sample size grows so do degrees of freedom and gets closer to normal distribution
As degrees of freedom increases, t values decrease
New cards
99
To find the t-critical value in excel
T.INV.2T(a,dof)
T.INV(right tail, dof)
New cards
100
Excel formula to add to point estimate when sigma unknwon
CONFIDENCE.T(a, SE, n)
New cards

Explore top notes

note Note
studied byStudied by 10 people
697 days ago
4.5(2)
note Note
studied byStudied by 52 people
316 days ago
5.0(2)
note Note
studied byStudied by 168 people
1009 days ago
5.0(1)
note Note
studied byStudied by 52 people
509 days ago
5.0(1)
note Note
studied byStudied by 21 people
213 days ago
5.0(1)
note Note
studied byStudied by 20 people
900 days ago
5.0(1)
note Note
studied byStudied by 74 people
724 days ago
5.0(2)
note Note
studied byStudied by 57275 people
701 days ago
4.8(405)

Explore top flashcards

flashcards Flashcard (67)
studied byStudied by 22 people
824 days ago
5.0(2)
flashcards Flashcard (27)
studied byStudied by 17 people
519 days ago
4.0(1)
flashcards Flashcard (49)
studied byStudied by 8 people
837 days ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 1 person
301 days ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 18 people
679 days ago
5.0(1)
flashcards Flashcard (37)
studied byStudied by 2 people
295 days ago
4.0(1)
flashcards Flashcard (55)
studied byStudied by 1 person
779 days ago
5.0(1)
flashcards Flashcard (47)
studied byStudied by 9 people
151 days ago
5.0(1)
robot