buss1020

0.0(0)

Studied by 1 person

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/207

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

208 Terms

New cards

Categorical variables

have values that can only be placed into categories such as yes or no

New cards

Numerical variables

have values that represent quantities and divided into discrete and continuous

New cards

Discrete variables

result from a counting process and finite number of integers

New cards

Continuous variables

result from measuring process and do not have to be integers

New cards

Variables can be further identified by what measurements?

interval (no true zero point e.g temp) or ratio scale (true zero point e.g height)

New cards

Probability sample

when the probability of each outcome is known

New cards

Non-probability samples

no guarantee true population will be represented (convenience, self-selection, judgment, quota)

New cards

framework for conducting statistical analysis

DCOVA- define, collect, organise, visualise, analyse

New cards

Branches of stats

descriptive (describe the sample e.g mean, median, range), inferential (estimations and hypothesis), predictive (predict how much customer spend)

New cards

Types of categorical variables

Ordinal variables (eye colour, car)
Nominal variables (clothing sizes)

New cards

Sources of data

SODOS
Surveys (cust satisfaction), organisations (historic weather conditions from country), designed experiments (consumer testing), observational studies (volume of traffic), streaming services (website traffic sites)

New cards

Types of survey error

1. coverage error (exclusion)
2. response error (fail collect all data)
3. sampling error (variation that will always exist)
4.measurement error (weakness in method)

New cards

Central tendency

point data clusters around

New cards

Main measures of central tendency that estimate central point using different methods

Mean, median, mode

New cards

Geometric mean

the average rate of return of a set of values calculated using the products of the terms and will always be equal to or less than the arithmetic mean

New cards

Measures of variation

range- difference between largest and smallest values, variance-average squared deviation distance from the mean
standard deviation- variation around sample mean coefficient of variation- shows the relative variation between two data sets in different units

New cards

Ad/dis of arithmetic mean

easy and quick to calculate and good for many things BUT tends to overstate the actual average, sometimes inappropriate

New cards

Z score

observation that tells us position in the data set (how many SD away from the mean)

Z\= X-Xbar/S

New cards

Skewness

mean < median\= left skewed
mean \> median \= right skewed
mean\=median\= symmetrical

New cards

How to calculate outliers

1. convert all data points into Z-scores using formula
2. use COUNTIFS function (array,"\>-3",array,"

New cards

To insert pivot table in excel

far left drag to values, column, row

New cards

Kurtosis

Measures the concentration of values in the centre as compared to the tails

New cards

Quartiles and box plots

IQR\=Q3-Q1
measure of variability not influenced by outliers

New cards

Chebyshev's Rule

Regardless of how data is distributed at least (1-1/k20)*100 of the value will fall within K standard deviation of the mean

New cards

Ethical considerations of descriptive stats

1. Assuming small differences are meaningful
2. Equating statistical significance with real world significance
3.Neglecting to look at extremes
4. trusting coincidence
5. Deceptive graphs

New cards

Monty hall problem

initial intuitions about a game of chance can be drastically incorrect

New cards

Sample space

the set of all possible outcomes from an experiment

New cards

simple event

described by a single characteristic

New cards

Joint events

are described by two or more characteristics

New cards

complement of event

The complement of event A consists of all outcomes that are NOT in A.

New cards

Visualisng

Tree diagram or contingency table

New cards

Marginal probability

refers to the probability of a single event

New cards

Joint Probability

considers two or more events (AND) (OR- subtract intersection)

New cards

Mutually exclusive events

Events that cannot occur at the same time.

New cards

Collectively exhaustive events

events that together cover the entire sample space (if mutually exclusive and collectively exhaustive then probability sums to 1)

New cards

Conditional Probability

the probability of one event, given the knowledge that another event has occurred

New cards

independence

when the occurrence of one event does not affect the probability of another

New cards

multiplication rule

To determine the probability, we multiply the probability of one event by the probability of another.- if A and B are independent and finding AND just multiply them but if dependant must calculate individual probabilities first

New cards

Bayes' Theorem

allows us to reverse the conditioning between two events or variables

P(A+B)/(B)
but in order to find these values as not always given in the Q:
P(A|B)*P(B)/P(B)--\> \= P(A+B)+P(A+B')

New cards

Multiplication counting rule

Determines the number of possible outcomes when K events can occur on each of n trials, but possible events differ from trial to trial (flip a coin in first trial and roll a dice in second (6*2\=12)

New cards

Repetition counting rule

Determines the number of possible outcomes when K events can occur on each of n trials (3 trials with 6 possible events - 6 power3)

New cards

Factorial Counting Rule

Involves counting the number of ways a set of items can be arranged in order e.g have 5 books on shelf how many different ways books can be arranged

EXCEL- FACT

New cards

permutations counting rule

Allows you to find out the number of ways in which a subset of an entire group of items can be arranged in order e.g have 5 books and are going to put 3 on a shelf, what are the different ways to order

EXCEL- PERMUT(have, choice)

New cards

combinations counting rule

This is interested in the number of ways X items can be selected from N items, irrespective of order. e.g have 5 books and select 3 to read, how many different sets of 3 possible?

EXCEL- COMBIN (N,X)

New cards

Ethical considerations of probability

using probability in advertising

New cards

Expectation

mean average where each observed outcome is multiplied by associated probabilities and then sum them

EXCEL- SUMPRODUCT(observed values and then probabilities)

New cards

A random variable

assigns numerical values to the possible outcomes of an uncertain event

New cards

discrete random variables

can only have probabilities at certain values and take only a countable number of values

Usually mutually exclusive and collectively exhaustive sums to 1 for the occurrence of each outcome

New cards

valid probability distribution

- all probabilities sum to 1 and are between 0 and 1

New cards

Summary measures for discrete probability distribution (bar graph)

expected value, variance, standard deviation

New cards

Variance of a random discrete variable

expected difference of mean squared
1. first find mean, then for each observation subtract mean, square, and do this for every observation
2. sum-product the previous calculation to their respective probabilities

EXCEL *SUMPRODUCT((X-Xbar)^2,p)

Observation with the higher probability will contribute higher to expectation

New cards

Standard deviation of a discrete random variable

square root of variance

New cards

Binomial distribution

Used when random variables of X counts the number of "events of interest" (successes) occurring from a fixed number of observations or trials e.g taking 10 lightbulbs and counting defects

New cards

Requirements for binomial distribution

1. Each trial is independent
2. Probability is always the same
3. Two outcomes
4. Set number of trials
5. pie\= probability of event occurring

New cards

To make calculations for binomial distribution

BINOM.DIST(x,n,prob, TRUE or FALSE)

FALSE when want to calculate probability at a point
TRUE when want cumulative probability

New cards

Poisson distribution

describes probability that a certain number of events (successes) will occur in a time period on average e.g given that we know 3 customers per minute arrive at a store, what is the probability that 2 customers arrive in 3 minutes

New cards

Requirements for the Poisson distribution

1. The probability that an event occurs in one window is the same for all other windows
2. The number of events in one window is independent of the number that occurs in other windows
3. The probability that 2 or more events will occur in a window approaches 0 as the window becomes smaller

New cards

Calculate poisson distribution

lander\= average (expected number of events), x\= number of events observed, variance\=lander, SD\= square root lander

EXCEL- POISSON.DIST (X, L, True or false)

New cards

Hypergeometric distribution

similar to a multinomial distribution, except that the probabilities associated with the classes do not remain constant as
in sampling without replacement or from a finite population.
-fixed number of trials

A\= number of items of interest in population
N-A\= number of items not of interest in the population
n\= sample size
x\= number of events interest in the sample
n-x number of events not of interest in the sample

New cards

Requirements for Hypergeometric

1. "n" trials in a sample taken from a finite population size "N"
2. the sample is taken without replacement
3. the outcome of trials are dependant
4. finding the probability of a particular number of events of interest

New cards

Calculating hypo-geometric distribtuion

EXCEL- HYPGEOM.DIST(x,n,A,N, true or false)

New cards

Application of covariance to summing random variables
looks like sigma(x,y)

*SUMPRODUCT (p(x,y)+(x-E(X))+(y-E(Y))

New cards

Positive covariance

there is a positive relationship between X and Y, when X increases, Y also increases on average

New cards

Negative covariance

the variables have a negative linear relationship, so as one variable increases the other variable on average decreases

New cards

Continuous random variables

can potentially take any value in a range, depending on how accurately or precisely it can be measured
Range could be up to infinity

New cards

probability for range of values

find probability from probability density function we find the area under the curve across the range

New cards

Interpreting PDFs

instead of probabilities for each value of X, a continuous random variable has whats called a probability density function

1. Each density curve represents the relative likelihood for each X value
2. the area under the entire density curve is always exactly 1 and the area under the curve for a single value for X equals zero

New cards

Normal distributions

symmetrical, bell shaped curve that represents the continuous variables will cluster around the mean

New cards

properties of normal distribution

1. Symmetric about the mean
- mode, median, and mean are at the same point
- the area under the curve to the right of the mean is equal to the area under the curve to the left of the mean (area\= 0.5)
2. the curve approaches, but never touches zero
3. the area under the curve is exactly 1 by definition

New cards

Compute probabilities for normal distribution

EXCEL- NORM.DIST
If finding in between large-small

If finding the boundaries given a probability\=
NORM.INV

if X\> then a value then use complement 1-NORM.DIST

New cards

Assessing normality

histogram, box plot, mean\=median-mode

New cards

uniform distribution

also called the rectangular distribution because it has equal density for all possible outcomes of the random variable

New cards

Calculating uniform distribution

height \= 1/b-a

New cards

exponential distributions

used to measure the length of time between two occurrences of an event
related to poisson distribution

New cards

calculate exponential distribution

EXCEL- expon.dist
has a long right tail

lander\= reciprocal of the mean or is the rate specified in the question.

New cards

sampling distribution

whenever we collect a sample and calculate a statistic for that sample, that sample statistic is a random variable and has a distribution that are dependant on the population distribution and chosen sample size

the larger the sample size the closer the sample mean will be to the middle of the population

Sampling distribution helps us know how accurate the statistic is for estimating the corresponding population parameter

New cards

Sample mean of sampling distribution

always be the case that the mean of the sample means is the same as the population mean

New cards

sample distributions of the mean rules

1. The mean of the sample mean is always equal to the population mean
2. the standard error of the sample mean is equal to the population standard deviation divided by the square root of the sample size
3. the standard error of the sample mean is smaller than the population mean when n\>1 (and gets smaller as population grows)
4. If the population is normally distributed the sample mean also follows a normal distribution

New cards

If population is normally distributed

the sample mean will also be normally distributed

New cards

Sample proportions

pie\= true population mean, P\=sample stat, average value of sample proportion is equal to pie, standard error of sample proportion\= square root pie(1-pie)/n

New cards

How do we know if P is normally distributed?

needs to fulfill criteria of being n*pie\> or equal to 5 and n(1-pie)\> or equal to 5

New cards

How to calculate proportion distributions

NORM.DIST(put proportion in)

New cards

finite population correction factor (FPC)

An adjustment to the required sample size that is made in cases where the sample is expected to be equal to 5 percent or more of the total population

Always less than 1 so it will reduce the standard deviation as leads to increased accuracy as the sample size increases uncertainty decreases

square root(N-n/N-1)

New cards

Variance of the sample mean

sigma^2/n

New cards

If asks sample mean lies between which 2 values

then realise that the percentage when subtracted by 100 is the amount of probability the critical values lie at. thus 0.95 means one critical value at 0.025 and one at 0.9755

New cards

Question says probability is larger than X

1- Norm.dist

New cards

measures of normality

CLT holds and symmetric is an assumption

New cards

Confidence intervals

confidence intervals help us to estimate the population parameters

Now use sampling distribution to estimate

New cards

Point estimates for confidence intervals

Population mean- use the sample mean
Standard deviation known- use Z
standard deviation unknown- use standard error and T
population proportion- Z

New cards

Interpret confidence interval

for a 95% confidence interval, if we collect many samples of size n, and construct confidence intervals for each, 95% of them will contain the true unknown parameter

New cards

General formula for confidence intervals

point estimate+_ (critical value) *(standard error)

New cards

Finding critical value Za/2 for when sigma known and proportions- confidence intervals

NORM.S.INV(a/2)
a\= 100-significance level

New cards

90% conf

1.645

New cards

95% conf

1.96