Looks like no one added any tags here yet for you.
Quantitative Data
Discrete and Continuous
Discrete Data
Must be measured in specific order / values, such as number of students in a class
Continuous Data
Measured infinitely such as age, height, time
Qualitative Data
Categorical, ordinal and nominal
Ordinal Data
Places in order and conveys a ranking such as clothing sizes (small, medium large)
Nominal Data
Does not convey ranking such as ethnicity, gender
What type of data is the number of cars a family owns?
Discrete
What type of data is the type of accommodation (such as budget, tourist, superior)
Ordinal - conveys a ranking
What type of data is favourite fruit preference at the market?
Nominal, conveys no ranking
What type of data is time spent at the market?
Discrete - measures time which is a specific value
Weekly household spending is divided into these groups: less than $50, $50-$100, $150-$200. What type of variable is this?
Categorical & Ordinal (defines categories and placed in order to convey a ranking)
Cross tabulation
Compares categorical with categorical
scatter plot
Compares numerical with numerical
frequency table
analyses 1 categorical variable. E.g. the fave stall of people at the market
Stacked / clustered bar chart
compares categorical with categorical e.g. proportion of M/F choosing fave stall
Relative frequency histogram
compares categorical with numerical (e.g. market spend of various occupational groups)
If the 2 variables are, "favourite stall" and "if visitors are regular or not", ac ross tabulation should be used because,
both variables are categorical and define a particular category
Mean
simple average
median
middle most value (when ranked from ascending to descending)
mode
most frequent
trimmed mean
without most extreme 5%
Range
maximum - minimum
interquartle range
75th percentile minus 25th percentile
variance
represents spread of data around the mean. Standard deviation squared
standard deviation
square root of variance, higher spread means more spread
co-efficient of variation
compares different groups with different magnitudes to compare variability
skewness
positive = right negative = left
significantly skewed
data is skewed more than twice its standard error
mode
median
Kurtosis
measures the extent to which observations cluster around the central point
What is it called when the kurtosis statistic is zero?
normal distribution
data clusters close to centre: positive or negative kurtosis?
positive
data clusters further from centre: positive or negative kurtosis?
negative
co-variance
measures co-movement between 2 variables
correlation of co-efficient
measures the linear relationship between 2 variables
What graph would measure the following: comparing time spent at the market average income
scatterplot, as it measures numerical by numerical
population
whole collection under analysis
sample
a portion of the population
parameter
summary measure describing a characteristic of the data, a type of rule or limit
statistic
summary measure computed to describe a characteristic of a sample
primary data
collected yourself
secondary data
taken from another source
observational data
you observe and record
experimental data
data you've obtained through experiments
simple random sampling
everyone is equally likely to get chosen from the population. E.g. randomly picking a certain number of students
systematic random sampling
having a system when randomly selecting sample. E.g. randomly selecting a sample then every K'th sample thereafter
Stratified random sampling
dividing populaiton into homogenous groups (similar characteristics) then taking random sample, e.g. dividing students by which degree they take then taking random sample
cluster sampling
dividing population into several clusters that aren't homogenous but are each representative of the population then taking a random sample
You want to sample residential halls but worry that a random sample wont include the small halls. Which sampling method should you use?
Stratified random sampling
Non sampling errors
human errors
coverage errors
when the sample has targeted the wrong subjects
non-response error
when subject chooses to not respond, impacting the data
measurement error
caused by bad question and misunderstanding
margin of error
quantified measure of sampling error
probability
how likely an event is to occur
how is probability written
P(event)
What is U in probability
union - probability of one event occurring over another
what is 'n' in probability
intersection - probability that both events occur together
collectively exhaustive
when the outcomes given are the only possible outcomes
complement
2 events complement each other if their probabilities add to 1. E.G. P(a) + P(b) =1
A Priori Classical
when you already know the probability exists through information
Empirical (relative frequency)
when you choose to work out the probability through experiments rather than information
Subjective
when the probability is based in your opinion
Conditional Probability
the probability of an event occurring given that another event has already occurred.
How is conditional probability written
P(A I B) e.g. P(Student I Female) "what is the probability that it is a student and they're female"
how is conditional probability calculated?
P(a n b) / P(b)
Marginal probability
total probability of a row or column
Probability independence
when the probability of one event does not influence the probability of another event occurring
When does co-variance = 0?
when variables are independent
Random Variables
variables with multiple possible values and an associated probability of getting each variable
Discrete Random Variables
can only take on a finite number of variables, e.g. the number of 6's rolled on a dice over 2 rolls: there can only be either 0 sixes, 1 six, or 2 sixes.
Expected Value defined
the value we expect based on the probabilities that exist.
Expected Value formula
E = ∑ [x • P(x)]
Variance
measures data spread around the mean
Variance formula
V(X) = ∑ [p(xi) + (xi-M)^2]
Binomial Distribution
discrete probability distribution with 4 characteristics
what are the 4 binomial characteristics
has to be 2 outcomes to every trial (success or fail)
fixed number of trials
probability of success remains the same for every trial
trials are independent, where the outcomes don't affect each other).
Discrete Random Variables
Cannot be divided, whole numbers, e.g. number of phone calls in a day, number of visitors
Expected Value
what we expect based on previous data. Formula: E(x) = (0 x 0.25) + (1 x 0.5) + (2 x 0.25) = 1
Variance
spread of the data. Formula is similar to expected value: V(x) = ((0² x 0.25) + (1² x 0.5) + (2² x 0.25))-1²
Poisson Probability Distribution
A discrete probability distribution used to find probabilities of the number of times a certain event occurs in a specified time interval (no fixed number of trials)
4 characteristics of Poisson
number of successes in trial is independent of number of successes in any other interval
Probability is the same for all equal sized intervals
probability of success in a trial is proportional to the size of the interval
probability of more than one success in an interval approaches zero as it becomes smaller
Empirical Rule
68% = 3 standard dev 95% = 2 standard dev 100% = 1 standard dev
normal distribution
A function that represents the distribution of variables as a symmetrical bell-shaped graph.
Standardized Z-Distribution
mean = 0 standard deviation = 1
How to recognise if data is normally distributed
graph is mound shaped and symmetrical
mean = median
empirical rule applies (68=3, 95=2, 100=3)
skewness & kurtosis close to 0
Graphs to show normally distributed data
histogram
box plot
stem & leaf
qq pp plot
What does a sample statistic do
makes an inference on a population parameter if you cant sample an entire population.
A quantitative estimate involves
a mean "what is the mean grade of the students"
what are x̅ and μ
x̅ represents the mean in a sample statistic, and μ is the same as x̅, but it represents the whole (parameter) population
A qualitative estimate involves
a proportion "what proportion of the population is from christchurch
Interval Estimates
estimations of a range of values of a population parameter. E.g. we expect μ to fall within $75-$100, or, we expect P to fall within 0.25-0.50
Point Estimates
estimates an exact value of a parameter using a single value. Unlikely to estimate correctly so use interval estimate instead
how to calculate confidence intervals
point estimate plus or minus margin of error (confidence level x standard error)
standard error
is the standard deviation of sample mean/proportion and represents the sample mean/proportions accuracy
when would you use the z distribution when trying to estimate a confidence interval
when the population standard deviation is known
the sample is normally distributed or, sample is large
When would you use the t distribution when trying to estimate a confidence interval
population standard deviation is unknown
sample is normally distributed or, is large
when would you use the Z distribution when trying to estimate a confidence interval
for proportions as you'll always know the population ST.D
What are the Z values
99% = 2.576 95% = 1.96 90% = 1.645