1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
pros and cons of IQR
+Less affected by outliers cuz it only considers middle 50% of data
-= doesn’t consider entire dataset
- shouldn’t use IQR on its own, should consider range and IQR
pros and cons of mean
+ considers entire data set
- affected by outliers
- not a good measure if data is skewed as it might not reflect central location
pros and cons of median
+ less affected by outliers
+ better measure of central location when data is skewed
- only considers middle value leaves out rest of data
pros and cons of variance
+ shows how observations differ from sample mean
- harder to interpret because it is in squared units (hence used standard deviation)
- affected by outliers and skewed data
pros and cons of SD
+ gives you how far data deviates from mean in the same unit
- is affected by outliers and skewed data
coefficeint of variation
how big is SD comparative to sample mean
+ can be used as a relative measure and can be used to compare between data sets. shows variability.
- affected by outliers
Covariance
- only tells you direction of association either positive or negative
+ gives you an idea how data is related/ the trend between data
unit of covariance is the units of each variable multipled
shows the direction of linear relationship between random variables
Correlation
+ tells you direction and strength
- doesn’t prove casusation, can overlook other variables
pros and cons of pie charts
+ each slice represents the categories proportion of the whole and is good for categorical data
- can be confusing if there’s too many categories as it can be hard to discern the relative size of each slice in comparison to the whole
Pros and cons of bar charts
+ where the height of bar is proportional to value it represents
+ effective for emphasising differences between groups
- can become cluttered if too many categories are included
pros and cons of histograms
+ good for finding information about how data is skewed or symmetric
used for quantitative continuous data over a continuous interval
- it is very sensitive to the bin size chosen and it can be hard to compare between data sets.
pros and cons of scatterplots
+ shows the relationship between 2 variables
- they don’t prove causation
- it can get visually confusing if two many data points are clustered in one region, covering each other.
pros and cons of box and whisker plot
+ good for large data sets
+ good for comparison
- can’t identify mode
- doesn’t show shape of data
Marginal distribution
The marginal distribution of a random variable is the probability distribution of that variable alone, obtained by summing the joint probabilities over all possible values of the other variable.
Joint distribution and its properties
The joint distribution allows us to understand the relationship between two random variables by considering the probability of different combinations of their values. RVs are jointly distributed if their probs are defined over the same sample space

Characteristics of a PDF
f(x) is a non negative value ie f(x)>/0 ie. probabilities can’t be negative
the value of x/the rv cuz probabilities can’t be negative
- RV can be any number from -infinity to + infinity
the total area under the PDF is equal to 1
probability of each individual point under pdf=0
probabilities are represented by the area under the pdf

Characteristics of a normal distribution
distribution is bell shaped
distribution is symmetric
the mean=mode=median
the mean determines the location of the distribution and the variance determines the width of distribution/spread
single peak at the mode
X can take any value from -infinity to infinity
what is a continuous random variables
A variable with an uncountably infinite number of outcomes eg. the time taken to download a file or the weekly spending of qm 1 students
Key charactersitics of continuous random variables
the variable can take on any value within a specific range (rather than just discrete countable values)
any individual point has a probability of 0
Pr(a<X<B)=Pr(a </X</B)
Characteristics of T distribution
bell shaped
symmetric around mean
has heavier tails
degree of freedom determines width of tails
has a higher variance than standard normal distribution
what percentage is 1,2,3 sds from mean
68.27%, 95.45%,99.73%
characteristics of standard normal
mean of 0, variance of 1
symmetric distribution
for t distribution what is e(x) and var(x)
if v>1, e(x)=0
if v>2, var(x)=v/v-2 so variance is always greater than or equal to 1
as degrees of freedom increases, it approaches standard normal and and fat tails vanish

What is a random variable
it is a rule or function that assigns a numerical value to each outcome of a random experiment
what is a discrete random variable
a variable with a finite or countably infinite number of outcomes
conditional mean
the expected value of one variable for a given value of another variable
what is binomial distribution
The binomial distribution is a
discrete probability distribution
that models the number of
successes in a fixed number of
independent trials, where each
trial has only two possible
outcomes - success or failure