Categorical variables (measured in groups/categories)
Nominal - values used to represent categories, but no order (e.g. eye color, icecream flavor)
Ordinal - values have an order/ hierarchy, but we don’t know how big the difference bet. the things is (e.g. shirt sizes)
Continuous variables (measures on a continuous scale)
Interval - equal differences in values = equal difference in property, but 0 does not indicate abscence of property (e.g. IQ test - 0 does not mean no intelligence)
Ratio - like interval, but has an absolute 0 and measurements can be compared by calculating ratios (e.g. height)
graphs
numerical values describing specific features - descriptive stats
Bar charts - nominal level measurement
Proportion/percentages
absolute frequency (count) = no. of times a value is observed
relative frequency (percentage) = no. of times value is observed, as a percentage (aka relative to total number of observations)
valid frequency = relative frequency compared to people who gave a valid answer
cumulative frequency = add relative frequencies of a group to percentages of prev group
Mode = the value that occurs the most (only for categorical data)
Histogram - interval/ratio
Mean = sum of values divided by no. of values (only for interval/ratio data)
Measure of center = the point around which most data is concentrated (e.g mean, mode, median)
Spread = how much data values differ from each other, and for how much data values differ from the measure of centre (big spread = data varies more)
Measure of spread: range, mean absolute deviation, variance, standard deviation
Variance = the mean squared deviation of values, with respect to the mean (flaw: unit of variance is dif. from unit of variable, so it’s harder to interpret) (small variance = small spread)
Standard deviation (sigma σ) = square root of variance = measure of spread around the mean (perk: more emphasis to extreme)
do Sum of Squares: calculate the mean, for every score subtract the mean (these are deviations), square all the deviations, add all these squared deviations to get Sum of Squares
find variance: σ2 = SS devided by n
find root to get st. dev.: σ =square root of variance
Box plot - interval/ratio
Median = order values low to high, count how many they are, divide by 2 and round up, count from beginning to that value // if even no. of values, take mean of middle 2 values
Range = maximum value - minimum value
Quartiles - order values, find median (Q2), find medians of those halves (Q1, Q3)// for even, medians will be means of values
Interquartile range (IQR) = middle 50% of data bet. Q1 and Q3
Outlier - can heavily influence mean and standard dev., but median stays the same
Nested bar chart - for nominal data
Proportions & conditional proportions
Side-by-side boxplots - nominal + interval/ratio
Median & IQR
Scatter plot - interval/ratio
[-1, +1] - sign shows direction of linear relationship, magnitude shows strength (closer to 1 = strong rel)
!! no correlation =/= no relation + careful w. outliers