1/47
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Variable
Something that can be measured or manipulated
Dependent variable
What is measured (the outcome) (The y axis)
Independent variable
What is changed (predictor variable) (The x axis)
Ordinal
Measured on a scale
Continuous
Measured with numbers
Model
Simplified representation of a system
Frequency distribution (empirical)
Associates each possible outcome with a frequency value
Uniform distribution (theoretical)
Probability is uniformly spread across all atcomes
Normal distribution
Aka the bell curve OR the Gaussian distribution - for continuous data, symmetrical around the mean.
Standard deviation
Quantifies degree of dispersion
Parameters (normal distribution)
The mean and standard deviation, it is a property of a distribution
Mode
Most frequently occurring value, peak value in frequency distribution.
Median
Value in the middle. If in even number it is the mean of the two middle numbers. It is a theoretical value.
Mean
Sum of all the values divided by the number of values.
Bimodal graph
Has 2 modes, median and mode are within them
Kurtosis
Fancy word for now spiky the distribution is, the higher the spike, the more the mean will fit.
Bar plot
X is categorical, Y is categorical
Box plot
X is categorical, y is numerical
Scatterplot
X is numerical, y is numerical
Conditional means
Means that shift depending on what valve some other piece of data assumes
Bivariate statistics
Describing the relationship between 2 variables
Regression line
Represents the average which is the relationship between the x and y axis
Slopes
The change in y over the change in x
Intercepts
The point where the line starts on the y-axis
Coefficients of the regression model
Slope and the intercept
Fitted value
A prediction of a different value by fitting a regression model onto a dataset.
Residuals
Represent information that is left over after removing the effect of explanatory variables. (Represented by the e at the end of the regression equation to show error)
Observed values =
Fitted values plus residuals
Null model
Residuals are from the mean of the dependent variable.
Residuals
The difference between a plot of data and the coefficient.
SSE
Squared sum of error
R squared
1- SSE model / SSE null
What does r squared show?
Closer to 1, there is less error, closer to 0, lots of error
Residual formula
Observed value - fitted value
Correlation
Not equal to causation- but variables do have a relationship
Positive correlation
Relationship is linear and is going up
Negative correlation
Relationship is going downwards
Persons R
A number that tells you how strongly correlated results are
What does Pearson R show?
Goes from 1 to 1, negative correlation is minus and vice versa close to zero - not much correlation
Centering
Linear transformation, center a predictor variable by subtracting the mean of it from each datapoint
Z -score
A value that has been transformed to a unit that quantifies how far it is from the mean (during centering)
Linear transformation
Changing all the numbers, values aren't changed
Log-transformation
The power to which a base most be raised to yield a given number - logarithms need bases
Non-linear transformations
Large numbers are affected more than small ones
Log e
2.72
R: mutate ()
Computes all linear transformations and makes another column for the Z -scores
Log transformations in r
Log ()
R: manually adding slope and intercept
Geom_abline(aes(intercept = x, slope = x))