1/77
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Sample standard deviation
Measure of spread or variability of a sample data set. The average distance of each data point.
Continuous numeric
Any value within a given range, includes fractions and decimals
Ordinal categorical
Represents categories with order and rank, such as ‘low,’ ‘medium,’ and ‘high’
Nominal categorical
No inherit order, groups items into categories without ranking
Continuous categorical
Grouping continuous (measurable) variables into distinct categories
Binary data
Categorises information/data into only two possible categories
Expected value
Average of all possible outcomes for a random variable, multiply each outcome by its probability and summing the results
Variance
How spread out a set of data is from the mean value. It is calculated as the average of the squared differences from the mean.
Random variables
Assigns a numerical variable to each outcome of a random phenomenon
Pr(X)
Probability someone/something is X (independent)
Pr(Y)
Probability someone/something is Y (dependent)
Pr(Y|X)
Probability someone/something is Y given they’re X
Continuous random variable
Variable that can take on any value within a given range
Standard error
The standard deviation of a sample’s distribution, used to measure how much a sample statistic (like the mean) id likely to differ from the true population parameter
Sampling distribution
Probability distribution of a statistic (like the mean or standard deviation) that is calculated from multiple, repeated random samples taken from the same population.
t-distribution
For small sample sizes or when the population standard deviation is unknown
F-distribution
Probability distribution of a ratio of two independent chi-square distributed random variables divided by their degrees of freedom. commonly used in ANOVA tests and linear regression modelling.
X2-distribution (chi-squared distribution)
With degrees of freedom is the distribution of a sum of the squares of independent standard normal variables
Z-distribution
A special normal distribution where the mean is 0 and the SD is 1
Z-score
Indicates how many standard deviations a data point is from the mean in a Z-distribution.
T-score
Standardises data allowing for comparisons between different datasets. Used to find the upper and lower bounds of a CI for a population mean, especially with small sample sizes. A smaller t-score suggests a greater difference between the sample mean and the population mean, while a smaller t-score indicates more similarity.
P-norm
Calculates the cumulative distribution function (CDF) for the normal distribution, CDF whose value is the probability that a corresponding continuous variable has a value less than or equal to the argument of the function. (R-function)
D-norm
Calculates the probability density function distribution, indicating how likely it is to observe a specific outcome. Returns the height of the normal curve at a specific point which represents the relative likelihood of that value occurring, but not the direct probability (R-function)
Q-norm
When given an area, it finds the boundary value that determines that area (R-function)
R-norm
Generates a random vector of numbers with a normal distribution (R-function)
Correlation
Calculates correlation coefficients between variables, calculates various types of correlation including Pearson, Spearman, and Kendall, and can be applied to individual vectors or entire data frames to generate correlation matrices (R-function)
Prop.test
For testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain values (R-function)
lm
Used to fit linear models. Can be used to carry out regression, single stratum analysis of variance and analysis of covariance (R-function)
Summary
Produces results summary (R-function)
chisq.test
Conducts Chi-squared tests for independence and goodness of fit (R-function).
Predict
Used to generate predictions based on fitted models, estimating outcomes for new data. (R-function)
fit
Used to create statistical models that best describe a dataset. (R-function)
Tukey HSD
Create a set of confidence intervals on the differences between the means of the levels of a factor with the specified family-wise probability of coverage. (R-function)
y-bar
The sample mean, represents the average value of a set of observations in a dataset.
x-bar
The sample mean representing the average of the independent variable in a dataset.
p-hat
The sample proportion, representing the estimated probability of a certain outcome in a dataset.
Null hypothesis
A statement that there is no effect or no difference, used as a starting point for statistical analysis and hypothesis testing.
Alternate hypothesis
A statement that indicates the presence of an effect or a difference, opposing the null hypothesis in statistical testing.
P-value
the probability of obtaining results at least as extreme as the ones observed, assuming the null hypothesis is true
μd
represents the mean difference between paired observations in hypothesis testing.
Normal model
A statistical model that represents data distributions characterized by a bell-shaped curve, defined by its mean and standard deviation.
chi-squared model test
A statistical method used to determine if there is a significant association between categorical variables by comparing observed frequencies to expected frequencies.
Linear regression model
Used to model the relationship between two variables and estimate the value of a response by using a line-of-best-fit
Binomial model
A statistical model that describes the number of successes in a fixed number of independent Bernoulli trials, used for binary outcome predictions.
Bernoulli trial
A random experiment with exactly two possible outcomes, commonly referred to as success and failure. Each trial is independent, and the probability of success remains constant.
Hypothesis test
A statistical method used to determine whether there is enough evidence to reject a null hypothesis in favor of an alternative hypothesis.
True difference
The actual difference in outcomes between two groups being compared in a statistical hypothesis test.
β1
slope coefficient in a simple linear regression model, representing the average change in the dependent variable (Y) for a one-unit increase in the independent variable (X)
β0
intercept term in a regression model, representing the average value of the dependent variable (Y) when all independent variables (X) are zero
β̂0
refers totheestimated y-interceptof a regression line, based on a sample of data
β̂1
the estimated slope from the sample data
β̂₂
the estimated coefficient for the second predictor variable in a multiple regression model.
β₂
true regression coefficient for the second predictor variable in the population regression model.
Appropriate multiplier
used to adjust estimates based on sample size or variability in statistical analysis.
R2
represents the proportion of variance in the dependent variable that can be explained by the independent variables in a regression model.
Non-constant variance
a value or factor that changes rather than staying fixed
Independence assumption
is the assumption that the observations are statistically independent of each other in a given analysis or model.
dependence assumption
is the assumption that the observations are correlated or related in some way, and thus may influence each other in statistical analysis.
linearity assumption
is the assumption that the relationship between the independent and dependent variables can be accurately modeled as a straight line.
F-value
Used in ANOVA and regression analysis to check whether your model explains a significant amount of variation in the outcome variable Y
ANOVA
a statistical test used to determine if there are significant differences between the means of three or more groups
X2-test
a statistical hypothesis test used to determine if there is a significant difference between observed and expected frequencies
Selection Bias
a systematic error that occurs when a sample used for analysis does not accurately represent the population it is intended to study, leading to skewed and unreliable conclusions
Information bias
a systematic error resulting from inaccurate measurement or reporting of data, which can lead to distorted outcomes and conclusions.
Research studies
are investigations designed to test hypotheses, analyze data, and derive conclusions about health-related questions.
CARE
refers to the process of ensuring that health interventions are effectively implemented and sustained in practice, focusing on patient-centered outcomes and quality of care.
Type I error
occurs when a null hypothesis is incorrectly rejected, suggesting a false positive result.
Type II error
occurs when a null hypothesis is incorrectly accepted, indicating a false negative result.
Population parameters
are numerical characteristics or measures of a population, such as means or variances, that summarize the data.
Strata/stratum
are distinct subgroups within a population, often used in stratified sampling to ensure representation across different segments.
Var[Y]
represents the variance of the random variable Y, indicating the spread or dispersion of its possible values.
Sample
a subset of individuals or observations selected from a larger population to make inferences about the whole group
Population
the entire, well-defined set of individuals, organisms, or data points (e.g., all patients with a specific disease, all hospitals in a region) that a researcher aims to study and make inferences about
Sampling
the process of selecting a representative subset of individuals from a larger population to make inferences about health-related characteristics
Continuous variables
Can take on any value, e.g. Height, Weight, Age, Blood pressure, often interested in the mean or average value
Investigating samples
Summarise in tables and graphs, if categorical data we can present proportions or percent, if continuous usually want to know where the centre is (central tendency) and how spread out the data is
Errors
Two different sorts
1) Errors that make our answers more uncertain i.e. more variability (can’t be avoided)
2) Errors that move us away from the truth (important to avoid)
95% confidence interval
A range of values, calculated from sample data, that is likely to contain the true population parameter (such as the mean) 95% of the time if the study were repeated
Calculation: Sample Mean ± (Critical Value × Standard Error)