1/21
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the rough outline of research projects?
1) Study design.
2) Data collection.
3) Data cleaning.
4) Data analysis.
5) Interpretation.
6) Conclusions and write-up.
What does “bias” mean?
(N.B. 'bias' here just means incorrect measurements that are incorrect in some systematic way, it's not meant to imply deliberate malice)
Why do we take a representative sample of a population?
This means our study cohort should have roughly the same proportions of different groups within the overall population. Very important if it can apply to wider populations.
Expensive, unrealistic, time consuming to test the entire population
What is margin of error?
Expected differences between the study cohort and whole population can be expressed as a 'margin of error'.
What are the two components of data analysis?
Descriptive analysis: start here!
• What's happening in your data?
• Describe overall trends, number and proportions by group, etc
Statistical analysis:
• Apply hypothesis tests, statistical modelling, predictive modelling, etc.
• Exact steps here depend on study design, what data is available, etc.
• Might involve getting a computer to do calculations - remember we still need our human brains to set that up and interpret the outputs!
What is quantitative / numerical data?
Two categories
• Continuous: can take any possible real value. e.g. temperature, distance.
• Discrete: can only take particular values. e.g. number of animals with a disease, number of languages a person speaks, result of rolling a dice.
What is qualitative / categorical data?
Two categories
• Nominal: distinct, unordered categories. e.g. hair colour, gender, superhero team.
• Ordinal: categories with some order or hierarchy. e.g. order of people finishing a race, student satisfaction ratings, movie rating.
Data types can be ________.
Transformed
What are the measures of central tendancy?
Mean, median, and mode (the three types of 'averages')
What are measures of dispersion?
• Maximum and minimum (not necessarily unique)
• Variance, standard deviation, and interquartile range (how 'spread out' is the data?)
The ______ value reflects where a majority of the data actually lies.
median
What is the variance?
Variance, usually written as s-squared or (greek letter) sigma-squared.
What is standard deviation?
Standard deviation (written as s or sigma) is the square root of the variance.
Variance and standard deviation are the…
Measure of how spread out data is around the middle
What is the interquartile range?
The median is the 'middle' value of the data, and can be found by lining the data points up in increasing value and finding the middle one.
The 'quartiles' are the 'middle' value of each half after splitting by the median.
• '1st quartile' is in the lower half
• '3rd quartile' is in the upper half
• (the median is technically the '2nd quartile')
The IQR is the 3rd quartile minus the lst quartile.
What are box plots most useful for?
Really useful for showing differences between groups of continuous data.
• Middle line of the box is the median.
• Top & bottom edges are Q3 and Q1.
• Lines extend to max and min.
• Height of the box is IQR.
What are bar plots most useful for?
Useful for showing 'counts' of each item/group of categorical data.
Also useful for comparing groups.
What are histograms useful for?
Groups together data
What is a density plot useful for?
Useful when you want to understand the distribution of continuous data.
For each value of the data (x axis), shows the proportion of data points with that value (y axis).
Basically a really smoothed-out histogram.
What is a pie chart useful for?
Can be useful for showing proportions of a total amount.
Often better options that communicate results more clearly.
What are scatter plots useful for?
Useful when looking for associations between two continuous variables. -
Can compare groups within data.
Can show 'best fit' lines of statistical models.
What are line plots useful for?
Useful e.g. for plotting continuous data over time.
Lines between data points implies connection and can be useful for guiding the eye of the reader.
Not always appropriate e.g. for unconnected data on the same plot.