Biostatistics 1: Descriptive Statistics

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/21

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

22 Terms

1
New cards

What is the rough outline of research projects?

1) Study design.

2) Data collection.

3) Data cleaning.

4) Data analysis.

5) Interpretation.

6) Conclusions and write-up.

2
New cards

What does “bias” mean?

(N.B. 'bias' here just means incorrect measurements that are incorrect in some systematic way, it's not meant to imply deliberate malice)

3
New cards

Why do we take a representative sample of a population?

This means our study cohort should have roughly the same proportions of different groups within the overall population. Very important if it can apply to wider populations.

  • Expensive, unrealistic, time consuming to test the entire population

4
New cards

What is margin of error?

Expected differences between the study cohort and whole population can be expressed as a 'margin of error'.

5
New cards

What are the two components of data analysis?

Descriptive analysis: start here!

• What's happening in your data?

• Describe overall trends, number and proportions by group, etc

Statistical analysis:

• Apply hypothesis tests, statistical modelling, predictive modelling, etc.

• Exact steps here depend on study design, what data is available, etc.

• Might involve getting a computer to do calculations - remember we still need our human brains to set that up and interpret the outputs!

6
New cards

What is quantitative / numerical data?

  • Two categories

• Continuous: can take any possible real value. e.g. temperature, distance.

• Discrete: can only take particular values. e.g. number of animals with a disease, number of languages a person speaks, result of rolling a dice.

7
New cards

What is qualitative / categorical data?

  • Two categories

• Nominal: distinct, unordered categories. e.g. hair colour, gender, superhero team.

• Ordinal: categories with some order or hierarchy. e.g. order of people finishing a race, student satisfaction ratings, movie rating.

8
New cards

Data types can be ________.

Transformed

9
New cards

What are the measures of central tendancy?

Mean, median, and mode (the three types of 'averages')

10
New cards

What are measures of dispersion?

• Maximum and minimum (not necessarily unique)

• Variance, standard deviation, and interquartile range (how 'spread out' is the data?)

11
New cards

The ______ value reflects where a majority of the data actually lies.

median

12
New cards

What is the variance?

Variance, usually written as s-squared or (greek letter) sigma-squared.

13
New cards

What is standard deviation?

Standard deviation (written as s or sigma) is the square root of the variance.

14
New cards

Variance and standard deviation are the…

Measure of how spread out data is around the middle

15
New cards

What is the interquartile range?

The median is the 'middle' value of the data, and can be found by lining the data points up in increasing value and finding the middle one.

The 'quartiles' are the 'middle' value of each half after splitting by the median.

• '1st quartile' is in the lower half

• '3rd quartile' is in the upper half

• (the median is technically the '2nd quartile')

The IQR is the 3rd quartile minus the lst quartile.

16
New cards

What are box plots most useful for?

Really useful for showing differences between groups of continuous data.

• Middle line of the box is the median.

• Top & bottom edges are Q3 and Q1.

• Lines extend to max and min.

• Height of the box is IQR.

17
New cards

What are bar plots most useful for?

Useful for showing 'counts' of each item/group of categorical data.

Also useful for comparing groups.

18
New cards

What are histograms useful for?

Groups together data

19
New cards

What is a density plot useful for?

Useful when you want to understand the distribution of continuous data.

For each value of the data (x axis), shows the proportion of data points with that value (y axis).

Basically a really smoothed-out histogram.

20
New cards

What is a pie chart useful for?

Can be useful for showing proportions of a total amount.

Often better options that communicate results more clearly.

21
New cards

What are scatter plots useful for?

Useful when looking for associations between two continuous variables. -

  • Can compare groups within data.

  • Can show 'best fit' lines of statistical models.

22
New cards

What are line plots useful for?

Useful e.g. for plotting continuous data over time.

  • Lines between data points implies connection and can be useful for guiding the eye of the reader.

  • Not always appropriate e.g. for unconnected data on the same plot.