1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Types of Evidence
Personal testimony
Reputable research journal
Reproducible research
Nature of data collection
Confounding Variables
Confounding (or confusion) occurs when the Treatment and Control Groups differ by some third variable which influences the response that is being studied
Selection bias
When participants are more likely to be chosen than others.
Randomised Controlled Trial
It involves randomly assigning participants to different groups (treatment and control) to receive different interventions.
Randomised Controlled Double Trial
participants are randomly assigned to treatment or control groups, and neither the participants nor the researchers know who receives the treatment
Consent bias
When participants choose whether or not they take part in the experiment
Survivor bias
Only happens after the study
An observed "improvement" may happen because there are dropouts of the sickest subjects
Adherer bias
Certain participants (adherers) keep taking treatment (placebo) as opposed to non-adherers = "improvement" in treatment group due to the adherers
Observational studies
An observational study is one in which the investigator cannot use randomisation for allocation to groups. The assignment of subjects is outside the control of the investigator.
What are the three precautions of observational studies?
Cannot establish causation – Observational studies can only show associations, not direct cause-and-effect relationships.
May appear like an RCT – They can resemble randomized trials in design but lack random assignment, which introduces bias.
Subject to confounding – Results can be misleading if other hidden variables (confounders) influence both the independent and dependent variables.
What is simpson’s paradox?
It’s when a trend appears in separate groups of data but reverses or disappears when the groups are combined, often due to a confounding variable. It highlights how misleading conclusions can arise if data isn't properly stratified.
What is an IDA?
IDA is a first general look at the data, without formally answering the research questions.
What are the four things involved in IDA?
Data background: checking the quality and integrity of the data
Data structure: What information has been collected
Data wrangling: Scraping, cleaning, tidying, reshaping, splitting, combing
Data summaries: Graphical and numerical
What is a variable in data analysis?
A feature or attribute measured about each subject; in tidy data, these are columns.
What is data in statistics?
Information about the set of subjects being studied, usually referring to a sample, not the full population.
What does IDA stand for and what does it do?
Initial Data Analysis – it gives a general look at the data to understand its quality, structure, and suitability for answering research questions.
What are the four main steps involved in IDA?
1. Data background, 2. Data structure, 3. Data wrangling, 4. Data summaries.
What is a variable in data analysis?
A feature or attribute measured about each subject; in tidy data, these are columns.
What is the difference between qualitative and quantitative variables?
Qualitative variables describe categories, while quantitative variables represent numeric measurements.
What is the rule of thumb for the number of histogram bins?
Use between 10–15 bins to avoid over- or under-condensing the data.
What is a density histogram?
A histogram where block area shows the percentage of subjects; total area equals 100%.
How do you calculate the IQR?
IQR = 75th percentile – 25th percentile.
How are outliers defined in a boxplot?
Values below LT (Q1 – 1.5×IQR) or above UT (Q3 + 1.5×IQR) are outliers.
What are the different types of histograms?
Standard histogram and density histogram.
What is a sliced histogram?
A histogram sliced by a qualitative variable to show its distribution within intervals.
What are the three types of boxplots mentioned?
Simple, comparative (filtered by a qualitative variable), and filtered with color/shape.
What are the main features of numerical summaries?
Maximum, minimum, centre (mean, median), and spread (standard deviation, range, IQR).
What is the mean?
The unique balancing point of the histogram where left and right sides cancel out.
What is the median?
The middle value when data is ordered; splits the data in half.
When is the median more useful than the mean?
When data is skewed or contains outliers, since the median is robust.
What is robustness in statistics?
A robust statistic is not affected by outliers; e.g., the median and IQR.
How do mean and median compare in different data shapes?
Symmetric: mean ≈ median
Left-skewed: mean < median
Right-skewed: mean > median
What is standard deviation?
The root mean square of the gaps from the mean; measures data spread.
What is the IQR?
Interquartile range = Q3 - Q1; it’s the spread of the middle 50% of data.
When is IQR more appropriate than standard deviation?
For skewed data, because IQR is robust and not influenced by outliers.
What do standard deviation intervals represent?
68% of data within 1 SD
95% within 2 SD
99.7% within 3 SD
What is a z-score (standard unit)?
The number of standard deviations a value is from the mean.