Data1001 Exam

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/53

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

54 Terms

New cards

Types of Evidence

Personal testimony
Reputable research journal
Reproducible research
Nature of data collection

New cards

Confounding Variables

Confounding (or confusion) occurs when the Treatment and Control Groups differ by some third variable which influences the response that is being studied

New cards

Selection bias

When participants are more likely to be chosen than others.

New cards

Randomised Controlled Trial

It involves randomly assigning participants to different groups (treatment and control) to receive different interventions.

New cards

Randomised Controlled Double Trial

participants are randomly assigned to treatment or control groups, and neither the participants nor the researchers know who receives the treatment

New cards

Consent bias

When participants choose whether or not they take part in the experiment

New cards

Survivor bias

Only happens after the study
An observed "improvement" may happen because there are dropouts of the sickest subjects

New cards

Adherer bias

Certain participants (adherers) keep taking treatment (placebo) as opposed to non-adherers = "improvement" in treatment group due to the adherers

New cards

Observational studies

An observational study is one in which the investigator cannot use randomisation for allocation to groups. The assignment of subjects is outside the control of the investigator.

New cards

What are the three precautions of observational studies?

Cannot establish causation – Observational studies can only show associations, not direct cause-and-effect relationships.
May appear like an RCT – They can resemble randomized trials in design but lack random assignment, which introduces bias.
Subject to confounding – Results can be misleading if other hidden variables (confounders) influence both the independent and dependent variables.

New cards

What is simpson’s paradox?

It’s when a trend appears in separate groups of data but reverses or disappears when the groups are combined, often due to a confounding variable. It highlights how misleading conclusions can arise if data isn't properly stratified.

New cards

What is an IDA?

IDA is a first general look at the data, without formally answering the research questions.

New cards

What are the four things involved in IDA?

Data background: checking the quality and integrity of the data
Data structure: What information has been collected
Data wrangling: Scraping, cleaning, tidying, reshaping, splitting, combing
Data summaries: Graphical and numerical

New cards

What is a variable in data analysis?

A feature or attribute measured about each subject; in tidy data, these are columns.

New cards

What is data in statistics?

Information about the set of subjects being studied, usually referring to a sample, not the full population.

New cards

What does IDA stand for and what does it do?

Initial Data Analysis – it gives a general look at the data to understand its quality, structure, and suitability for answering research questions.

New cards

What are the four main steps involved in IDA?

1. Data background, 2. Data structure, 3. Data wrangling, 4. Data summaries.

New cards

What is a variable in data analysis?

A feature or attribute measured about each subject; in tidy data, these are columns.

New cards

What is the difference between qualitative and quantitative variables?

Qualitative variables describe categories, while quantitative variables represent numeric measurements.

New cards

What is the rule of thumb for the number of histogram bins?

Use between 10–15 bins to avoid over- or under-condensing the data.

New cards

What is a density histogram?

A histogram where block area shows the percentage of subjects; total area equals 100%.

New cards

How do you calculate the IQR?

IQR = 75th percentile – 25th percentile.

New cards

How are outliers defined in a boxplot?

Values below LT (Q1 – 1.5×IQR) or above UT (Q3 + 1.5×IQR) are outliers.

New cards

What are the different types of histograms?

Standard histogram and density histogram.

New cards

What is a sliced histogram?

A histogram sliced by a qualitative variable to show its distribution within intervals.

New cards

What are the three types of boxplots mentioned?

Simple, comparative (filtered by a qualitative variable), and filtered with color/shape.

New cards

What are the main features of numerical summaries?

Maximum, minimum, centre (mean, median), and spread (standard deviation, range, IQR).

New cards

What is the mean?

The unique balancing point of the histogram where left and right sides cancel out.

New cards

What is the median?

The middle value when data is ordered; splits the data in half.

New cards

When is the median more useful than the mean?

When data is skewed or contains outliers, since the median is robust.

New cards

What is robustness in statistics?

A robust statistic is not affected by outliers; e.g., the median and IQR.

New cards

How do mean and median compare in different data shapes?

Symmetric: mean ≈ median
Left-skewed: mean < median
Right-skewed: mean > median

New cards

What is standard deviation?

The root mean square of the gaps from the mean; measures data spread.

New cards

What is the IQR?

Interquartile range = Q3 - Q1; it’s the spread of the middle 50% of data.

New cards

When is IQR more appropriate than standard deviation?

For skewed data, because IQR is robust and not influenced by outliers.

New cards

What do standard deviation intervals represent?

68% of data within 1 SD
95% within 2 SD
99.7% within 3 SD

New cards

What is a z-score (standard unit)?

The number of standard deviations a value is from the mean.

New cards