Data1001 Exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/53

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

54 Terms

1
New cards

Types of Evidence

  • Personal testimony

  • Reputable research journal

  • Reproducible research

  • Nature of data collection

2
New cards

Confounding Variables

Confounding (or confusion) occurs when the Treatment and Control Groups differ by some third variable which influences the response that is being studied

3
New cards

Selection bias

When participants are more likely to be chosen than others.

4
New cards

Randomised Controlled Trial

It involves randomly assigning participants to different groups (treatment and control) to receive different interventions. 

5
New cards

Randomised Controlled Double Trial

participants are randomly assigned to treatment or control groups, and neither the participants nor the researchers know who receives the treatment

6
New cards

Consent bias

When participants choose whether or not they take part in the experiment

7
New cards

Survivor bias

  • Only happens after the study

  • An observed "improvement" may happen because there are dropouts of the sickest subjects

8
New cards

Adherer bias

Certain participants (adherers) keep taking treatment (placebo) as opposed to non-adherers = "improvement" in treatment group due to the adherers

9
New cards

Observational studies

An observational study is one in which the investigator cannot use randomisation for allocation to groups. The assignment of subjects is outside the control of the investigator.

10
New cards

What are the three precautions of observational studies?

  • Cannot establish causation – Observational studies can only show associations, not direct cause-and-effect relationships.

  • May appear like an RCT – They can resemble randomized trials in design but lack random assignment, which introduces bias.

  • Subject to confounding – Results can be misleading if other hidden variables (confounders) influence both the independent and dependent variables.

11
New cards

What is simpson’s paradox?

It’s when a trend appears in separate groups of data but reverses or disappears when the groups are combined, often due to a confounding variable. It highlights how misleading conclusions can arise if data isn't properly stratified.

12
New cards

What is an IDA?

IDA is a first general look at the data, without formally answering the research questions.

13
New cards

What are the four things involved in IDA?

  • Data background: checking the quality and integrity of the data

  • Data structure: What information has been collected

  • Data wrangling: Scraping, cleaning, tidying, reshaping, splitting, combing

  • Data summaries: Graphical and numerical

14
New cards

What is a variable in data analysis?

A feature or attribute measured about each subject; in tidy data, these are columns.

15
New cards

What is data in statistics?

Information about the set of subjects being studied, usually referring to a sample, not the full population.

16
New cards

What does IDA stand for and what does it do?

Initial Data Analysis – it gives a general look at the data to understand its quality, structure, and suitability for answering research questions.

17
New cards

What are the four main steps involved in IDA?

1. Data background, 2. Data structure, 3. Data wrangling, 4. Data summaries.

18
New cards

What is a variable in data analysis?

A feature or attribute measured about each subject; in tidy data, these are columns.

19
New cards

What is the difference between qualitative and quantitative variables?

Qualitative variables describe categories, while quantitative variables represent numeric measurements.

20
New cards

What is the rule of thumb for the number of histogram bins?

Use between 10–15 bins to avoid over- or under-condensing the data.

21
New cards

What is a density histogram?

A histogram where block area shows the percentage of subjects; total area equals 100%.

22
New cards

How do you calculate the IQR?

IQR = 75th percentile – 25th percentile.

23
New cards

How are outliers defined in a boxplot?

Values below LT (Q1 – 1.5×IQR) or above UT (Q3 + 1.5×IQR) are outliers.

24
New cards

What are the different types of histograms?

Standard histogram and density histogram.

25
New cards

What is a sliced histogram?

A histogram sliced by a qualitative variable to show its distribution within intervals.

26
New cards

What are the three types of boxplots mentioned?

Simple, comparative (filtered by a qualitative variable), and filtered with color/shape.

27
New cards

What are the main features of numerical summaries?

Maximum, minimum, centre (mean, median), and spread (standard deviation, range, IQR).

28
New cards

What is the mean?

The unique balancing point of the histogram where left and right sides cancel out.

29
New cards

What is the median?

The middle value when data is ordered; splits the data in half.

30
New cards

When is the median more useful than the mean?

When data is skewed or contains outliers, since the median is robust.

31
New cards

What is robustness in statistics?

A robust statistic is not affected by outliers; e.g., the median and IQR.

32
New cards

How do mean and median compare in different data shapes?

  • Symmetric: mean ≈ median

  • Left-skewed: mean < median

  • Right-skewed: mean > median

33
New cards

What is standard deviation?

The root mean square of the gaps from the mean; measures data spread.

34
New cards

What is the IQR?

Interquartile range = Q3 - Q1; it’s the spread of the middle 50% of data.

35
New cards

When is IQR more appropriate than standard deviation?

For skewed data, because IQR is robust and not influenced by outliers.

36
New cards

What do standard deviation intervals represent?

  • 68% of data within 1 SD

  • 95% within 2 SD

  • 99.7% within 3 SD

37
New cards

What is a z-score (standard unit)?

The number of standard deviations a value is from the mean.

38
New cards
39
New cards
40
New cards
41
New cards
42
New cards
43
New cards
44
New cards
45
New cards
46
New cards
47
New cards
48
New cards
49
New cards
50
New cards
51
New cards
52
New cards
53
New cards
54
New cards