Exam Revision

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/125

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

126 Terms

1
New cards

Types of evidence

  • Personal testimony

  • Reputable research journal

  • Reproducible research

  • Nature of the data collection

2
New cards

Selection bias

When participants are more likely to be chosen than others

3
New cards

Randomised controlled trial

One group receives the intervention (the experimental group), while the other (the control group) receives a placebo, standard treatment, or no treatment at all. 

4
New cards

Randomised controlled double-blind trial

A research study where neither the participants nor the researchers know who is receiving the intervention or placebo, reducing bias in results.

5
New cards

Consent bias

When participants choose whther or not they take part in the experiment

6
New cards

Survivior bias

  • Only happens after the study

  • An observed "improvement" may happen because there are dropouts of the sickest subjects

7
New cards

Adherer bias

Certain participants (adherers) keep taking treatment (placebo) as opposed to non-adherers = "improvement" in treatment group due to the adherers

8
New cards

3 precautions of observational studies

  1. Obsevational studies can’t establish causation only association

  2. Observational studies may present as an RCT

  3. Confounding variables may influence results if not properly controlled.

9
New cards

Observational studies

An observational study is one in which the investigator cannot use randomisation for allocation to groups. The assignment of subjects is outside the control of the investigator.

10
New cards

Contemporaneous control

A contempoaneous control group occurs at the same time as the treatment group.

11
New cards

What is the simpson's paradox?

It’s when a trend appears in separate groups of data but reverses or disappears when the groups are combined, often due to a confounding variable. It highlights how misleading conclusions can arise if data isn't properly stratified.

12
New cards

What is data in statistics?

Data is information about the subjects being studied, usually referring to a sample rather than the whole population.

13
New cards

What does IDA stand for and what is it?

Initial Data Analysis – a first look at the data before answering research questions. It checks quality, structure, and suggests patterns or new questions.

14
New cards

What are the key steps in IDA?

A variable measures or describes an attribute of the subjects; each column in a tidy dataset is a variable.

15
New cards

What does high dimensional data mean?

There are more variables (p) than subjects (n), common in big data.

16
New cards

What are the two main types of variables?

Quantitative (numerical) and qualitative (categorical)

17
New cards

How is one qualitative variable visualised?

With a single barplot, where categories are on the x-axis.

18
New cards

How are two qualitative variables visualised?

With a double barplot using colour to show the second variable.

19
New cards

What is a histogram used for?

To visualise the distribution of a quantitative variable across class intervals.

20
New cards

What’s the difference between a standard and density histogram?

Standard shows counts; density shows percentages (area = 100%).

21
New cards

What is the rule of thumb for number of histogram intervals?

Use 10–15 class intervals to avoid over/under condensing data.

22
New cards

What is a sliced histogram?

A histogram where a qualitative variable is shown by slicing each bar with colour.

23
New cards

What do the edges of a boxplot represent?

The 25th and 75th percentiles; the box shows the middle 50% of the data.

24
New cards

How are outliers identified in a boxplot?

They are outside the thresholds:
LT = Q1 − 1.5×IQR
UT = Q3 + 1.5×IQR

25
New cards

What is a comparative boxplot?

A boxplot comparing a quantitative variable across levels of a qualitative variable.

26
New cards

What is a filtered scatterplot?

A scatterplot with more variables shown using colour or shape to distinguish them.

27
New cards

Why is age usually treated as quantitative?

Because it's easier to convert from quantitative to qualitative, not the other way around.

28
New cards

What is the purpose of numerical summaries?

To reduce all data to a single statistic, making it easier to communicate and compare key features like centre and spread.

29
New cards

What are the main types of numerical summaries?

Maximum, minimum, centre (mean, median), and spread (standard deviation, range, IQR).

30
New cards

What is the mean?

The balancing point of a distribution, where the sum of deviations on both sides equals zero.

31
New cards

What is the median?

The middle value of an ordered dataset; 50% of values lie above and 50% below it.

32
New cards

When is the median more useful than the mean?

When the data is skewed or contains outliers, because the median is robust and unaffected by extreme values.

33
New cards

When is the mean more useful than the median?

For symmetric data with few outliers; e.g., calculating averages for prediction or reporting.

34
New cards

How do mean and median behave with skewed data?

Left skew: mean < median

  • Right skew: mean > median

  • Symmetric: mean ≈ median

35
New cards

What is robustness in statistics?

A property of a summary (like the median or IQR) where it remains reliable even with outliers or skewed data.

36
New cards

Why must the mean gap always equal zero?

Because the mean is the balancing point; all positive and negative deviations from the mean cancel out.

37
New cards

What does standard deviation measure?

The average spread or dispersion of data points from the mean.

38
New cards

What is the RMS (root mean square) in standard deviation?

It calculates the square root of the average squared deviations from the mean.

39
New cards

When is it okay to treat a dataset as a population vs. a sample?

If the dataset includes all subjects of interest (e.g. all house sales in one suburb in one month), it's a population; otherwise, it's a sample.

40
New cards

What percentage of data falls within 1, 2, and 3 standard deviations of the mean?

  • 68% within 1 SD

  • 95% within 2 SDs

  • 99.7% within 3 SDs

41
New cards

What is the interquartile range (IQR)?

The range of the middle 50% of the data, calculated as Q3 − Q1; it's robust against outliers.

42
New cards

What is the difference between quartiles and quantiles?

Quartiles divide data into 4 parts, while quantiles divide data into q equal parts.

43
New cards

What are standard units (z-scores)?

The number of standard deviations a data point is from the mean:

<p>The number of standard deviations a data point is from the mean:</p>
44
New cards
45
New cards
What is the normal curve?
A probability density function for a continuous variable.
46
New cards
What are the parameters of a normal curve?
Population mean and population standard deviation.
47
New cards
What does area under a density histogram represent?
Total probability = 1 (or 100%).
48
New cards
What does P(X < x) represent in a normal curve?
The area under the curve to the left of x.
49
New cards
What rule do all normal curves follow?
The 68%-95%-99.7% rule.
50
New cards
How can a general normal curve be rescaled?
By converting to standard normal using z-scores.
51
New cards
What does the pnorm() function do in R?
Calculates area under the normal curve for a given x.
52
New cards
What does the qnorm() function do in R?
Finds the x-value for a given area under the normal curve.
53
New cards
How to check if data is normally distributed?
Use graphical summaries, 68-95-99.7 rule, and QQ plot.
54
New cards
What is measurement error?
Difference between measured and exact value.
55
New cards
What causes chance error?
Random variation when repeating a measurement.
56
New cards
How to estimate chance error?
Replicate the measurement and calculate standard deviation.
57
New cards
What is bias in measurement?
A constant error added/subtracted from each measurement.
58
New cards
What are the 6 steps in linear regression?
Scatterplot, correlation, residual plot, check assumptions, predict.
59
New cards
What does a scatterplot show?
Relationship between two quantitative variables.
60
New cards
What does the correlation coefficient measure?
Strength and direction of a linear relationship.
61
New cards
What does r = +1 or -1 mean?
Perfect linear association.
62
New cards
What is the regression line?
Line that best predicts Y from X.
63
New cards
What is a residual?
Difference between actual and predicted Y value.
64
New cards
What does a good residual plot look like?
Random scatter around horizontal line.
65
New cards
What does homoscedasticity mean?
Equal spread of residuals across the range of X.
66
New cards
What is the equation for a regression line?
Y = a + bX.
67
New cards
When can we use the regression line for prediction?
After checking assumptions and model fit.
68
New cards
What is RMS error?
Root mean square of residuals; measures prediction error.
69
New cards
What is ecological correlation?
Correlation between grouped means; often overestimates true association.
70
New cards
Does correlation imply causation?
No, correlation shows association, not cause.
71
New cards
What are standard units in regression?
Z-scores; number of SDs from the mean.
72
New cards
What does extrapolation mean?
Predicting outside the range of observed data.
73
New cards
What is a vertical strip in a scatterplot?
Data at one value of X; used to assess spread/homoscedasticity.
74
New cards
75
New cards
What is the prosecutor's fallacy?
Mistaking P(DNA match | Innocent) for P(Innocent | DNA match).
76
New cards
What is chance (probability)?
Long-run frequency of an event occurring.
77
New cards
What is a complement in probability?
P(Event) = 1 - P(Complement).
78
New cards
What is conditional probability?
P(Event A | Event B): probability A occurs given B occurred.
79
New cards
What is the multiplication principle?
P(A and B) = P(A) × P(B | A).
80
New cards
What does it mean for two events to be independent?
P(B | A) = P(B); knowing A doesn't affect B.
81
New cards
What ensures independence in sampling?
Drawing with replacement.
82
New cards
What ensures dependence in sampling?
Drawing without replacement.
83
New cards
What are mutually exclusive events?
Events that cannot occur together.
84
New cards
What is the addition rule for mutually exclusive events?
P(A or B) = P(A) + P(B).
85
New cards
What is a binomial trial?
An experiment with two outcomes, fixed n, and constant p.
86
New cards
What does dbinom() do in R?
Calculates exact binomial probabilities.
87
New cards
What does pbinom() do in R?
Calculates cumulative binomial probabilities.
88
New cards
What is chance variability?
Random variation in outcomes from a chance process.
89
New cards
What is the law of large numbers?
As trials increase, observed proportion approaches expected.
90
New cards
What is the gambler's fallacy?
Belief that deviations will 'even out' in the short term.
91
New cards
What is the box model?
A way to simulate random draws from a population.
92
New cards
What is chance error?
Observed Value - Expected Value.
93
New cards
What is the standard error (SE)?
The SD of the chance error.
94
New cards
What is the expected value (EV)?
The average outcome in a chance process.
95
New cards
How does the box model apply to gambling?
Box contains winnings/losses; draws = plays.
96
New cards
How do you model binary outcomes in a box?
Use 1 for success and 0 for failure.
97
New cards
What is the normal approximation to the binomial?
Using the normal curve to estimate binomial probabilities.
98
New cards
What is the central limit theorem?
Sums/means of large random samples follow a normal distribution.
99
New cards
When can you use the normal approximation?
When number of draws > 30 and data is not too skewed.
100
New cards
What is continuity correction?
Adjusting endpoints by 0.5 when approximating discrete data with normal.