Exam Revision

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/125

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

126 Terms

New cards

Types of evidence

Personal testimony
Reputable research journal
Reproducible research
Nature of the data collection

New cards

Selection bias

When participants are more likely to be chosen than others

New cards

Randomised controlled trial

One group receives the intervention (the experimental group), while the other (the control group) receives a placebo, standard treatment, or no treatment at all.

New cards

Randomised controlled double-blind trial

A research study where neither the participants nor the researchers know who is receiving the intervention or placebo, reducing bias in results.

New cards

Consent bias

When participants choose whther or not they take part in the experiment

New cards

Survivior bias

Only happens after the study
An observed "improvement" may happen because there are dropouts of the sickest subjects

New cards

Adherer bias

Certain participants (adherers) keep taking treatment (placebo) as opposed to non-adherers = "improvement" in treatment group due to the adherers

New cards

3 precautions of observational studies

Obsevational studies can’t establish causation only association
Observational studies may present as an RCT
Confounding variables may influence results if not properly controlled.

New cards

Observational studies

An observational study is one in which the investigator cannot use randomisation for allocation to groups. The assignment of subjects is outside the control of the investigator.

New cards

Contemporaneous control

A contempoaneous control group occurs at the same time as the treatment group.

New cards

What is the simpson's paradox?

It’s when a trend appears in separate groups of data but reverses or disappears when the groups are combined, often due to a confounding variable. It highlights how misleading conclusions can arise if data isn't properly stratified.

New cards

What is data in statistics?

Data is information about the subjects being studied, usually referring to a sample rather than the whole population.

New cards

What does IDA stand for and what is it?

Initial Data Analysis – a first look at the data before answering research questions. It checks quality, structure, and suggests patterns or new questions.

New cards

What are the key steps in IDA?

A variable measures or describes an attribute of the subjects; each column in a tidy dataset is a variable.

New cards

What does high dimensional data mean?

There are more variables (p) than subjects (n), common in big data.

New cards

What are the two main types of variables?

Quantitative (numerical) and qualitative (categorical)

New cards

How is one qualitative variable visualised?

With a single barplot, where categories are on the x-axis.

New cards

How are two qualitative variables visualised?

With a double barplot using colour to show the second variable.

New cards

What is a histogram used for?

To visualise the distribution of a quantitative variable across class intervals.

New cards

What’s the difference between a standard and density histogram?

Standard shows counts; density shows percentages (area = 100%).

New cards

What is the rule of thumb for number of histogram intervals?

Use 10–15 class intervals to avoid over/under condensing data.

New cards

What is a sliced histogram?

A histogram where a qualitative variable is shown by slicing each bar with colour.

New cards

What do the edges of a boxplot represent?

The 25th and 75th percentiles; the box shows the middle 50% of the data.

New cards

How are outliers identified in a boxplot?

They are outside the thresholds:
LT = Q1 − 1.5×IQR
UT = Q3 + 1.5×IQR

New cards

What is a comparative boxplot?

A boxplot comparing a quantitative variable across levels of a qualitative variable.

New cards

What is a filtered scatterplot?

A scatterplot with more variables shown using colour or shape to distinguish them.

New cards

Why is age usually treated as quantitative?

Because it's easier to convert from quantitative to qualitative, not the other way around.

New cards

What is the purpose of numerical summaries?

To reduce all data to a single statistic, making it easier to communicate and compare key features like centre and spread.

New cards

What are the main types of numerical summaries?

Maximum, minimum, centre (mean, median), and spread (standard deviation, range, IQR).

New cards

What is the mean?

The balancing point of a distribution, where the sum of deviations on both sides equals zero.

New cards

What is the median?

The middle value of an ordered dataset; 50% of values lie above and 50% below it.

New cards

When is the median more useful than the mean?

When the data is skewed or contains outliers, because the median is robust and unaffected by extreme values.

New cards

When is the mean more useful than the median?

For symmetric data with few outliers; e.g., calculating averages for prediction or reporting.

New cards

How do mean and median behave with skewed data?

Left skew: mean < median

Right skew: mean > median
Symmetric: mean ≈ median

New cards

What is robustness in statistics?

A property of a summary (like the median or IQR) where it remains reliable even with outliers or skewed data.

New cards

Why must the mean gap always equal zero?

Because the mean is the balancing point; all positive and negative deviations from the mean cancel out.

New cards

What does standard deviation measure?

The average spread or dispersion of data points from the mean.

New cards

What is the RMS (root mean square) in standard deviation?

It calculates the square root of the average squared deviations from the mean.

New cards

When is it okay to treat a dataset as a population vs. a sample?

If the dataset includes all subjects of interest (e.g. all house sales in one suburb in one month), it's a population; otherwise, it's a sample.

New cards

What percentage of data falls within 1, 2, and 3 standard deviations of the mean?

68% within 1 SD
95% within 2 SDs
99.7% within 3 SDs

New cards

What is the interquartile range (IQR)?

The range of the middle 50% of the data, calculated as Q3 − Q1; it's robust against outliers.

New cards

What is the difference between quartiles and quantiles?

Quartiles divide data into 4 parts, while quantiles divide data into q equal parts.

New cards

What are standard units (z-scores)?

The number of standard deviations a data point is from the mean:

New cards

What is the normal curve?

A probability density function for a continuous variable.

New cards

What are the parameters of a normal curve?

Population mean and population standard deviation.

New cards

What does area under a density histogram represent?

Total probability = 1 (or 100%).

New cards

What does P(X < x) represent in a normal curve?

The area under the curve to the left of x.

New cards

What rule do all normal curves follow?

The 68%-95%-99.7% rule.

New cards

How can a general normal curve be rescaled?

By converting to standard normal using z-scores.

New cards

What does the pnorm() function do in R?

Calculates area under the normal curve for a given x.

New cards

What does the qnorm() function do in R?

Finds the x-value for a given area under the normal curve.

New cards

How to check if data is normally distributed?

Use graphical summaries, 68-95-99.7 rule, and QQ plot.

New cards

What is measurement error?

Difference between measured and exact value.

New cards

What causes chance error?

Random variation when repeating a measurement.

New cards

How to estimate chance error?

Replicate the measurement and calculate standard deviation.

New cards

What is bias in measurement?

A constant error added/subtracted from each measurement.

New cards

What are the 6 steps in linear regression?

Scatterplot, correlation, residual plot, check assumptions, predict.

New cards

What does a scatterplot show?

Relationship between two quantitative variables.

New cards

What does the correlation coefficient measure?

Strength and direction of a linear relationship.

New cards

What does r = +1 or -1 mean?

Perfect linear association.

New cards

What is the regression line?

Line that best predicts Y from X.

New cards

What is a residual?

Difference between actual and predicted Y value.

New cards

What does a good residual plot look like?

Random scatter around horizontal line.

New cards

What does homoscedasticity mean?

Equal spread of residuals across the range of X.

New cards

What is the equation for a regression line?

Y = a + bX.

New cards

When can we use the regression line for prediction?

After checking assumptions and model fit.

New cards

What is RMS error?

Root mean square of residuals; measures prediction error.

New cards

What is ecological correlation?

Correlation between grouped means; often overestimates true association.

New cards

Does correlation imply causation?

No, correlation shows association, not cause.

New cards

What are standard units in regression?

Z-scores; number of SDs from the mean.

New cards

What does extrapolation mean?

Predicting outside the range of observed data.

New cards

What is a vertical strip in a scatterplot?

Data at one value of X; used to assess spread/homoscedasticity.

New cards

What is the prosecutor's fallacy?

Mistaking P(DNA match | Innocent) for P(Innocent | DNA match).

New cards

What is chance (probability)?

Long-run frequency of an event occurring.

New cards

What is a complement in probability?

P(Event) = 1 - P(Complement).

New cards

What is conditional probability?

P(Event A | Event B): probability A occurs given B occurred.

New cards

What is the multiplication principle?

P(A and B) = P(A) Ã— P(B | A).

New cards

What does it mean for two events to be independent?

P(B | A) = P(B); knowing A doesn't affect B.

New cards

What ensures independence in sampling?

Drawing with replacement.

New cards

What ensures dependence in sampling?

Drawing without replacement.

New cards

What are mutually exclusive events?

Events that cannot occur together.

New cards

What is the addition rule for mutually exclusive events?

P(A or B) = P(A) + P(B).

New cards

What is a binomial trial?

An experiment with two outcomes, fixed n, and constant p.

New cards

What does dbinom() do in R?

Calculates exact binomial probabilities.

New cards

What does pbinom() do in R?

Calculates cumulative binomial probabilities.

New cards

What is chance variability?

Random variation in outcomes from a chance process.

New cards

What is the law of large numbers?

As trials increase, observed proportion approaches expected.

New cards

What is the gambler's fallacy?

Belief that deviations will 'even out' in the short term.

New cards

What is the box model?

A way to simulate random draws from a population.

New cards

What is chance error?

Observed Value - Expected Value.

New cards

What is the standard error (SE)?

The SD of the chance error.

New cards

What is the expected value (EV)?

The average outcome in a chance process.

New cards

How does the box model apply to gambling?

Box contains winnings/losses; draws = plays.

New cards

How do you model binary outcomes in a box?

Use 1 for success and 0 for failure.

New cards

What is the normal approximation to the binomial?

Using the normal curve to estimate binomial probabilities.

New cards

What is the central limit theorem?

Sums/means of large random samples follow a normal distribution.

New cards

When can you use the normal approximation?

When number of draws > 30 and data is not too skewed.

100

New cards

What is continuity correction?

Adjusting endpoints by 0.5 when approximating discrete data with normal.