[Q] Topic 2 - Data and Graphical summaries

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/28

flashcard set

Earn XP

Description and Tags

Vocabulary practice flashcards covering key terms and concepts from the DATA1001/1901 Semester 1 lecture notes, including data types, EDA, probability, regression, and hypothesis testing.

Last updated 9:02 AM on 6/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

29 Terms

1
New cards

What is data?

Data: Information about a subject; often a sample of a population.

2
New cards

What is quantitative data?

Quantitative Data: Numerical info; can be continuous (decimals) or discrete (whole numbers).

3
New cards

What is a continuous variable?

Continuous Variable: A type of quantitative variable that can take any decimal value.

4
New cards

What is a discrete variable?

Discrete Variable: A quantitative type that only has whole numbers.

5
New cards

What is qualitative data?

Qualitative Data: Labels or categories; can't be used for math.

6
New cards

What is nominal data?

Nominal Data: Qualitative data with no order (like gender).

7
New cards

What is ordinal data?

Ordinal Data: Qualitative data with a natural order, unequal gaps.

8
New cards

What is exploratory data analysis (EDA)?

EDA: First look at data; includes Background, Structure, Wrangling, Summaries.

9
New cards

What is tidy data?

Tidy Data: Variables are columns, observations are rows.

10
New cards

What is self-selection bias?

Self-selection Bias: Bias from participants excluding themselves.

11
New cards

What is a randomised controlled trial (RCT)?

RCT: A study design that establishes causation through random assignment.

12
New cards

What is data linkage?

Data Linkage: Combining data from different sources.

13
New cards

What is a comparative boxplot?

Comparative Boxplot: Graph for visualizing qualitative vs. quantitative variables.

14
New cards

What is a scatterplot?

Scatterplot: Visualizes the relationship between two quantitative variables.

15
New cards

What is a density histogram?

Density Histogram: Histogram showing density on the y-axis (total area = 100%).

16
New cards

What is the interquartile range (IQR)?

IQR: The spread of the middle 50% of data; IQR=Q3Q1IQR = Q3 - Q1.

17
New cards

What is standard deviation (SD)?

SD: The average distance of data points from the mean; ≥ 0.

18
New cards

What are robust statistics?

Robust Statistics: Measures like Median and IQR; not influenced by outliers.

19
New cards

What is chance error?

Chance Error: Random measurement error measured by standard deviation of replicas.

20
New cards

What is bias?

Bias: A consistent error in one direction, often harder to detect.

21
New cards

What is the normal distribution rule?

Normal Distribution Rule: About 68%, 95%, 99.7% of data within ±1, 2, 3 SD.

22
New cards

What is correlation (r)?

Correlation (r): Measure of linear association; range from -1 to +1.

23
New cards

What is the central limit theorem?

Central Limit Theorem: Sums from large samples are approximately normally distributed.

24
New cards

What is the prosecutor's fallacy?

Prosecutor's Fallacy: Mistaking P(evidence | innocent)P(\text{evidence}\text{ | innocent}) for P(innocent | evidence)P(\text{innocent}\text{ | evidence}).

25
New cards

What are residuals?

Residuals: Differences between actual data and the regression line; average is 0.

26
New cards

What is R-squared (R²)?

: Proportion of variation in dependent variable explained by an independent variable.

27
New cards

What is a p-value?

P-value: Probability of extremes if the null hypothesis is true.

28
New cards

What is Cochran's Rule?

Cochran's Rule: Expected frequency in chi-squared test must be ≥ 5.

29
New cards

What is YAML?

YAML: Configuration format in Quarto; correct indentation is key.