[Q] Topic 2 - Data and Graphical summaries

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/28

flashcard set

Earn XP

Description and Tags

Vocabulary practice flashcards covering key terms and concepts from the DATA1001/1901 Semester 1 lecture notes, including data types, EDA, probability, regression, and hypothesis testing.

Last updated 10:20 AM on 6/10/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

29 Terms

1
New cards

What is data?

Information about a subject, which can be represented in various forms.

2
New cards

What is quantitative data?

Numerical information that can be measured or counted.

3
New cards

What is a continuous variable?

A quantitative variable that can take any decimal value within a range.

4
New cards

What is a discrete variable?

A quantitative type that consists of whole numbers only.

5
New cards

What is qualitative data?

Non-numerical data categorized into groups or labels.

6
New cards

What is nominal data?

Qualitative data that has no inherent order or ranking.

7
New cards

What is ordinal data?

Qualitative data that has a defined order or ranking.

8
New cards

What is exploratory data analysis (EDA)?

The initial examination and interpretation of data to uncover patterns.

9
New cards

What is tidy data?

A structured way of organizing data where variables are columns and observations are rows.

10
New cards

What is self-selection bias?

A bias introduced when participants choose their own involvement in a study.

11
New cards

What is a randomised controlled trial (RCT)?

A study design that randomly assigns participants to intervention or control groups to establish causation.

12
New cards

What is data linkage?

The process of combining data from different sources to enrich analysis.

13
New cards

What is a comparative boxplot?

A graphical tool used to visualize differences between groups for qualitative and quantitative variables.

14
New cards

What is a scatterplot?

A chart that displays the relationship between two quantitative variables using points.

15
New cards

What is a density histogram?

A histogram where the height represents the density of data rather than frequency.

16
New cards

What is the interquartile range (IQR)?

The range that captures the middle 50% of the data, calculated as Q3 - Q1.

17
New cards

What is standard deviation (SD)?

A statistical measure that indicates the average distance of data points from the mean.

18
New cards

What are robust statistics?

Statistical measures, such as median and IQR, that are not greatly affected by outliers.

19
New cards

What is chance error?

The random error that can occur in measurements due to variability.

20
New cards

What is bias?

A consistent error in measurement that skews results in a specific direction.

21
New cards

What is the normal distribution rule?

A statistical principle stating that 68%, 95%, and 99.7% of data falls within ±1, 2, and 3 standard deviations from the mean respectively.

22
New cards

What is correlation (r)?

A statistical measure that indicates the strength and direction of a linear relationship between two variables.

23
New cards

What is the central limit theorem?

A fundamental statistical principle stating that the sum of variables from a large sample approximates a normal distribution.

24
New cards

What is the prosecutor's fallacy?

The error of interpreting the probability of evidence assuming guilt, rather than calculating the likelihood of innocence given the evidence.

25
New cards

What are residuals?

The differences between observed values and the values predicted by a regression model.

26
New cards

What is R-squared (R²)?

A statistic that represents the proportion of variance in the dependent variable that can be explained by the independent variable.

27
New cards

What is a p-value?

The probability of obtaining results at least as extreme as the observed results, given that the null hypothesis is true.

28
New cards

What is Cochran's Rule?

A rule stating that the expected frequency for each category in a chi-squared test should be at least 5.

29
New cards

What is YAML?

A human-readable data serialization format used for configuration files with a focus on simplicity and readability.