Data Science - Large Datasets and Scientific Argumentation

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/37

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:33 PM on 5/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

38 Terms

1
New cards

What are large datasets?

Large datasets in science are expansive collections of information that exceed the capacity of traditional tools for storage, management, and analysis

2
New cards

What are common features of large datasets?

They contain many data points, multiple variables, patterns, trends, and often require digital tools for analysis.

3
New cards

How are large datasets collected?

Large datasets can be collected through surveys, experiments, sensors, satellites, databases, or online tracking systems.

4
New cards

What are large datasets used for?

They are used to identify patterns, make predictions, support research, and solve real-world problems.

5
New cards

Give an example of a large dataset application

Weather forecasting uses large datasets to predict climate and weather patterns.

6
New cards

How do you develop a question using a dataset?

Create a question that can be answered by analysing variables within the dataset.

Example: “Is there a relationship between study time and test scores?”

7
New cards

What is descriptive statistics?

Descriptive statistics are methods used to summarise and describe data.

8
New cards

What is the mean?

The mean is the average value found by adding all values and dividing by the total number of values.

9
New cards

What is the median?

The median is the middle value when data is arranged in order.

10
New cards

What is the mode?

The mode is the value that appears most frequently in a dataset.

11
New cards

What is range?

Range is the difference between the highest and lowest values in a dataset.

12
New cards

What is the IQR?

The IQR measures the spread of the middle 50% of data.

13
New cards

What is standard deviation?

Standard deviation measures how spread out data values are from the mean.

14
New cards

What are the benefits of descriptive statistics?

They simplify large datasets, identify patterns, and make data easier to communicate.

15
New cards

What are weaknesses of descriptive statistics?

They may hide important details, ignore causes, and sometimes oversimplify data.

16
New cards

What is univariate analysis?

Univariate analysis examines one variable at a time.

17
New cards

What is included in a univariate analysis?

Measures of centre, spread, graphs, and identifying patterns or anomalies.

18
New cards

What is a histogram?

A histogram displays the frequency distribution of continuous numerical data.

19
New cards

What is a box plot?

A box plot shows the median, quartiles, spread, and possible outliers in data.

20
New cards

What is bivariate analysis?

Bivariate analysis examines the relationship between two variables.

21
New cards

What is a scatter plot?

A scatter plot graphs pairs of data values to show relationships between variables.

22
New cards

What is correlation?

Correlation describes the strength and direction of a relationship between two variables.

23
New cards

What does a positive correlation mean?

As one variable increases, the other also increases.

24
New cards

What does a negative correlation mean?

As one variable increases, the other decreases.

25
New cards

What does zero correlation mean?

There is no relationship between the variables.

26
New cards

What is a correlation coefficient?

A correlation coefficient is a number between -1 and +1 that measures the strength of correlation.

27
New cards

How do you interpret a correlation coefficient?

  • Close to +1 = strong positive correlation

  • Close to -1 = strong negative correlation

  • Close to 0 = weak or no correlation

28
New cards

Does correlation prove causation?

No. Correlation only shows association, not cause and effect.

29
New cards

What conditions are needed to establish causation?

There must be a clear relationship, controlled variables, repeated evidence, and logical scientific explanation.

30
New cards

Why are large datasets important in science?

They improve accuracy, reliability, and allow scientists to identify trends and patterns.

31
New cards

What is an outlier/anomaly in a dataset?

An anomaly is a data point that does not fit the overall pattern.

32
New cards

Why is identifying outliers important?

Outliers may indicate errors, unusual events, or important discoveries.

33
New cards

How should findings from data analysis be communicated?

Using clear scientific language, graphs, tables, statistics, and conclusions.

34
New cards

Why are graphs useful in data analysis?

Graphs help visualise patterns, trends, and relationships clearly.

35
New cards

What are evidence based decisions?

Decisions made using reliable data and scientific evidence

36
New cards

Why should implications of decisions be assessed?

To understand possible effects, risks, benefits, and consequences of decisions.

37
New cards

Give an example of using data for decision making

Using pollution data to decide whether stricter environmental laws are needed.

38
New cards

Why is statistical analysis important in science?

Statistical analysis helps scientists interpret data accurately and determine whether results are meaningful.