TOPIC #2: Data and graphical summaries

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/16

flashcard set

Earn XP

Description and Tags

Module One DATA1001

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

17 Terms

1
New cards

What is data?

information about the set of subjects being studied. Data most commonly refers to the sample (part of the information) of the population.

2
New cards

What is IDA (initial data analysis)

a first general look at the data, without formally answering the research questions.

3
New cards

What is involved in an IDA

Data background - checking the quality and integrity of the data,
Data structure - what information has been collected?
Data wrangling - scraping, cleaning, tidying, reshaping etc data
Data summaries - Graphical and numerical

4
New cards

What are types of variables?

quant (quantitative or numerical)
qual (qualitative or categorical)

5
New cards

Describe the categories of data

<p></p>
6
New cards

How is one variable of qualitative data commonly summarised?

By a single barplot, where the categories are on the x axis.

7
New cards

How are two variables of qualitative data commonly summarised?

a double barplot, where the 2nd variable is the colour within the bars

8
New cards

What is a simple histogram used for?

Used to summarise quantitative data, to see how a variable is distributed across different class intervals.

9
New cards

what are the two types of histograms?

standard histogram - the y axis represents the number of subjects in each class interval

density/probability histogram - the area of each block represents the percentage of subjects in a particular class interval

10
New cards

What does a simple boxplot show?

the distribution of a single quantitative variable, based on percentiles.

11
New cards

explain a simple boxplot

the box contains the middle 50% of data (between the 25th and 75th percentiles), with a line for the 50% percentile (the median data point).
the lower and upper thresholds represent the minimum and maximum values, excluding outliers. Outliers are typically plotted as individual points outside the whiskers.

<p>the box contains the middle 50% of data (between the 25th and 75th percentiles), with a line for the 50% percentile (the median data point). <br>the lower and upper thresholds represent the minimum and maximum values, excluding outliers. Outliers are typically plotted as individual points outside the whiskers. </p>
12
New cards

How do u calculate the IQR

75th percentile - 25th percentile

13
New cards

How do you calculate the lower threshold?

The lower threshold is calculated as the 25th percentile - 1.5 x IQR

14
New cards

How do you calculate the upper threshold?

The upper threshold is calculated as the 75th percentile + 1.5 x IQR

15
New cards

What is a comparative boxplot?

they take a quantitative variable and divide or filter it by a qualitative variable.

<p>they take a quantitative variable and divide or filter it by a qualitative variable.  </p>
16
New cards

What is a simple scatterplot?

examines the relationship between two quantitative variables, plotting them on the x x and y axes.

17
New cards

Summarise how the type of variables we have, determines what type of graphical summary we use.

knowt flashcard image