Exploring Data

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/11

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

12 Terms

New cards

LO1 - what is the importance of stats in our world, including current challenges?

Statistics is everywhere within our developing world and is a vital part of examining and problem solving in the workplace.

Problems with data include:

The recognition of data origin and it’s management
The agency of data, in order to maintain privacy but also honesty and accessibility
Big data can be difficult to process and manage.

New cards

LO2 - What are the types of bias?

Confounding - something unknown interfering with the data
Selection - group selection is known and biased
Observer - the experimenter knows the condition and expects a certain result

New cards

LO2 - What are the types of evidence?

Personal testimony - cannot be generalised and is highly biased and subjective
Journal articles - are reproducible and reliable

New cards

LO2 - what are the main types of study design behind datasets

Random Controlled Testing (RCT) > this is the most desired study design
- Splits the participants into a control and treatment group > allows for accurate testing of the independent variable in order to discover true correlation
- Targets bias through the use of random allocation and double blind experiments
Observation Testing >
- This cannot find correlations only associations
- Can be split into subtypes to test for the independent variable
- Can lead to Simpsons Paradox

New cards

LO2 - What is Simpson’s Paradox

In observable data when too sets of data pool together and the results are lost due to this collaboration.

New cards

LO2 - What is Domain Knowledge?

Domain knowledge is all important background information about the context or topic that the data is examining, this is important for all data analysis.

New cards

LO3 - Describe the breakdown of data analysis and the types of data

IDA:

Data background
Data structure
Data cleaning
Data summaries
Data:
Qualitative or Quantitative
- Qual > Ordinal or Nominal
- Quant > Discrete or Continuous

New cards

LO3 - Types of Graphical Summaries for Quantitative and Qualitative Data

Qualitative:

> 1 Variable > Single Bar plot

> 2 Variables > Double Bar plot (coloured)

Quantitative:

> 1 Variable > Single Histogram or a Single Box Plot

> 2 Variables > A scatter plot

Quantitative and Qualitative

1 × 1 > A sliced Histogram, or a Comparative Box Plot

2 × 2 > A filtered scatterplot

New cards

LO3 - What are the components of the centre of data, when summarising numerically.

mean > the average of the data, sits as the balancing point on a histogram

is not robust at all so changes with skewed data (misinforming when outliers)

median > the middle value of the data, sits in the middle of the histogram

is very robust and does not skew easily - represents the peak of the data when skewed
both depend on each other to represent the centre of the data

New cards

LO3 - What is the variables within summarising the spread of data?

Standard Deviations > measures the gaps between the data and the mean - how the data spreads in relation to the mean sqrt(sum(data-mean)²/n)

the sample sd minuses n by 1, the population sd is the normal eqn
on a histogram is a normal distribution and is divided by the standard units of 99.7%, 95% and 68%
IQR = Q3-Q1 - examines the range of the middle 50% of the data
looks for outliers and is shown in the boxplot
- Combination of mean and sd: coefficient of variance CV= sd/mean > examines for volatility

New cards

LO3 - what happens when the data shifts or is scaled?

shifts - the data shifts to the left or right (sd remains, mean changes)

scales - the data shifts and it' scale changes - both change

New cards