Exploring Data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/11

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

12 Terms

1
New cards

LO1 - what is the importance of stats in our world, including current challenges?

Statistics is everywhere within our developing world and is a vital part of examining and problem solving in the workplace.

Problems with data include:

  • The recognition of data origin and it’s management

  • The agency of data, in order to maintain privacy but also honesty and accessibility

  • Big data can be difficult to process and manage.

2
New cards

LO2 - What are the types of bias?

  • Confounding - something unknown interfering with the data

  • Selection - group selection is known and biased

  • Observer - the experimenter knows the condition and expects a certain result

3
New cards

LO2 - What are the types of evidence?

  • Personal testimony - cannot be generalised and is highly biased and subjective

  • Journal articles - are reproducible and reliable

4
New cards

LO2 - what are the main types of study design behind datasets

  1. Random Controlled Testing (RCT) > this is the most desired study design

    • Splits the participants into a control and treatment group > allows for accurate testing of the independent variable in order to discover true correlation

    • Targets bias through the use of random allocation and double blind experiments

  2. Observation Testing >

    • This cannot find correlations only associations

    • Can be split into subtypes to test for the independent variable

    • Can lead to Simpsons Paradox

5
New cards

LO2 - What is Simpson’s Paradox

In observable data when too sets of data pool together and the results are lost due to this collaboration.

6
New cards

LO2 - What is Domain Knowledge?

Domain knowledge is all important background information about the context or topic that the data is examining, this is important for all data analysis.

7
New cards

LO3 - Describe the breakdown of data analysis and the types of data

IDA:

  • Data background

  • Data structure

  • Data cleaning

  • Data summaries

    Data:

  • Qualitative or Quantitative

    • Qual > Ordinal or Nominal

    • Quant > Discrete or Continuous

8
New cards

LO3 - Types of Graphical Summaries for Quantitative and Qualitative Data

Qualitative:

> 1 Variable > Single Bar plot

> 2 Variables > Double Bar plot (coloured)

Quantitative:

> 1 Variable > Single Histogram or a Single Box Plot

> 2 Variables > A scatter plot

Quantitative and Qualitative

1 × 1 > A sliced Histogram, or a Comparative Box Plot

2 × 2 > A filtered scatterplot

9
New cards

LO3 - What are the components of the centre of data, when summarising numerically.

mean > the average of the data, sits as the balancing point on a histogram

  • is not robust at all so changes with skewed data (misinforming when outliers)

median > the middle value of the data, sits in the middle of the histogram

  • is very robust and does not skew easily - represents the peak of the data when skewed

    both depend on each other to represent the centre of the data

10
New cards

LO3 - What is the variables within summarising the spread of data?

Standard Deviations > measures the gaps between the data and the mean - how the data spreads in relation to the mean sqrt(sum(data-mean)²/n)

  • the sample sd minuses n by 1, the population sd is the normal eqn

  • on a histogram is a normal distribution and is divided by the standard units of 99.7%, 95% and 68%

    IQR = Q3-Q1 - examines the range of the middle 50% of the data

    looks for outliers and is shown in the boxplot

    • Combination of mean and sd: coefficient of variance CV= sd/mean > examines for volatility

11
New cards

LO3 - what happens when the data shifts or is scaled?

shifts - the data shifts to the left or right (sd remains, mean changes)

scales - the data shifts and it' scale changes - both change

12
New cards