1/11
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
LO1 - what is the importance of stats in our world, including current challenges?
Statistics is everywhere within our developing world and is a vital part of examining and problem solving in the workplace.
Problems with data include:
The recognition of data origin and it’s management
The agency of data, in order to maintain privacy but also honesty and accessibility
Big data can be difficult to process and manage.
LO2 - What are the types of bias?
Confounding - something unknown interfering with the data
Selection - group selection is known and biased
Observer - the experimenter knows the condition and expects a certain result
LO2 - What are the types of evidence?
Personal testimony - cannot be generalised and is highly biased and subjective
Journal articles - are reproducible and reliable
LO2 - what are the main types of study design behind datasets
Random Controlled Testing (RCT) > this is the most desired study design
Splits the participants into a control and treatment group > allows for accurate testing of the independent variable in order to discover true correlation
Targets bias through the use of random allocation and double blind experiments
Observation Testing >
This cannot find correlations only associations
Can be split into subtypes to test for the independent variable
Can lead to Simpsons Paradox
LO2 - What is Simpson’s Paradox
In observable data when too sets of data pool together and the results are lost due to this collaboration.
LO2 - What is Domain Knowledge?
Domain knowledge is all important background information about the context or topic that the data is examining, this is important for all data analysis.
LO3 - Describe the breakdown of data analysis and the types of data
IDA:
Data background
Data structure
Data cleaning
Data summaries
Data:
Qualitative or Quantitative
Qual > Ordinal or Nominal
Quant > Discrete or Continuous
LO3 - Types of Graphical Summaries for Quantitative and Qualitative Data
Qualitative:
> 1 Variable > Single Bar plot
> 2 Variables > Double Bar plot (coloured)
Quantitative:
> 1 Variable > Single Histogram or a Single Box Plot
> 2 Variables > A scatter plot
Quantitative and Qualitative
1 × 1 > A sliced Histogram, or a Comparative Box Plot
2 × 2 > A filtered scatterplot
LO3 - What are the components of the centre of data, when summarising numerically.
mean > the average of the data, sits as the balancing point on a histogram
is not robust at all so changes with skewed data (misinforming when outliers)
median > the middle value of the data, sits in the middle of the histogram
is very robust and does not skew easily - represents the peak of the data when skewed
both depend on each other to represent the centre of the data
LO3 - What is the variables within summarising the spread of data?
Standard Deviations > measures the gaps between the data and the mean - how the data spreads in relation to the mean sqrt(sum(data-mean)²/n)
the sample sd minuses n by 1, the population sd is the normal eqn
on a histogram is a normal distribution and is divided by the standard units of 99.7%, 95% and 68%
IQR = Q3-Q1 - examines the range of the middle 50% of the data
looks for outliers and is shown in the boxplot
Combination of mean and sd: coefficient of variance CV= sd/mean > examines for volatility
LO3 - what happens when the data shifts or is scaled?
shifts - the data shifts to the left or right (sd remains, mean changes)
scales - the data shifts and it' scale changes - both change