Presenting data

Why and How

Scientific research – have a hypothesis and use scientific methods to draw a conclusion

Variable – anything that varies

Data – values that variables recieve in measurements

Results – systematic variation in the data that relates to the hypothesis and research question

Analysis – reveals or tests patterns in data, produces results

Three ways of presenting results:

Text – verbal description. Establish unambigous meaning and logical relationship but risk to get lost in details

Table – spatially organised representation of single, precise specifications. Communicate a larger number of specific details but meaning/logical relationships aren’t communicated

Graphics – visualisation of data patterns.

Focus and highlight on main pattterns in the data and communicate different kinds of information in parallel.

But specific details get lost and visual interpretation may depend on veiwer

Two main uses:

to assess quality of data (detects faults in your experiment, characterise data distribution and identifying outliers)
visualise results (highlights and explains main results)

When making graphics need to focus on dimensionality and format

How to use graphical tools?
1. How many variables
  - One variable (univariate) data quality, distribution, outliers
  - Two variable (bivariate) independent/dependent vriable, simple relationshops
  - More than 2 variables (multivariate) complex relationships (interaction)
1. Types of variation. Independent variation, dependent variation and other variation(noise) which is not part of testing logic, but intertefers with the tested vairation and defines reliability of results. Other variation (noise) typically error bars, independent variable on X axis and dependent variable on Y axis
1. Types of scales:
Nominal scale – in graphic distances are unrelated
Ordinal scale - oriented
Interval scale - distances
Ratio scale – ratios
1. Resolution – put screenshot.
Decrease resoultion through aggregation (eg by categorising for nominal data or for quantative data using mean and standard deviation) to avoid clutter
1. Pattern- different kinds of diagrams to reveal different patterns and relationships
Freedom rating – more freedom, more versatile so describes more complex relationships but also less organised

Pie chart has low freedom rating
Variable 1 – a continous, ratio-scaled axis is represented as a circle
Variable 2 – discrete data is show as wedges
Advantage – Part-whole relationships – relate parts to the whole
But limited scope – limited to 2 sources of variation (no error bars) and requires ratio-scale and finite data

Bar chart has higher freedom rating
Variable 1 – discrete, more intuitve with regular steps
Variable 2 – quantative scale along
Can add error bars
Advantage: grouping – highlight distance from zero/another baseline and highlight differences across groups/conditions and visualise different groups/conditiions
Disadvantage: useless for continous data along both axes. Risks clutter when many data points

Line chart has even higher freedom rating
Variable 1 – continous, at least ordinal scale (otherwise connecting line misleading)
Variable 2 – continous, at least ordinal scale
Error bars
Additional continous variables: grouping of ata by seperate lines identified by visual appearance
Advantage:
Disadvantage:

Scatter chart higher freedom rating
Variable 1 and 2 – continous X and Y-axis, at least interval scale
Error bars possible along both axis
Additional varibales – identify by visual appearance
Advantage: covariation
Disadvantage: lacks structure, risks clutter

Simple

Typically use graphics to see whether it is nominal or skew distribution

Also typically used to highlight main results for comparison and references

Aggregation by counting each value and summing up a binary variable (frequency)

aggregate by frequency to visualise the distribution

Proportion – a part, share or number considered in comparative relation to a whole (fraction) = relative frequency

Percentage – per 100, where 100 is the full set

In frequency charts we have dependent variable on X-axis and frequency on Y-axis

Nominal scale use a pie chart, can also use a bar chart.
But would be misleading (it would look like there is a trend when there is not) on line chart and not clear on scattergram

Ordinal scale would be good on a bar chart
But would not be clear on pie, line or scatter

Ratio scale but discrete is good on a scattergram

Continous data aggregation by frequency is useless
So we use binning which is transforming continuous data into discrete data by allocating the continous data to intervals (the bins) SCREENSHOT
Using the stem and leaf display forms a histogram
Histogram – a bar chart using bins to display continuous data
Highlight difference from zero on Y-axis
Frequency density – frequency of data per equal interval (frequency/bin size)
Probability density – probability of dara per value
Frequency desnity allows estimation of probabiklity density, which is fundamental for statistical testing
Assess the shape of distribution on histogram: symetrical (no tail or 2 tails), negative skew (left tail) or positive skew (right tail)

Cumulation – a collection of objects laid on top of each other
Cumulative sum or running total: sum up progressively
Gives cumulative histogram with cumulative frequency
Cumulative frequencies are an estimate of cumulative probabilities

Can simplify data by only analysing whats changed

Standard deviation of binary variable only depends on proportions

Aggregation by central tendency by obtaining the mean and standard deviation. Can plot the mean and standard deviation in bar chart making it much more simple
But mean and standard deviation can only be used for symmetrical distribution
Mean and SD misrepresent asymmetrical distribution
median and quartiles do not assume symmetry
Boxplot does not assume symmetry. Shows outliers and interquartile range (which is the width of the box)

When the IV and DV are both qualitative and nominal/ordinal date:
Use bar chart

When IV is qualitative and nominal/ordinal and DV is quantative and interval/ratio data that is discrete use bar or boxplot

When the dependent and independent variable is both quantative and continous:
- Use line chart because better highlights trends
For bivariate distribution use scattergram

Complex

Data format (scale and resolution)

Qualitative IV and Qualitative DV (nominal/ordinal)

Use bar chart as pie charts are difficult to compare

Qualitative IV (nominal/ ordinal) and Quantitative DV (interval/ratio , discrete)

Aggregation by central tendency

Boxplot provides more information about distributions than a bar chart with mean and standard deviation

Quantitave IV (interval/ratio , discrete) and quantitive DV (interval/ratio continous/discrete)

Use line graph

Quantitative IV (interval/ratio , continous) and quantitative DV (interval/ratio discrete/continous)

Line graph

IV - predictor variable

DV - outcome variable

Bivariate distribution use scattergram