Skills Earth Sciences

Statistical plots

When you are analysing large data sets, you will often have to perform statistics as well. It is especially helpful in the initial data analysis stage to plot your data to study the data distribution. Although statistical analysis is beyond the scope of this module, we will discuss some basic statistical visualisations here, focused on data distribution and errors bars.

[collapsibles]

[collapse title = “Visualising data distribution”]

Several ways exist to visualise the distribution of your data, the display frequency, and comparisons between groups. The most common ones are discussed below14. For more information on visualising data distribution, we recommend you read this paper by Weissgerber et al. (2019).

Box plots display data distribution of quantitative data through quartiles. They can be used to compare medians and distribution of categorical data. They indicate medians, symmetricity of the data, spread in the data, and skewness. Since they take up less space than other types of data distribution graphs, they are specifically useful if you want to visualise a multitude of groups. However, box plots conceal significant details about how your data is distributed (gaussian, bimodal, or multimodal).

Violin plots are a more advanced version of a box plot, that also indicates probability density. Therefore, they also show the distribution shape of the data. However, they are more complicated to interpret.

Histograms display distribution of quantitative data binned in bars indicating the frequency at each interval. They give insight into what type of distribution your data has, where values are concentrated, and what the extremes are. They give more information about distribution than box plots, but also take up more space and make it difficult to compare different categories.

[/collapse]

[collapsibles]

[collapse title = “Error bars”]

Error bars represent variability and are used to indicate the uncertainty or error in your graphic data. Generally, error bars are based on the statistical standard deviation of a mean, represented by 1 sigma or 2 sigma error margins (68% and 95% confidence interval respectively). However, error bars can also indicate other uncertainties. Therefore, when you use error bars, think about what they indicate and state this in the figure caption.

[/collapse]