Investigating data quality with dataXplore

by Francesca Mancini

 

dataXplore is a user friendly web-based app to investigate the quality of species occurrence data. It is built as an interface to the R package occAssess (Boyd et al. 2021), but it does not require any programming skills to use it. 

 

Species occurrence data are often biased and they may not accurately reflect real species distributions and their changes through time. It is therefore important to scrutinise these data for common forms of biases before they are used to derive any metrics of species distribution or trends. dataXplore produces a set of metrics and visualisations to screen a user supplied dataset for common forms of biases:

  • bias in time: do number of records change through time?
  • taxonomic bias: does the number of species recorded change through time? Are common species under-recorded?
  • spatial bias: is the distribution of the data representative of the area of interest? Is the data distributed randomly in space or is it clustered? How does the spatial coverage change through time? How many sites are visited consistently across years?
  • environmental bias: are the data representative of the environmental space?

 

Although dataXplore provides a tool to investigate the quality of species occurrence data, it does not provide formal recommendations as to whether the data is of sufficient quality for any specific use. This is because the usability of species occurrence data depends not only on their biases but also on the question being asked and the method used to answer it. It is possible to derive useful inferences from biased data where biases can be modelled or reduced.

 

dataXplore was developed by Dylan Carbone and Francesca Mancini at UKCEH through the NERC Knowledge Exchange Fellowship: Bringing the Data Revolution to Nature Recovery. We thank all the project partners and particularly Stuart Fraser, Tom Hartley, Adam Fraser and Euan Mckenzie for proving invaluable feedback during the development of the app. If you have any feedback on dataXplore please contact Francesca Mancini or open an issue on GitHub.

 

Reference

  • Boyd, R.J., Powney, G.D., Carvell, C., & Pescott, O.L. 2021. occAssess: An R package for assessing potential biases in species occurrence data. Ecology and Evolution, 11(22): 16177–16187. doi.org/10.1002/ece3.8299