An accurate statement of the provenance of data is essential in biomedical research. Powerful data manipulation tools available in the tidyverse R package ecosystem (Wickham et al., 2019) provide the infrastructure to assemble, clean and filter data prior to statistical analysis. Manual documentation of the steps taken in the data pipeline and the provenance of data is a cumbersome and error prone task which may restrict reproducibility. dtrackr is a wrapper around a subset of the standard tidyverse data manipulation tools that allows automatic tracking of the processing steps applied to a data set, prior to statistical analysis. It allows early detection and reporting of data quality problems, and automatically documents a pipeline of data transformations as a flowchart in a format suitable for scientific publication, including, but not limited to CONSORT diagrams (Schulz et al., 2010).