We present a toolkit for markerless skeleton tracking and marker-based object tracking utilizing data fusion with an arbitrary number of depth cameras. As depth-camera based skeletal tracking is always inaccurate due to technology limitations, our goal was to be able to preestimate systematic errors for given tracking situations to improve fusion. Previous work analyzed various aspects of depth camera accuracy, however to our best knowledge, there has been neither systematic error modelling nor an application of such a model for skeletal fusion. Our paper presents such a model for the Kinect v2 camera, by using statistical modelling on capture datasets using such cameras and a marker-based ground truth capture system. By applying this model, we are able to improve the overall accuracy of the fusion output by 68% by predicting data quality with an error of around 3.2 cm. Our toolkit is available for use by other researchers to easily create larger capture spaces with higher tracking accuracy based on the error model when compared to single depth cameras.