Further information on publisher's website:https://doi.org/10.1007/978-3-319-59063-9 7Publisher's copyright statement:The nal publication is available at Springer via https://doi.org/10.1007/978-3-319-59063-97Additional information:
Use policyThe full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-pro t purposes provided that:• a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.Please consult the full DRO policy for further details. Abstract. Existing object detection frameworks in the deep learning field generally over-detect objects, and use non-maximum suppression (NMS) to filter out excess detections, leaving one bounding box per object. This works well so long as the ground-truth bounding boxes do not overlap heavily, as would be the case with objects that partially occlude each other, or are packed densely together. In these cases it would be beneficial, and more elegant, to have a fully end-to-end system that outputs the correct number of objects without requiring a separate NMS stage. In this paper we discuss the challenges involved in solving this problem, and demonstrate preliminary results from a prototype system.