“…1 shows, this can help disambiguate two people partially occluding each other. These mid-level features, however, present fundamental challenges to the common network-flow, and related formulations of data association which use object detections (e.g., [14,13,2,19,21,17] ). In particular, a person to be tracked is typically represented by an unknown number of mid-level features, which split and merge in both space and time.…”