Tim Meinhardt scite author profile

The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-bydetection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.

show abstract

Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems

Meinhardt

et al. 2017

View full text Add to dashboard Cite

While variational methods have been among the most powerful tools for solving linear inverse problems in imaging, deep (convolutional) neural networks have recently taken the lead in many challenging benchmarks. A remaining drawback of deep learning approaches is their requirement for an expensive retraining whenever the specific problem, the noise level, noise type, or desired measure of fidelity changes. On the contrary, variational methods have a plug-and-play nature as they usually consist of separate data fidelity and regularization terms.In this paper we study the possibility of replacing the proximal operator of the regularization used in many convex energy minimization algorithms by a denoising neural network. The latter therefore serves as an implicit natural image prior, while the data term can still be chosen independently. Using a fixed denoising neural network in exemplary problems of image deconvolution with different blur kernels and image demosaicking, we obtain state-of-the-art reconstruction results. These indicate the high generalizability of our approach and a reduction of the need for problemspecific training. Additionally, we discuss novel results on the analysis of possible optimization algorithms to incorporate the network into, as well as the choices of algorithm parameters and their relation to the noise level the neural network is trained on.

show abstract

TrackFormer: Multi-Object Tracking with Transformers

Meinhardt¹,

Kirillov²,

Leal-Taixé³

et al. 2021

Preprint

128

View full text Add to dashboard Cite

We present TrackFormer, an end-to-end multi-object tracking and segmentation model based on an encoderdecoder Transformer architecture. Our approach introduces track query embeddings which follow objects through a video sequence in an autoregressive fashion. New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time. The Transformer decoder adjusts track query embeddings from frame to frame, thereby following the changing object positions. TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm by self-and encoder-decoder attention mechanisms which simultaneously reason about location, occlusion, and object identity. TrackFormer yields state-ofthe-art performance on the tasks of multi-object tracking (MOT17) and segmentation (MOTS20). We hope our unified way of performing detection and tracking will foster future research in multi-object tracking and video understanding. Code will be made publicly available.

show abstract

TrackFormer: Multi-Object Tracking with Transformers

et al. 2022

View full text Add to dashboard Cite

Holismokes

Cañameras

Schuldt

Suyu

et al. 2020

A&A

View full text Add to dashboard Cite

We present a systematic search for wide-separation (with Einstein radius θE ≳ 1.5″), galaxy-scale strong lenses in the 30 000 deg2 of the Pan-STARRS 3π survey on the Northern sky. With long time delays of a few days to weeks, these types of systems are particularly well-suited for catching strongly lensed supernovae with spatially-resolved multiple images and offer new insights on early-phase supernova spectroscopy and cosmography. We produced a set of realistic simulations by painting lensed COSMOS sources on Pan-STARRS image cutouts of lens luminous red galaxies (LRGs) with redshift and velocity dispersion known from the sloan digital sky survey (SDSS). First, we computed the photometry of mock lenses in gri bands and applied a simple catalog-level neural network to identify a sample of 1 050 207 galaxies with similar colors and magnitudes as the mocks. Second, we trained a convolutional neural network (CNN) on Pan-STARRS gri image cutouts to classify this sample and obtain sets of 105 760 and 12 382 lens candidates with scores of pCNN > 0.5 and > 0.9, respectively. Extensive tests showed that CNN performances rely heavily on the design of lens simulations and the choice of negative examples for training, but little on the network architecture. The CNN correctly classified 14 out of 16 test lenses, which are previously confirmed lens systems above the detection limit of Pan-STARRS. Finally, we visually inspected all galaxies with pCNN > 0.9 to assemble a final set of 330 high-quality newly-discovered lens candidates while recovering 23 published systems. For a subset, SDSS spectroscopy on the lens central regions proves that our method correctly identifies lens LRGs at z ∼ 0.1–0.7. Five spectra also show robust signatures of high-redshift background sources, and Pan-STARRS imaging confirms one of them as a quadruply-imaged red source at zs = 1.185, which is likely a recently quenched galaxy strongly lensed by a foreground LRG at zd = 0.3155. In the future, high-resolution imaging and spectroscopic follow-up will be required to validate Pan-STARRS lens candidates and derive strong lensing models. We also expect that the efficient and automated two-step classification method presented in this paper will be applicable to the ∼4 mag deeper gri stacks from the Rubin Observatory Legacy Survey of Space and Time (LSST) with minor adjustments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tim Meinhardt

Tracking Without Bells and Whistles

Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems

TrackFormer: Multi-Object Tracking with Transformers

TrackFormer: Multi-Object Tracking with Transformers

Holismokes

Contact Info

Product

Resources

About