In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single endto-end framework. Under this paradigm, both tasks are accomplished by operating over a shared global feature map extracted from the input image. Among the main challenges that end-to-end approaches face is the performance degradation when recognizing text across scale variations (smaller or larger text), and arbitrary word rotation angles. In this work, we address these challenges by proposing a novel global-to-local attention mechanism for text spotting, termed GLASS , that fuses together global and local features. The global features are extracted from the shared backbone, preserving contextual information from the entire image, while the local features are computed individually on resized, high resolution rotated word crops. The information extracted from the local crops alleviates much of the inherent difficulties with scale and word rotation. We show a performance analysis across scales and angles, highlighting improvement over scale and angle extremities. In addition, we introduce an orientation-aware loss term supervising the detection task, and show its contribution to both detection and recognition performance across all angles. Finally, we show that GLASS is general by incorporating it into other leading text spotting architectures, improving their text spotting performance. Our method achieves state-of-the-art results on multiple benchmarks, including the newly released TextOCR.
Plankton interact with the environment according to their size and three-dimensional (3D) structure. To study them outdoors, these translucent specimens are imaged in situ. Light projects through a specimen in each image. The specimen has a random scale, drawn from the population’s size distribution and random unknown pose. The specimen appears only once before drifting away. We achieve 3D tomography using such a random ensemble to statistically estimate an average volumetric distribution of the plankton type and specimen size. To counter errors due to non-rigid deformations, we weight the data, drawing from advanced models developed for cryo-electron microscopy. The weights convey the confidence in the quality of each datum. This confidence relies on a statistical error model. We demonstrate the approach on live plankton using an underwater field microscope.
<p>The climate is strongly affected by interaction with clouds. To reduce major errors in climate predictions, this interaction requires a much finer understanding of cloud physics than current knowledge. Current knowledge is based on empirical remote sensing data that is analyzed under the assumption that the atmosphere and clouds are made of very broad and uniform layers. To help to overcome this problem, 3D scattering computed tomography (CT) has been suggested as a way to study clouds.&#160;</p><p>CT is a powerful way to recover the inner structure of three dimensional (3D) volumetric heterogeneous objects. CT has extensive use in many research and operational domains. Aside from its common usage in medicine, CT is used for sensing geophysical terrestrial structures, atmospheric pollution and fluid dynamics. CT requires imaging from multiple directions and in nearly all CT approaches, the object is considered static during image acquisition. However, in many cases, the object changes while multi-view images are acquired sequentially. Thus, an effort has been invested to expand 3D CT to four-dimensional (4D) spatiotemporal CT. This effort has been directed at linear CT modalities. Since linear CT is computationally easier to handle, it has been a popular method for medical imaging. However, these linear CT modalities do not apply to clouds: clouds constitute a scattering medium, and therefore radiative transfer is non-linear in the clouds&#8217; content.</p><p>This work focuses on the challenge of 4D scattering CT of clouds. Scattering CT of clouds requires high-resolution multi-view images from space. There are spaceborne and high-altitude systems that may provide such data, for example AirMSPI, MAIA, HARP and AirHARP. An additional planned system is the CloudCT formation, funded by the ERC. However, these systems are costly. Deploying them in large numbers to simultaneously acquire images of the same clouds from many angles can be impractical. Therefore, the platforms are planned to move above the clouds: a sequence of images is taken, in order to span and sample a wide angular breadth. However, the clouds evolve while the angular span is sampled.</p><p>We pose conditions under which this task can be performed. These regard temporal sampling and angular breadth, in relation to the correlation time of the evolving cloud. Then, we generalize scattering CT. The generalization seeks spatiotemporal recovery of the cloud extinction field in high resolution (10m), using data taken by a small number of moving cameras. We present an optimization-based method to reach this, and then demonstrate the method both in rigorous simulations and on real data.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.