Human life is populated with articulated objects. Current Category-level Articulation Pose Estimation (CAPE) methods are studied under the singleinstance setting with a fixed kinematic structure for each category. Considering these limitations, we reform this problem setting for real-world environments and suggest a CAPE-Real (CAPER) task setting. This setting allows varied kinematic structures within a semantic category, and multiple instances to co-exist in an observation of real world. To support this task, we build an articulated model repository ReArt-48 and present an efficient dataset generation pipeline, which contains Fast Articulated Object Modeling (FAOM) and Semi-Authentic MixEd Reality Technique (SAMERT). Accompanying the pipeline, we build a large-scale mixed reality dataset ReArtMix and a real world dataset ReArtVal. We also propose an effective framework ReArtNOCS that exploits RGB-D input to estimate part-level pose for multiple instances in a single forward pass. Extensive experiments demonstrate that the proposed ReArtNOCS can achieve good performance on both CAPER and CAPE settings. We believe it could serve as a strong baseline for future research on the CAPER task.
Aerial terrain mapping has been used for many years to monitor natural habitats and ecosystems, assist in urban planning, and monitor trends in land usage. Recent improvements in digital imaging, LiDAR, and synthetic aperture radar have facilitated the generation of 3-D terrain models for analysis in these applications. Unfortunately,thesesystemstypicallyrequirelargemannedaircraftandsignificant post-processing of data before viewable results are produced. This inhibits use of these technologies in time-critical applications such as disaster relief, autonomous obstacle avoidance, and landing-zone assessment for a vertical take-off and landing aircraft. This paper describes a wide-baseline stereo vision system that enables near-real-time generation of dense 3-D terrain maps. The key advantage of computational stereo vision over monocular structure-from-motion is that terrain can be reconstructed from a single synchronized pair of calibrated images. The paper describes a working prototype, and presents a novel approach for combining separate stereo maps into larger terrain mosaics. The new stereo system and algorithm have an accuracy rangingfrom56cmto65cmacrossthefieldofviewatanaltitudeof40m.Also, dense correlation of the imagery generates over 2200 points/m 2 . The system weighs just 3.1 kg, roughly one-fourth the weight of comparable high-altitude mapping systems, at ca. one-tenth the cost. The paper also describes potential implementations usingField-ProgrammableGateArrays(FPGAs)andApplication-SpecificIntegrated Circuits (ASICs) for real-time operation.
Using the National Standards' 5 C's as a framework, the authors examine student success in Spanish community service‐learning at making connections across academic disciplines, to information and viewpoints that they encounter in the community, and to concrete social action. We use data from students, instructors, and community partners involved in a community service‐learning Spanish course to present three cases: a student who made connections and took action, another who could not make connections beyond her own experience vis‐à‐vis the concept of poverty, and one representative case of a student who excelled as a student in every traditional context yet did not take action.
Abstract-Much research has emphasized stereo disparity as a source of depth information. To a lesser extent, camera vergence and lens focus have also been investigated for their utility in depth recovery. Each of these visual cues exhibits shortcomings when used individually in the sense that none alone can be used to reconstruct surfaces for real scenes that often cover a wide field of view and a large range of depth. This paper presents an approach to integration of these cues that attempts to exploit their complementary strengths and weaknesses through active control of camera focus and orientations. In addition, the aperture and zoom settings of the cameras are controlled. The result is an active vision system that dynamically and cooperatively interleaves image acquisition with surface estimation. A dense composite map of a single contiguous surface is synthesized by automatically scanning the surface and combining estimates of adjacent, local surface patches. This problem is formulated as one of minimizing a pair of objective functions. The first such function is concerned with the selection of a target for fixation. The second objective function guides the surface estimation process in the vicinity of the fixation point. Calibration parameters of the cameras are treated as variables during optimization, thus making camera calibration an integral, flexible component of surface estimation. An implementation of this method is described, and a performance evaluation of the system is presented. An average absolute error of less than 0.15% in estimated depth was achieved for a large surface having a depth of approximately 2 m.Index Terms-Active vision, camera calibration, fixation, range from focus, range from stereo, range from vergence, surface estimation, visual cue integration, visual target selection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.