e current state of the research in landmark recognition highlights the good accuracy which can be achieved by embedding techniques, such as Fisher vector and VLAD. All these techniques do not exploit spatial information, i.e. consider all the features and the corresponding descriptors without embedding their location in the image. is paper presents a new variant of the well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique which accounts, at a certain degree, for the location of features. e driving motivation comes from the observation that, usually, the most interesting part of an image (e.g., the landmark to be recognized) is almost at the center of the image, while the features at the borders are irrelevant features which do no depend on the landmark. e proposed variant, called locVLAD (location-aware VLAD), computes the mean of the two global descriptors: the VLAD executed on the entire original image, and the one computed on a cropped image which removes a certain percentage of the image borders.is simple variant shows an accuracy greater than the existing state-of-the-art approach. Experiments are conducted on two public datasets (ZuBuD and Holidays) which are used both for training and testing. Morever a more balanced version of ZuBuD is proposed. . 2017. A location-aware embedding technique for accurate landmark recognition.
This paper proposes a novel prediction tool for improving the compression performance of texture atlases. This algorithm, called Geometry-Aware (GA) intra coding, takes advantage of the topology of the associated 3D meshes, in order to reduce the redundancies in the texture map. For texture processing, the concept of the conventional intra prediction, used in video compression, has been adapted to consider neighboring information on the 3D surface. We have also studied how this prediction tool can be integrated into a complete coding solution. In particular, a block scanning strategy and a graph-based transform for residual coding have been proposed. Results show that the knowledge of the mesh topology significantly improves the compression efficiency of texture atlases 1 .
Omni-directional images are characterized by their high resolution (usually 8K) and therefore require high compression efficiency. Existing methods project the spherical content onto one or multiple planes and process the mapped content with classical 2D video coding algorithms. However, this projection induces sub-optimality. Indeed, after projection, the statistical properties of the pixels are modified, the connectivity between neighboring pixels on the sphere might be lost, and finally, the sampling is not uniform. Therefore, we propose to process uniformly distributed pixels directly on the sphere to achieve high compression efficiency. In particular, a scanning order and a prediction scheme are proposed to exploit, directly on the sphere, the statistical dependencies between the pixels. A Graph Fourier Transform is also applied to exploit local dependencies while taking into account the 3D geometry. Experimental results demonstrate that the proposed method provides up to 5.6% bitrate reduction and on average around 2% bitrate reduction over state-of-the-art methods.
Immersive visual experience can be obtained by allowing the user to navigate in a 360-degree visual content. These contents are stored in high resolution and need a lot of space on the server to store them. The transmission depends on the user's request and only the spatial region which is requested by the user is transmitted to avoid wasting network bandwidth. Therefore, storage and transmission rates are both critical. Splitting the rates into storage and transmission has not been formally considered in the literature for evaluating 360-degree content compression algorithms. In this paper, we propose a framework to evaluate the coding efficiency of 360-degree content while discriminating between storage and transmission rate and taking into account user dependency. This brings the flexibility to compare different coding methods based on the storage capacity on the server and network bandwidth of users.
A 3D mesh object is usually represented as a combination of several entities including geometrical information (i.e., the triangles and their position in space) and a texture atlas/map (i.e. a giant 2D image containing all the texture information that is mapped to the 3D object at the rendering stage). This atlas is usually compressed using a conventional 2D image coder, thus without taking into account the geometrical information. Moreover, the whole image is usually decoded even though only a subpart of the mesh is observed by a user. In this paper, we propose a novel approach to compress a texture atlas of a 3D model that enables random access during decoding, and nevertheless takes into account the correlation driven by the geometrical information. The experimental results demonstrate the benefits of the proposed coder.
Interactive video communication has been recently proposed for multi-view videos. In this scheme, the server has to store the views as compact as possible, while being able to transmit them independently to the users, who are allowed to navigate interactively among the views, hence requesting a subset of them. To achieve this goal, the compression must be done using a model-based coding in which the correlation between the predicted view generated on the user side and the original view has to be modeled by a statistical distribution. In this paper we propose a framework for lossless fixed-length source coding to select a model among a candidate set of models that incurs the lowest extra rate cost to the system. Moreover, in cases where the depth image is available, we provide a method to estimate the correlation model.
This study suggests an application of human-robot interaction based on three-dimensional real-time monocular head pose tracker in which active appearance models (AAMs) are utilised to extract facial features. In order to improve texture model, two probabilistic approaches are proposed for principal component analysis in the presence of missing values. It is observed that using the suggested Bayesian model not only increases the fitting accuracy of the model, but also reduces model parameters which may cause an increase in the speed of model fitting. Moreover, contrary to the common assumption in AAM, the gradient matrix must not be supposed to be constant. In this investigation, a method is suggested in which the gradient matrix is adapted with new images during model fitting of video sequences as much as possible. In the next step, by means of suggested methods, operator's head pose will be estimated by POSIT algorithm and by its implementation on PeopleBot robot, enhancement of the interaction between human and robot is presented in order to control the orientation of the robot camera.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.