State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different augmented "views" of a sample. Because these approaches try to match views of the same sample, they can be too myopic and fail to produce meaningful results when augmentations are not sufficiently rich. This motivates the use of the dataset itself to find similar, yet distinct, samples to serve as views for one another. In this paper, we introduce Mine Your Own vieW (MYOW), a new approach for building across-sample prediction into SSL. The idea behind our approach is to actively mine views, finding samples that are close in the representation space of the network, and then predict, from one sample's latent representation, the representation of a nearby sample. In addition to showing the promise of MYOW on standard datasets used in computer vision, we highlight the power of this idea in a novel application in neuroscience where rich augmentations are not already established. When applied to neural datasets, MYOW outperforms other self-supervised approaches in all examples (in some cases by more than 10%), and surpasses the supervised baseline for most datasets. By learning to predict the latent representation of similar samples, we show that it is possible to learn good representations in new domains where augmentations are still limited.
Meaningful and simplified representations of neural activity can yield insights into how and what information is being processed within a neural circuit. However, without labels, finding representations that reveal the link between the brain and behavior can be challenging. Here, we introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE. Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state). These transformed (or augmented) views are created by dropping out neurons and jittering samples in time, which intuitively should lead the network to a representation that maintains both temporal consistency and invariance to the specific neurons used to represent the neural state. Through evaluations on both synthetic data and neural recordings from hundreds of neurons in different primate brains, we show that it is possible to build representations that disentangle neural datasets along relevant latent dimensions linked to behavior.
With the emergence of naked-eye 3D mobile devices, mobile 3D video services are becoming increasingly important for video service providers, such as Youtube and Netflix, while multi-view 3D videos have the potential to inspire a variety of innovative applications. However, enabling multi-view 3D video services may overwhelm WiFi networks when every view of a video are multicasted. In this paper, therefore, we propose to incorporate depth-image-based rendering (DIBR), which allows each mobile client to synthesize the desired view from nearby left and right views, in order to effectively reduce the bandwidth consumption. Moreover, when each client suffers from packet losses, retransmissions incur additional bandwidth consumption and excess delay, which in turn undermines the quality of experience in video applications. To address the above issue, we first discover the merit of view protection via DIBR for multi-view video multicast using a mathematical analysis and then design a new protocol, named Multi-View Group Management Protocol (MVGMP), to support the dynamic join and leave of users and the change of desired views. The simulation results demonstrate that our protocol effectively reduces bandwidth consumption and increases the probability for each client to successfully playback the desired views in a multi-view 3D video.
Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on transport, however, OT can be fragile to outliers or noise, especially in high dimensions. Here, we introduce a new form of structured OT that simultaneously learns low-dimensional structure in data while leveraging this structure to solve the alignment task. Compared with OT, the resulting transport plan has better structural interpretability, highlighting the connections between individual data points and local geometry, and is more robust to noise and sampling. We apply the method to synthetic as well as real datasets, where we show that our method can facilitate alignment in noisy settings and can be used to both correct and interpret domain shift.
With the emergence of naked-eye 3D mobile devices, mobile 3D video services become increasingly important for video service providers, such as Youtube and Netflix, while multi-view 3D videos are potential to bring out varied innovative applications. However, enabling multi-view 3D video services may overwhelm WiFi networks when we multicast every view of a video. In this paper, therefore, we propose to incorporate depth-image-based rendering (DIBR), which allows each mobile client to synthesize the desired view from nearby left and right views, to effectively reduce the bandwidth consumption. Moreover, due to varied channel conditions, each client may suffer from different packet loss probabilities, and retransmissions incur additional bandwidth consumption. To address this issue, we first analyze the merit of view protection via DIBR for multi-view video multicast and then design a new protocol, named Multi-View Group Management Protocol (MVGMP), for the dynamic group management of multicast users. Simulation results manifest that our protocol effectively reduces the bandwidth consumption and increases the probability for each client to successfully playback the desired view of a multi-view 3D video.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.