Inspired by the success of the General Language Understanding Evaluation benchmark, we introduce the Biomedical Language Understanding Evaluation (BLUE) benchmark to facilitate research in the development of pre-training language representations in the biomedicine domain. The benchmark consists of five tasks with ten datasets that cover both biomedical and clinical texts with different dataset sizes and difficulties. We also evaluate several baselines based on BERT and ELMo and find that the BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieves the best results. We make the datasets, pre-trained models, and codes publicly available at https://github.com/ ncbi-nlp/BLUE_Benchmark.
Holographic displays promise unprecedented capabilities for direct-view displays as well as virtual and augmented reality applications. However, one of the biggest challenges for computer-generated holography (CGH) is the fundamental tradeoff between algorithm runtime and achieved image quality, which has prevented high-quality holographic image synthesis at fast speeds. Moreover, the image quality achieved by most holographic displays is low, due to the mismatch between the optical wave propagation of the display and its simulated model. Here, we develop an algorithmic CGH framework that achieves unprecedented image fidelity and real-time framerates. Our framework comprises several parts, including a novel camera-in-the-loop optimization strategy that allows us to either optimize a hologram directly or train an interpretable model of the optical wave propagation and a neural network architecture that represents the first CGH algorithm capable of generating full-color high-quality holographic images at 1080p resolution in real time.
Fig. 1. One of the applications of the proposed end-to-end computational camera design paradigm is achromatic extended depth of field. When capturing an image with a regular singlet lens (top left), out-of-focus regions are blurry and chromatic aberrations further degrade the image quality. With our framework, we optimize the profile of a refractive optical element that achieves both depth and chromatic invariance. This element is fabricated using diamond turning (right) or using photolithography. After processing an image recorded with this optical element using a simple Wiener deconvolution, we obtain an all-in-focus image with little chromatic aberrations (top center). Point spread functions for both the regular lens and the optimized optical element are shown in the bottom. In this paper, we explore several applications that demonstrate the efficacy of our novel approach to domain-specific computational camera design.
Near-eye displays using holographic projection are emerging as an exciting display approach for virtual and augmented reality at high-resolution without complex optical setups --- shifting optical complexity to computation. While precise phase modulation hardware is becoming available, phase retrieval algorithms are still in their infancy, and holographic display approaches resort to heuristic encoding methods or iterative methods relying on various relaxations.
In this work, we depart from such existing approximations and solve the phase retrieval problem for a hologram of a scene at a single depth at a given time by revisiting complex Wirtinger derivatives, also extending our framework to render 3D volumetric scenes. Using Wirtinger derivatives allows us to pose the phase retrieval problem as a quadratic problem which can be minimized with first-order optimization methods. The proposed Wirtinger Holography is flexible and facilitates the use of different loss functions, including learned perceptual losses parametrized by deep neural networks, as well as stochastic optimization methods. We validate this framework by demonstrating holographic reconstructions with an order of magnitude lower error, both in simulation and on an experimental hardware prototype.
We posit that user behavior during natural viewing of images contains an abundance of information about the content of images as well as information related to user intent and user defined content importance. In this paper, we conduct experiments to better understand the relationship between images, the eye movements people make while viewing images, and how people construct natural language to describe images. We explore these relationships in the context of two commonly used computer vision datasets. We then further relate human cues with outputs of current visual recognition systems and demonstrate prototype applications for gaze-enabled detection and annotation.
Typical camera optics consist of a system of individual elements that are designed to compensate for the aberrations of a single lens. Recent computational cameras shift some of this correction task from the optics to post-capture processing, reducing the imaging optics to only a few optical elements. However, these systems only achieve reasonable image quality by limiting the field of view (FOV) to a few degrees - effectively ignoring severe off-axis aberrations with blur sizes of multiple hundred pixels.
In this paper, we propose a lens design and learned reconstruction architecture that lift this limitation and provide an order of magnitude increase in field of view using only a single thin-plate lens element. Specifically, we design a lens to produce spatially shift-invariant point spread functions, over the full FOV, that are tailored to the proposed reconstruction architecture. We achieve this with a mixture PSF, consisting of a peak and and a low-pass component, which provides residual contrast instead of a small spot size as in traditional lens designs. To perform the reconstruction, we train a deep network on captured data from a display lab setup, eliminating the need for manual acquisition of training data in the field. We assess the proposed method in simulation and experimentally with a prototype camera system.
We compare our system against existing single-element designs, including an aspherical lens and a pinhole, and we compare against a complex multielement lens, validating high-quality large field-of-view (i.e. 53°) imaging performance using only a single thin-plate element.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.