Word sense disambiguation (WSD) is a basic task of natural language processing (NLP) and its purpose to choose the correct sense of an ambiguous word according to its context. In biomedical WSD, recent research has used context embeddings built by concatenating or averaging word embeddings to represent the sense of a context. These simple linear operations on neighbor words ignore the information about the sequence and may cause their models to be flawed in semantic representation. In this paper, we present a novel language model based on Bi-LSTM to embed an entire sentential context in continuous space by taking account of word order. We demonstrate that our language model can generate high-quality context representations in an unsupervised manner. Unlike the previous work that directly predicts the word senses, our model classifies a word in a context by building sense embeddings and this helps us set a new state-of-the-art result (macro/micro average) on both MSH and NLM datasets. In addition, with the same language model, we propose semi-supervised learning based on label propagation (LP) to reduce the dependence on biomedical data. The results show that this method can nearly approach the state-of-the-art results produced by our Bi-LSTM when reducing the labeled training data. INDEX TERMS Word sense disambiguation, semi-supervised learning, context embedding, biomedical domain.
How deep neural networks (DNNs) learn from noisy labels has been studied extensively in image classification but much less in image segmentation. So far, our understanding of the learning behavior of DNNs trained by noisy segmentation labels remains limited. In this study, we address this deficiency in both binary segmentation of biological microscopy images and multi-class segmentation of natural images. We generate extremely noisy labels by randomly sampling a small fraction (e.g., 10%) or flipping a large fraction (e.g., 90%) of the ground truth labels. When trained with these noisy labels, DNNs provide largely the same segmentation performance as trained by the original ground truth. This indicates that DNNs learn structures hidden in labels rather than pixel-level labels per se in their supervised training for semantic segmentation. We refer to these hidden structures in labels as meta-structures. When DNNs are trained by labels with different perturbations to the meta-structure, we find consistent degradation in their segmentation performance. In contrast, incorporation of meta-structure information substantially improves performance of an unsupervised segmentation model developed for binary semantic segmentation. We define meta-structures mathematically as spatial density distributions and show both theoretically and experimentally how this formulation explains key observed learning behavior of DNNs.
To identify and localize macromolecules of interest in crowded intracellular environment, the low signal-to-noise ratio and missing imaging wedge of cryo-electron tomography (cryo-ET) data pose substantial technical challenges. Currently, mainstream approaches of 3D particle picking in cryo-ET either follow the 'segment-then-cluster' strategy, or extract potential structural regions as sub-tomograms and then perform classification. Different from these two-step methods, we solve the problem using a one-step instance segmentation approach, termed 3D-SOLOv2. Specifically, the category and mask of each 3D particle are predicted according to the particle's location and size. To solve the lack of real masks for 3D particles in cryo-ET, a Gaussian-shaped mask is proposed to approximate real masks. When tested on simulated datasets of SHREC2020 challenge, our model achieves the fastest inference speed and the state-ofthe-art performance for both localization and classification tasks. When tested on real cryo-ET dataset of EMPIAR-10045, our model also achieves better performance than other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.