Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the inexpensive collection of large annotated training data, which can be further leveraged to build a complementary automated disambiguation system capable of inferring patterns for identifying publications written by the same person. Building on more than 1 million publicly released crowdsourced annotations, we propose an automated author disambiguation solution exploiting this data (i) to learn an accurate classifier for identifying coreferring authors and (ii) to guide the clustering of scientific publications by distinct authors in a semi-supervised way. To the best of our knowledge, our analysis is the first to be carried out on data of this size and coverage. With respect to the state of the art, we validate the general pipeline used in most existing solutions, and improve by: (i) proposing phonetic-based blocking strategies, thereby increasing recall; and (ii) adding strong ethnicity-sensitive features for learning a linkage function, thereby tailoring disambiguation to non-Western author names whenever necessary.
Modern microscopes create a data deluge with gigabytes of data generated each second, and terabytes per day. Storing and processing this data is a severe bottleneck, not fully alleviated by data compression. We argue that this is because images are processed as grids of pixels. To address this, we propose a content-adaptive representation of fluorescence microscopy images, the Adaptive Particle Representation (APR). The APR replaces pixels with particles positioned according to image content. The APR overcomes storage bottlenecks, as data compression does, but additionally overcomes memory and processing bottlenecks. Using noisy 3D images, we show that the APR adaptively represents the content of an image while maintaining image quality and that it enables orders of magnitude benefits across a range of image processing tasks. The APR provides a simple and efficient content-aware representation of fluosrescence microscopy images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.