The paper presents a novel web-based platform for experimental workflow development in historical document digitisation and analysis. The platform has been developed as part of the IMPACT project, providing a range of tools and services for transforming physical documents into digital resources. It explains the main drivers in developing the technical framework and its architecture, how and by whom it can be used and presents some initial results. The main idea lies in setting up an interoperable and distributed infrastructure based on loose coupling of tools via web services that are wrapped in modular workflow templates which can be executed, combined and evaluated in many different ways. As the workflows are registered through a Web 2.0 environment, which is integrated with a workflow management system, users can easily discover, share, rate and tag workflows and thereby support the building of capacity across the whole community. Where ground truth is available, the workflow templates can also be used to compare and evaluate new methods in a transparent and flexible way.
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2%, 91.1%, and 93.9% with audio input only, visual input only, and both audio-visual input, respectively. The highest classification accuracy of 93.9%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5% compared with DCASE baseline.
We present DreamDrug, a crowdsourced dataset for detecting mentions of drugs in noisy user-generated item listings from darknet markets. Our dataset contains nearly 15,000 manually annotated drug entities in over 3,500 item listings scraped from the darknet market platform "DreamMarket" in 2017. We also train and evaluate baseline models for detecting these entities, using contextual language models fine-tuned in a few-shot setting and on the full dataset, and examine the effect of pretraining on in-domain unannotated corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.