This paper introduces DAN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization to support research on cross-lingual cross-domain learning for a less-resourced language. We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task. We compare transfer capabilities from German versus in-language annotation from scratch. We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER. Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data. Our results also show that an out-of-domain setup remains challenging, while performance on news plateaus quickly. This highlights the importance of cross-domain evaluation of cross-lingual transfer.
Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowdsourced labels on the span-level or labels from a predefined skill inventory. To address this gap, we introduce SKILLSPAN, a novel SE dataset consisting of 14.5K sentences and over 12.5K annotated spans. We release its respective guidelines created over three different sources annotated for hard and soft skills by domain experts. We introduce a BERT baseline (Devlin et al., 2019). To improve upon this baseline, we experiment with language models that are optimized for long spans (Joshi et al., 2020;Beltagy et al., 2020), continuous pre-training on the job posting domain (Han and Eisenstein, 2019; Gururangan et al., 2020), and multi-task learning (Caruana, 1997). Our results show that the domain-adapted models significantly outperform their non-adapted counterparts, and single-task outperforms multi-task learning. * Equal contribution.You will thrive working in a Dev/Sec Ops culture . SKILL KNOWLEDGEThe ability to manage large sections of guests . SKILL Knowledge of property law rules of Germany .
Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels on the span-level or labels from a predefined skill inventory. To address this gap, we introduce SKILLSPAN, a novel SE dataset consisting of 14.5K sentences and over 12.5K annotated spans. We release its respective guidelines created over three different sources annotated for hard and soft skills by domain experts. We introduce a BERT baseline (Devlin et al., 2019). To improve upon this baseline, we experiment with language models that are optimized for long spans (Joshi et al., 2020;Beltagy et al., 2020), continuous pre-training on the job posting domain (Han and Eisenstein, 2019; Gururangan et al., 2020), and multi-task learning (Caruana, 1997). Our results show that the domainadapted models significantly outperform their non-adapted counterparts, and single-task outperforms multi-task learning. * Equal contribution.You will thrive working in a Dev/Sec Ops culture . SKILL KNOWLEDGEThe ability to manage large sections of guests . SKILL Knowledge of property law rules of Germany .
In recent years, distributed optical fiber acoustic sensing (DAS) technology has been increasingly used for vertical seismic profile (VSP) exploration. Even though this technology has the advantages of high spatial resolution, strong resistance to high temperature and pressure variations, long sensing distance, DAS seismic noise has expanded from random noise to optical abnormal noise, fading noise and horizontal noise, etc. This seriously affects the quality of the seismic data and brings huge challenges to subsequent imaging, inversion and interpretation. Moreover, the noise is more complex and more difficult to simultaneously suppress using traditional methods. Therefore, for the purpose of effectively improving the signal-to-noise ratio (SNR) of DAS seismic data, we introduce a denoising network named attention-guided denoising convolutional neural network (ADNet). The network is composed of four blocks, including a sparse block (SB), a feature enhancement block (FEB), an attention block (AB) and a reconstruction block (RB). The network uses different kinds of convolutions alternately to enlarge the receptive field size and extract global feature of the input. Meanwhile, the attention mechanism is introduced to extract the hidden noise information in the complex background. The network predicts the noise, and denoised data are obtained by subtracting the predicted results from the noisy inputs. In addition, we uniquely construct a large number of complex forward models for pure seismic data training set to enhance the network suitability. The combination design improves the denoising performance and reduces computational cost and memory consumption. The results obtained from both synthetic- and field data illustrate that the network has the ability to denoise the seismic images and retrieve weak effective signals better than conventional methods and common networks.
This paper describes our system to assess humour intensity in edited news headlines as part of a participation in the 7th task of SemEval-2020 on "Humor, Emphasis and Sentiment". Various factors need to be accounted for in order to assess the funniness of an edited headline. We propose an architecture that uses hand-crafted features, knowledge bases and a language model to understand humour, and combines them in a regression model. Our system outperforms two baselines. In general, automatic humour assessment remains a difficult task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.