Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework

Bhola, Akshay; Halder, Kishaloy; Prasad, Animesh; Kan, Min-Yen

doi:10.18653/v1/2020.coling-main.513

Cited by 21 publications

(24 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As can be seen in Table 1, many works do not release their data (apart from Sayfullina et al, 2018 andBhola et al, 2020) and none release their annotation guidelines. In addition, none of the previous studies approach SE as a span-level extraction task with state-of-the-art language models, nor did 1 https://github.com/kris927b/SkillSpan they release a dataset of this magnitude with manually annotated (long) spans of competences by domain experts.…”

Section: Related Workmentioning

confidence: 99%

“…Previous work in SE shows promising progress, but is halted by a lack of available datasets and annotation guidelines. Two out of 14 studies release their dataset, which limit themselves to crowd-sourced labels (Sayfullina et al, 2018) or annotations from a predefined list of skills on the document-level (Bhola et al, 2020). Additionally, none of the 14 previously mentioned studies release their annotation guidelines, which obscures the meaning of a competence.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Zhang¹,

Jensen²,

Sonniks³

et al. 2022

Preprint

View full text Add to dashboard Cite

Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels on the span-level or labels from a predefined skill inventory. To address this gap, we introduce SKILLSPAN, a novel SE dataset consisting of 14.5K sentences and over 12.5K annotated spans. We release its respective guidelines created over three different sources annotated for hard and soft skills by domain experts. We introduce a BERT baseline (Devlin et al., 2019). To improve upon this baseline, we experiment with language models that are optimized for long spans (Joshi et al., 2020;Beltagy et al., 2020), continuous pre-training on the job posting domain (Han and Eisenstein, 2019; Gururangan et al., 2020), and multi-task learning (Caruana, 1997). Our results show that the domainadapted models significantly outperform their non-adapted counterparts, and single-task outperforms multi-task learning. * Equal contribution.You will thrive working in a Dev/Sec Ops culture . SKILL KNOWLEDGEThe ability to manage large sections of guests . SKILL Knowledge of property law rules of Germany .

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Zhang¹,

Jensen²,

Sonniks³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In our context, for example, the "labels" that curricular and job content share are skills. Due to the sheer volume of possible skills, this becomes an extreme multilabel classification task (XMC) as recently pointed out by Bhola et al (2020). In their closely relevant work, BERT models are employed to learn an embedding for a job description and XMC models then classify each embedding into a subset of skills over a large pool of predetermined skills set.…”

Section: Natural Language Processingmentioning

confidence: 99%

“…Therefore, computing Equation (1) translates into the task of predicting skills from the content of syllabi. There can be multiple NLP approaches for this task, and here we present an example inspired by Bhola et al (2020) who frame skill identification as a multilabel classification problem. Specifically, we describe a BERT-LSTM architecture as illustrated in Figure 1.…”

Section: Economic Dimension: Skill Overlapmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 1st Workshop on NLP for Positive Impact

2021

View full text Add to dashboard Cite

Amidst rising mental health needs in society, virtual agents are increasingly deployed in counselling. In order to give pertinent advice, counsellors must first gain an understanding of the issues at hand by eliciting sharing from the counsellee. It is thus important for the counsellor chatbot to encourage the user to open up and talk. One way to sustain the conversation flow is to acknowledge the counsellee's key points by restating them, or probing them further with questions. This paper applies models from two closely related NLP tasks -summarization and question generation -to restatement and question generation in the counselling context. We conducted experiments on a manually annotated dataset of Cantonese post-reply pairs on topics related to loneliness, academic anxiety and test anxiety. We obtained the best performance in both restatement and question generation by finetuning BertSum, a state-of-the-art summarization model, with the in-domain manual dataset augmented with a large-scale, automatically mined open-domain dataset.

show abstract

Methodology and Empirical Strategy

Nuccio¹,

Mogno²

2023

Contributions to Management Science

View full text Add to dashboard Cite

Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework

Cited by 21 publications

References 13 publications

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Proceedings of the 1st Workshop on NLP for Positive Impact

Methodology and Empirical Strategy

Contact Info

Product

Resources

About