Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.513
|View full text |Cite
|
Sign up to set email alerts
|

Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework

Abstract: We introduce a deep learning model to learn the set of enumerated job skills associated with a job description. In our analysis of a large-scale government job portal mycareersfuture.sg, we observe that as much as 65% of job descriptions miss describing a significant number of relevant skills. Our model addresses this task from the perspective of an extreme multi-label classification (XMLC) problem, where descriptions are the evidence for the binary relevance of thousands of individual skills. Building upon th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 13 publications
0
24
0
Order By: Relevance
“…As can be seen in Table 1, many works do not release their data (apart from Sayfullina et al, 2018 andBhola et al, 2020) and none release their annotation guidelines. In addition, none of the previous studies approach SE as a span-level extraction task with state-of-the-art language models, nor did 1 https://github.com/kris927b/SkillSpan they release a dataset of this magnitude with manually annotated (long) spans of competences by domain experts.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As can be seen in Table 1, many works do not release their data (apart from Sayfullina et al, 2018 andBhola et al, 2020) and none release their annotation guidelines. In addition, none of the previous studies approach SE as a span-level extraction task with state-of-the-art language models, nor did 1 https://github.com/kris927b/SkillSpan they release a dataset of this magnitude with manually annotated (long) spans of competences by domain experts.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work in SE shows promising progress, but is halted by a lack of available datasets and annotation guidelines. Two out of 14 studies release their dataset, which limit themselves to crowd-sourced labels (Sayfullina et al, 2018) or annotations from a predefined list of skills on the document-level (Bhola et al, 2020). Additionally, none of the 14 previously mentioned studies release their annotation guidelines, which obscures the meaning of a competence.…”
Section: Introductionmentioning
confidence: 99%
“…In our context, for example, the "labels" that curricular and job content share are skills. Due to the sheer volume of possible skills, this becomes an extreme multilabel classification task (XMC) as recently pointed out by Bhola et al (2020). In their closely relevant work, BERT models are employed to learn an embedding for a job description and XMC models then classify each embedding into a subset of skills over a large pool of predetermined skills set.…”
Section: Natural Language Processingmentioning
confidence: 99%
“…Therefore, computing Equation (1) translates into the task of predicting skills from the content of syllabi. There can be multiple NLP approaches for this task, and here we present an example inspired by Bhola et al (2020) who frame skill identification as a multilabel classification problem. Specifically, we describe a BERT-LSTM architecture as illustrated in Figure 1.…”
Section: Economic Dimension: Skill Overlapmentioning
confidence: 99%
See 1 more Smart Citation