MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

Zhang, Linhan; Chen, Qian; Wang, Wen; Deng, Chong; Zhang, Shiliang; Li, Bing; Wang, Wei; Cao, Xin

doi:10.18653/v1/2022.findings-acl.34

Cited by 24 publications

(16 citation statements)

References 39 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While KP-Miner achieved a better performance for 14 of the 17 datasets with an average of 6.08% in F1-score, RaKUn was improved by a cross-dataset average of 4.46% for 14 of the 17 datasets. We observed the most change in (Sun et al, 2020) -SIFRank+ (Sun et al, 2020) -MDERank (Zhang et al, 2022) --…”

Section: Pos-tag Patternsmentioning

confidence: 72%

“…A more recent method is SIFRank (Sun et al, 2020), which combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it was upgraded to SIFRank+ by position-biased weight to improve its performance for long documents. Lastly, MDERank (Zhang et al, 2022) considers the similarity between the embeddings of the source document and its masked version for candidate ranking.…”

Section: Unsupervised Ake Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Altuncu¹,

Nurse²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS-tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS-tagging step and two representative sources of semantic information -specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluate all candidate keywords following some context-specific and semantic-aware criteria. For five stateof-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100% in terms of improved cases) and significantly (between 10.2% and 53.8%, with an average of 25.8%, in terms of F1-score and across all five methods), especially when all the three enhancement steps

show abstract

Section: Pos-tag Patternsmentioning

confidence: 72%

Section: Unsupervised Ake Methodsmentioning

confidence: 99%

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Altuncu¹,

Nurse²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…First, we use the MDERank 46 algorithm to extract named entities. The MDERank algorithm selects keywords through candidate word masking and similarity calculation.…”

Section: Knowledge Graph Construction and Vulnerability Feature Analysismentioning

confidence: 99%

Domain Knowledge-Based Analysis of Linux Vulnerability Characteristics and Evolution

Wu,

Weng,

Sun

et al. 2024

Preprint

View full text Add to dashboard Cite

An operating system is the essence of software, serving as the foundation for the operation of various application software. The security of the operating system is crucial for the national informatization construction. Data indicates that many cybersecurity incidents result from exploiting security vulnerabilities in the operating system. Linux is currently the most widely used open-source operating system, with thousands of Common Vulnerabilities and Exposures (CVE) related to Linux system reported each year. Therefore, research and prevention of vulnerabilities in the Linux system are particularly important. To gain a better understanding of the characteristics of Linux system vulnerabilities, this paper leverages knowledge in the field of software security to analyze nearly 10,000 historical vulnerability data in two core systems of Linux: Linux Kernel and Debian Linux. The study explores the evolutionary patterns of vulnerability characteristics. Specific research contents include: (1) Data collection and cleaning of vulnerability data in Linux Kernel and Debian Linux systems; (2) Cross-statistical analysis of structured data features in vulnerability reports; (3) Unstructured data feature mining in vulnerability reports based on domain knowledge; (4) Analysis of the evolution of vulnerability characteristics. This paper provides empirical lessons and guidance for Linux system vulnerabilities to assist practitioners and researchers in better preventing and detecting vulnerabilities in Linux and Linux-based systems.

show abstract

“…One of the recent methods that has achieved the best performance is the methods that use deep learning algorithms such as [23]- [25]. The development of sentence embedding techniques also contributed to the emergence of AKE methods that use these techniques as [26]- [28].…”

Section: Keyphrases Extraction Approachesmentioning

confidence: 99%

“…The keyphrases are selected from among the candidate keyphrases that have the greatest cosine similarity to the document using the maximal margin relevance, to avoid repetition of extracting the same keyphrases. MDERank [28], an unsupervised method that uses BERT technique [50] to embed the document and its variants. The principle of MDERank is to create variants for the original document while masking some phrases in these variants.…”

Section: Present Keyphrases Extractionmentioning

confidence: 99%

Present and absent keyphrases extraction: an approach based on sentence embedding

Ajallouda

Zellou

2022

IJEECS

View full text Add to dashboard Cite

The automatic keyphrases extraction (AKE) of a document is any expression by which we can learn its content without having to read it. Keyphrases are exploited in natural language processing (NLP) applications. These phrases are often mentioned in the document but there may be some keyphrases that are not mentioned. In the field of AKE, researchers have exploited many techniques, such as statistical calculation, deep learning algorithms, graph representation, and sentence embedding techniques. Approaches that exploit embedding techniques calculate the similarity between a document and a candidate keyphrase, where similar phrases to the document are considered as keyphrases. Representing the document by a single vector makes its performance poor, especially in long documents. This is in addition to the inability of these methods to generate absent keyphrases. In order to overcome these problems, our paper proposes an unsupervised approach to AKE, based on the universal sentence encoder (USE) to represent candidate keyphrases and parts of the document probably containing keyphrases. Our method also generates keyphrases not mentioned in the text. We compared the performance of the proposed approach with other methods based on embedding techniques, where the results showed the superiority of our approach especially in long documents.

show abstract

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

Cited by 24 publications

References 39 publications

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Domain Knowledge-Based Analysis of Linux Vulnerability Characteristics and Evolution

Present and absent keyphrases extraction: an approach based on sentence embedding

Contact Info

Product

Resources

About