2021
DOI: 10.48550/arxiv.2105.12848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition

Abstract: We study the problem of learning a named entity recognition (NER) tagger using noisy labels from multiple weak supervision sources. Though cheap to obtain, the labels from weak supervision sources are often incomplete, inaccurate, and contradictory, making it difficult to learn an accurate NER model. To address this challenge, we propose a conditional hidden Markov model (CHMM), which can effectively infer true labels from multi-source noisy labels in an unsupervised way. CHMM enhances the classic hidden Marko… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 34 publications
(14 reference statements)
0
9
0
Order By: Relevance
“…Early named entity recognition tasks were generally based on manually constructed rules and dictionary methods [9],but entity recognition is heavily reliant on human intervention, which often leads to limitations in terms of cost control and portability. With the development of machine learning techniques, researchers have utilized statistical machine learning algorithms to construct mathematical models, such as Hidden Markov Models (HMM) [10,11] and Support Vector Machines (SVM) [12][13][14]. Although these models have achieved good results in some corpora, they can only extract partial contextual information.…”
Section: Related Workmentioning
confidence: 99%
“…Early named entity recognition tasks were generally based on manually constructed rules and dictionary methods [9],but entity recognition is heavily reliant on human intervention, which often leads to limitations in terms of cost control and portability. With the development of machine learning techniques, researchers have utilized statistical machine learning algorithms to construct mathematical models, such as Hidden Markov Models (HMM) [10,11] and Support Vector Machines (SVM) [12][13][14]. Although these models have achieved good results in some corpora, they can only extract partial contextual information.…”
Section: Related Workmentioning
confidence: 99%
“…(1) In probabilistic graphical model approach (and in addition to the HMM-based models [20,21,26,36,39]), Rodrigues et al [32] in early 2014 used a partially directed graph containing a CRF for modeling to solve the truth inference from crowdsourcing labels; (2) In deep learning model approach (and in addition to the "source-specific perturbation" methods [17,26,46]), other methods [17,[33][34][35] are either based on the end-to-end deep neural architecture [33], or the customized optimization objective along with coordinate ascent optimization technology [34,35], or the iterative solving framework similar to expectation-maximization algorithm [4]. However, all these methods do not have the advantages of the recently proposed neuralized HMM-based graphical models [18,19] and our Neural-Hidden-CRF in principled modeling for variants of interest and in harnessing the context information that provided by advanced deep learning models. Additionally, it is worth mentioning the presence of numerous established WS methods that address the normal independent classification scenario [3,5,[43][44][45].…”
Section: Related Workmentioning
confidence: 99%
“…This global normalization approach is unlike all HMMs, where (y (𝑖 ) , t (𝑖 ) ) is split into multiple uni-directional dependent random variables (i.e., a set consisting of many (𝑦 (𝑖,𝑗 ) 𝑙 , 𝑑 (𝑖 ) 𝑙 )) based on some strict independence assumptions and each conditional probability distribution between random variables is normalized (e.g., local normalization for per-step in Li et al [18,19]) to further obtain the probability of the joint distribution (y (𝑖 ) , t (𝑖 ) ). Simply put, our approach models/trains holistically (the learned knowledge is global), while the HMMs [18,19] decompose the modeling into multiple uni-directional dependent local regions and model the patterns for the scale of a step (the learned knowledge is local). As a result of the holistic undirected graphical modeling, our method can result in model parameters that are not constrained by probabilistic forms, thus enjoying more flexible scoring.…”
Section: 𝑐𝑗mentioning
confidence: 99%
See 2 more Smart Citations