Embedded-State Latent Conditional Random Fields for Sequence Labeling

Thai, Dung; Ramesh, Sree Harsha; Murty, Shikhar; Vilnis, Luke; McCallum, Andrew

doi:10.18653/v1/k18-1001

Cited by 4 publications

(7 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We train a variety of citation field extraction models including one based on RoBERTa [Liu et al, 2019]. We show that this model trained only on the UMass CFE dataset matches state-ofthe-art results [Thai et al, 2018]. We then show that training the BERT-based model on our large automatically generated dataset drastically improves the results, outperforming the state of the art approach by 1.2 points of F1, a 24.48% relative reduction in error.…”

Section: Introductionmentioning

confidence: 87%

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

Thai¹,

Xu²,

Monath³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Accurate parsing of citation reference strings is crucial to automatically construct scholarly databases such as Google Scholar or Semantic Scholar. Citation field extraction (CFE) is precisely this task-given a reference label which tokens refer to the authors, venue, title, editor, journal, pages, etc. Most methods for CFE are supervised and rely on training from labeled datasets that are quite small compared to the great variety of reference formats. BIBT E X, the widely used reference management tool, provides a natural method to automatically generate and label training data for CFE. In this paper, we describe a technique for using BIBT E X to generate, automatically, a large-scale (41M labeled strings), labeled dataset, that is four orders of magnitude larger than the current largest CFE dataset, namely the UMass Citation Field Extraction dataset [Anzaroot and McCallum, 2013]. We experimentally demonstrate how our dataset can be used to improve the performance of the UMass CFE using a RoBERTa-based [Liu et al., 2019] model. In comparison to previous SoTA, we achieve a 24.48% relative error reduction, achieving span level F1-scores of 96.3%.

show abstract

Section: Introductionmentioning

confidence: 87%

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

Thai¹,

Xu²,

Monath³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Low-rank structure has been explored in both HMMs [Siddiqi et al, 2009], a generalization of PCFGs called weighted tree automata [Rabusseau et al, 2015], and conditional random fields [Thai et al, 2018]. The reduced-rank HMM [Siddiqi et al, 2009] has at most 50 states, and relies on spectral methods for training.…”

Section: Related Workmentioning

confidence: 99%

“…We extend the low-rank assumption to neural parameterizations, which have been shown to be effective for generalization [Kim et al, 2019, Chiu andRush, 2020], and directly optimize the evidence via gradient descent. Finally, Thai et al [2018] do not take advantage of the low-rank parameterization of their CRF potentials for faster inference via low-rank matrix products, a missed opportunity. Instead, the low-rank parameterization is used only as a regularizer, with the full potentials instantiated during inference.…”

Section: Related Workmentioning

confidence: 99%

Low-Rank Constraints for Fast Inference in Structured Models

Chiu¹,

Deng²,

Rush³

2022

Preprint

View full text Add to dashboard Cite

Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups. * Equal contribution Code is available here.35th Conference on Neural Information Processing Systems (NeurIPS 2021).

show abstract

“…LDCRF has been shown to outperform HMM, CRF and HCRF on several sequence labeling tasks Sun et al, 2008). Thai et al (2018) recently proposed a very similar model called Embedded-State Latent CRFs. They claim to factorize the log potential as the novelty over a LDCRF, however such factorization is not reflected in their model structure and mathematical descriptions.…”

Section: Literature Reviewmentioning

confidence: 99%

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Neogi¹,

Dauwels²

2019

Preprint

View full text Add to dashboard Cite

Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets -UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose difficulty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ∼4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ∼2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.

show abstract

Embedded-State Latent Conditional Random Fields for Sequence Labeling

Cited by 4 publications

References 16 publications

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

Low-Rank Constraints for Fast Inference in Structured Models

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Contact Info

Product

Resources

About