Scaling Conditional Random Fields by One-Against-the-Other Decomposition

Zhang, Hai; Kit, Chunyu

doi:10.1007/s11390-008-9157-4

Cited by 3 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CRFs [4,9] are the state-of-the-art approaches in information extraction taking the sequence characteristics to do better labeling, which have been widely used in many fields, such as NLP tasks, IE. To better incorporate the two-dimensional neighborhood dependencies, Zhu et al [1] propose a two-dimensional Conditional Random Fields (2DCRFs) model for semantic annotation of Web objects.…”

Section: Related Workmentioning

confidence: 99%

2D Correlative-Chain Conditional Random Fields for Semantic Annotation of Web Objects

Dong

Peng

2010

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that strong two-dimensional sequence characteristics and correlative characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state-of-the-art approaches taking the sequence characteristics to do better labeling. However, as the appearance of correlative characteristics between Web object elements, previous CRFs have their limitations for semantic annotation of Web objects and cannot deal with the long distance dependencies between Web object elements efficiently. To better incorporate the long distance dependencies, on one hand, this paper describes long distance dependencies by correlative edges, which are built by making good use of structured information and the characteristics of records from external databases; and on the other hand, this paper presents a two-dimensional Correlative-Chain Conditional Random Fields (2DCC-CRFs) to do semantic annotation of Web objects. This approach extends a classic model, two-dimensional Conditional Random Fields (2DCRFs), by adding correlative edges. Experimental results using a large number of real-world data collected from diverse domains show that the proposed approach can significantly improve the semantic annotation accuracy of Web objects.

show abstract

Section: Related Workmentioning

confidence: 99%

2D Correlative-Chain Conditional Random Fields for Semantic Annotation of Web Objects

Dong

Peng

2010

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…The other is the conditional random fields (CRFs) model [23] for supervised segmentation via character tagging, conventionally trained only on a pre-segmented corpus. The latter is a state-of-the-art approach that has set new performance records in the field, as illustrated in [52,55], although its efficiency is yet to be further enhanced by various means [58,59]. All scores given by the goodness measures are discretized in the same way for use as feature values in the CRFs model.…”

Section: Introductionmentioning

confidence: 99%

Integrating unsupervised and supervised word segmentation: The role of goodness measures

Zhang

Kit

2011

Information Sciences

Self Cite

View full text Add to dashboard Cite

“…They adopted the construction features and time features of new words, but the experimental result was not good. Peng [10] regarded the process of Chinese word segmentation and new word identification as a unified step using the Conditional Random Field (CRF) model [11][12] , the method only detects new words and does not assign POS tags to new words. Peng [10] proved that the character-based model performs better than the wordbased model in new word identification.…”

Section: Introductionmentioning

confidence: 99%

Chinese New Word Identification: A Latent Discriminative Model with Global Features

孙晓

黄德根

宋海玉

et al. 2011

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Sun X, Huang DG, Song HY et al. Chinese new word identification: a latent discriminative model with global features. AbstractChinese new words are particularly problematic in Chinese natural language processing. With the fast development of Internet and information explosion, it is impossible to get a complete system lexicon for applications in Chinese natural language processing, as new words out of dictionaries are always being created. The procedure of new words identification and POS tagging are usually separated and the features of lexical information cannot be fully used. A latent discriminative model, which combines the strengths of Latent Dynamic Conditional Random Field (LDCRF) and semi-CRF, is proposed to detect new words together with their POS synchronously regardless of the types of new words from Chinese text without being pre-segmented. Unlike semi-CRF, in proposed latent discriminative model, LDCRF is applied to generate candidate entities, which accelerates the training speed and decreases the computational cost. The complexity of proposed hidden semi-CRF could be further adjusted by tuning the number of hidden variables and the number of candidate entities from the Nbest outputs of LDCRF model. A new-word-generating framework is proposed for model training and testing, under which the definitions and distributions of new words conform to the ones in real text. The global feature called "Global Fragment Features" for new word identification is adopted. We tested our model on the corpus from SIGHAN-6. Experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags with satisfactory results. The proposed model performs competitively with the state-of-the-art models.

show abstract

Scaling Conditional Random Fields by One-Against-the-Other Decomposition

Cited by 3 publications

References 11 publications

2D Correlative-Chain Conditional Random Fields for Semantic Annotation of Web Objects

2D Correlative-Chain Conditional Random Fields for Semantic Annotation of Web Objects

Integrating unsupervised and supervised word segmentation: The role of goodness measures

Chinese New Word Identification: A Latent Discriminative Model with Global Features

Contact Info

Product

Resources

About