Proceeding of the 6th Conference on Natural Language Learning - COLING-02 2002
DOI: 10.3115/1118853.1118871
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of algorithms for maximum entropy parameter estimation

Abstract: Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
312
0
4

Year Published

2003
2003
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 417 publications
(319 citation statements)
references
References 17 publications
1
312
0
4
Order By: Relevance
“…Thus, we iterate between belief propagation, analytical Maximum Likelihood updates and Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) in an expectation maximization routine, helping scalability and handling computational restrictions. We note that L-BFGS is, to date, the preferred algorithm for fitting maximum-entropy models and CRFs, and has largely surpassed generalized iterative scaling algorithms for use with MEMMs (Malouf (2002)). …”
Section: Parameter and Location Estimationmentioning
confidence: 99%
“…Thus, we iterate between belief propagation, analytical Maximum Likelihood updates and Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) in an expectation maximization routine, helping scalability and handling computational restrictions. We note that L-BFGS is, to date, the preferred algorithm for fitting maximum-entropy models and CRFs, and has largely surpassed generalized iterative scaling algorithms for use with MEMMs (Malouf (2002)). …”
Section: Parameter and Location Estimationmentioning
confidence: 99%
“…Formal details of the disambiguation model are presented in [25]. For training the maximum entropy models, we use an implementation by [14].…”
Section: Maximum Entropy Disambiguation Modelmentioning
confidence: 99%
“…The traditional method for training in CRFs is iterative scaling algorithms [6,21]. Sine those methods are very slow for classification [20], therefore we use quasi-Newton methods, such as L-BFGS [8], which are significantly more efficient [10,20].…”
Section: Training Crfs Letmentioning
confidence: 99%