Robert Malouf scite author profile

Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limitedmemory variable metric algorithm outperformed the other choices.

show abstract

Morphological Organization: The Low Conditional Entropy Conjecture

Ackerman¹,

Malouf²

2013

lan

185

150

View full text Add to dashboard Cite

Crosslinguistically, inflectional morphology exhibits a spectacular range of complexity in both the structure of individual words and the organization of systems that words participate in. We distinguish two dimensions in the analysis of morphological complexity. enumerative complexity (E-complexity) reflects the number of morphosyntactic distinctions that languages make and the strategies employed to encode them, concerning either the internal composition of words or the arrangement of classes of words into inflection classes. This, we argue, is constrained by integrative complexity (I-complexity). The I-complexity of an inflectional system reflects the difficulty that a paradigmatic system poses for language users (rather than lexicographers) in information-theoretic terms. This becomes clear by distinguishing average paradigm entropy from average conditional entropy . The average entropy of a paradigm is the uncertainty in guessing the realization for a particular cell of the paradigm of a particular lexeme (given knowledge of the possible exponents). This gives one a measure of the complexity of a morphological system—systems with more exponents and more inflection classes will in general have higher average paradigm entropy—but it presupposes a problem that adult native speakers will never encounter. In order to know that a lexeme exists, the speaker must have heard at least one word form, so in the worst case a speaker will be faced with predicting a word form based on knowledge of one other word form of that lexeme. Thus, a better measure of morphological complexity is the average conditional entropy, the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected cell. This is the I-complexity of paradigm organization. Viewed from this information-theoretic perspective, languages that appear to differ greatly in their E-complexity—the number of exponents, inflectional classes, and principal parts—can actually be quite similar in terms of the challenge they pose for a language user who already knows how the system works. We adduce evidence for this hypothesis from three sources: a comparison between languages of varying degrees of E-complexity, a case study from the particularly challenging conjugational system of Chiquihuitlán Mazatec, and a Monte Carlo simulation modeling the encoding of morphosyntactic properties into formal expressions. The results of these analyses provide evidence for the crucial status of words and paradigms for understanding morphological organization.

show abstract

3 Parts and wholes: Implicative patterns in inflectional paradigms

Ackerman¹,

Blevins²,

Malouf³

2009

219

View full text Add to dashboard Cite

Humans show an amazing ability to produce novel words based on previous experience. What analogical processes are at work in this process, and how do analogical generalizations emerge from complex morphological systems? This chapter addresses these questions with new quantitative measures. Words are construed as recombinant gestalts. The predictive value of particular words in relation to others is calculated in terms of measures of conditional entropy. When applied to Tundra Nenets nominal paradigms, the model captures central aspects of morphological organization and learning.

show abstract

Taking sides: user classification for informal online political discourse

Malouf

Mullen

2008

View full text Add to dashboard Cite

(Research paper) PurposeTo evaluate and extend existing natural language processing techniques into the domain of informal online political discussions. Design/methodology/approachA database of postings from a U.S. political discussion site was collected, along with self-reported political orientation data for the users. A variety of sentiment analysis, text classification, and social network analysis methods were applied to the postings and evaluated against the users' self-descriptions. FindingsPurely text-based methods performed poorly, but could be improved using techniques which took into account the users' position in the online community. Research limitationsThe techniques we applied here are fairly simple, and more sophisticated learning algorithms may yield better results for text-based classification. Practical implicationsThis work suggests that social network analysis is an important tool for performing natural language processing tasks with informal web texts.

show abstract

Acoustic differences in morphologically-distinct homophones

Seyfarth

Garellek

Gillingham

et al. 2017

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Robert Malouf

A comparison of algorithms for maximum entropy parameter estimation

Morphological Organization: The Low Conditional Entropy Conjecture

3 Parts and wholes: Implicative patterns in inflectional paradigms

Taking sides: user classification for informal online political discourse

Acoustic differences in morphologically-distinct homophones

Contact Info

Product

Resources

About