Michael Collins scite author profile

In this paper we attempt to determine the effectiveness of using entropy, as defined in NIST SP800-63, as a measurement of the security provided by various password creation policies. This is accomplished by modeling the success rate of current password cracking techniques against real user passwords. These data sets were collected from several different websites, the largest one containing over 32 million passwords. This focus on actual attack methodologies and real user passwords quite possibly makes this one of the largest studies on password security to date. In addition we examine what these results mean for standard password creation policies, such as minimum password length, and character set requirements.

show abstract

Fusion of Detected Objects in Text for Visual Question Answering

Alberti¹,

Ling²,

Collins³

et al. 2019

140

View full text Add to dashboard Cite

To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language. The "Bounding Boxes in Text Transformer" (B2T2) also leverages referential information binding words to portions of the image in a single unified architecture. B2T2 is highly effective on the Visual Commonsense Reasoning benchmark 1 , achieving a new state-of-the-art with a 25% relative reduction in error rate compared to published baselines and obtaining the best performance to date on the public leaderboard (as of May 22, 2019). A detailed ablation analysis shows that the early integration of the visual features into the text analysis is key to the effectiveness of the new architecture. A reference implementation of our models is provided 2 .

show abstract

Information Density and Dependency Length as Complementary Cognitive Models

Collins¹

2013

J Psycholinguist Res

View full text Add to dashboard Cite

Certain English constructions permit two syntactic alternations. (1) a. I looked up the number. b. I looked the number up. (2) a. He is often at the office. b. He often is at the office. This study investigates the relationship between syntactic alternations and processing difficulty. What cognitive mechanisms are responsible for our attraction to some alternations and our aversion to others?This article reviews three psycholinguistic models of the relationship between syntactic alternations and processing: Maximum Per Word Surprisal (building on the ideas of Hale, in Proceedings of the 2nd Meeting of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, Pittsburgh, PA, pp 159-166, 2001), Uniform Information Density (UID) (Levy and Jaeger in Adv Neural Inf Process Syst 19:849-856, 2007; inter alia), and Dependency Length Minimization (DLM) (Gildea and Temperley in Cognit Sci 34:286-310, 2010). Each theory makes predictions about which alternations native speakers should favor. Subjects were recruited using Amazon Mechanical Turk and asked to judge which of two competing syntactic alternations sounded more natural. Logistic regression analysis on the resulting data suggests that both UID and DLM are powerful predictors of human preferences. We conclude that alternations that approach uniform information density and minimize dependency length are easier to process than those that do not.

show abstract

Tutorial: Machine Learning Methods in Natural Language Processing

Collins¹

2003

View full text Add to dashboard Cite

Statistical or machine learning approaches have become quite prominent in the Natural Language Processing literature. Common techniques include generative models such as Hidden Markov Models or Probabilistic Context-Free Grammars, and more general noisy-channel models such as the statistical approach to machine translation pioneered by researchers at IBM in the early 90s. Recent work has considered discriminative methods such as (conditional) markov random fields, or large-margin methods. This tutorial will describe several of these techniques. The methods will be motivated through a number of natural language problems: from part-of-speech tagging and parsing, to machine translation, dialogue systems and information extraction problems. I will also concentrate on links to the COLT and kernel methods literature: for example covering kernels over the discrete structures found in NLP, online algorithms for NLP problems, and the issues in extending generalization bounds from classification problems to NLP problems such as parsing.

show abstract

Implementing Criterion-Referenced Assessment

Wong

Briguglio

Singh

et al. 2007

View full text Add to dashboard Cite

Spoken language translation with MID-90's technology: a case study

Rayner¹,

Bretan²,

Carter³

et al. 1993

View full text Add to dashboard Cite

Proceedings of the 2003 conference on Empirical methods in natural language processing -

Collins¹,

Steedman

2003

View full text Add to dashboard Cite

Genre and Form in Working-Class Life Writing, from Haymarket to the New Deal

Collins

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael Collins

Testing metrics for password creation policies by attacking large sets of revealed passwords

Fusion of Detected Objects in Text for Visual Question Answering

Information Density and Dependency Length as Complementary Cognitive Models

Tutorial: Machine Learning Methods in Natural Language Processing

Implementing Criterion-Referenced Assessment

Spoken language translation with MID-90's technology: a case study

Proceedings of the 2003 conference on Empirical methods in natural language processing -

Genre and Form in Working-Class Life Writing, from Haymarket to the New Deal

Contact Info

Product

Resources

About