Thitidej Tularak scite author profile

Thitidej Tularak

2Publications

26Citation Statements Received

46Citation Statements Given

How they've been cited

How they cite others

Affiliations

Cornell University

Publications

Order By: Most citations

Classification of Homogeneous Data With Large Alphabets

Kelly

Wagner

Tularak

et al. 2013

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

Given training sequences generated by two distinct, but unknown, distributions sharing a common alphabet, we study the problem of determining whether a third test sequence was generated according to the first or second distribution using only the training data. To better model sources such as natural language, for which the underlying distributions are difficult to learn, we allow the alphabet size to grow and therefore the probability distributions to change with the blocklength. Our primary focus is the situation in which the underlying probabilities are all of the same order, and in this regime we give conditions on the alphabet growth rate and distributions guaranteeing the existence of universally consistent tests, i.e. tests having a probability of error tending to zero with the blocklength for any underlying distributions. We show that some commonly used statistical tests are universally consistent provided the alphabet is sub-linear but these tests are inconsistent for linear growth rates. We then propose a classifier that is universally consistent with up-to quadratic alphabet growth and that no classifier can handle the case in which the alphabet grows quadratically or faster. If the tester is given the underlying distributions in place of the training data, we prove that consistent testing is possible regardless of the growth of the underlying alphabet. Our results are then used to illuminate the problem of classifying arbitrary (i.e. non-homogeneous) distributions on growing alphabets.

show abstract

Universal hypothesis testing in the learning-limited regime

Kelly

Tularak

Wagner

et al. 2010

View full text Add to dashboard Cite

Given training sequences generated by two distinct, but unknown distributions sharing a common alphabet, we seek a classifier that can correctly decide whether a third test sequence is generated by the first or second distribution using only the training data. To model 'limited learning' we allow the alphabet size to grow and therefore probability distributions to change with the blocklength. We prove that a natural choice, namely a generalized likelihood ratio test, is universally consistent (has a probability of error tending to zero with the blocklength for all underlying distributions) when the alphabet size is sub-linear in the blocklength, but inconsistent for linear alphabet growth. For up-to quadratic alphabet growth, in a regime where all probabilities are of the same order, we prove the universally consistency of a new test and show there are no such tests when the alphabet grows quadratically or faster.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.