Jean Nonnemaker scite author profile

Jean Nonnemaker

4Publications

28Citation Statements Received

14Citation Statements Given

How they've been cited

How they cite others

Affiliations

Lehigh University

Publications

Order By: Most citations

Using synthetic data safely in classification

Nonnemaker

Baird

2009

View full text Add to dashboard Cite

When is it safe to use synthetic data in supervised classification? Trainable classifier technologies require large representative training sets consisting of samples labeled with their true class. Acquiring such training sets is difficult and costly. One way to alleviate this problem is to enlarge training sets by generating artificial, synthetic samples. Of course this immediately raises many questions, perhaps the first being "Why should we trust artificially generated data to be an accurate representative of the real distributions?" Other questions include "When will training on synthetic data work as well as -or better than training on real data ?".We distinguish between sample space (the set of real samples), generator space (all samples that can be generated synthetically), and finally, feature space (the set of samples in terms of finite numerical values). In this paper, we discuss a series of experiments, in which we produced synthetic data in generator space, that is, by convex interpolation among the generating parameters for samples and showed we could amplify real data to produce a classifier that is as accurate as a classifier trained on real data. Specifically, we have explored the feasibility of varying the generating parameters for Knuth's Metafont system to see if previously unseen fonts could also be recognized.

show abstract

<title>Versatile document image content extraction</title>

Baird

Moll

Nonnemaker

et al. 2006

View full text Add to dashboard Cite

We offer a preliminary report on a research program to investigate versatile algorithms for document image content extraction, that is locating regions containing handwriting, machine-print text, graphics, line-art, logos, photographs, noise, etc. To solve this problem in its full generality requires coping with a vast diversity of document and image types. Automatically trainable methods are highly desirable, as well as extremely high speed in order to process large collections. Significant obstacles include the expense of preparing correctly labeled ("ground-truthed") samples, unresolved methodological questions in specifying the domain (e.g. what is a representative collection of document images?), and a lack of consensus among researchers on how to evaluate content-extraction performance. Our research strategy emphasizes versatility first: that is, we concentrate at the outset on designing methods that promise to work across the broadest possible range of cases. This strategy has several important implications: the classifiers must be trainable in reasonable time on vast data sets; and expensive ground-truthed data sets must be complemented by amplification using generative models. These and other design and architectural issues are discussed. We propose a trainable classification methodology that marries k-d trees and hash-driven table lookup and describe preliminary experiments.

show abstract

Demis

Jiang

Kessler

Nonnemaker

2002

View full text Add to dashboard Cite

Modern interaction systems are usually event-driven. New input devices often require new event types, and handling input from the user becomes increasingly more complex. Frequently, the WIMP (Windows, Icons, Menus, Pointer) paradigm widely used today is not suitable for interactive applications, such a virtual reality applications, that use more than the standard mouse and keyboard input devices.In this paper, we present the design and implementation of the Dynamic Event Model for Interactive System (DEMIS). DEMIS is a middleware between the operating system and the application that supports various input device events while using generic event recognition to detect composite events.

show abstract

Demis

Jiang¹,

Kessler²,

Nonnemaker³

2002

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jean Nonnemaker

Using synthetic data safely in classification

<title>Versatile document image content extraction</title>

Demis

Demis

Contact Info

Product

Resources

About