Christian Siefkes scite author profile

Christian Siefkes

5Publications

77Citation Statements Received

14Citation Statements Given

How they've been cited

109

How they cite others

Affiliations

Mercator Research Institute on Global Commons and Climate Change, Freie Universität Berlin

Publications

Order By: Most citations

Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering

Siefkes

Assis²,

Chhabra

et al. 2004

View full text Add to dashboard Cite

Abstract. Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus.

show abstract

Spam Filtering using a Markov Random Field Model with Variable Weighting Schemas

Chhabra

Yerazunis²,

Siefkes

View full text Add to dashboard Cite

show abstract

An Overview and Classification of Adaptive Approaches to Information Extraction

Siefkes

Siniakov

2005

View full text Add to dashboard Cite

Incremental Information Extraction Using Tree-Based Context Representations

Siefkes

2005

View full text Add to dashboard Cite

Abstract. The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic processing. Providing annotated training data to adapt a trainable IE system to a new domain requires a considerable amount of work. To address this, we explore incremental learning. Here training documents are annotated sequentially by a user and immediately incorporated into the extraction model. Thus the system can support the user by proposing extractions based on the current extraction model, reducing the workload of the user over time. We introduce an approach to modeling IE as a token classification task that allows incremental training. To provide sufficient information to the token classifiers, we use rich, tree-based context representations of each token as feature vectors. These representations make use of the heuristically deduced document structure in addition to linguistic and semantic information. We consider the resulting feature vectors as ordered and combine proximate features into more expressive joint features, called "Orthogonal Sparse Bigrams" (OSB). Our results indicate that this setup makes it possible to employ IE in an incremental fashion without a serious performance penalty.

show abstract

Peer-Produktion – der unerwartete Aufstieg einer commonsbasierten Produktionsweise

Siefkes¹

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.