Christian Jacquemin scite author profile

Abstract. Recent developments in computational terminology call for the design of multiple and complementary tools for the acquisition, the structuring and the exploitation of terminological data. This paper proposes to bridge the gap between term acquisition and thesaurus construction by offering a framework for automatic structuring of multi-word candidate terms with the help of corpus-based links between single-word terms.Hypernym links acquired through an information extraction procedure are projected on multi-word terms through the recognition of semantic variations. The induced hierarchy is incomplete but provides an automatic generalization of singleword terms relations to multi-word terms that are pervasive in technical thesauri and corpora.

show abstract

Empirical observation of term variations and principles for their description

Daille

Habert

Jacquemin

et al. 1996

TERM

View full text Add to dashboard Cite

Terms are often supposed not to be prone to variation. Empirical observation of terms in various corpora (telecommunication, physics, medicine) shows, on the contrary, the quantitative and qualitative importance of term variation. We give a precise linguistic description of the rules relating to controlled terms and observed variants and of the constraints on these rules. This description leads to novel means of enriching terminologies via the generation of possible term variants or the simplification of nominal parse trees in order to discover potential variants.

show abstract

Expansion of multi-word terms for indexing and retrieval using morphology and syntax

Jacquemin¹,

Klavans

Tzoukermann³

1997

View full text Add to dashboard Cite

show abstract

Text Image Deblurring Using Kernel Sparsity Prior

Fang

Zhou

Jacquemin

et al. 2020

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Waved aCGH: to smooth or not to smooth

Leprêtre

Villenet

Quief

et al. 2010

View full text Add to dashboard Cite

Array-based comparative genomic hybridization (aCGH) is a powerful tool to detect genomic imbalances in the human genome. The analysis of aCGH data sets has revealed the existence of a widespread technical artifact termed as ‘waves’, characterized by an undulating data profile along the chromosome. Here, we describe the development of a novel noise-reduction algorithm, waves aCGH correction algorithm (WACA), based on GC content and fragment size correction. WACA efficiently removes the wave artifact, thereby greatly improving the accuracy of aCGH data analysis. We describe the application of WACA to both real and simulated aCGH data sets, and demonstrate that our algorithm, by systematically correcting for all known sources of bias, is a significant improvement on existing aCGH noise reduction algorithms. WACA and associated files are freely available as Supplementary Data.

show abstract

Expansion of multi-word terms for indexing and retrieval using morphology and syntax

Jacquemin

Klavans

Tzoukermann³

1997

View full text Add to dashboard Cite

A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed. MotivationTerms are known to be excellent descriptors of the informational content of textual documents (Srinivasan, 1996), but they are subject to numerous linguistic variations. Terms cannot be retrieved properly with coarse text simplification techniques (e.g. stemming); their identification requires precise and efficient NLP techniques. We have developed a domain independent system for automatic term recognition from unrestricted text. The system presented in this paper takes as input a list of controlled terms and a corpus; it detects and marks occurrences of term We would like to thank the NLP Group of Columbia University, Bell Laboratories -Lucent Technologies, and the Institut Universitaire de Technologie de Nantes for their support of the exchange visitor program for the first author. We also thank the Institut de l'Information Scientifique et Technique (INIST-CNRS) for providing us with the agricultural corpus and the associated term list, and Didier Bourigault for providing us with terms extracted from the newspaper corpus through LEXTER (Bourigault, 1993).variants within the corpus. The system takes as input a precompiled (automatically or manually) term list, and transforms it dynamically into a more complete term list by adding automatically generated variants. This method extends the limits of term extraction as currently practiced in the IR community: it takes into account multiple morphological and syntactic ways linguistic concepts are expressed within language. Our approach is a unique hybrid in allowing the use of manually produced precompiled data as input, combined with fully automatic computational methods for generating term expansions. Our results indicate that we can expand term variations at least 30% within a scientific corpus. 2Background and Introduction NLP techniques have been applied to extraction of information from corpora for tasks such as free indexing (extraction of descriptors from corpora), (Metzler and Haas, 1989;Schwarz, 1990;Sheridan and Smeaton, 1992;Strzalkowski, 1996), term acquisition (Smadja and McKeown, 1991;Bourigault, 1993;Justeson and Katz, 1995; Dallle, 1996), or extraction of lin9uistic information e.g. support verbs (Grefenstette and Teufel, 1995), and event structure of verbs (Klavans and Chodorow, 1992). Although useful, these approaches suffer from two weaknesses which we address. First is the issue...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.