Olivier Ferret scite author profile

Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system despite it not being intrinsically linked to the notion of Transformers. While this system is thought to achieve a good balance between the flexibility of characters and the efficiency of full words, using predefined wordpiece vocabularies from the general domain is not always suitable, especially when building models for specialized domains (e.g., the medical domain). Moreover, adopting a wordpiece tokenization shifts the focus from the word level to the subword level, making the models conceptually more complex and arguably less convenient in practice. For these reasons, we propose CharacterBERT, a new variant of BERT that drops the wordpiece system altogether and uses a Character-CNN module instead to represent entire words by consulting their characters. We show that this new model improves the performance of BERT on a variety of medical domain tasks while at the same time producing robust, word-level, and open-vocabulary representations.

show abstract

Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers

Tourille¹,

Ferret²,

Névéol³

et al. 2017

View full text Add to dashboard Cite

show abstract

Generative Event Schema Induction with Entity Disambiguation

Nguyen

Tannier

Ferret

et al. 2015

View full text Add to dashboard Cite

This paper presents a generative model to event schema induction. Previous methods in the literature only use head words to represent entities. However, elements other than head words contain useful information. For instance, an armed man is more discriminative than man. Our model takes into account this information and precisely represents it using probabilistic topic distributions. We illustrate that such information plays an important role in parameter estimation. Mostly, it makes topic distributions more coherent and more discriminative. Experimental results on benchmark dataset empirically confirm this enhancement.

show abstract

Multimodal Entity Linking for Tweets

Adjali

Besançon

Ferret

et al. 2020

View full text Add to dashboard Cite

In many information extraction applications, entity linking (EL) has emerged as a crucial task that allows leveraging information about named entities from a knowledge base. In this paper, we address the task of multimodal entity linking (MEL), an emerging research field in which textual and visual information is used to map an ambiguous mention to an entity in a knowledge base (KB). First, we propose a method for building a fully annotated Twitter dataset for MEL, where entities are defined in a Twitter KB. Then, we propose a model for jointly learning a representation of both mentions and entities from their textual and visual contexts. We demonstrate the effectiveness of the proposed model by evaluating it on the proposed dataset and highlight the importance of leveraging visual information when it is available.

show abstract

Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata

Rudnik

Ehrhart

Ferret

et al. 2019

View full text Add to dashboard Cite

News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and to compare with past similar events. However, searching for precise facts described in stories is hard. In this paper, we propose a general method that leverages the Wikidata knowledge base to produce semantic annotations of news articles. Next, we describe a semantic search engine that supports both keyword based search in news articles and structured data search providing filters for properties belonging to specific event schemas that are automatically inferred.

show abstract

Using collocations for topic segmentation and link detection

Ferret

2002

View full text Add to dashboard Cite

We present in this paper a method for achieving in an integrated way two tasks of topic analysis: segmentation and link detection. This method combines word repetition and the lexical cohesion stated by a collocation network to compensate for the respective weaknesses of the two approaches. We report an evaluation of our method for segmentation on two corpora, one in French and one in English, and we propose an evaluation measure that specifically suits that kind of systems.

show abstract

LIMSI-COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of classifiers

Tourille¹,

Ferret²,

Névéol³

et al. 2016

View full text Add to dashboard Cite

show abstract

LIMSI-COT at SemEval-2017 Task 12: Neural Architecture for Temporal Information Extraction from Clinical Narratives

Tourille¹,

Ferret²,

Tannier³

et al. 2017

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Olivier Ferret

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers

Generative Event Schema Induction with Entity Disambiguation

Multimodal Entity Linking for Tweets

Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata

Using collocations for topic segmentation and link detection

LIMSI-COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of classifiers

LIMSI-COT at SemEval-2017 Task 12: Neural Architecture for Temporal Information Extraction from Clinical Narratives

Contact Info

Product

Resources

About