Hai-Long Trieu scite author profile

Tran

Duong

et al. 2020

Motivation Recent neural approaches on event extraction from text mainly focus on flat events in general domain, while there are less attempts to detect nested and overlapping events. These existing systems are built on given entities and they depend on external syntactic tools. Results We propose an end-to-end neural nested event extraction model named DeepEventMine that extracts multiple overlapping directed acyclic graph structures from a raw sentence. On the top of the Bidirectional Encoder Representations from Transformers (BERT) model, our model detects nested entities and triggers, roles, nested events and their modifications in an end-to-end manner without any syntactic tools. Our DeepEventMine model achieves the new state-of-the-art performance on seven biomedical nested event extraction tasks. Even when gold entities are unavailable, our model can detect events from raw text with promising performance. Availability and implementation Our codes and models to reproduce the results are available at: https://github.com/aistairc/DeepEventMine Supplementary information Supplementary data are available at Bioinformatics online.

Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts

Trieu¹,

Nguyen²,

Misawa³

et al. 2018

Existing biomedical coreference resolution systems depend on features and/or rules based on syntactic parsers. In this paper, we investigate the utility of the stateof-the-art general domain neural coreference resolution system on biomedical texts. The system is an end-to-end system without depending on any syntactic parsers. We also investigate the domain specific features to enhance the system for biomedical texts. Experimental results on the BioNLP Protein Coreference dataset and the CRAFT corpus show that, with no parser information, the adapted system compared favorably with the systems that depend on parser information on these datasets, achieving 51.23% on the BioNLP dataset and 36.33% on the CRAFT corpus in F1 score. In-domain embeddings and domain-specific features helped improve the performance on the BioNLP dataset, but they did not on the CRAFT corpus.

Coreference Resolution in Full Text Articles with BERT and Syntax-based Mention Filtering

Nguyen²,

Nguyen

et al. 2019

This paper describes our system developed for the coreference resolution task of the CRAFT Shared Tasks 2019. The CRAFT corpus is more challenging than other existing corpora because it contains full text articles. We have employed an existing span-based state-of-theart neural coreference resolution system as a baseline system. We enhance the system with two different techniques to capture longdistance coreferent pairs. Firstly, we filter noisy mentions based on parse trees with increasing the number of antecedent candidates. Secondly, instead of relying on the LSTMs, we integrate the highly expressive language model-BERT into our model. Experimental results show that our proposed systems significantly outperform the baseline. The best performing system obtained F-scores of 44%, 48%, 39%, 49%, 40%, and 57% on the test set with B 3 , BLANC, CEAFE, CEAFM, LEA, and MUC metrics, respectively. Additionally, the proposed model is able to detect coreferent pairs in long distances, even with a distance of more than 200 sentences.

Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Tran

Ittoo

et al. 2019

Phrase-based machine translation (MT) systems require large bilingual corpora for training. Nevertheless, such large bilingual corpora are unavailable for most language pairs in the world, causing a bottleneck for the development of MT. For the Asian language pairs—Japanese, Indonesian, Malay paired with Vietnamese—they are also not excluded from the case, in which there are no large bilingual corpora on these low-resource language pairs. Furthermore, although the languages are widely used in the world, there is no prior work on MT, which causes an issue for the development of MT on these languages. In this article, we conducted an empirical study of leveraging additional resources to improve MT for the Asian low-resource language pairs: translation from Japanese, Indonesian, and Malay to Vietnamese. We propose an innovative approach that lies in two strategies of building bilingual corpora from comparable data and phrase pivot translation on existing bilingual corpora of the languages paired with English. Bilingual corpora were built from Wikipedia bilingual titles to enhance bilingual data for the low-resource languages. Additionally, we introduced a combined model of the additional resources to create an effective solution to improve MT on the Asian low-resource languages. Experimental results show the effectiveness of our systems with the improvement of +2 to +7 BLEU points. This work contributes to the development of MT on low-resource languages, especially opening a promising direction for the progress of MT on the Asian language pairs.

Towards Developing Dialogue Systems with Entertaining Conversations

Iida

Bao

et al. 2017

This paper explores a novel approach to developing a dialogue system that is able to make entertaining conversations with users. It proposes a method to improve the current goal-driven dialogue systems which support users for specific tasks while satisfying users' goals with entertaining conversations. It then develops a dialogue system in which a set of features are considered to generate entertaining conversations, while reasonably prolonging the original too short dialogue. The game refinement measure is employed for the assessment of attractiveness since the conversations in dialogue systems can be seen as the process by which a player creates shoots or moves to win a game. The dialogues generated by the proposed method are evaluated by human subjects. The results confirm the effectiveness of the proposed method. The present idea can be a promising way to realize dialogue systems with entertaining conversations although further investigations are needed.