Deep Learning Architecture for Complex Word Identification

Hertog, Dirk De; Tack, Anaïs

doi:10.18653/v1/w18-0539

Cited by 16 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our system is ranked directly above TMU [62]. This system is based on the frequency of the target word in a Wikipedia Corpus and a learner corpus subsequently trained on a random forest classifier, as well as a deep learning architecture with word/char embeddings, word length and frequency counts named NLP-CIC [63].…”

Section: Discussionmentioning

confidence: 99%

Lexical Simplification System to Improve Web Accessibility

2021

View full text Add to dashboard Cite

People with intellectual, language and learning disabilities face accessibility barriers when reading texts with complex words. Following accessibility guidelines, complex words can be identified, and easy synonyms and definitions can be provided for them as reading aids. To offer support to these reading aids, a lexical simplification system for Spanish has been developed and is presented in this article. The system covers the complex word identification (CWI) task and offers replacement candidates with the substitute generation and selection (SG/SS) task. These tasks have followed machine learning techniques and contextual embeddings using Easy Reading and Plain Language resources, such as dictionaries and corpora. Additionally, due to the polysemy present in the language, the system provides definitions for complex words, which are disambiguated by a rule-based method supported by a state-of-the-art embedding resource. This system is integrated into a web system that provides an easy way to improve the readability and comprehension of Spanish texts. The results obtained are satisfactory; in the CWI task, better results were obtained than with other systems that used the same dataset. The SG/SS task results are comparable to similar works in the English language and provide a solid starting point to improve this task for the Spanish language. Finally, the results of the disambiguation process evaluation were good when evaluated by a linguistic expert. These findings represent an additional advancement in the lexical simplification of texts in Spanish and in a generic domain using easy-to-read resources, among others, to provide systematic support to compliance with accessibility guidelines.

show abstract

Section: Discussionmentioning

confidence: 99%

Lexical Simplification System to Improve Web Accessibility

2021

View full text Add to dashboard Cite

show abstract

“…The submitted systems mainly use traditional machine learning classifiers(e.g. SVM, Random Forests) with features (Butnaru and Ionescu, 2018;Kajiwara and Komachi, 2018), deep learning methods (Hartmann and Dos Santos, 2018;De Hertog and Tack, 2018) and ensemble methods (Gooding and Kochmar, 2018;Aroyehun et al, 2018). More recently, (Gooding and Kochmar, 2019) propose a new perspective by treating CWI as a sequence labeling task that can detect both complex words and phrases.…”

Section: Complex Word Identificationmentioning

confidence: 99%

DeepBlueAI at SemEval-2021 Task 1: Lexical Complexity Prediction with A Deep Ensemble Approach

Pan¹,

Song²,

Wang³

et al. 2021

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

View full text Add to dashboard Cite

Lexical complexity plays an important role in reading comprehension. lexical complexity prediction (LCP) can not only be used as a part of Lexical Simplification systems, but also as a stand-alone application to help people better reading. This paper presents the winning system we submitted to the LCP Shared Task of SemEval 2021 that capable of dealing with both two subtasks. We first perform fine-tuning on numbers of pre-trained language models (PLMs) with various hyperparameters and different training strategies such as pseudo-labelling and data augmentation. Then an effective stacking mechanism is applied on top of the fine-tuned PLMs to obtain the final prediction. Experimental results on the Complex dataset show the validity of our method and we rank first and second for subtask 2 and 1. Multi-wordsContext1: SEM confirmed many of the observations made by confocal microscopy. Complexity score: 0.64473 Context2: SJ and SVJ carried out confocal microscopy on whole-mounts of stria vascularis. Complexity score: 0.7750 Single word Context1:They shall be to you for a refuge from the avenger of blood. Complexity score: 0.3475 Context2: There will be a pavilion for a shade in the daytime from the heat, and for a refuge and for a shelter from storm and from rain.

show abstract

“…ITEC addresses both the binary and probabilistic classification task for the English and Spanish multilingual datasets (De Hertog and Tack, 2018). They have used 5 different aspects of the target word in the process of feature extractions, namely, word embedding, morphological structure, psychological measures, corpus counts, and topical information.…”

Section: Shared Task Systemsmentioning

confidence: 99%

A Report on the Complex Word Identification Shared Task 2018

Yimam¹,

Biemann²,

Malmasi³

et al. 2018

Proceedings of the Thirteenth Workshop on Innovative Use of NLP For Building Educational Applications

Self Cite

View full text Add to dashboard Cite

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop colocated with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.

show abstract

Deep Learning Architecture for Complex Word Identification

Cited by 16 publications

References 12 publications

Lexical Simplification System to Improve Web Accessibility

Lexical Simplification System to Improve Web Accessibility

DeepBlueAI at SemEval-2021 Task 1: Lexical Complexity Prediction with A Deep Ensemble Approach

A Report on the Complex Word Identification Shared Task 2018

Contact Info

Product

Resources

About