A Report on the Complex Word Identification Shared Task 2018

Yimam, Seid Muhie; Biemann, Chris; Malmasi, Shervin; Paetzold, Gustavo Henrique; Specia, Lucia; Štajner, Sanja; Tack, Anaïs; Zampieri, Marcos

doi:10.18653/v1/w18-0507

Cited by 92 publications

(108 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…word length and frequency) [17]. Among these markers, especially frequency has been treated in detail and ascertained multiple times to have a strong relation to word difficulty based on a variety of evaluation methods ranging from decision trees to deep recurrent neural networks [18], [19] and not only in English but other languages as well [20], [21]. This relation is possibly due to the fact that frequent use of a word or word-family can enhance peoples' familiarity to it (e.g.…”

Section: Background and Related Workmentioning

confidence: 99%

An Algorithm for Automatic Collation of Vocabulary Decks Based on Word Frequency

Yücel

Supitayakul

Monden

et al. 2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

This study focuses on computer based foreign language vocabulary learning systems. Our objective is to automatically build vocabulary decks with desired levels of relative difficulty relations. To realize this goal, we exploit the fact that word frequency is a good indicator of vocabulary difficulty. Subsequently, for composing the decks, we pose two requirements as uniformity and diversity. Namely, the difficulty level of the cards in the same deck needs to be uniform enough so that they can be grouped together and difficulty levels of the cards in different decks need to be diverse enough so that they can be grouped in different decks. To assess uniformity and diversity, we use rank-biserial correlation and propose an iterative algorithm, which helps in attaining desired levels of uniformity and diversity based on word frequency in daily use of language. In experiments, we employed a spaced repetition flashcard software and presented users various decks built with the proposed algorithm, which contain cards from different content types. From users' activity logs, we derived several behavioral variables and examined the polyserial correlation between these variables and difficulty levels across different word classes. This analysis confirmed that the decks compiled with the proposed algorithm induce an effect on behavioral variables in line with the expectations. In addition, a series of experiments with decks involving varying content types confirmed that this relation is independent of word class.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

An Algorithm for Automatic Collation of Vocabulary Decks Based on Word Frequency

Yücel

Supitayakul

Monden

et al. 2020

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Simplification can be targeted by identifying complex words (e.g. Paetzold and Specia, 2016;Yimam et al, 2018), and then performing lexical simplification (e.g. Glavaš andŠtajner, 2015;Glavaš and Vulić, 2018;Horn et al, 2014;Kriz et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Metaphors in Text Simplification: To change or not to change, that is the question

Clausen¹,

Năstase

2019

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

View full text Add to dashboard Cite

We present an analysis of metaphors in news text simplification. Using features that capture general and metaphor specific characteristics, we test whether we can automatically identify which metaphors will be changed or preserved, and whether there are features that have different predictive power for metaphors or literal words. The experiments show that the Age of Acquisition is the most distinctive feature for both metaphors and literal words. Features that capture Imageability and Concreteness are useful when used alone, but within the full set of features they lose their impact. Frequency of use seems to be the best feature to differentiate metaphors that should be changed and those to be preserved.

show abstract

“…São vários os trabalhos futuros que antevemos para essa pesquisa: avaliar outros métodos de AM para a tarefa além da Regressão Logística, por exemplo,Árvores de decisão, Random Forest, Bagging, Boosting, SVM; incrementar o número de features da abordagem baseada em AM, usando tamanho das palavras, número de sílabas, número de sentidos e sinônimos em tesauros; além de avaliar modelos avançados de deep learning, que são uma tendência daárea para a tarefa. Yimam, S. M., Biemann, C., Malmasi, S., Paetzold, G. H., Specia, L.,Štajner, S., Tack, A., and Zampieri, M. (2018). A report on the complex word identification shared task 2018. arXiv preprint arXiv:1804.09132.…”

Section: Conclusões E Trabalhos Futurosunclassified

“…There are some tools for Brazilian Portuguese such as the Flesch Index [30], which is adapted for Portuguese and used in the Microsoft Word, and mainly the Coh-Metrix-Port and AIC, developed in the PorSimples project [3], whose goal is to simplify Web texts for people with poor literacy levels. These tools, however, do not meet the needs of educators in the classroom: there are no classifiers able to discriminate the level of complexity of each year focus of this study -3rd to 7th years, using metrics of the many language levels.…”

Section: Introductionmentioning

confidence: 99%

“…Métodos being focused on is replaced by a synonym, which can ask for adjustments in the writing of the words of the sentence, such as the adequacy of gender and/or number. In recent years there has been great activity in this field of research, especially for English, [17,4,29,31,30] but also for other languages such as Japanese and multilingual and cross-lingual scenarios [12,13,32,30]. Only two studies focus on children [12,13].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Adaptação lexical automática em textos informativos para o Ensino Fundamental

Hartmann¹

View full text Add to dashboard Cite

O ensino de leitura e compreensão de textos nas escolas deve ser natural, ou seja, deve respeitar a individualidade do aluno e ao mesmo tempo proporcionar motivação para tais atividades. Qualquer problema nas etapas de desenvolvimento dessas habilidades pode acarretar bloqueios ou desinteresses dado que cada criança tem seu próprio ritmo. Adequar o nível de complexidade de um texto à capacidade de leitura de um aluno é determinante para que ele progrida a seu modo e atinja os níveis de compreensão leitora esperados para uma série ou ciclo. O objetivo geral deste trabalho é alavancar a área de pesquisa em Adaptação Lexical na Língua Portuguesa, atuando na Simplificação e Elaboração Lexical, para permitir que um texto informativo de maior complexidade possa ser adaptado a alunos do Ensino Fundamental, que poderão compreender melhor seu conteúdo. As contribuições deste trabalho são várias, dentre elas: (i) Compilação de três léxicos delimitando a complexidade lexical ao longo dos anos escolares por meio de seleção de dicionários sugeridos pelo Programa Nacional do Livro Didático; (ii) Compilação de um córpus contendo 7.645 textos do gênero informativo escritos para crianças cursando o Ensino Fundamental; (iii) Compilação de um córpus com 36.413 legendas de filmes e séries dos gêneros Família e Animação que ilustra o material ouvido por crianças no dia a dia; (iv) Criação de um repositório com métricas psicolinguísticas para 26.874 palavras do Português Brasileiro; (v) Criação do dataset SIMPLEX-PB, contendo 1.582 instâncias com sentenças e palavras complexas identificadas. O dataset também conta com a definição curta de cada palavra complexa, a indicação binária da palavra complexa ser termo técnico, 38 features de complexidade lexical para cada palavra complexa do recurso e listas de sinônimos para a palavra complexa, ordenadas pelas crianças; (vi) Busca de padrões de Elaboração Lexical e de ocorrências de elaboração das palavras complexas do SIMPLEX-PB (cerca de 50% das elaborações encontradas foram realizadas em palavras marcadas como termo técnico); (vii) Avaliação de métodos para Identificação de Palavras Complexas para o Português Brasileiro; e (viii) Avaliação dos métodos de Identificação de Palavras Complexas na tarefa de Simplificação Lexical, considerando crianças do Ensino Fundamental como público alvo. Os métodos resultantes desse projeto de doutorado, chamado de Adap2Kids, poderão ser utilizados no apoio à geração de conteúdo para crianças e também podem ser aplicados no material apresentado em sala de aula, de forma a adequar o seu léxico para as necessidades de cada ciclo escolar do Ensino Fundamental.

show abstract

A Report on the Complex Word Identification Shared Task 2018

Cited by 92 publications

References 32 publications

An Algorithm for Automatic Collation of Vocabulary Decks Based on Word Frequency

An Algorithm for Automatic Collation of Vocabulary Decks Based on Word Frequency

Metaphors in Text Simplification: To change or not to change, that is the question

Adaptação lexical automática em textos informativos para o Ensino Fundamental

Contact Info

Product

Resources

About