2019
DOI: 10.13053/cys-23-3-3271
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection

Abstract: It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints. However, the information needed to solve a given task can vary, and simply using the output of the final layer is not necessarily sufficient. Moreover, to our knowledge, exploiting large language representation models to detect grammatical errors has not yet been studied. In this work, we investigate the effect of utilizing information no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(23 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…BERT BERT can integrate information in raw corpora (BooksCorpus and English Wikipedia) while considering task-specific information contained in the target dataset. Kaneko and Komachi (2019) use BERT contextualized representation to achieve state-of-the-art results for word-based GED tasks. In addition to improving results in the GED task, BERT (Devlin et al, 2018) has been shown to be a powerful feature extractor for various other tasks.…”
Section: Sentence Embeddingmentioning
confidence: 99%
“…BERT BERT can integrate information in raw corpora (BooksCorpus and English Wikipedia) while considering task-specific information contained in the target dataset. Kaneko and Komachi (2019) use BERT contextualized representation to achieve state-of-the-art results for word-based GED tasks. In addition to improving results in the GED task, BERT (Devlin et al, 2018) has been shown to be a powerful feature extractor for various other tasks.…”
Section: Sentence Embeddingmentioning
confidence: 99%
“…BERT BERT can integrate information in raw corpora (BooksCorpus and English Wikipedia) while considering task-specific information contained in the target dataset. Kaneko and Komachi (2019) use BERT contextualized representation to achieve state-of-the-art results for word-based GED tasks. In addition to improving results in the GED task, BERT has been shown to be a powerful feature extractor for various other tasks.…”
Section: Sentence Embeddingmentioning
confidence: 99%
“…The goal of their study was to predict the token-level labels on a sentence-level using the attention mechanism for zero-shot sequence labeling. Kaneko and Komachi (2019) proposed a model of applying attention to each layer of BERT for GED and achieved state-of-the-art results in wordlevel GED tasks. Our BERT model predicts grammatical quality on a sentence-level for re-ranking.…”
Section: Related Workmentioning
confidence: 99%
“…Bidirectional Encoder Representations from Transformer (BERT) (Devlin et al, 2019) can consider information of large-scale raw corpora and task specific information by fine-tuning on the target task corpora. Moreover, BERT is known to be effective in the distinction of grammatical sentences from ungrammatical sentences (Kaneko and Komachi, 2019). They proposed a grammatical error detection (GED) model based on BERT that achieved state-of-the-art results in word-level GED tasks.…”
Section: Introductionmentioning
confidence: 99%