Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Lea 2019
DOI: 10.18653/v1/k19-2007
|View full text |Cite
|
Sign up to set email alerts
|

HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding

Abstract: This paper describes our system (HIT-SCIR) for the CoNLL 2019 shared task: Cross-Framework Meaning Representation Parsing. We extended the basic transition-based parser with two improvements: a) Efficient Training by realizing stack LSTM parallel training; b) Effective Encoding via adopting deep contextualized word embeddings BERT (Devlin et al., 2019). Generally, we proposed a unified pipeline to meaning representation parsing, including framework-specific transitionbased parsers, BERT-enhanced word represent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(62 citation statements)
references
References 32 publications
0
62
0
Order By: Relevance
“…The table compares ERG parsing results to a selection of 'real' submissions to the shared task, viz. the top performers within each framework and for the task overall: HIT-SCIR (Che et al, 2019), Peking (Chen et al, 2019) 3 , SJTU-NICT (Bai and Zhao, 2019), and SUDA-Alibaba (Zhang et al, 2019). In contrast to the ERG parser, all of these systems are purely data-driven, in the sense that they do not incorporate manually curated linguistic knowledge (beyond finite-state tokenization rules, maybe) but rather learn all their parameters exclusively from the shared task training data.…”
Section: Resultsmentioning
confidence: 99%
“…The table compares ERG parsing results to a selection of 'real' submissions to the shared task, viz. the top performers within each framework and for the task overall: HIT-SCIR (Che et al, 2019), Peking (Chen et al, 2019) 3 , SJTU-NICT (Bai and Zhao, 2019), and SUDA-Alibaba (Zhang et al, 2019). In contrast to the ERG parser, all of these systems are purely data-driven, in the sense that they do not incorporate manually curated linguistic knowledge (beyond finite-state tokenization rules, maybe) but rather learn all their parameters exclusively from the shared task training data.…”
Section: Resultsmentioning
confidence: 99%
“…The overall results of our system using the official MRP metric are present in Table 3. All reported scores are macro-averaged F1 scores of all (Che et al, 2019) 90.41% 2 70.85% 3 69.86% 1 77.61% 2 79.37% 1 12.40% 1 86.20% 1 SJTU-NICT (Li et al, 2019) 91.50% 1 71.24% 2 68.73% 2 77.62% 1 77.74% 2 9.40% 2 85.27% 2 SUDA-Alibaba (Zhang et al, 2019b) 86.01% 5 69.50% 4 68.24% 3 77.11% 3 76.85% 3 8.16% 3 83.96% 3 Saarland (Donatelli et al, 2019) 86.70% 4 71.33% 1 61.11% 5 75.08% 5 75.01% 4 -81.87% 4 Table 3: Overall results, macro-averaged on all frameworks. We present F1 scores and ranks compared to official ST submissions.…”
Section: Resultsmentioning
confidence: 99%
“…In contrast to a transition-based system, we build the graph in a layer-wise fashion, with operations joined in groups. (Che et al, 2019) 92.65% 3 93.00% 4 95.33% 3 99.28% 1 92.54% 2 -95.08% 2 SJTU-NICT (Li et al, 2019) 93.26% 2 94.89% 3 95.49% 2 99.27% 2 92.39% 3 -95.50% 1 SUDA-Alibaba (Zhang et al, 2019b) 91.13% 6 90.27% 8 91.51% 7 98.16% 8 89.84% 7 -92.26% 7 Saarland (Donatelli et al, 2019) 85.87% 8 (Che et al, 2019) 96.03% 3 89.30% 5 93.10% 1 99.12% 1 79.65% 3 -90.55% 4 SJTU-NICT (Li et al, 2019) 96.30% 1 93.14% 4 91.57% 5 99.11% 2 80.27% 1 -91.19% 3 SUDA-Alibaba (Zhang et al, 2019b) 86.55% 8 84.51% 8 85.03% 8 97.51% 8 75.22% 7 -85.56% 8 Saarland (Donatelli et al, 2019) 93.50% 6 (Che et al, 2019) 85.23% 5 89.45% 3 89.54% 2 94.29% 2 88.77% 3 -90.75% 2 SJTU-NICT (Li et al, 2019) 87.72% 3 89.42% 4 77.53% 4 93.37% 3 87.82% 4 -89.90% 3 SUDA-Alibaba (Zhang et al, 2019b) 89.94% 2 91.20% 1 89.72% 1 94.86% 1 89.66% 2 -91.85% 1 Saarland (Donatelli et al, 2019) 86.31% 4 (Che et al, 2019) 100.00% 1 --95.36% 3 72.66% 1 61.98% 1 81.67% 1 SJTU-NICT (Li et al, 2019) 95.31% 5 --96.36% 1 65.56% 3 47.00% 2 77.80% 3 SUDA-Alibaba (Zhang et al, 2019b) 99.56% 3 --95.02% 4 67.74% 2 40.80% 3 78.43% 2 Saarland (Donatelli et al, 2019) 80 (Che et al, 2019) 78.15% 7 82.51% 2 71.33% 5 -63.21% 2 -72.94% 2 SJTU-NICT (Li et al, 2019) 84.88% 4 78.78% 5 79.08% 1 -62.64% 3 -71.97% 3 SUDA-Alibaba (Zhang et al, 2019b) 62.86% 9 81.53% 4 74.96% 3 -61.78% 5 -71.72% 5 Saarland (Donatelli et al, 2019) 86.89% 1 74.02% 6 40.79% 7 -62.16% 4 -66.72% 6 (e) AMR framework Table 4: Results on individual frameworks. We present F1 scores and ranks compared to official ST submissions.…”
Section: Discussionmentioning
confidence: 99%
“…We use dropout (Srivastava et al, 2014) between MLP layers, and recurrent dropout (Gal and Ghahramani, 2016) between BiLSTM layers, both with p = 0.4. We also use word, lemma, coarseand fine-grained POS tag dropout with α = 0.2 Offi-TUPA (single-task) TUPA (multi-task) Best System cial (Donatelli et al, 2019) 88.46 EDS 81.00 81.36 73.95 74.81 91.85 92.55 UCCA 27.56 40.06 23.65 41.03 81.67 (Che et al, 2019) 82.61 (Che et al, 2019) AMR 44.73 47.04 33.75 43.37 73.38 (Cao et al, 2019 73.11 (Donatelli et al, 2019) Overall 57.70 57.55 45.34 50.64 86.20 (Che et al, 2019 84.88 (Donatelli et al, 2019) Table 2: Official test MRP F-scores (in %) for TUPA (single-task and multi-task). For comparison, the highest score achieved for each framework and evaluation set is shown.…”
Section: Hyperparametersmentioning
confidence: 99%