Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1233
|View full text |Cite
|
Sign up to set email alerts
|

Interpretation of Natural Language Rules in Conversational Machine Reading

Abstract: Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regulations to answer "Can I...?" or "Do I have to...?" questions such as "I am working in Canada. Do I have to carry on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
159
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 121 publications
(182 citation statements)
references
References 26 publications
2
159
0
Order By: Relevance
“…We report the results of our approach, the various baselines, as well as the previous state-of-the-art (SOTA) scores where applicable in Table 1 and 2 for SHARC and in Table 3 for DAILY DIALOG. On the SHARC dataset, we observe very poor BLEU-4 performance for the encoder-decoder Transformer (E&D), which is consistent with results from Saeidi et al (2018), who could not get a LSTM-based network to work without an additional classification head. Adding BERT (E&D+B) slightly improves performance.…”
Section: Resultssupporting
confidence: 84%
“…We report the results of our approach, the various baselines, as well as the previous state-of-the-art (SOTA) scores where applicable in Table 1 and 2 for SHARC and in Table 3 for DAILY DIALOG. On the SHARC dataset, we observe very poor BLEU-4 performance for the encoder-decoder Transformer (E&D), which is consistent with results from Saeidi et al (2018), who could not get a LSTM-based network to work without an additional classification head. Adding BERT (E&D+B) slightly improves performance.…”
Section: Resultssupporting
confidence: 84%
“…The CSQA dataset [30] takes preliminary steps towards the sequential KG-QA paradigm, but it is extremely artificial: initial and follow-up questions are generated semi-automatically via templates, and sequential utterances are only simulated by stitching questions with shared entities or relations in a thread, without a logical flow. QBLink [9], CoQA [27], ans ShARC [29] are recent resources for sequential QA over text. The SQA resource [16], derived from WikiTableQuestions [25], is aimed at driving conversational QA over (relatively small) Web tables.…”
Section: The Convquestions Benchmark 41 Benchmark Creationmentioning
confidence: 99%
“…However, such table-cell search methods cannot scale to real-world, large-scale curated KGs. QBLink [9], CoQA [27], and ShARC [29] are recent benchmarks aimed at driving conversational QA over text, and the allied paradigm in text comprehension on interactive QA [18]. Hixon et al [13] try to learn concept knowledge graphs from conversational dialogues over science questions, but such KGs are fundamentally different from curated ones like Wikidata with millions of facts.…”
Section: Related Workmentioning
confidence: 99%
“…The most closely related datasets to ROPES are ShARC (Saeidi et al, 2018), Open-BookQA (Mihaylov et al, 2018), and QuaRel (Tafjord et al, 2019). ShARC shares the same goal of understanding causes and effects (in terms of specified rules), but frames it as a dialogue where the system has to also generate questions to gain complete information.…”
Section: Related Workmentioning
confidence: 99%