Solr Integration in the Anserini Information Retrieval Toolkit

Clancy, Ryan; Eskildsen, Toke; Ruest, Nick; Lin, Jimmy

doi:10.1145/3331184.3331401

Cited by 3 publications

(2 citation statements)

References 8 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Third, the integration of search capabilities with a document store allows dstlr to focus analyses on subsets of documents, as demonstrated in Clancy et al (2019b). For convenience, our open-source search toolkit Anserini (Yang et al, 2018) provides a number of connectors for ingesting document collections into Solr (Clancy et al, 2019a), under different index architectures. The execution layer, which relies on Apache Spark, coordinates the two major phases of knowledge graph construction: extraction and enrichment.…”

Section: System Overviewmentioning

confidence: 99%

Scalable Knowledge Graph Construction from Text Collections

Clancy¹,

Ilyas²,

Lin³

2019

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Self Cite

View full text Add to dashboard Cite

We present a scalable, open-source platform that "distills" a potentially large text collection into a knowledge graph. Our platform takes documents stored in Apache Solr and scales out the Stanford CoreNLP toolkit via Apache Spark integration to extract mentions and relations that are then ingested into the Neo4j graph database. The raw knowledge graph is then enriched with facts extracted from an external knowledge graph. The complete product can be manipulated by various applications using Neo4j's native Cypher query language: We present a subgraph-matching approach to align extracted relations with external facts and show that fact verification, locating textual support for asserted facts, detecting inconsistent and missing facts, and extracting distantly-supervised training data can all be performed within the same framework.

show abstract

Section: System Overviewmentioning

confidence: 99%

Scalable Knowledge Graph Construction from Text Collections

Clancy¹,

Ilyas²,

Lin³

2019

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Third, the integration of search capabilities with a document store allows dstlr to focus analyses on subsets of documents, as demonstrated in Clancy et al (2019b). For convenience, our open-source search toolkit Anserini provides a number of connectors for ingesting document collections into Solr (Clancy et al, 2019a), under different index architectures. The execution layer, which relies on Apache Spark, coordinates the two major phases of knowledge graph construction: extraction and enrichment.…”

Section: System Overviewmentioning

confidence: 99%

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

2019

View full text Add to dashboard Cite

We present the results of the second Fact Extraction and VERification (FEVER2.0) Shared Task. The task challenged participants to both build systems to verify factoid claims using evidence retrieved from Wikipedia and to generate adversarial attacks against other participant's systems. The shared task had three phases: building, breaking and fixing. There were 8 systems in the builder's round, three of which were new qualifying submissions for this shared task, and 5 adversaries generated instances designed to induce classification errors and one builder submitted a fixed system which had higher FEVER score and resilience than their first submission. All but one newly submitted systems attained FEVER scores higher than the best performing system from the first shared task and under adversarial evaluation, all systems exhibited losses in FEVER score. There was a great variety in adversarial attack types as well as the techniques used to generate the attacks, In this paper, we present the results of the shared task and a summary of the systems, highlighting commonalities and innovations among participating systems.

show abstract