SPaR.txt, a Cheap Shallow Parsing Approach for Regulatory Texts

Kruiper, Ruben; Konstas, Ioannis; Gray, Alasdair J. G.; Sadeghineko, Farhad; Watson, Richard; Kumar, Bimal

doi:10.18653/v1/2021.nllp-1.14

Cited by 4 publications

(1 citation statement)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent approaches to the modeling and extraction of regulations in the construction domain vary greatly both in their choice of semantic representation and their methods for mapping text to such representations. Kruiper et al (2021) create the ScotReg corpus of Scottish building regulations, deőne a sequence labeling task that is a combination of shallow parsing (chunking) and semantic role labeling, assigning labels such as Action and Object to spans of text that are also syntactic constituents, and annotate 200 sentences using this representation to create the SPaR.txt dataset, which they use to train a standard deep learning architecture consisting of BERT embeddings, bidirectional Long Short-Term Memory (bi-LSTM) and Conditional Random Fields (CRFs). On the test portion of the dataset their models achieve precision, recall, and F1 scores around 80%.…”

Section: Nlp In the Construction Domainmentioning

confidence: 99%

BRISE-Plandok: a German legal corpus of building regulations

Recski

Iklódi

Lellmann

et al. 2023

Preprint

View full text Add to dashboard Cite

We present the BRISE-Plandok corpus, a collection of 250 text documents with over 7,000 sentences from the Zoning Map of the City of Vienna, annotated manually with formal representations of the rules they convey. The generic rule format used by the corpus enables automated compliance checking of building plans, a process developed as part of the BRISE project. The format also allows for conversion to multiple logic formalisms, including dyadic deontic logic, enabling automated reasoning. Annotation guidelines were developed in collaboration with experts of the city's building inspection office, describing nearly 100 domain-specific attributes with examples. Each document was annotated independently by two trained annotators and subsequently reviewed by the authors. A rule-based system for the automatic extraction of rules from text was developed and used in the annotation process to provide suggestions. The reviewed dataset was also used to train a set of baseline machine learning models for the task of attribute extraction, the main step in the rule extraction process. Both the rule-based system and the ML baselines are evaluated on the annotated dataset and released as open-source software. We also describe and release the framework used for generating and parsing the interactive xlsx spreadsheets used by annotators.

show abstract

Section: Nlp In the Construction Domainmentioning

confidence: 99%

BRISE-Plandok: a German legal corpus of building regulations

Recski

Iklódi

Lellmann

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Greco,

Tagarelli

2023

Artif Intell Law

View full text Add to dashboard Cite

Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.

show abstract

BRISE-plandok: a German legal corpus of building regulations

Recski,

Iklódi,

Lellmann

et al. 2024

Lang Resources & Evaluation

View full text Add to dashboard Cite

We present the BRISE-Plandok corpus, a collection of 250 text documents with a total of over 7000 sentences from the Zoning Map of the City of Vienna, annotated manually with formal representations of the rules they convey. The generic rule format used by the corpus enables automated compliance checking of building plans, a process developed as part of the BRISE (https://smartcity.wien.gv.at/en/brise/) project. The format also allows for conversion to multiple logic formalisms, including dyadic deontic logic, enabling automated reasoning. Annotation guidelines were developed in collaboration with experts of the city’s building inspection office, describing nearly 100 domain-specific attributes with examples. Each document was annotated independently by two trained annotators and subsequently reviewed by the authors. A rule-based system for the automatic extraction of rules from text was developed and used in the annotation process to provide suggestions. The reviewed dataset was also used to train a set of baseline machine learning models for the task of attribute extraction, the main step in the rule extraction process. Both the rule-based system and the ML baselines are evaluated on the annotated dataset and released as open-source software. We also describe and release the framework used for generating and parsing the interactive xlsx spreadsheets used by annotators.

show abstract

SPaR.txt, a Cheap Shallow Parsing Approach for Regulatory Texts

Cited by 4 publications

References 37 publications

BRISE-Plandok: a German legal corpus of building regulations

BRISE-Plandok: a German legal corpus of building regulations

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

BRISE-plandok: a German legal corpus of building regulations

Contact Info

Product

Resources

About