Identification of risk features using text mining and BERT-based models: Application to an oil refinery

Safety occurrence reports can contain valuable information on how incidents occur, revealing knowledge that can assist safety practitioners. This paper presents and discusses a literature review exploring how Natural Language Processing (NLP) has been applied to occurrence reports within safety-critical industries, informing further research on the topic and highlighting common challenges. Some of the uses of NLP include the ability for occurrence reports to be automatically classified against categories, and entities such as causes and consequences to be extracted from the text as well as the semantic searching of occurrence databases. The review revealed that machine learning models form the dominant method when applying NLP, although rule-based algorithms still provide a viable option for some entity extraction tasks. Recent advances in deep learning models such as Bidirectional Transformers for Language Understanding are now achieving a high accuracy while eliminating the need to substantially pre-process text. The construction of safety-themed datasets would be of benefit for the application of NLP to occurrence reporting, as this would allow the fine-tuning of current language models to safety tasks. An interesting approach is the use of topic modelling, which represents a shift away from the prescriptive classification taxonomies, splitting data into “topics”. Where many papers focus on the computational accuracy of models, they would also benefit from real-world trials to further inform usefulness. It is anticipated that NLP will soon become a mainstream tool used by safety practitioners to efficiently process and gain knowledge from safety-related text.

show abstract

“…Fine-tune the model. The "standard" model is further trained on a specific dataset (e.g., collection of safety assessment reports) [42,76].…”

Section: "During Final Apch To Lndg Zone R-hand Eng Cowling Exited Ac...mentioning

confidence: 99%

A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports

Ricketts

Barry

Guo

et al. 2023

Safety

View full text Add to dashboard Cite

show abstract

“…Ansaldi et al 34 an ontology has been defined considering safety documents and applied to the analysis of equipment aging in a liquid fuel depot of an industrial establishment. Maceˆdo et al, 35 BERT is combined with information coming from risk assessment documentation and pre-hazard analysis spreadsheets to identify risk features and potential hazards in O&G refineries. Bin et al, 36 a NLP technique based on text chains is developed to extract fault features from accident reports of high-speed trains, with the objective of maintenance improvement.…”

Section: Nlp For Accident Classificationmentioning

confidence: 99%

A framework based on Natural Language Processing and Machine Learning for the classification of the severity of road accidents from reports

Valcamonico

Baraldi

Amigoni

et al. 2022

Proceedings of the Institution of Mechanical Engineers, Part O:

View full text Add to dashboard Cite

Road safety analysis is typically performed by domain experts on the basis of the information contained in accident reports. The main challenges are the difficulty of considering a large number of reports in textual form and the subjectivity of the expert judgments contained in reports. This work develops a framework based on the combination of Natural Language Processing (NLP) and Machine Learning (ML) for the automatic classification of accidents with the final aim of assisting experts in performing road safety analyses. Two different models for the representation of the textual reports (Hierarchical Dirichlet Processes (HDPs) and Doc2vec) and three ML-based classifiers (Artificial Neural Networks (ANNs), Decision Trees (DTs) and Random Forests (RFs)) are compared. The framework is applied to a repository of road accident reports provided by the US National Highway Traffic Safety Administration. The best trade-off between accuracy of the classification and explainability of the obtained results is achieved by combining HDP topic modeling and RF classification.

show abstract

“…B. Macêdo et al 2022;J. Macêdo et al 2020), each document contains the description of potential accident events from different processing units of an oil refinery and their qualitative assessment of frequency of occurrence and severity of consequences.…”

Section: Datasetmentioning

confidence: 99%

“…B. Macêdo et al 2022) extracted textual data from preliminary hazard analysis reports regarding different systems of an oil refinery. The extracted data were used to train classifiers to predict the possible consequences given the occurrence of a spill and their respective expected frequency and severity level.…”

Section: Introductionmentioning

confidence: 99%

Identification of Features of Rare Risk Events in Oil Refineries Using Natural Language Processing (NLP)

Macêdoa,

Moura,

Lins

et al. 2022

Book of Extended Abstracts for the 32nd European Safety and Reliability Conference

View full text Add to dashboard Cite

Accidents in the process industry can be prevented and their consequences mitigated by performing proper risk analyses to inform decision making. Quantitative risk analysis (QRA) is one of the main frameworks used for risk assessment and the identification of all hazards, which a plant is exposed to. There are crucial tasks to ensure the comprehensiveness of QRA that are carried out by the examination of a set of engineering documents by experts during structured meetings. The hazards identified are stored as textual data in documents, which retain valuable information about the risks related to the analyzed system. In this context, Natural Language Processing (NLP) techniques have emerged as a way for extracting, organizing and classifying relevant information from text. However, a challenge arises when we are interested in addressing catastrophic accidents that are rare and for which, thus, only limited information is available. Some accidents are actually postulated as possible in principle, but there are no historical occurrence records. Developing techniques to characterize features about such rare events is a challenging task, yet quite useful for QRA, whose outcomes would guide designing preventive measures and supporting decision making. In this paper, we applied data augmentation (DA) and investigate different configurations to address the rare event issue. DA is applied to obtain a balanced and sufficiently large training set. The final aim of this work is developing a model capable of characterizing relevant features about rare or unseen accidental scenarios in support to hazard and operability study (HAZOP) and preliminary hazard analysis (PrHA).

show abstract

Identification of risk features using text mining and BERT-based models: Application to an oil refinery

Cited by 28 publications

References 69 publications

A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports

A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports

A framework based on Natural Language Processing and Machine Learning for the classification of the severity of road accidents from reports

Identification of Features of Rare Risk Events in Oil Refineries Using Natural Language Processing (NLP)

Contact Info

Product

Resources

About