Accidents in the process industry can be prevented and their consequences mitigated by performing proper risk analyses to inform decision making. Quantitative risk analysis (QRA) is one of the main frameworks used for risk assessment and the identification of all hazards, which a plant is exposed to. There are crucial tasks to ensure the comprehensiveness of QRA that are carried out by the examination of a set of engineering documents by experts during structured meetings. The hazards identified are stored as textual data in documents, which retain valuable information about the risks related to the analyzed system. In this context, Natural Language Processing (NLP) techniques have emerged as a way for extracting, organizing and classifying relevant information from text. However, a challenge arises when we are interested in addressing catastrophic accidents that are rare and for which, thus, only limited information is available. Some accidents are actually postulated as possible in principle, but there are no historical occurrence records. Developing techniques to characterize features about such rare events is a challenging task, yet quite useful for QRA, whose outcomes would guide designing preventive measures and supporting decision making. In this paper, we applied data augmentation (DA) and investigate different configurations to address the rare event issue. DA is applied to obtain a balanced and sufficiently large training set. The final aim of this work is developing a model capable of characterizing relevant features about rare or unseen accidental scenarios in support to hazard and operability study (HAZOP) and preliminary hazard analysis (PrHA).