Extracting relational triples from unstructured text is crucial for large-scale knowledge graph construction. However, few existing works excel in solving the overlapping triple problem where multiple relational triples in the same sentence share the same entities. In this work, we introduce a fresh perspective to revisit the relational triple extraction task and propose a novel cascade binary tagging framework (CASREL) derived from a principled problem formulation. Instead of treating relations as discrete labels as in previous works, our new framework models relations as functions that map subjects to objects in a sentence, which naturally handles the overlapping problem. Experiments show that the CAS-REL framework already outperforms state-ofthe-art methods even when its encoder module uses a randomly initialized BERT encoder, showing the power of the new tagging framework. It enjoys further performance boost when employing a pre-trained BERT encoder, outperforming the strongest baseline by 17.5 and 30.2 absolute gain in F1-score on two public datasets NYT and WebNLG, respectively. In-depth analysis on different scenarios of overlapping triples shows that the method delivers consistent performance gain across all these scenarios. The source code and data are released online 1 .
Virtual personal assistants (VPA) (e.g.
Radiation pneumonitis (RP) is one of the major toxicities of thoracic radiation therapy. RP incidence has been proven to be closely associated with the dosimetric factors and normal tissue control possibility (NTCP) factors. However, because these factors only utilize limited information of the dose distribution, the prediction abilities of these factors are modest. We adopted the dosiomics method for RP prediction. The dosiomics method first extracts spatial features of the dose distribution within ipsilateral, contralateral, and total lungs, and then uses these extracted features to construct prediction model via univariate and multivariate logistic regression (LR). The dosiomics method is validated using 70 non-small cell lung cancer (NSCLC) patients treated with volumetric modulated arc therapy (VMAT) radiotherapy. Dosimetric and NTCP factors based prediction models are also constructed to compare with the dosiomics features based prediction model. For the dosimetric, NTCP and dosiomics factors/features, the most significant single factors/features are the mean dose, parallel/serial (PS) NTCP and gray level co-occurrence matrix (GLCM) contrast of ipsilateral lung, respectively. And the area under curve (AUC) of univariate LR is 0.665, 0.710 and 0.709, respectively. The second significant factors are V 5 of contralateral lung, equivalent uniform dose (EUD) derived from PS NTCP of contralateral lung and the low gray level run emphasis of gray level run length matrix (GLRLM) of total lungs. The AUC of multivariate LR is improved to 0.676, 0.744, and 0.782, respectively. The results demonstrate that the univariate LR of dosiomics features has approximate predictive ability with NTCP factors, and the multivariate LR outperforms both the dosimetric and NTCP factors. In conclusion, the spatial features of dose distribution extracted by the dosiomics method effectively improves the prediction ability.
Bugs are prevalent in software systems. Some bugs are critical and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. In this work, we propose a new approach leveraging information retrieval, in particular BM25-based document similarity function, to automatically predict the severity of bug reports. Our approach automatically analyzes bug reports reported in the past along with their assigned severity labels, and recommends severity labels to newly reported bug reports. Duplicate bug reports are utilized to determine what bug report features, be it textual, ordinal, or categorical, are important. We focus on predicting fine-grained severity labels, namely the different severity labels of Bugzilla including: blocker, critical, major, minor, and trivial. Compared to the existing state-of-the-art study on fine-grained severity prediction, namely the work by Menzies and Marcus, our approach brings significant improvement.
Abstract-Bugs are prevalent in software systems. To improve the reliability of software systems, developers often allow end users to provide feedback on bugs that they encounter. Users could perform this by sending a bug report in a bug report management system like Bugzilla. This process however is uncoordinated and distributed, which means that many users could submit bug reports reporting the same problem. These are referred to as duplicate bug reports. The existence of many duplicate bug reports may cause much unnecessary manual efforts as often a triager would need to manually tag bug reports as being duplicates. Recently, there have been a number of studies that investigate duplicate bug report problem which in effect answer the following question: given a new bug report, retrieve k other similar bug reports. This, however, still requires substantive manual effort which could be reduced further. Jalbert and Weimer are the first to introduce the direct detection of duplicate bug reports; it answers the question: given a new bug report, classify if it as a duplicate bug report or not. In this paper, we extend Jalbert and Weimer's work by improving the accuracy of automated duplicate bug report identification. We experiments with bug reports from Mozilla bug tracking system which were reported between February 2005 to October 2005, and find that we could improve the accuracy of the previous approach by about 160%.
Issue tracking systems are valuable resources during software maintenance activities and contain information about the issues faced during the development of a project as well as after its release. Many projects receive many reports of bugs and it is challenging for developers to manually debug and fix them. To mitigate this problem, past studies have proposed information retrieval (IR)-based bug localization techniques, which takes as input a textual description of a bug stored in an issue tracking system, and returns a list of potentially buggy source code files.These studies often evaluate their effectiveness on issue reports marked as bugs in issue tracking systems, using as ground truth the set of files that are modified in commits that fix each bug. However, there are a number of potential biases that can impact the validity of the results reported in these studies. First, issue reports marked as bugs might not be reports of bugs due to error in the reporting and classification process. Many issue reports are about documentation update, request for improvement, refactoring, code cleanups, etc. Second, bug reports might already explicitly specify the buggy program files and for these reports bug localization techniques are not needed. Third, files that get modified in commits that fix the bugs might not contain the bug.This study investigates the extent these potential biases affect the results of a bug localization technique and whether bug localization researchers need to consider these potential biases when evaluating their solutions. In this paper, we analyse issue reports from three different projects: HTTPClient, Jackrabbit, and Lucene-Java to examine the impact of above three biases on bug localization. Our results show that one of these biases significantly and substantially impacts bug localization results, while the other two biases have negligible or minor impact.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.