Modeling Dynamic Pairwise Attention for Crime Classification over Legal Articles

Wang, Pengfei; Yang, Ze; Niu, Shuzi; Zhang, Yongfeng; Zhang, Lei; Niu, Shaozhang

doi:10.1145/3209978.3210057

Cited by 38 publications

(24 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent examples are multi-label learning based on SVM (Chang et al, 2017 ), based on deep learning (Mai et al, 2018 ), and based on ensemble classification (Büyükçakir et al, 2018 ). For very large classification space, extreme multi-label classification is proposed, e.g., a method based on graph embedding (Tagami, 2017 ), a method based on convolutional neural network (CNN) (Liu et al, 2017 ), and a method based on attention model of neural networks (Wang et al, 2018 ). Moreover, label hierarchy also can be considered so that part-of, is-a, and inclusion relationships are extracted from external data sources such as Wikipedia in the classification task (Bairi et al, 2016 ; Xie et al, 2017 ).…”

Section: Related Workmentioning

confidence: 99%

Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence

Kurakawa

Sun

Ando³

2020

Front. Big Data

View full text Add to dashboard Cite

A novel subject classification scheme should often be applied to a preclassified bibliographic database for the research evaluation task. Generally, adopting a new subject classification scheme is labor intensive and time consuming, and an effective and efficient approach is necessary. Hence, we propose an approach to apply a new subject classification scheme for a subject-classified database using a data-driven correspondence between the new and present ones. In this paper, we define a subject classification model of the bibliographic database comprising a topological space. Then, we show our approach based on this model, wherein forming a compact topological space is required for a novel subject classification scheme. To form the space, a correspondence between two subject classification schemes using a research project database is utilized as data. As a case study, we applied our approach to a practical example. It is a tool used as world proprietary benchmarking for research evaluation based on a citation database. We tried to add a novel subject classification of a research project database.

show abstract

Section: Related Workmentioning

confidence: 99%

Application of a Novel Subject Classification Scheme for a Bibliographic Database Using a Data-Driven Correspondence

Kurakawa

Sun

Ando³

2020

Front. Big Data

View full text Add to dashboard Cite

show abstract

“…In addition, they use a unified k or a threshold value, which causes an increase in errors especially when the prediction probability is not precise. There is a work which jointly learns a multi-label classification model and a threshold predictor to gain different fixed thresholds for the different labels [17]. However, it ignores the co-occurring relation between labels.…”

Section: Charge (Label) Law Privision (External Knowledge)mentioning

confidence: 99%

“…With the rise of joint learning, there are also attempts to combine legal article recommendations with charge prediction for multi-task learning [6,12]; Some studies are based on reading comprehension and hierarchical multi-label classification [11,16]. Inspired by the success of attention mechanism in NLP task, Wang et al handled charge prediction task by incorporating an attention mechanism [17]. Different from them, our paper studies how to joint the impacts of the similarity relation, the difference relation and the co-occurring relation in a unified framework.…”

Section: Multi-label Charge Predictionmentioning

confidence: 99%

“…Label number learning is to learn the number of labels based on an output probability. There are some researches for using naïve top-k strategy [10] and threshold strategy [17]. Yang et al summarized the whole process of multi-label classification and used a Multilayer Perceptron (MLP) to get thresholds for label output probabilities [19].…”

Section: Label Number Learningmentioning

confidence: 99%

“…Our framework adopts deep models as feature extractors, such as TextCNN [9], CRNN [14], DPCNN [7], CNN&Attention [15], Bi-GRU [2], Bi-LSTM [3], etc. For the RQ1, we compare our proposed DMA with the softattention and the transformer which are baselines [4,8]; For the RQ2, we compare our proposed NLN with the threshold strategy and the top-k strategy which are baselines [10,17]; For the RQ3, we compare our proposed DMA and NLN with DMA and the threshold strategy which is the label decision baseline in our framework.…”

Section: Baselinesmentioning

confidence: 99%

See 2 more Smart Citations

A Relation Learning Hierarchical Framework for Multi-label Charge Prediction

Duan

Lin

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In legal field, multi-label charge prediction is a popular and foundational task to predict charges (labels) by a case description (a fact). From perspectives of content analysis and label decision, there are two major difficulties. One is content confusion that the case descriptions of some charges are almost identical. The other is dynamic label number that the numbers of labels (label number) of different cases may be different. In this paper, we propose a relation learning hierarchical framework for multi-label charge prediction with two models, i.e., dynamic merging attention (DMA) and number learning network (NLN). Specially, DMA can improve the charge prediction performance by dynamically learning the similarity relation between a fact and external knowledge (provisions) and the difference relation between different provisions, which alleviates the phenomenon of content confusion. NLN mitigates the dynamic label number by learning the co-occurring relation between labels. Moreover, we put the two models into a unified framework to enhance their effects. Conducted on a public large real-world law dataset, experimental results demonstrate that our framework with DMA and NLN outperforms wellknown baselines by more than 3%-23%.

show abstract