Improving topic modeling through homophily for legal documents

Ashihara, Kazuki; Vaigh, Cheikh Brahim El; Chu, Chenhui; Renoust, Benjamin; Okubo, Noriko; Takemura, Noriko; Nakashima, Yuta; Nagahara, Hajime

doi:10.1007/s41109-020-00321-y

Cited by 8 publications

(4 citation statements)

References 39 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e length of the shortest element sentence is 30-40 words, and the length of the longest element sentence will reach more than 300 words. e traditional model mostly uses fixed parameters as the vector dimension and fills the vector with 0 for the short sentence, which cannot effectively capture the characteristic representation of sentences with different lengths [5]. In order to weaken the negative impact of the length difference of different sentences on the effect of the model, the multihead self-attention mechanism (MAT) based on the mask method is further integrated into the BERT-CNN model.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning for Decision Element Extraction in Fact Description of Legal Documents

2022

Advances in Multimedia

View full text Add to dashboard Cite

In order to improve the work efficiency of judicial personnel and solve the problem of waste of judicial resources, this topic proposes a method of decision element extraction in the fact description of legal documents based on in-depth learning. Firstly, this paper briefly introduces the basic theory of deep learning, text mining technology, a neural network, and other theories and technologies, then expounds the decision element extraction model in the fact description of legal documents based on deep learning, such as the HMM model, the CRF model, and the Bert model, and finally expounds the establishment and implementation method of the decision element extraction model in the fact description of legal documents, so as to provide a guarantee for the work quality and efficiency of judicial personnel. The samples are according to the sample label frequency to obtain more balanced data, and we manually label keywords to obtain feature vectors to assist the model in improving the prediction results, but it also increases the statistical quantity mode of label co-occurrence. Although all modes can be included by using a larger matrix, the amount of calculation increases significantly. Therefore, the follow-up work mainly studies the important co-occurrence features that can be used and then adopts better dimensionality reduction methods to improve the final prediction results.

show abstract

Section: Introductionmentioning

confidence: 99%

Deep Learning for Decision Element Extraction in Fact Description of Legal Documents

2022

Advances in Multimedia

View full text Add to dashboard Cite

show abstract

“…In this matrix, the measure scale goes from [-1, 1], with 1 denoting the strongest connotation. The work of [26] defines topic variety as the proportion of unique words across all themes. The scale goes from [0, 1], with 0 denoting superfluous topics and 1 denoting topics with more variety.…”

Section: Discussionmentioning

confidence: 99%

Topic modelling of legal documents using NLP and bidirectional encoder representations from transformers

Rawat

Ghildiyal

Dixit

2022

IJEECS

View full text Add to dashboard Cite

<span>Modeling legal text is a difficult task because of its unique features, such as lengthy texts, complex language structures, and technical terms. During the last decade, there has been a big rise in the number of legislative documents, which makes it hard for law professionals to keep up with legislation like analyzing judgements and implementing acts. The relevancy of topics is heavily influenced by the processing and presentation of legal documents in some contexts. The objective of this work is to understand the legal judgement corpus related to cases under the Hindu Marriage Act of India. The study looked into various methods to generate sentence embeddings from the judgement. This paper employs the power of the BERTopic algorithm for generating significant topics.</span>

show abstract

“…The designed model reduces the computational complexity but the performance of long documents classification was not improved. Improving topic modeling analysis method was introduced in [16] for legal case documents classification. But it failed to use a multilayer network model for legal case documents categorization.…”

Section: Related Workmentioning

confidence: 99%

Probit Regressive Tversky Indexed Rocchio Convolutive Deep Neural Learning for Legal Document Data Analytics

Mohan¹,

Nair²

2021

ijisae

View full text Add to dashboard Cite

Legal documents data analytics is a very significant process in the field of computational law. Semantically analyzing the documents is more challenging since it's often more complicated than open domain documents. Efficient document analysis is crucial to current legal applications, such as case-based reasoning, legal citations, and so on. Due to the extensive growth of documents of data, several statistical machine-learning methods have been developed for Legal documents data analytics. However, documents are large and highly complex, so the traditional machine learning-based classification models are inefficient for accurate data analytics with minimum time. In order to improve the accurate legal documents data analytics with minimum time, an efficient technique called Probit Regressive Tversky Indexed Rocchio Convolutive Deep Neural Learning (PRTIRCDNL) is introduced. The PRTIRCDNL technique uses the Convolutive Deep neural learning concept to learn the given input with help of many layers and provides accurate classification results. Convolutive Deep Neural Learning uses two different processing steps such as keyword extraction and classification in the different layers such as input, two hidden layers and output layer. Initially, large numbers of legal documents are collected from the dataset. Then the collected legal documents are sent to the input layer of the convolutive deep neural learning. The input legal documents are transferred into the first hidden layer where the keyword extraction process is carried out by applying the Target projective probit Regression. Then the regression function extracts the keywords based on frequent occurrence score. Then the extracted keywords are transferred into the second hidden layer where the document classification is performed using the Tversky similarity indexive Rocchio classifier. Likewise, all the legal documents are classified into different classes. The experimental evaluation is carried out using different performance metrics such as accuracy, precision, recall, F-measure and computational time with respect to the number of legal documents collected from the dataset. The observed results confirmed that the presented PRTIRCDNL technique provides the better performance in terms of achieving higher accuracy, precision, recall and F-measure with minimum computation time.

show abstract

Improving topic modeling through homophily for legal documents

Cited by 8 publications

References 39 publications

Deep Learning for Decision Element Extraction in Fact Description of Legal Documents

Deep Learning for Decision Element Extraction in Fact Description of Legal Documents

Topic modelling of legal documents using NLP and bidirectional encoder representations from transformers

Probit Regressive Tversky Indexed Rocchio Convolutive Deep Neural Learning for Legal Document Data Analytics

Contact Info

Product

Resources

About