Kazuki Ashihara scite author profile

Kazuki Ashihara

4Publications

7Citation Statements Received

89Citation Statements Given

How they've been cited

How they cite others

Affiliations

Osaka University, Osaka Health Science University

Publications

Order By: Most citations

Improving topic modeling through homophily for legal documents

et al. 2020

View full text Add to dashboard Cite

Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The relevance of the modeled topics strongly depends on the legal context they are used in. On the other hand, references to laws and prior cases are key elements for judges to rule on a case. Taken together, these references form a network, whose structure can be analysed with network analysis. However, the content of the referenced documents may not be always accessed. Even in that case, the reference structure itself shows that documents share latent similar characteristics. We propose to use this latent structure to improve topic modeling of law cases using document homophily. In this paper, we explore the use of homophily networks extracted from two types of references: prior cases and statute laws, to enhance topic modeling on legal case documents. We conduct in detail, an analysis on a dataset consisting of rich legal cases, i.e., the COLIEE dataset, to create these networks. The homophily networks consist of nodes for legal cases, and edges with weights for the two families of references between the case nodes. We further propose models to use the edge weights for topic modeling. In particular, we propose a cutting model and a weighting model to improve the relational topic model (RTM). The cutting model uses edges with weights higher than a threshold as document links in RTM; the weighting model uses the edge weights to weight the link probability function in RTM. The weights can be obtained either from the co-citations or from the cosine similarity based on an embedding of the homophily networks. Experiments show that the use of the homophily networks for topic modeling significantly outperforms previous studies, and the weighting model is more effective than the cutting model.

show abstract

Contextualized Multi-Sense Word Embedding

Ashihara

Kajiwara

Arase

et al. 2019

Journal of Natural Language Processing

View full text Add to dashboard Cite

Currently, distributed word representations are employed in many natural language processing tasks. However, when generating one representation for each word, the meanings of a polysemous word cannot be differentiated because the meanings are integrated into one representation. Therefore, several attempts have been made to generate different representations per meaning based on parts of speech or the topic of a sentence. However, these methods are too unrefined to deal with polysemy. In this paper, we proposed two methods to generate more subtle multiple word representations. The first method involves generating multiple word representations using the word in a dependency relationship as a clue. The second approach involves employing a bi-directional language model in which a word representation that considers all the words in the context is generated. The results of the extensive evaluation of the Lexical Substitution task and Context-Aware Word Similarity task confirmed the effectiveness of our approaches to generate more subtle multiple word representations.

show abstract

Legal Information as a Complex Network: Improving Topic Modeling Through Homophily

Ashihara

Chu

Renoust

et al. 2019

View full text Add to dashboard Cite

Contextualized context2vec

Ashihara¹,

Kajiwara²,

Arase³

et al. 2019

View full text Add to dashboard Cite

Lexical substitution ranks substitution candidates from the viewpoint of paraphrasability for a target word in a given sentence. There are two major approaches for lexical substitution: (1) generating contextualized word embeddings by assigning multiple embeddings to one word and (2) generating context embeddings using the sentence. Herein we propose a method that combines these two approaches to contextualize word embeddings for lexical substitution. Experiments demonstrate that our method outperforms the current state-ofthe-art method. We also create CEFR-LP, a new evaluation dataset for the lexical substitution task. It has a wider coverage of substitution candidates than previous datasets and assigns English proficiency levels to all target words and substitution candidates.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kazuki Ashihara

Improving topic modeling through homophily for legal documents

Contextualized Multi-Sense Word Embedding

Legal Information as a Complex Network: Improving Topic Modeling Through Homophily

Contextualized context2vec

Contact Info

Product

Resources

About