Huan Yee Koh scite author profile

Huan Yee Koh

5Publications

19Citation Statements Received

435Citation Statements Given

How they've been cited

How they cite others

296

425

Affiliations

Monash University

Publications

Order By: Most citations

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

Koh

Liu

et al. 2022

ACM Comput. Surv.

View full text Add to dashboard Cite

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

show abstract

Leveraging Information Bottleneck for Scientific Document Summarization

Ju¹,

Liu²,

Koh³

et al. 2021

View full text Add to dashboard Cite

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

show abstract

The application of artificial intelligence to accelerate G protein‐coupled receptor drug discovery

Nguyen

Koh

et al. 2023

British J Pharmacology

View full text Add to dashboard Cite

The application of artificial intelligence (AI) approaches to drug discovery for G protein‐coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand‐GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process “faster, smarter and cheaper,” we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery.

show abstract

How Far are We from Robust Long Abstractive Summarization?

Koh¹,

Ju²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

ive summarization has made tremendous progress in recent years. In this work, we perform fine-grained human annotations to evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries. For long document abstractive models, we show that the constant strive for state-of-the-art ROUGE results can lead us to generate more relevant summaries but not factual ones. For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary. It also reveals important limitations of factuality metrics in detecting different types of factual errors and the reasons behind the effectiveness of BARTScore. We then suggest promising directions in the endeavor of developing factual consistency metrics. Finally, we release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.

show abstract

How Far are We from Robust Long Abstractive Summarization?

Koh¹,

Ju²,

Zhang³

et al. 2022

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Huan Yee Koh

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

Leveraging Information Bottleneck for Scientific Document Summarization

The application of artificial intelligence to accelerate G protein‐coupled receptor drug discovery

How Far are We from Robust Long Abstractive Summarization?

How Far are We from Robust Long Abstractive Summarization?

Contact Info

Product

Resources

About