Yucheng Cai scite author profile

Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end taskoriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semisupervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages -MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.1 Variational semi-supervised learning with LVM generally assumes that the unlabeled and labeled data are drawn from the same distribution, except that the unlabeled data are missing data (without labels) (Kingma and Welling 2014). This is often occurred in realworld situations, e.g., unlabeled in-domain data are easily available between customers and human agents.

show abstract

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

Liu

Cai

Lin

et al. 2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Pruning the Unimportant or Redundant Filters? Synergy Makes Better

Cai

Zhuowen

Guo

et al. 2021

View full text Add to dashboard Cite

Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

Liu

Cai

et al. 2023

View full text Add to dashboard Cite

Multiple Symmetric Lipomas: A Case Report

Feng

Cai

et al. 2020

Chinese Journal of Plastic and Reconstructive Surgery

View full text Add to dashboard Cite

Learning-Based Autonomous Channel Access in the Presence of Hidden Terminals

Shao

Cai

Wang

et al. 2024

IEEE Trans. on Mobile Comput.

View full text Add to dashboard Cite

Improving multi-scale detection layers in the deep learning network for wheat spike detection based on interpretive analysis

et al. 2023

View full text Add to dashboard Cite

Background Detecting and counting wheat spikes is essential for predicting and measuring wheat yield. However, current wheat spike detection researches often directly apply the new network structure. There are few studies that can combine the prior knowledge of wheat spike size characteristics to design a suitable wheat spike detection model. It remains unclear whether the complex detection layers of the network play their intended role. Results This study proposes an interpretive analysis method for quantitatively evaluating the role of three-scale detection layers in a deep learning-based wheat spike detection model. The attention scores in each detection layer of the YOLOv5 network are calculated using the Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, which compares the prior labeled wheat spike bounding boxes with the attention areas of the network. By refining the multi-scale detection layers using the attention scores, a better wheat spike detection network is obtained. The experiments on the Global Wheat Head Detection (GWHD) dataset show that the large-scale detection layer performs poorly, while the medium-scale detection layer performs best among the three-scale detection layers. Consequently, the large-scale detection layer is removed, a micro-scale detection layer is added, and the feature extraction ability in the medium-scale detection layer is enhanced. The refined model increases the detection accuracy and reduces the network complexity by decreasing the network parameters. Conclusion The proposed interpretive analysis method to evaluate the contribution of different detection layers in the wheat spike detection network and provide a correct network improvement scheme. The findings of this study will offer a useful reference for future applications of deep network refinement in this field.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yucheng Cai

Revisiting Markovian Generative Architectures for Efficient Task-Oriented Dialog Systems

Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog Systems

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

Pruning the Unimportant or Redundant Filters? Synergy Makes Better

Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

Multiple Symmetric Lipomas: A Case Report

Learning-Based Autonomous Channel Access in the Presence of Hidden Terminals

Improving multi-scale detection layers in the deep learning network for wheat spike detection based on interpretive analysis

Contact Info

Product

Resources

About