Gan Chun scite author profile

Gan Chun

5Publications

29Citation Statements Received

130Citation Statements Given

How they've been cited

How they cite others

128

Affiliations

University of Wisconsin–Madison

Publications

Order By: Most citations

Vocabulary Learning via Optimal Transport for Neural Machine Translation

Xu¹,

Zhou²,

Chun³

et al. 2021

View full text Add to dashboard Cite

The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether one can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of the role of vocabulary from the perspective of information theory. Motivated by this, we formulate the quest of vocabularization -finding the best token dictionary with a proper size -as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT outperforms widely-used vocabularies in diverse scenarios, including WMT-14 English-German and TED's 52 translation directions. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https: //github.com/Jingjing-NLP/VOLT.

show abstract

Probabilistic Graph Reasoning for Natural Proof Generation

Sun

Zhang²,

Chen

et al. 2021

View full text Add to dashboard Cite

In this paper, we investigate the problem of reasoning over natural language statements. Prior neural based approaches do not explicitly consider the inter-dependency among answers and their proofs. In this paper, we propose PROBR, a novel approach for joint answer prediction and proof generation. PROBR defines a joint probabilistic distribution over all possible proof graphs and answers via an induced graphical model. We then optimize the model using variational approximation on top of neural textual representation. Experiments on multiple datasets under diverse settings (fully supervised, few-shot and zero-shot evaluation) verify the effectiveness of PROBR, e.g., achieving 10%-30% improvement on QA accuracy in few/zero-shot evaluation. Our codes and models can be found at https://github.com/ changzhisun/PRobr/.

show abstract

Dyadic decomposition of convex domains of finite type and applications

Chun

Khan

2022

Math. Z.

View full text Add to dashboard Cite

Vocabulary Learning via Optimal Transport for Neural Machine Translation

Zhou²,

Chun

et al. 2020

Preprint

View full text Add to dashboard Cite

It is well accepted that the choice of token vocabulary largely affects the performance of machine translation. However, due to expensive trial costs, most studies only conduct simple trials with dominant approaches (e.g BPE) and commonly used vocabulary sizes. In this paper, we find an exciting relation between an information-theoretic feature and BLEU scores. With this observation, we formulate the quest of vocabularization -finding the best token dictionary with a proper size -as an optimal transport problem. We then propose VOLT, a simple and efficient vocabularization solution without the full and costly trial training. We evaluate our approach on multiple machine translation tasks, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. Empirical results show that VOLT beats widely-used vocabularies on diverse scenarios. For example, VOLT achieves 70% vocabulary size reduction and 0.6 BLEU gain on English-German translation. Also, one advantage of VOLT lies in its low resource consumption. Compared to naive BPE-search, VOLT reduces the search time from 288 GPU hours to 0.5 CPU hours.1 Here BPE-32K refers to a vocabulary constructed by Byte Pair Encoding (BPE) with 32K tokens.

show abstract

Global Newlander-Nirenberg theorem for domains with $C^2$ boundary

Chun¹,

Gong²

2020

Preprint

View full text Add to dashboard Cite

The Newlander-Nirenberg theorem says that a formally integrable complex structure is locally equivalent to the standard complex structure in the complex Euclidean space. In this paper, we consider two natural generalizations of the Newlander-Nirenberg theorem under the presence of a C 2 strictly pseudoconvex boundary. When a given formally integrable complex structure X is defined on the closure of a bounded strictly pseudoconvex domain with C 2 boundary D ⊂ C n , we show the existence of global holomorphic coordinate systems defined on D that transform X into the standard complex structure provided that X is sufficiently close to the standard complex structure. Moreover, we show that such closeness is stable under a small C 2 perturbation of ∂D. As a consequence, when a given formally integrable complex structure is defined on a one-sided neighborhood of some point in a C 2 real hypersurface M ⊂ C n , we prove the existence of local one-sided holomorphic coordinate systems provided that M is strictly pseudoconvex with respect to the given complex structure. We also obtain results when the structures are finite smooth.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gan Chun

Vocabulary Learning via Optimal Transport for Neural Machine Translation

Probabilistic Graph Reasoning for Natural Proof Generation

Dyadic decomposition of convex domains of finite type and applications

Vocabulary Learning via Optimal Transport for Neural Machine Translation

Global Newlander-Nirenberg theorem for domains with $C^2$ boundary

Contact Info

Product

Resources

About