A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service. 1
Sparse-view CT reconstruction algorithms via total variation (TV) optimize the data iteratively on the basis of a noise- and artifact-reducing model, resulting in significant radiation dose reduction while maintaining image quality. However, the piecewise constant assumption of TV minimization often leads to the appearance of noticeable patchy artifacts in reconstructed images. To obviate this drawback, we present a penalized weighted least-squares (PWLS) scheme to retain the image quality by incorporating the new concept of total generalized variation (TGV) regularization. We refer to the proposed scheme as “PWLS-TGV” for simplicity. Specifically, TGV regularization utilizes higher order derivatives of the objective image, and the weighted least-squares term considers data-dependent variance estimation, which fully contribute to improving the image quality with sparse-view projection measurement. Subsequently, an alternating optimization algorithm was adopted to minimize the associative objective function. To evaluate the PWLS-TGV method, both qualitative and quantitative studies were conducted by using digital and physical phantoms. Experimental results show that the present PWLS-TGV method can achieve images with several noticeable gains over the original TV-based method in terms of accuracy and resolution properties.
Cerebral perfusion X-ray computed tomography (PCT) imaging, which detects and characterizes the ischemic penumbra, and assesses blood-brain barrier permeability with acute stroke or chronic cerebrovascular diseases, has been developed extensively over the past decades. However, due to its sequential scan protocol, the associated radiation dose has raised significant concerns to patients. Therefore, in this study we developed an iterative image reconstruction algorithm based on the maximum a posterior (MAP) principle to yield a clinically acceptable cerebral PCT image with lower milliampere seconds (mAs). To preserve the edges of the reconstructed image, an edge-preserving prior was designed using a normal-dose pre-contrast unenhanced scan. For simplicity, the present algorithm was termed as “MAP-ndiNLM”. Evaluations with the digital phantom and the simulated low-dose clinical brain PCT datasets clearly demonstrate that the MAP-ndiNLM method can achieve more significant gains than the existing FBP and MAP-Huber algorithms with better image noise reduction, low-contrast object detection and resolution preservation. More importantly, the MAP-ndiNLM method can yield more accurate kinetic enhanced details and diagnostic hemodynamic parameter maps than the MAP-Huber method.
This paper presents a robust Adaptive Fuzzy Neural Controller (AFNC) suitable for identification and control of a class of uncertain MIMO nonlinear systems. The proposed controller has the following salient features: (1) Selforganizing fuzzy neural structure, i.e. fuzzy control rules can be generated or deleted automatically; (2) Online learning ability of uncertain MIMO nonlinear systems; (3) Fast learning speed; (4) Adaptive control; (5) Robust control, where global stability of the system is established using the Lyapunov approach. Simulation example is included to confirm the validity and performance of the proposed control algorithm.
Peer review is a core element of the scientific process, particularly in conference-centered fields such as ML and NLP. However, only few studies have evaluated its properties empirically. Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018. We quantitatively and qualitatively assess the corpus. This includes a pilot study on paper weaknesses given by reviewers and on quality of author responses. We then focus on the role of the rebuttal phase, and propose a novel task to predict after-rebuttal (i.e., final) scores from initial reviews and author responses. Although author responses do have a marginal (and statistically significant) influence on the final scores, especially for borderline papers, our results suggest that a reviewer's final score is largely determined by her initial score and the distance to the other reviewers' initial scores. In this context, we discuss the conformity bias inherent to peer reviewing, a bias that has largely been overlooked in previous research. We hope our analyses will help better assess the usefulness of the rebuttal phase in NLP conferences.
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Referencefree evaluation holds the promise of web-scale comparison of MT systems. We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER. We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations, namely, (a) a semantic mismatch between representations of mutual translations and, more prominently, (b) the inability to punish "translationese", i.e., low-quality literal translations. We propose two partial remedies:(1) post-hoc re-alignment of the vector spaces and (2) coupling of semantic-similarity based metrics with target-side language modeling. In segment-level MT evaluation, our best metric surpasses reference-based BLEU by 5.7 correlation points. We make our MT evaluation code available. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.