Zihao Zheng scite author profile

Research into the area of multiparty dialog has grown considerably over recent years. We present the Molweni dataset 1 , a machine reading comprehension (MRC) dataset with discourse structure built over multiparty dialog. Molweni's source samples from the Ubuntu Chat Corpus, including 10,000 dialogs comprising 88,303 utterances. We annotate 30,066 questions on this corpus, including both answerable and unanswerable questions. Molweni also uniquely contributes discourse dependency annotations in a modified Segmented Discourse Representation Theory (SDRT; ) style for all of its multiparty dialogs, contributing large-scale (78,245 annotated discourse relations) data to bear on the task of multiparty dialog discourse parsing. Our experiments show that Molweni is a challenging dataset for current MRC models: BERT-wwm, a current, strong SQuAD 2.0 performer, achieves only 67.7% F 1 on Molweni's questions, a 20+% significant drop as compared against its SQuAD 2.0 performance.

show abstract

Disordered Antigens and Epitope Overlap Between Anti–Citrullinated Protein Antibodies and Rheumatoid Factor in Rheumatoid Arthritis

Zheng

Mergaert

Fahmy

et al. 2019

Arthritis & Rheumatology

View full text Add to dashboard Cite

Objective. Anti-citrullinated protein antibodies (ACPAs) and rheumatoid factor (RF) are commonly present in rheumatoid arthritis (RA) without a clear rationale for their coexistence. Moreover, autoantibodies develop against proteins with different posttranslational modifications and native proteins without obvious unifying characteristics of the antigens. We undertook this study to broadly evaluate autoantibody binding in seronegative and seropositive RA to identify novel features of reactivity.Methods. An array was created using a total of 172,828 native peptides, citrulline-containing peptides, and homocitrulline-containing peptides derived primarily from proteins citrullinated in the rheumatoid joint. IgG and IgM binding to peptides were compared between cyclic citrullinated peptide (CCP)-positive RF+, CCP+RF−, CCP−RF+, and CCP−RF− serum from RA patients (n = 48) and controls (n = 12). IgG-bound and endogenously citrullinated peptides were analyzed for amino acid patterns and predictors of intrinsic disorder, i.e., unstable 3-dimensional structure. Binding to IgG-derived peptides was specifically evaluated. Enzyme-linked immunosorbent assay confirmed key results.Results. Broadly, CCP+RF+ patients had high citrulline-specific IgG binding to array peptides and CCP+RF− and CCP−RF+ patients had modest citrulline-specific IgG binding (median Z scores 3.02, 1.42, and 0.75, respectively; P < 0.0001). All RA groups had low homocitrulline-specific binding. CCP+RF+ patients had moderate IgG binding to native peptides (median Z score 2.38; P < 0.0001). The highest IgG binding was to citrulline-containing peptides, irrespective of protein identity, especially if citrulline was adjacent to glycine or serine, motifs also seen in endogenous citrullination in the rheumatoid joint. Highly bound peptides had multiple features predictive of disorder. IgG from CCP+RF+ patients targeted citrulline-containing IgG-derived peptides.Conclusion. Disordered antigens, which are frequently citrullinated, and common epitopes for ACPAs and RF are potentially unifying features for RA autoantibodies.

show abstract

DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension

Liu

Zheng

et al. 2021

View full text Add to dashboard Cite

MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing

Zheng

Mergaert

Ong

et al. 2021

View full text Add to dashboard Cite

Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility and power of current methods, we advance an empirical Bayesian tool that computes local false discovery rate statistics and local false sign rate statistics when provided with data on estimated effects and estimated standard errors from all the measured peptides. As the name suggests, the MixTwice tool involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Constrained optimization techniques provide for model fitting of mixing distributions under weak shape constraints (unimodality of the effect distribution). Numerical experiments show that MixTwice can accurately estimate generative parameters and powerfully identify non-null peptides. In a peptide array study of rheumatoid arthritis (RA), MixTwice recovers meaningful peptide markers in one case where the signal is weak, and has strong reproducibility properties in one case where the signal is strong. Availability MixTwice is available as an R software package https://cran.rproject. org/web/packages/MixTwice/ Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Rheumatoid Factor and Anti–Modified Protein Antibody Reactivities Converge on IgG Epitopes

Mergaert

Zheng

Denny

et al. 2022

Arthritis & Rheumatology

View full text Add to dashboard Cite

Objective. Rheumatoid arthritis (RA) patients often develop rheumatoid factors (RFs), antibodies that bind IgG Fc, and anti-modified protein antibodies (AMPAs), multireactive autoantibodies that commonly bind citrullinated, homocitrullinated, and acetylated antigens. Recently, antibodies that bind citrulline-containing IgG epitopes were discovered in RA, suggesting that additional undiscovered IgG epitopes could exist and that IgG could be a shared antigen for RFs and AMPAs. This study was undertaken to reveal new IgG epitopes in rheumatic disease and to determine if multireactive AMPAs bind IgG.Methods. Using sera from patients with RA, systemic lupus erythematosus, Sjögren's disease (SjD), or spondyloarthropathy, IgG binding to native, citrulline-containing, and homocitrulline-containing linear epitopes of the IgG constant region was evaluated by peptide array, with highly bound epitopes further evaluated by enzyme-linked immunosorbent assay (ELISA). Binding of monoclonal AMPAs to IgG-derived peptides and IgG Fc was also evaluated by ELISA.Results. Seropositive RA sera showed high IgG binding to multiple citrulline-and homocitrulline-containing IgGderived peptides, whereas anti-SSA+ sera from SjD patients showed consistent binding to a single linear native epitope of IgG in the hinge region. Monoclonal AMPAs bound citrulline-and homocitrulline-containing IgG peptides and modified IgG Fc.Conclusion. The repertoire of epitopes bound by AMPAs includes modified IgG epitopes, positioning IgG as a common antigen that connects the otherwise divergent reactivities of RFs and AMPAs.

show abstract

Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure

Li¹,

Liu²,

Kan³

et al. 2020

Preprint

View full text Add to dashboard Cite

Similarity Calculation via Passage-Level Event Connection Graph

2022

View full text Add to dashboard Cite

Recently, many information processing applications appear on the web on the demand of user requirement. Since text is one of the most popular data formats across the web, how to measure text similarity becomes the key challenge to many web applications. Web text is often used to record events, especially for news. One text often mentions multiple events, while only the core event decides its main topic. This core event should take the important position when measuring text similarity. For this reason, this paper constructs a passage-level event connection graph to model the relations among events mentioned in one text. This graph is composed of many subgraphs formed by triggers and arguments extracted sentence by sentence. The subgraphs are connected via the overlapping arguments. In term of centrality measurement, the core event can be revealed from the graph and utilized to measure text similarity. Moreover, two improvements based on vector tunning are provided to better model the relations among events. One is to find the triggers which are semantically similar. By linking them in the event connection graph, the graph can cover the relations among events more comprehensively. The other is to apply graph embedding to integrate the global information carried by the entire event connection graph into the core event to let text similarity be partially guided by the full-text content. As shown by experimental results, after measuring text similarity from a passage-level event representation perspective, our calculation acquires superior results than unsupervised methods and even comparable results with some supervised neuron-based methods. In addition, our calculation is unsupervised and can be applied in many domains free from the preparation of training data.

show abstract

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension

Li¹,

Liu²,

Qin³

et al. 2019

Preprint

View full text Add to dashboard Cite

In this paper, we propose the scheme for annotating large-scale multi-party chat dialogues for discourse parsing and machine comprehension. The main goal of this project is to help understand multi-party dialogues. Our dataset is based on the Ubuntu Chat Corpus. For each multi-party dialogue, we annotate the discourse structure and question-answer pairs for dialogues. As we know, this is the first large scale corpus for multi-party dialogues discourse parsing, and we firstly propose the task for multi-party dialogues machine reading comprehension.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zihao Zheng

Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure

Disordered Antigens and Epitope Overlap Between Anti–Citrullinated Protein Antibodies and Rheumatoid Factor in Rheumatoid Arthritis

DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension

MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing

Rheumatoid Factor and Anti–Modified Protein Antibody Reactivities Converge on IgG Epitopes

Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure

Similarity Calculation via Passage-Level Event Connection Graph

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension

Contact Info

Product

Resources

About