Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community‐wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non‐physiological complexes. The non‐physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein‐protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non‐physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross‐validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non‐physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Comparing two sets of multivariate samples is a central problem in data analysis. From a statistical standpoint, the simplest way to perform such a comparison is to resort to a non-parametric two-sample test (TST), which checks whether the two sets can be seen as i.i.d. samples of an identical unknown distribution (the null hypothesis). If the null is rejected, one wishes to identify regions accounting for this difference. This paper presents a two-stage method providing feedback on this difference, based upon a combination of statistical learning (regression) and computational topology methods. Consider two populations, each given as a point cloud in R d. In the first step, we assign a label to each set and we compute, for each sample point, a discrepancy measure based on comparing an estimate of the conditional probability distribution of the label given a position versus the global unconditional label distribution. In the second step, we study the height function defined at each point by the aforementioned estimated discrepancy. Topological persistence is used to identify persistent local minima of this height function, their basins defining regions of points with high discrepancy and in spatial proximity. Experiments are reported both on synthetic and real data (satellite images and handwritten digit images), ranging in dimension from d = 2 to d = 784, illustrating the ability of our method to localize discrepancies. On a general perspective, the ability to provide feedback downstream TST may prove of ubiquitous interest in exploratory statistics and data science.
Rheumatoid arthritis (RA) is associated with abnormal B cell-functions implicatingantibody-dependent and -independent mechanisms. B cells have emerged as important cytokine-producing cells, and cytokines are well-known drivers of RA pathogenesis. To identify novel cytokine-mediated B-cell functions in RA, we comprehensively analysed the capacity of B cells from RA patients with an inadequate response to disease modifying anti-rheumatic drugs to produce cytokines in comparison with healthy donors (HD). RA B cells displayed a constitutively higher production of the pathogenic factors interleukin (IL)-8 and Gro-α, while their production of several cytokines upon activation via the B cell receptor for antigen (BCR) was broadly suppressed, including a loss of the expression of the protective factor TRAIL, compared to HD B cells. These defects were partly erased after treatment with the IL-6-signalling inhibitor tocilizumab, indicating that abnormal IL-6 signalling contributed to these abnormalities. Noteworthy, the clinical response of individual patients to tocilizumab therapy could be predicted using the amounts of MIP-1β and β-NGF produced by these patients' B cells before treatment. Taken together, our study highlights hitherto unknown abnormal B-cell functions in RA patients, which are related to the unbalanced cytokine network, and are potentially relevant for RA pathogenesis and treatment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.