BackgroundAlthough the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing.ResultsHere we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors.ConclusionsThe experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.
Immunoglobulins (Igs) are crucial for the defense against pathogens, but they are also important in many clinical and biotechnological applications. Their characteristics, and ultimately their function, depend on their three-dimensional (3D) structure; however, the procedures to experimentally determine it are extremely laborious and demanding. Hence, the ability to gain insight into the structure of Igs at large relies on the availability of tools and algorithms for producing accurate Ig structural models based on their primary sequence alone. These models can then be used to determine structural and eventually functional similarities between different Igs. An example of such a task is the clustering of Igs based on their structure to determine meaningful common features such as the possible existence of common molecular targets (antigens). Several approaches have been proposed in order to achieve an optimal solution to this task yet their results were hindered mainly due to the lack of efficient clustering methods based on the similarity of 3D structure descriptors. Here, we present a novel workflow for robust Ig 3D modeling and automated clustering. We validated our protocol in chronic lymphocytic leukemia (CLL), where the clonotypic Igs are critically implicated in the disease ontogeny and evolution. Indeed, immunogenetic studies on the clonotypic Igs have strongly implicated antigen selection in the pathogenesis of CLL, while also providing robust prognostic information. In the present study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural similarity between the 3D models. The Fast Point Feature Histograms descriptors derived from the structurally aligned parts are used to compute a distance matrix, which is then used as input for the clustering procedure. Clustering analysis on the data is performed through the application of the agglomerative and density-based clustering approaches. The first method is unsupervised whereas the second belongs to the semi-supervised type, i.e. requires a predefined number of clusters. To evaluate the quality of the herein described workflow, we performed a supervised analysis of 125 Ig 3D models originating from 5 CLL stereotyped subsets i.e. subgroups sharing (quasi) identical IGs, namely subsets #1, #2, #4, #6, #8. The reasoning behind this choice was that (i) homologous Ig primary sequences can be reasonably anticipated to be reflected in overall similar 3D structures, hence providing a reference for evaluating the developed workflow; and, (ii) these subsets are well characterized at both the clinical and biological levels. Subset size distribution was as follows: subset #1 (IGHV clan I/IGKV1(D)-39), n=37; subset #2 (IGHV3-21/IGLV3-21), n=43; subset #4 (IGHV4-34/IGKV2-30), n=22; subset #6 (IGHV1-69/IGKV3-20), n=12; and, subset #8 (IGHV4-39/IGKV1(D)-39), n=11. Overall, we obtained a high level of clustering accuracy i.e. Ig 3D model clusters matched to a very high degree the subsets defined by Ig primary sequence similarity. In detail, 5 Ig 3D model clusters were produced by: (i) cluster 1 containing 37/37 (100%) subset #1 models and one (8.3%) subset #6 model, (ii) cluster 2 containing 43/43 (100%) subset #2 models, (iii) cluster 3 containing 21/22 (95.5%) subset #4 models, (iv) cluster 4 containing 11/12 (91.7%) #6 models, and, (v) cluster 5 containing 11/11 (100%) subset #8 models along with a single (4.5%) subset #4 model (subsets #4 and #8 concern IgG CLL, in itself a rarity for CLL). These findings support that the innovative workflow described here enables robust clustering of 3D models produced from Ig sequences from patients with CLL. Furthermore, they indicate that CLL classification based on stereotypy of Ig primary sequences is likely also verified at the Ig 3D structural level. Studies are ongoing for both addressing the minor discrepancies observed here and producing the unsupervised 3D clustering of the IGs from a large series of both stereotyped and non-stereotyped CLL cases. Disclosures Rosenquist: Gilead Sciences: Speakers Bureau. Stamatopoulos:Gilead: Consultancy, Honoraria, Research Funding; Abbvie: Honoraria, Other: Travel expenses; Janssen: Honoraria, Other: Travel expenses, Research Funding; Novartis: Honoraria, Research Funding.
One of the core tasks in order to perform fast and accurate retrieval results in a content-based search and retrieval 3D system is to determine an efficient and effective method for matching similarities between the 3D models. In this paper the "cascaded fusion of local descriptors" is proposed for efficient retrieval of classified 3D models, based on a 2D coloured logo retrieval methodological approach, suitably modified for the purpose of 3D search and retrieval tasks that are widely used in the augmented reality (AR)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.