While big data helps improve decision-making and model developments, it often runs into privacy concerns. An example would be retrieving drivers’ origin and destination information from smartphone navigation apps for developing a route choice behavior model. To conserve privacy, yet to take advantage of big data in navigation applications, the authors propose to apply a federated learning approach, which has shown promising application in predicting smartphone keyboard’s next word without sending text to the server. Additional benefits of using federated learning is to save on data communications, by sending model parameters instead of entire raw data, and to distribute the computational burden to each smartphone instead of to the main server. The results from real-world route navigation usage data from about 30,000 drivers over one year showed that the proposed federated learning approach was able to achieve very similar accuracy to the traditional centralized global model and yet assures privacy.
Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entailsbut is not equivalent toa problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATHs homologous superfamily (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, 'define' different homologous SFs. We evaluate and quantify pairwise 'distances' between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationshipsa view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.