Saikat Mukherjee scite author profile

Fast and reliable detection of patients with severe and heterogeneous illnesses is a major goal of precision medicine1,2. Patients with leukaemia can be identified using machine learning on the basis of their blood transcriptomes3. However, there is an increasing divide between what is technically possible and what is allowed, because of privacy legislation4,5. Here, to facilitate the integration of any medical data from any data owner worldwide without violating privacy laws, we introduce Swarm Learning—a decentralized machine-learning approach that unites edge computing, blockchain-based peer-to-peer networking and coordination while maintaining confidentiality without the need for a central coordinator, thereby going beyond federated learning. To illustrate the feasibility of using Swarm Learning to develop disease classifiers using distributed data, we chose four use cases of heterogeneous diseases (COVID-19, tuberculosis, leukaemia and lung pathologies). With more than 16,400 blood transcriptomes derived from 127 clinical studies with non-uniform distributions of cases and controls and substantial study biases, as well as more than 95,000 chest X-ray images, we show that Swarm Learning classifiers outperform those developed at individual sites. In addition, Swarm Learning completely fulfils local confidentiality regulations by design. We believe that this approach will notably accelerate the introduction of precision medicine.

show abstract

Automatic Annotation of Content-Rich HTML Documents: Structural and Semantic Analysis

Mukherjee

Yang

Ramakrishnan

2003

View full text Add to dashboard Cite

Abstract. Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processing. This paper seeks to bridge this semantic gap by addressing the fundamental problem of automatically annotating HTML documents with semantic labels. Exploiting a key observation that semantically related items exhibit consistency in presentation style as well as spatial locality in template-based content-rich HTML documents, we have developed a novel framework for automatically partitioning such documents into semantic structures. Our framework tightly couples structural analysis of documents with semantic analysis incorporating domain ontologies and lexical databases such as WordNet. We present experimental evidence of the effectiveness of our techniques on a large collection of HTML documents from various news portals.

show abstract

Swarm Learning as a privacy-preserving machine learning approach for disease classification

Warnat-Herresthal

Schultze

Shastry

et al. 2020

Preprint

View full text Add to dashboard Cite

AbstractIdentification of patients with life-threatening diseases including leukemias or infections such as tuberculosis and COVID-19 is an important goal of precision medicine. We recently illustrated that leukemia patients are identified by machine learning (ML) based on their blood transcriptomes. However, there is an increasing divide between what is technically possible and what is allowed because of privacy legislation. To facilitate integration of any omics data from any data owner world-wide without violating privacy laws, we here introduce Swarm Learning (SL), a decentralized machine learning approach uniting edge computing, blockchain-based peer-to-peer networking and coordination as well as privacy protection without the need for a central coordinator thereby going beyond federated learning. Using more than 14,000 blood transcriptomes derived from over 100 individual studies with non-uniform distribution of cases and controls and significant study biases, we illustrate the feasibility of SL to develop disease classifiers based on distributed data for COVID-19, tuberculosis or leukemias that outperform those developed at individual sites. Still, SL completely protects local privacy regulations by design. We propose this approach to noticeably accelerate the introduction of precision medicine.

show abstract

Bootstrapping Semantic Annotation for Content-Rich HTML Documents

Mukherjee

Ramakrishnan

Singh

View full text Add to dashboard Cite

show abstract

Automatic discovery of semantic structures in HTML documents

Mukherjee

Yang

Tan

et al.

View full text Add to dashboard Cite

Semantic annotation of medical images

Seifert

Kelm

Moeller

et al. 2010

View full text Add to dashboard Cite

Diagnosis and treatment planning for patients can be significantly improved by comparing with clinical images of other patients with similar anatomical and pathological characteristics. This requires the images to be annotated using common vocabulary from clinical ontologies. Current approaches to such annotation are typically manual, consuming extensive clinician time, and cannot be scaled to large amounts of imaging data in hospitals. On the other hand, automated image analysis while being very scalable do not leverage standardized semantics and thus cannot be used across specific applications. In our work, we describe an automated and context-sensitive workflow based on an image parsing system complemented by an ontology-based context-sensitive annotation tool. An unique characteristic of our framework is that it brings together the diverse paradigms of machine learning based image analysis and ontology based modeling for accurate and scalable semantic image annotation.

show abstract

Logic-Based Approaches to Workflow Modeling and Verification

Mukherjee

Davulcu

Kifer

et al. 2004

View full text Add to dashboard Cite

Asymmetric Key-Value Split Pattern Assumption over MapReduce Behavioral Model

G¹,

Kiran²,

Mukherjee³

2014

IJCA

View full text Add to dashboard Cite

Actual Quantifiability is a concept in MapReduce that is based on two assumptions: (1) every mapper is cautious, i.e., does not exclude any reducer's key-value split pattern choice from consideration, and (2) every mapper respects the reducer's key-value split pattern preferences, i.e., deems one reducer's key-value split pattern choice to be infinitely more likely than another whenever it premises the reducer to prefer the one to the other. In this paper we provide a new approach for actual quantifiability, by assuming that mappers have asymmetric key-value split pattern about the reducer's key-value utilities. We show that, if the uncertainty of each mapper about the reducer's key-value utilities vanishes gradually in some regular manner, then the key-value split pattern choices it can quantifiably make under common conjecture in quantifiability are all actually quantifiable in the original MapReduce with no uncertainty about the reducer's utilities.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.