Effective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: 1) absolute relevance judgments arising from explicit labeling of search results; and 2) relative relevance judgments extracted from user clickthroughs of search results or converted from the absolute relevance judgments. We propose a novel optimization framework emphasizing the use of relative relevance judgments. The main contribution is the development of an algorithm based on regression that can be applied to objective functions involving preference data, i.e., data indicating that a document is more relevant than another with respect to a query. Experimental results are carried out using data sets obtained from a commercial search engine. Our results show significant improvements of our proposed methods over some existing methods.
Modern healthcare systems now rely on advanced computing methods and technologies, such as Internet of Things (IoT) devices and clouds, to collect and analyze personal health data at an unprecedented scale and depth. Patients, doctors, healthcare providers, and researchers depend on analytical models derived from such data sources to remotely monitor patients, early-diagnose diseases, and find personalized treatments and medications. However, without appropriate privacy protection, conducting data analytics becomes a source of a privacy nightmare. In this article, we present the research challenges in developing practical privacypreserving analytics in healthcare information systems. The study is based on kHealth-a personalized digital healthcare information system that is being developed and tested for disease monitoring. We analyze the data and analytic requirements for the involved parties, identify the privacy assets, analyze existing privacy substrates, and discuss the potential tradeoff among privacy, efficiency, and model quality.
Data perturbation is a popular technique for privacypreserving data mining. The major challenge of data perturbation is balancing privacy protection and data quality, which are normally considered as a pair of contradictive factors. We propose that selectively preserving only the task/model specific information in perturbation would improve the balance. Geometric data perturbation, consisting of random rotation perturbation, random translation perturbation, and noise addition, aims at preserving the important geometric properties of a multidimensional dataset, while providing better privacy guarantee for data classification modeling. The preliminary study has shown that random geometric perturbation can well preserve model accuracy for several popular classification models, including kernel methods, linear classifiers, and SVM classifiers, while it also revealed some security concerns to random geometric perturbation. In this paper, we address some potential attacks to random geometric perturbation and design several methods to reduce the threat of these attacks. Experimental study shows that the enhanced geometric perturbation can provide satisfactory privacy guarantee while still well preserving model accuracy for the discussed data classification models.
Accurately measuring antibody repertoire sequence composition in a small amount of blood is challenging yet important for understanding repertoire responses to infection and vaccination. We develop molecular identifier clustering-based immune repertoire sequencing (MIDCIRS) and use it to study age-related antibody repertoire development and diversification before and during acute malaria in infants (< 12 months old) and toddlers (12–47 months old) with 4−8 ml of blood. Here, we show this accurate and high-coverage repertoire-sequencing method can use as few as 1000 naive B cells. Unexpectedly, we discover high levels of somatic hypermutation in infants as young as 3 months old. Antibody clonal lineage analysis reveals that somatic hypermutation levels are increased in both infants and toddlers upon infection, and memory B cells isolated from individuals who previously experienced malaria continue to induce somatic hypermutations upon malaria rechallenge. These results highlight the potential of antibody repertoire diversification in infants and toddlers.
The fabrication of functional tissues is important for tissue engineering, regenerative medicine and biological research. While current 3D bioprinting technologies are hard to precise arrangement of bioinks (composed of cells...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.