Estimating the Support of a High-Dimensional Distribution

Schölkopf, Bernhard; Platt, John; Shawe‐Taylor, John; Smola, Alex; Williamson, Robert C.

doi:10.1162/089976601750264965

Cited by 4,668 publications

(3,272 citation statements)

References 24 publications

(33 reference statements)

Supporting

Mentioning

3,208

Contrasting

Unclassified

Order By: Relevance

“…[28] TDDs should consider correlation between any pair of descriptors because considering correlation is a way to avoid detecting unrealistic x coordinates.…”

Section: Training Data Density (Tdds)mentioning

confidence: 99%

“…[28] Unlike the two-class problem which consists of positive and negative classes, the one-class is the positive class, meaning that data do not have labels to be classified. In OCSVM algorithm, a SVM model is constructed between training dataset and the origin, aiming at constructing a discrimination model between them.…”

Section: One-class Support Vector Machine (Ocsvm)mentioning

confidence: 99%

See 1 more Smart Citation

Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space

Miyao

Funatsu

2017

Mol. Inf.

View full text Add to dashboard Cite

When chemical structures are searched based on descriptor values, or descriptors are interpreted based on values, it is important that corresponding chemical structures actually exist. In order to consider the existence of chemical structures located in a specific region in the chemical space, we propose to search them inside training data domains (TDDs), which are dense areas of a training dataset in the chemical space. We investigated TDDs' features using diverse and local datasets, assuming that GDB11 is the chemical universe. These two analyses showed that considering TDDs gives higher chance of finding chemical structures than a random search-based method, and that novel chemical structures actually exist inside TDDs. In addition to those findings, we tested the hypothesis that chemical structures were distributed on the limited areas of chemical space. This hypothesis was confirmed by the fact that distances among chemical structures in several descriptor spaces were much shorter than those among randomly generated coordinates in the training data range

show abstract

“…[28] TDDs should consider correlation between any pair of descriptors because considering correlation is a way to avoid detecting unrealistic x coordinates.…”

Section: Training Data Density (Tdds)mentioning

confidence: 99%

Section: One-class Support Vector Machine (Ocsvm)mentioning

confidence: 99%

Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space

Miyao

Funatsu

2017

Mol. Inf.

View full text Add to dashboard Cite

show abstract

“…Implicitly, the absence of an edge represents the conditional independence of the according variables. Several algorithms to infer GMs from purely binary data are publicly available as R packages (Wainwright et al ., 2006; Höfling & Tibshirani, 2009; Guo et al ., 2010; Ravikumar et al ., 2010). Their counterparts for purely continuous data are Gaussian graphical models (GGMs), which use partial correlations to infer graphs.…”

Section: From Omics To Systems Biologymentioning

confidence: 99%

Integration of ‘omics’ data in aging research: from biomarkers to systems biology

et al. 2015

View full text Add to dashboard Cite

SummaryAge is the strongest risk factor for many diseases including neurodegenerative disorders, coronary heart disease, type 2 diabetes and cancer. Due to increasing life expectancy and low birth rates, the incidence of age‐related diseases is increasing in industrialized countries. Therefore, understanding the relationship between diseases and aging and facilitating healthy aging are major goals in medical research. In the last decades, the dimension of biological data has drastically increased with high‐throughput technologies now measuring thousands of (epi) genetic, expression and metabolic variables. The most common and so far successful approach to the analysis of these data is the so‐called reductionist approach. It consists of separately testing each variable for association with the phenotype of interest such as age or age‐related disease. However, a large portion of the observed phenotypic variance remains unexplained and a comprehensive understanding of most complex phenotypes is lacking. Systems biology aims to integrate data from different experiments to gain an understanding of the system as a whole rather than focusing on individual factors. It thus allows deeper insights into the mechanisms of complex traits, which are caused by the joint influence of several, interacting changes in the biological system. In this review, we look at the current progress of applying omics technologies to identify biomarkers of aging. We then survey existing systems biology approaches that allow for an integration of different types of data and highlight the need for further developments in this area to improve epidemiologic investigations.

show abstract

“…Figure 8 plots several thousand pairs of these four major vital signs, drawn from a random subset of patients. To highlight major clusters of typical vital sign values, we used a one-class SVM [39] with a radial basis function kernel (ν = 0.5, γ = 0.1) and set our outliers fraction to %0.5. This simple plotting alone reveals several relationships that could not be captured by simple threshold alarms.…”

Section: B Multivariate Analysismentioning

confidence: 99%

Clinician-in-the-Loop Annotation of ICU Bedside Alarm Data

Roederer

Dimartino

Gutsche

et al. 2016

2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

View full text Add to dashboard Cite

In this work, we describe the state of clinical monitoring in the intensive care unit and operating room, where patients are at their most fragile and thus monitoring is most heightened. We describe how large amounts of data generated by monitoring patients' physiologic signals, along with the ubiquitous aspecific threshold alarms in use today, cause dangerous alarm fatigue for medical caregivers. In order to build more specific, more useful alarms, we gathered a novel data set that would allow us to assess the number, types, and utility of alarms currently in use in the intensive care unit. To do this, we developed a system to collect physiologic monitor data, alarms, and annotations of those alarms provided electronically by clinicians. We describe the collection process for this novel data set and provide a preliminary description of the data. Abstract-In this work, we describe the state of clinical monitoring in the intensive care unit and operating room, where patients are at their most fragile and thus monitoring is most heightened. We describe how large amounts of data generated by monitoring patients' physiologic signals, along with the ubiquitous aspecific threshold alarms in use today, cause dangerous alarm fatigue for medical caregivers. In order to build more specific, more useful alarms, we gathered a novel data set that would allow us to assess the number, types, and utility of alarms currently in use in the intensive care unit. To do this, we developed a system to collect physiologic monitor data, alarms, and annotations of those alarms provided electronically by clinicians. We describe the collection process for this novel data set and provide a preliminary description of the data.

show abstract

Estimating the Support of a High-Dimensional Distribution

Cited by 4,668 publications

References 24 publications

Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space

Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space

Integration of ‘omics’ data in aging research: from biomarkers to systems biology

Clinician-in-the-Loop Annotation of ICU Bedside Alarm Data

Contact Info

Product

Resources

About