Samuel Kaski scite author profile

This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.

show abstract

Fundamentals and Recent Developments in Approximate Bayesian Computation

Lintusaari

et al. 2016

View full text Add to dashboard Cite

Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.]

show abstract

Plasmids Shaped the Recent Emergence of the Major Nosocomial Pathogen Enterococcus faecium

Arredondo-Alonso

Top

McNally

et al. 2020

mBio

109

140

View full text Add to dashboard Cite

Enterococcus faecium is a gut commensal of humans and animals but is also listed on the WHO global priority list of multidrug-resistant pathogens. Many of its antibiotic resistance traits reside on plasmids and have the potential to be disseminated by horizontal gene transfer. Here, we present the first comprehensive population-wide analysis of the pan-plasmidome of a clinically important bacterium, by whole-genome sequence analysis of 1,644 isolates from hospital, commensal, and animal sources of E. faecium. Long-read sequencing on a selection of isolates resulted in the completion of 305 plasmids that exhibited high levels of sequence modularity. We further investigated the entirety of all plasmids of each isolate (plasmidome) using a combination of short-read sequencing and machine-learning classifiers. Clustering of the plasmid sequences unraveled different E. faecium populations with a clear association with hospitalized patient isolates, suggesting different optimal configurations of plasmids in the hospital environment. The characterization of these populations allowed us to identify common mechanisms of plasmid stabilization such as toxin-antitoxin systems and genes exclusively present in particular plasmidome populations exemplified by copper resistance, phosphotransferase systems, or bacteriocin genes potentially involved in niche adaptation. Based on the distribution of k-mer distances between isolates, we concluded that plasmidomes rather than chromosomes are most informative for source specificity of E. faecium. IMPORTANCE Enterococcus faecium is one of the most frequent nosocomial pathogens of hospital-acquired infections. E. faecium has gained resistance against most commonly available antibiotics, most notably, against ampicillin, gentamicin, and vancomycin, which renders infections difficult to treat. Many antibiotic resistance traits, in particular, vancomycin resistance, can be encoded in autonomous and extrachromosomal elements called plasmids. These sequences can be disseminated to other isolates by horizontal gene transfer and confer novel mechanisms to source specificity. In our study, we elucidated the total plasmid content, referred to as the plasmidome, of 1,644 E. faecium isolates by using short- and long-read whole-genome technologies with the combination of a machine-learning classifier. This was fundamental to investigate the full collection of plasmid sequences present in our collection (pan-plasmidome) and to observe the potential transfer of plasmid sequences between E. faecium hosts. We observed that E. faecium isolates from hospitalized patients carried a larger number of plasmid sequences compared to that from other sources, and they elucidated different configurations of plasmidome populations in the hospital environment. We assessed the contribution of different genomic components and observed that plasmid sequences have the highest contribution to source specificity. Our study suggests that E. faecium plasmids are regulated by complex ecological constraints rather than physical interaction between hosts.

show abstract

Dimensionality reduction by random mapping: fast similarity computation for clustering

Kaski

220

144

View full text Add to dashboard Cite

Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study

2001

View full text Add to dashboard Cite

Abstract. Several measures have been proposed for comparing nonlinear projection methods but so far no comparisons have taken into account one of their most important properties, the trustworthiness of the resulting neighborhood or proximity relationships. One of the main uses of nonlinear mapping methods is to visualize multivariate data, and in such visualizations it is crucial that the visualized proximities can be trusted upon: If two data samples are close to each other on the display they should be close-by in the original space as well. A local measure of trustworthiness is proposed and it is shown for three data sets that neighborhood relationships visualized by the Self-Organizing Map and its variant, the Generative Topographic Mapping, are more trustworthy than visualizations produced by traditional multidimensional scalingbased nonlinear projection methods.

show abstract

Local multidimensional scaling

2006

View full text Add to dashboard Cite

WEBSOM – Self-organizing maps of document collections

et al. 1998

View full text Add to dashboard Cite

Group Factor Analysis

Klami

Virtanen

Leppäaho

et al. 2015

IEEE Trans. Neural Netw. Learning Syst.

102

View full text Add to dashboard Cite

Abstract-Factor analysis provides linear factors that describe relationships between individual variables of a data set. We extend this classical formulation into linear factors that describe relationships between groups of variables, where each group represents either a set of related variables or a data set. The model also naturally extends canonical correlation analysis to more than two sets, in a way that is more flexible than previous extensions. Our solution is formulated as variational inference of a latent variable model with structural sparsity, and it consists of two hierarchical levels: The higher level models the relationships between the groups, whereas the lower models the observed variables given the higher level. We show that the resulting solution solves the group factor analysis problem accurately, outperforming alternative factor analysis based solutions as well as more straightforward implementations of group factor analysis. The method is demonstrated on two life science data sets, one on brain activation and the other on systems biology, illustrating its applicability to the analysis of different types of highdimensional data sources.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Samuel Kaski

Self organization of a massive document collection

Fundamentals and Recent Developments in Approximate Bayesian Computation

Plasmids Shaped the Recent Emergence of the Major Nosocomial Pathogen Enterococcus faecium

Dimensionality reduction by random mapping: fast similarity computation for clustering

Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study

Local multidimensional scaling

WEBSOM – Self-organizing maps of document collections

Group Factor Analysis

Contact Info

Product

Resources

About