We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc.
A multistage clustering and data processing method, SWIFT (detailed in a companion manuscript), has been developed to detect rare subpopulations in large, high-dimensional flow cytometry datasets. An iterative sampling procedure initially fits the data to multidimensional Gaussian distributions, then splitting and merging stages use a criterion of unimodality to optimize the detection of rare subpopulations, to converge on a consistent cluster number, and to describe non-Gaussian distributions. Probabilistic assignment of cells to clusters, visualization, and manipulation of clusters by their cluster medians, facilitate application of expert knowledge using standard flow cytometry programs. The dual problems of rigorously comparing similar complex samples, and enumerating absent or very rare cell subpopulations in negative controls, were solved by assigning cells in multiple samples to a cluster template derived from a single or combined sample. Comparison of antigen-stimulated and control human peripheral blood cell samples demonstrated that SWIFT could identify biologically significant subpopulations, such as rare cytokine-producing influenza-specific T cells. A sensitivity of better than one part per million was attained in very large samples. Results were highly consistent on biological replicates, yet the analysis was sensitive enough to show that multiple samples from the same subject were more similar than samples from different subjects. A companion manuscript (Part 1) details the algorithmic development of SWIFT. © 2014 The Authors. Published by Wiley Periodicals Inc.
We present a computational framework for automatically quantifying verbal and nonverbal behaviors in the context of job interviews. The proposed framework is trained by analyzing the videos of 138 interview sessions with 69 internship-seeking undergraduates at the Massachusetts Institute of Technology (MIT). Our automated analysis includes facial expressions (e.g., smiles, head gestures, facial tracking points), language (e.g., word counts, topic modeling), and prosodic information (e.g., pitch, intonation, and pauses) of the interviewees. The ground truth labels are derived by taking a weighted average over the ratings of 9 independent judges. Our framework can automatically predict the ratings for interview traits such as excitement, friendliness, and engagement with correlation coefficients of 0.75 or higher, and can quantify the relative importance of prosody, language, and facial expressions. By analyzing the relative feature weights learned by the regression models, our framework recommends to speak more fluently, use less filler words, speak as "we" (vs. "I"), use more unique words, and smile more. We also find that the students who were rated highly while answering the first interview question were also rated highly overall (i.e., first impression matters). Finally, our MIT Interview dataset will be made available to other researchers to further validate and expand our findings.
We address the problem of automatically aligning natural language sentences with corresponding video segments without any direct supervision. Most existing algorithms for integrating language with videos rely on handaligned parallel data, where each natural language sentence is manually aligned with its corresponding image or video segment. Recently, fully unsupervised alignment of text with video has been shown to be feasible using hierarchical generative models. In contrast to the previous generative models, we propose three latent-variable discriminative models for the unsupervised alignment task. The proposed discriminative models are capable of incorporating domain knowledge, by adding diverse and overlapping features. The results show that discriminative models outperform the generative models in terms of alignment accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.