A PCA based sequence-to-vector (seq2vec) dimension reduction method for the text classification problem, called the tree-structured multi-stage principal component analysis (TMPCA) is presented in this paper. Theoretical analysis and applicability of TMPCA are demonstrated as an extension to our previous work (Su, Huang, & Kuo, in press). Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better performance than state-of-the-art fastText and other neural-network-based solutions.
A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussiandistributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer lin and output layer lout, the MLP of interest consists of two intermediate layers, l1 and l2. We propose a feedforward design that has three stages: 1) from lin to l1: half-space partitionings accomplished by multiple parallel LDAs, 2) from l1 to l2: subspace isolation where one Gaussian modality is represented by one neuron, 3) from l2 to lout: class-wise subspace mergence, where each Gaussian modality is connected to its target class. Through this process, we present an automatic MLP design that can specify the network architecture (i.e., the layer number and the neuron number at a layer) and all filter weights in a feedforward one-pass fashion. This design can be generalized to an arbitrary distribution by leveraging the Gaussian mixture model (GMM). Experiments are conducted to compare the performance of the traditional backpropagationbased MLP (BP-MLP) and the new feedforward MLP (FF-MLP).
Objective: Recent advances in light-sheet fluorescence microscopy (LSFM) enable 3dimensional (3-D) imaging of cardiac architecture and mechanics in toto. However, segmentation of the cardiac trabecular network to quantify cardiac injury remains a challenge.
Methods:We hereby employed "subspace approximation with augmented kernels (Saak) transform" for accurate and efficient quantification of the light-sheet image stacks following chemotherapy-treatment. We established a machine learning framework with augmented kernels based on the Karhunen-Loeve Transform (KLT) to preserve linearity and reversibility of rectification.
Results:The Saak transform-based machine learning enhances computational efficiency and obviates iterative optimization of cost function needed for neural networks, minimizing the number of training datasets for segmentation in our scenario. The integration of forward and inverse Saak transforms can also serve as a light-weight module to filter adversarial perturbations and reconstruct estimated images, salvaging robustness of existing classification methods. The accuracy and robustness of the Saak transform are evident following the tests of dice similarity coefficients and various adversary perturbation algorithms, respectively. The addition of edge detection further allows for quantifying the surface area to volume ratio (SVR) of the myocardium in response to chemotherapy-induced cardiac remodeling.
Conclusion:The combination of Saak transform, random forest, and edge detection augments segmentation efficiency by 20-fold as compared to manual processing.Significance: This new methodology establishes a robust framework for post light-sheet imaging processing, and creating a data-driven machine learning for automated quantification of cardiac ultra-structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.