What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data-and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.
Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets.We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an 1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that FlexiFaCT converges on the variety of objective functions, even with projections.
We investigate temporal resolution of documents, such as determining the date of publication of a story based on its text. We describe and evaluate a model that build histograms encoding the probability of different temporal periods for a document. We construct histograms based on the Kullback-Leibler Divergence between the language model for a test document and supervised language models for each interval. Initial results indicate this language modeling approach is effective for predicting the dates of publication of short stories, which contain few explicit mentions of years.
In this paper, we describe a novel methodology, grounded in techniques from the field of machine learning, for modeling emerging social structure as it develops in threaded discussion forums, with an eye towards application in the threaded discussions of massive open online courses (MOOCs). This modeling approach integrates two simpler, well established prior techniques, namely one related to social network structure and another related to thematic structure of text. As an illustrative application of the integrated technique's use and utility, we use it as a lens for exploring student dropout behavior in three different MOOCs. In particular, we use the model to identify twenty emerging subcommunities within the threaded discussions of each of the three MOOCs.We then use a survival model to measure the impact of participation in identified subcommunities on attrition along the way for students who have participated in the course discussion forums of the three courses. In each of three MOOCs we find evidence that participation in two to four subcommunities out of the twenty is associated with significantly higher or lower dropout rates than average. A qualitative post-hoc analysis illustrates how the learned models can be used as a lens for understanding the values and focus of discussions within the subcommunities, and in the illustrative example to think about the association between those and detected higher or lower dropout rates than average in the three courses. Our qualitative analysis demonstrates that the patterns that emerge make sense: It associates evidence of stronger expressed motivation to actively participate in the course as well as evidence of stronger cognitive engagement with the material in subcommunities associated with lower attrition, and the opposite in subcommunities associated with higher attrition.
For critical load bearing structures, it is often necessary to experimentally determine the load distribution on the structure so that accurate finite element models can be developed for stress and fatigue life predictions. An inverse problem approach is presented here for computing or calibrating the loads and boundary conditions acting on a structure. This enables the creation of more accurate finite element models, especially for structures that have complicated load distribution and compliant boundary conditions. The method presented here involves minimizing the least square error between the strains computed using the finite element model and the strains and displacements obtained experimentally. The nodal loads and the compliance at fixed boundaries are treated as the variables in the optimization problem. The compliance is modeled as springs attached at the nodes that are on the boundary where the structure is restrained. The method is verified by computing the loads and boundary conditions when displacements, maximum shear strain or both are available at large number of points on the surface of the structure. The experimental data set was generated using the luminescent photoelastic coating (LPC) technique.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.