Machine learning (ML) systems have to support various tensor operations. However, such ML systems were largely developed without asking: what are the foundational abstractions necessary for building machine learning systems? We believe that proper computational and implementation abstractions will allow for the construction of self-configuring, declarative ML systems, especially when the goal is to execute tensor operations in a distributed environment, or partitioned across multiple AI accelerators (ASICs). To this end, we first introduce a tensor relational algebra (TRA), which is expressive to encode any tensor operation that can be written in the Einstein notation. We consider how TRA expressions can be rewritten into an implementation algebra (IA) that enables effective implementation in a distributed environment, as well as how expressions in the IA can be optimized. Our empirical study shows that the optimized implementation provided by IA can reach or even out-perform carefully engineered HPC or ML systems for large scale tensor manipulations and ML workflows in distributed clusters.
We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix multiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.
A quite novel surface profilometry is proposed, which adopts a single optical grating projection setup with a small projection angle. The height distribution of the measured surface is retrieved by calculating the coordinates of the intersection between the projecting ray and the observing sight line. While the position of the observing point in the deformed fringe pattern can be detected by fringe optical flow. The relationship between optical flow and the height distribution of the tested surface is established. Simulations and some primary experiment results are completed to prove that the proposed method is feasible to measure a complex surface. The main advantage of the proposed method is obviously that the height distribution of the measured surface can be obtained directly without phase-to-height transformation.
In this work, a new method of measuring surface shape based on Brox optical flow estimation is presented. The measuring system consists of a projector, a measured object and a charge coupled device (CCD) camera. The grating fringes are projected onto the reference plane at a small angle. Two fringe images—before and after placing the measured object on the reference plane—are captured, respectively. Then, the optical flow field between two images is evaluated by using Brox optical flow algorithm. The theoretical relationship between the optical flow field and the height of the measured surface is established. According to the relationship, the height distribution of the measured object can be retrieved quickly without phase-to-height transformation. However, the calculated height distribution has been found to be deviated from its true value. To solve the problem, a correction scheme suitable for the optical flow method is proposed. By using the correction scheme, the accuracy of the calculated result is greatly improved. Simulations and experiments are completed to verify the feasibility of the proposed method and the accuracy of the correction method. The results show that the proposed method is more accurate than that of the Fourier transform method. Compared with traditional surface shape measurement, the optical flow method has some obvious advantages: (1) Only two frame images are required to recover the height distribution. (2) Relatively simple in measurement process and calculation so less time consuming. (3) Because the optical flow method contains time factor itself, it is more suitable for dynamic measurement. (4) No restrictions on projection pattern.
Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning partitions models and data over many machines, allowing model and dataset sizes beyond the available compute power and memory of a single machine. In practice though, distributed ML is challenging when distribution is mandatory, rather than chosen by the practitioner. In such scenarios, data could unavoidably be separated among workers due to limited memory capacity per worker or even because of data privacy issues. There, existing distributed methods will utterly fail due to dominant transfer costs across workers, or do not even apply. We propose a new approach to distributed fully connected neural network learning, called independent subnet training (IST), to handle these cases. In IST, the original network is decomposed into a set of narrow subnetworks with the same depth. These subnetworks are then trained locally before parameters are exchanged to produce new subnets and the training cycle repeats. Such a naturally "model parallel" approach limits memory usage by storing only a portion of network parameters on each device. Additionally, no requirements exist for sharing data between workers (i.e., subnet training is local and independent) and communication volume and frequency are reduced by decomposing the original network into independent subnets. These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution. We show experimentally that IST results in training times that are much lower than common distributed learning approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.