Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design – including the separation into two phases, the form of the programming language, and the properties of the aggregators – exploits the parallelism inherent in having data and computation distributed across many machines.
We describe a design for generics in Go inspired by previous work on Featherweight Java by Igarashi, Pierce, and Wadler. Whereas subtyping in Java is nominal, in Go it is structural, and whereas generics in Java are defined via erasure, in Go we use monomorphisation. Although monomorphisation is widely used, we are one of the first to formalise it. Our design also supports a solution to The Expression Problem.
Released as open source in November 2009, Go has become the foundation for critical infrastructure at every major cloud provider. Its creators look back on how Go got here and why it has stuck around.
Very large data sets -telephone call records, network logs, high-resolution satellite images, or web document repositories -are not easily analyzed using traditional database techniques. They may be simply too large, grow too fast, or may not fit well in a database schema. They tend to span multiple disks and machines. On the other hand, these large data sets often have a flat and regular structure that permits distributed filtering and aggregation. We present a system and language for such analyses*. A filtering phase, in which a query is expressed using the procedural programming language Sawzall, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The language constructs and execution model of Sawzall have been devised to enable parallel execution without the need for complex dependency analysis. Even with our fairly traditional implementation of the Sawzall execution engine we observe nearly perfect scalability as we add more machines.*Joint work with Sean Dorward, Rob Pike, and Sean Quinlan.
Abstract. This paper presents a simple and efficient method for instruction scheduling within basic blocks. An implementation prowed to be extremely small while producing results comparable to other more complicated techniques. The algorithm is of quadratic complexity in the number of instructions but a linear run-time is achieved in practice. Because no (code) look-ahead is needed, the algorithm is even suitable for one-pass compilers.
We describe a design for generics in Go inspired by previous work on Featherweight Java by Igarashi, Pierce, and Wadler. Whereas subtyping in Java is nominal, in Go it is structural, and whereas generics in Java are defined via erasure, in Go we use monomorphisation. Although monomorphisation is widely used, we are one of the first to formalise it. Our design also supports a solution to The Expression Problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.