Recently, the Log-Structured Merge-tree (LSMtree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state-of-the-art in LSM-based storage techniques. We provide a general taxonomy to classify the literature of LSM-trees, survey the efforts in detail, and discuss their strengths and trade-offs. We further survey several representative LSM-based open-source NoSQL systems and discuss some potential future research directions resulting from the survey.
In recent years, the Log Structured Merge (LSM) tree has been widely adopted by NoSQL and NewSQL systems for its superior write performance. Despite its popularity, however, most existing work has focused on LSM-based keyvalue stores with only a single LSM-tree; auxiliary structures, which are critical for supporting ad-hoc queries, have received much less attention. In this paper, we focus on efficient data ingestion and query processing for generalpurpose LSM-based storage systems. We first propose and evaluate a series of optimizations for efficient batched point lookups, significantly improving the range of applicability of LSM-based secondary indexes. We then present several new and efficient maintenance strategies for LSM-based storage systems. Finally, we have implemented and experimentally evaluated the proposed techniques in the context of the Apache AsterixDB system, and we present the results here.
PVLDB Reference Format:Chen Luo, Michael J. Carey. Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems (Extended Version). PVLDB, 12(5): xxxx-yyyy, 2019.
The Log-Structured Merge-Tree (LSM-tree) has been widely adopted for use in modern NoSQL systems for its superior write performance. Despite the popularity of LSM-trees, they have been criticized for suffering from write stalls and large performance variances due to the inherent mismatch between their fast in-memory writes and slow background I/O operations. In this paper, we use a simple yet effective two-phase experimental approach to evaluate write stalls for various LSM-tree designs. We further explore the design choices of LSM merge schedulers to minimize write stalls given a disk bandwidth budget. We have conducted extensive experiments in the context of the Apache AsterixDB system and we present the results here.
We report the association of inkjet and electrospray ionization MS to detect picoliter droplet, where the liquid volume and its position onto the tip can be precisely controlled to form ultrafine droplets for successive ionization of the analyte. Single rectangle pulse was applied to piezoelectric device on inkjet microchip for the ejection of each picoliter droplet, and it was controlled by a computer. The voltage and width of driving pulse for the inkjet were optimized to make reproducible ejection of the solvent with low viscosity. The volume of each droplet was about 600 pl, and a trigger of 10 droplets was selected as the best inlet mode taking relative standard derivation of the droplets into consideration. The target substrate used with high voltage to form ionization was graphite, after several attempts with some materials. High-speed camera was used to capture the breaking-up process of a droplet. The distance between the inkjet nozzle and the tip was set at 2 cm to avoid short circuit. The influences on the mass intensity of the diameter of the tip, the volume and the concentration of the sample were examined. The tip with a small diameter performed greater intensity, and the limit of detection decreased, whereas the small volume of liquid played high ionization efficiency. Linear regression in the range between 1 and 200 ppm for caffeine was conducted, where internal standard theobromine was used. Some real samples were also detected with the instrument.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.