In this paper we survey, consolidate, and present the state of the art in distributed database concurrency control. The heart of our analysts is a decomposition of the concurrency control problem into two major subproblems: read-write and write-write synchronization. We describe a series of synchromzation techniques for solving each subproblem and show how to combine these techniques into algorithms for solving the entire concurrency control problem. Such algorithms are called "concurrency control methods." We describe 48 principal methods, including all practical algorithms that have appeared m the literature plus several new ones. We concentrate on the structure and correctness of concurrency control algorithms. Issues of performance are given only secondary treatment.
We analyzed the whole genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors, and identify very rare SNVs. We also directly estimated a human intergeneration mutation rate of ∼1.1×10-8 per position per haploid genome. Both offspring in this family have two recessive disorders--Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the unique value of complete genome sequencing in families.
The availability of dense genetic linkage maps of mammalian genomes makes feasible a wide range of studies, including positional cloning of monogenic traits, genetic dissection of polygenic traits, construction of genome-wide physical maps, rapid marker-assisted construction of congenic strains, and evolutionary comparisons. We have been engaged for the past five years in a concerted effort to produce a dense genetic map of the laboratory mouse. Here we present the final report of this project. The map contains 7,377 genetic markers, consisting of 6,580 highly informative simple sequence length polymorphisms integrated with 797 restriction fragment length polymorphisms in mouse genes. The average spacing between markers is about 0.2 centimorgans or 400 kilobases.
A physical map has been constructed of the human genome containing 15,086 sequence-tagged sites (STSs), with an average spacing of 199 kilobases. The project involved assembly of a radiation hybrid map of the human genome containing 6193 loci and incorporated a genetic linkage map of the human genome containing 5264 loci. This information was combined with the results of STS-content screening of 10,850 loci against a yeast artificial chromosome library to produce an integrated map, anchored by the radiation hybrid and genetic maps. The map provides radiation hybrid coverage of 99 percent and physical coverage of 94 percent of the human genome. The map also represents an early step in an international project to generate a transcript map of the human genome, with more than 3235 expressed sequences localized. The STSs in the map provide a scaffold for initiating large-scale sequencing of the human genome.
We have constructed a genetic map of the mouse genome containing 4,006 simple sequence length polymorphisms (SSLPs). The map provides an average spacing of 0.35 centiMorgans (cM) between markers, corresponding to about 750 kb. Approximately 90% of the genome lies within 1.1 cM of a marker and 99% lies within 2.2 cM. The markers have an average polymorphism rate of 50% in crosses between laboratory strains. The markers are distributed in a relatively uniform fashion across the genome, although some deviations from randomness can be detected. In particular, there is a significant underrepresentation of markers on the X chromosome. This map represents the two-thirds point toward our goal of developing a mouse genetic map containing 6,000 SSLPs.
Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalanguage. Optimization begins by translating each Datalanguage query into a relational calculus form called an envelope, which is essentially an aggregate-free QUEL query. This paper is primarily concerned with the optimization of envelopes.Envelopes are processed in two phases. The first phase executes relational operations at various sites of the distributed database in order to delimit a subset of the database that contains all data relevant to the envelope. This subset is called a reduction of the database. The second phase transmits the reduction to one designated site, and the query is executed locally at that site.The critical optimization problem is to perform the reduction phase efficiently. Success depends on designing a good repertoire of operators to use during this phase, and an effective algorithm for deciding which of these operators to use in processing a given envelope against a given database. The principal reduction operator that we employ is called a sem@oin. In this paper we define the semijoin operator, explain why semijoin is an effective reduction operator, and present an algorithm that constructs a cost-effective program of semijoins, given an envelope and a database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.