Hagar Barak scite author profile

Hagar Barak

3Publications

19Citation Statements Received

13Citation Statements Given

How they've been cited

How they cite others

Affiliations

Princeton University

Publications

Order By: Most citations

A Relative-Entropy Algorithm for Genomic Fingerprinting Captures Host-Phage Similarities

et al. 2005

View full text Add to dashboard Cite

The degeneracy of codons allows a multitude of possible sequences to code for the same protein. Hidden within the particular choice of sequence for each organism are over 100 previously undiscovered biologically significant, short oligonucleotides (length, 2 to 7 nucleotides). We present an information-theoretic algorithm that finds these novel signals. Applying this algorithm to the 209 sequenced bacterial genomes in the NCBI database, we determine a set of oligonucleotides for each bacterium which uniquely characterizes the organism. Some of these signals have known biological functions, like restriction enzyme binding sites, but most are new. An accompanying scoring algorithm is introduced that accurately (92%) places sequences of 100 kb with their correct species among the choice of hundreds. This algorithm also does far better than previous methods at relating phage genomes to their bacterial hosts, suggesting that the lists of oligonucleotides are "genomic fingerprints" that encode information about the effects of the cellular environment on DNA sequence. Our approach provides a novel basis for phylogeny and is potentially ideally suited for classifying the short DNA fragments obtained by environmental shotgun sequencing. The methods developed here can be readily extended to other problems in bioinformatics.Genome analysis has uncovered many sequence differences among organisms. Both mononucleotide and dinucleotide content, as well as codon usage, vary widely among genomes (6). The size of even small bacterial genomes is statistically sufficient to determine a substantially richer set of sequence-based features describing each organism. However, many of these features have remained elusive, in the coding regions in particular, due to complicated constraints. Each (protein-coding) gene encodes a particular protein, which constrains its possible nucleotide sequence. Because the genetic code is degenerate, this constraint still allows for an enormous number of possible DNA sequences for each gene. Also, the overall codon usage in each gene is known to have strong biological consequences, possibly determined by isoaccepting tRNA abundances (5). In order to isolate new features within the coding regions, these constraints must be factored out.To solve this problem, we create a background genome that shares exactly the above-described constraints with the real genome but is otherwise random (4). The background genome encodes all the same proteins, and the codon usage is precisely matched for each gene. The hidden features for which we are searching are contained in the differences between the background genome and the real genome. The problem is reduced to extracting these differences.We have incorporated information theory into an algorithm to systematically compute the over-and underrepresented strings of nucleotides (words) in the real genome compared to those of the background (see Materials and Methods for details). A major difficulty in finding these words is that they are not independent. For example, if ...

show abstract

Catherine Hanley, Louis: The French Prince Who Invaded England. New Haven: Yale University Press, 2016. Pp. xv, 279; 22 color plates, 2 maps, and 2 tables. $40. ISBN: 978-0-300-21745-2.

Barak¹

2018

Speculum

View full text Add to dashboard Cite

The Managerial Revolution of the Thirteenth Century

Barak¹

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hagar Barak

A Relative-Entropy Algorithm for Genomic Fingerprinting Captures Host-Phage Similarities

Catherine Hanley, Louis: The French Prince Who Invaded England. New Haven: Yale University Press, 2016. Pp. xv, 279; 22 color plates, 2 maps, and 2 tables. $40. ISBN: 978-0-300-21745-2.

The Managerial Revolution of the Thirteenth Century

Contact Info

Product

Resources

About