Transcription factors (TFs) regulate the expression of genes involved in myriad cellular processes through sequence-specific interactions with DNA. In order to predict DNA regulatory elements and the TFs targeting them with greater accuracy, detailed knowledge of the binding preferences of TFs is needed. Protein binding microarray (PBM) technology permits rapid, high-throughput characterization of the in vitro DNA binding specificities of proteins 1 . Here, we present a novel, maximally compact, synthetic DNA sequence design that represents all possible DNA sequence variants of a given length k (i.e., all "k-mers") on a single, universal microarray. We constructed such all k-mer microarrays covering all 10 base pair (bp) binding sites by converting high-density single-stranded oligonucleotide arrays to double-stranded DNA arrays. Using these microarrays, we comprehensively determined the binding specificities over a full range of affinities for five TFs of diverse structural classes from yeast, worm, mouse, and human. Importantly, the unbiased coverage of all k-mers permits an interrogation of binding site preferences, including nucleotide interdependencies, at unprecedented resolution.
Our group has recently developed a compact, universal protein binding microarray (PBM) that can be used to determine the binding preferences of transcription factors (TFs). This design represents all possible sequence variants of a given length k (i.e., all k-mers) on a single array, allowing a complete characterization of the binding specificities of a given TF. Here, we present the mathematical foundations of this design based on de Bruijn sequences generated by linear feedback shift registers. We show that these sequences represent the maximum number of variants for any given set of array dimensions (i.e., number of spots and spot lengths), while also exhibiting desirable pseudo-randomness properties. Moreover, de Bruijn sequences can be selected that represent gapped sequence patterns, further increasing the coverage of the array. This design yields a powerful experimental platform that allows the binding preferences of TFs to be determined with unprecedented resolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.