The complete intron-exon organization of the gene encoding human perlecan (HSPG2), the major heparan sulfate proteoglycan of basement membranes, has been elucidated, and specific exons have been assigned to coding sequences for the modular domains of the protein core. The gene was composed of 94 exons, spanning >120 kbp of genomic DNA. The exon arrangement was analyzed vis-a-vis the modular structure of the perlecan, which harbors protein domains homologous to the low density lipoprotein receptor, laminin, epidermal growth factor, and neural cell adhesion molecule.The exon size and the intron phases were highly conserved when compared to the corresponding domains of the homologous genes, suggesting that most of this modular proteoglycan has evolved from a common ancestor by gene duplication or exon shuffling. The 5' flanking region revealed a structural organization characteristic of housekeeping and growth control-related genes. It lacked canonical TATA or CAAT boxes, but it contained several GC boxes with binding sites for the transcription factors SP1 and ETF. Consistent with the lack of a TATA element, the perlecan gene contained multiple transcription initiation sites distributed over 80 bp of genomic DNA. These results offer insights into the evolution of this chimeric molecule and provide the molecular basis for understanding the transcriptional control of this important gene.In the past few years proteoglycans have assumed a pivotal role in diverse areas of human biology not only because of their physicochemical attributes but also because of their involvement in regulating cellular growth and differentiation (1, 2). A key player is perlecan (HSPG2), the major heparan sulfate proteoglycan of basement membranes and extracellular matrices (2-7). Complete cDNA cloning of the human species (6, 7) predicts a protein core of -467 kDa excluding any posttranslational modification, thus making perlecan one of the largest gene products of the human body. It is now apparent that the heparan sulfate proteoglycan originally isolated from the Engelbreth-Holm-Swarm (EHS) tumor (8) is identical to that found in the pericellular matrices of human colon carcinoma cells (9, 10), human lung fibroblasts (11), bovine endothelial cells (12), and mouse mammary epithelial (13) cells. The protein core of perlecan has undoubtedly descended from the use of protein modules previously identified in other extracellular matrix and ligand molecules. It comprises five distinct domains with only the first domain, the heparan sulfate-binding region, unique to perlecan (6). The other four domains exhibit homology to the low density lipoprotein (LDL) receptor, the N-terminal region of laminin A and B short arms, the neural cell adhesion molecule (N-CAM), and the globular C terminus of the laminin A chain, respectively (5-7). Because of its complex molecular organization, strategic topology, and widespread distributionThe publication costs of this article were defrayed in part by page charge payment. This article must therefore be her...