10Predicting quantitative effects of gene regulatory elements (GREs) on gene expression is a longstanding 11 challenge in biology. Machine learning models for gene expression prediction may be able to address 12 this challenge, but they require experimental datasets that link large numbers of GREs to their 13 quantitative effect. However, current methods to generate such datasets experimentally are either 14 restricted to specific applications or limited by their technical complexity and error-proneness. Here we 15 introduce DNA-based phenotypic recording as a widely applicable and practical approach to generate 16 very large datasets linking GREs to quantitative functional readouts of high precision, temporal 17 resolution, and dynamic range, solely relying on sequencing. This is enabled by a novel DNA 18 architecture comprising a site-specific recombinase, a GRE that controls recombinase expression, and a 19 DNA substrate modifiable by the recombinase. Both GRE sequence and substrate state can be 20 determined in a single sequencing read, and the frequency of modified substrates amongst constructs 21 harbouring the same GRE is a quantitative, internally normalized readout of this GRE's effect on 22 recombinase expression. Using next-generation sequencing, the quantitative expression effect of 23 extremely large GRE sets can be assessed in parallel. As a proof of principle, we apply this approach to 24 record translation kinetics of more than 300,000 bacterial ribosome binding sites (RBSs), collecting over 25 2.7 million sequence-function pairs in a single experiment. Further, we generalize from these large-scale 26Recent progress in DNA sequencing and synthesis has facilitated reading and (re-)writing of the genetic 33 makeup of biological systems on a massive scale 1,2 . Despite this progress, the relationship between a 34 genetic sequence and its functional properties is poorly understood, and thus the question "what to write" 35 remains largely unanswered 3,4 . Since the number of possible sequences scales exponentially with their 36 length, the theoretical sequence space cannot be exhaustively explored by experiments, even for small 37GREs 5-7 . Therefore, innovative high-throughput (HTP) approaches are required that allow to collect a 38 quantitative functional readout for large numbers of genetic sequences 7,8 . At the same time, novel 39 methods are required that identify statistical patterns and dependencies in the resulting datasets to 40 generate models that accurately predict the properties of untested sequences. Deep learning maximizes 41 the benefit of data collection at large scale owing to its ability to capture complex, nonlinear 42 dependencies and to its computational scalability 9 , which led to several successful applications in 43 computational biology, from genomics to proteomics 10-15 . These methods promise to be able to model 44 sequence-function dependencies with minimal prior assumptions, provided that large experimental 45 training datasets that link sequence to quantitative measure ...