Identifying all essential genomic components is critical for the assembly of minimal artificial life. In the genome-reduced bacterium Mycoplasma pneumoniae, we found that small ORFs (smORFs; < 100 residues), accounting for 10% of all ORFs, are the most frequently essential genomic components (53%), followed by conventional ORFs (49%). Essentiality of smORFs may be explained by their function as members of protein and/or DNA/RNA complexes. In larger proteins, essentiality applied to individual domains and not entire proteins, a notion we could confirm by expression of truncated domains. The fraction of essential non-coding RNAs (ncRNAs) non-overlapping with essential genes is 5% higher than of non-transcribed regions (0.9%), pointing to the important functions of the former. We found that the minimal essential genome is comprised of 33% (269,410 bp) of the M. pneumoniae genome. Our data highlight an unexpected hidden layer of smORFs with essential functions, as well as non-coding regions, thus changing the focus when aiming to define the minimal essential genome.
A new genome-scale metabolic reconstruction of M. pneumonia is used in combination with external metabolite measurement and protein abundance measurements to quantitatively explore the energy metabolism of this genome-reduce human pathogen.
Identification of small open reading frames (sm
ORF
s) encoding small proteins (≤ 100 amino acids;
SEP
s) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (Ran
SEP
s) with “‐omics” approaches, we were able to describe 109 bacterial small
ORF
omes. Predictions were first validated by performing an exhaustive search of
SEP
s present in
Mycoplasma pneumoniae
proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, Ran
SEP
s predictions were validated and compared with other tools using proteomic datasets from different bacterial species and
SEP
s from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as
SEP
s. Integration of Ran
SEP
s predictions with transcriptomics data showed that some annotated non‐coding
RNA
s could in fact encode for
SEP
s. A functional study of
SEP
s highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the
SEP
s included a N‐terminus predicted signal peptide. We envision Ran
SEP
s as a tool to unmask the hidden universe of small bacterial proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.