Alterations in cancer genomes originate from mutational processes taking place throughout oncogenesis and cancer progression. We show that likeliness and entropy are two properties of somatic mutations crucial in cancer evolution, as cancer-driver mutations stand out, with respect to both of these properties, as being distinct from the bulk of passenger mutations. Our analysis can identify novel cancer driver genes and differentiate between gain and loss of function mutations.. CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. Since driver mutations are under positive selection 11 , their mutational pattern might diverge from that observed in the much more numerous passenger mutations. In order to test this notion, we made use of a dataset of cancer mutations derived from the one generated by Chang et al. 16 using several cancergenome resources. The full dataset comprises ~2 million single-nucleotides variants present in over 11,000 cancer exomes from patients who had one of 41 tumour types. In order to calculate the probability of non-synonymous mutations, we applied on this dataset a Markov model trained on synonymous mutations, as they are mostly neutral. We preferred a zero-order rather than a higher order model, as we are dealing with coding sequences where, by virtue of the triplet genetic code, higher order patterns are confounded by constraints related to the protein sequence. Having worked out the parameters of the transition matrix of our model based on synonymous mutations, we refer to this output as the Mutational Background Model (see methods, Fig.1a), as these mutations reflect the outcomes of errors in the replicative/repair pathways and/or exposure to mutagens during cancer onset and progression. Next, we used the background model to calculate for each group of non-synonymous . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/354324 doi: bioRxiv preprint first posted online Jun. 24, 2018; Mattiuz et al.3 mutations (GNSM: the set of all mutations hitting the same codon in a given transcript among all patients) two scores. (a) Mutational likeliness: this measures the probability for a given GNSM to result simply from the background model. A negative value of this parameter indicates a nucleotide change that does not conform to the overall mutational pattern of the tumour; in other words, a decreased likeliness of an individual mutation may reflect selective pressure on that mutation. (b) Mutational entropy: this score, calculated by applying the Shannon entropy to each mutated codon, measures the bias towards a specific amino acid that is encoded by a GNSM compared to the expectations from the background model (Fig.1a). Mutational entropy is m...