A simple model-based approach to inferring and visualizing cancer mutation signatures
Yuichi Shiraishi, Georg Tremmel, Satoru Miyano, Matthew Stephens
Recent advances in sequencing technologies have enabled
the production of
massive amounts of data on somatic mutations from cancer genomes. These 
data have led to the detection of characteristic patterns of somatic 
mutations or ``mutation signatures'' at an unprecedented resolution, 
with the
potential for new insights into
the causes and mechanisms of tumorigenesis.
Here we present new methods for modelling, identifying and visualizing 
such mutation signatures. Our methods
greatly simplify mutation signature models compared with existing 
approaches, reducing the number of parameters by orders of magnitude 
even while increasing the contextual factors (e.g. the number of 
flanking bases) that are accounted for. This improves both sensitivity 
and robustness of inferred signatures. We also provide a new intuitive 
way to visualize the signatures, analogous to the use of sequence logos 
to visualize transcription factor binding sites. 
We illustrate our new method on somatic mutation data from urothelial 
carcinoma of the upper urinary tract, and a
larger dataset from 30 diverse cancer types.
The results illustrate several important features
of our methods, including the ability of our new visualization
tool to clearly highlight the key features of each signature,
the improved robustness of signature inferences from small sample sizes,
 and more detailed inference of signature characteristics such as strand
 biases and sequence context effects at the base two positions 5' to the
 mutated site.
The overall framework of our work is based on probabilistic models that 
are closely
connected with ``mixed-membership models'' which are widely used in 
population genetic admixture analysis, and in machine learning for 
document clustering. We argue that recognizing these relationships 
should help improve
understanding of mutation signature extraction problems,
and suggests ways to further improve the statistical methods.
Our methods are implemented in an R package 
pmsignature (https://github.com/friend1ws/pmsignature)
and a web application available at 
https://friend1ws.shinyapps.io/pmsignature_shiny/.