“…As for protein identities (IDs), all IDs mapping to a peptide were preserved to keep full information about the splicing isoforms that map to a peptide. - The data were log 2 transformed, triplicate measurements were averaged for all HeLa cell lines (median value; the peptide had to be quantified in at least two injections), and only peptide full profiles (i.e., peptides quantified in all twelve HeLa cell lines) were kept for the downstream analysis.
- Peptides mapped to several genes were excluded and were further classified based on the criteria whether they map uniquely to one splicing isoform of a gene ( unique peptides) or to multiple splicing isoforms of the same gene ( shared peptides).
- Both unique and shared peptides were then collapsed to create a matrix of quantified proteins and protein groups, respectively (by summing; at least two peptides were required for a protein/protein group).
- Similar approach was applied for the pSILAC data. After k loss calculation, only peptide k loss full profiles were used with peptides mapping to multiple genes removed.
- The k loss values were log 2 transformed and collapsed to create a protein and protein group matrix (average; at least two peptides were required for a protein/protein group).
- The two data sets were then merged in a protein‐centric way; i.e., the protein k loss values were mapped to the protein expression matrix using the protein/protein group IDs.
- To map the mRNA abundance data to the assembled protein matrix, we exploited the fact that in our protein FASTA DB, each entry was annotated by a unique Ensemble transcript ID (ENST) and thus could be easily mapped to its corresponding transcript abundance in the RNA‐Seq data (i.e., FPKM).
- We first mapped the unique proteins (UQ) with the abundance of the corresponding transcripts.
- For the protein groups quantified based on shared peptides quantities, we applied a sample‐specific protein inference similar to a strategy described previously (Liu et al , 2017b). For each splicing isoform included in a shared protein group, we retrieved its average abundance on mRNA level (the average was calculated from all HeLa variants).
- The major (i.e., the most abundant) splicing isoform on mRNA level was then selected as the best representative transcript ID for the whole protein group ( shared major, SM ) and was used for transcript abundance mapping.
…”