Supplementary MaterialsTable S2

Supplementary MaterialsTable S2. tens of thousands of phosphorylation sites have been identified in human cells, approaches to determine the functional importance of each phosphosite are lacking. Here, we manually curated 112 datasets of phospho-enriched proteins generated from 104 different human cell types or tissues. We reanalyzed the 6,801 proteomics experiments that passed our quality control criteria, creating a reference phosphoproteome containing 119,809 human phosphosites. To prioritize functional sites, we used machine learning to identify 59 features indicative of proteomic, structural, regulatory or evolutionary relevance and integrate them into a single functional score. Our approach identifies regulatory phosphosites across different molecular mechanisms, processes and diseases, and reveals genetic susceptibilities at a genomic size. Many book regulatory phosphosites had been validated, including a job in neuronal differentiation for phosphosites in SMARCC2, a known person in the SWI/SNF chromatin remodeling organic. Protein phosphorylation can be a post-translational changes (PTM) mixed up in regulation of all biological processes and its own misregulation continues to be linked to many human being illnesses1,2. The entire extent of human being phosphorylation continues to be an open query under active analysis through mass spectrometry (MS) techniques3. Notably, an in-depth research of an individual cell line determined over 50,000 phosphopeptides and recommended that 75% from the proteome could be phosphorylated4. The aggregation of such research have resulted in the recognition of over 200,000 phosphosites in assets such as for example PhosphoSitePlus (PSP)5. Although analytical problems remain, the bottleneck in the scholarly study of phosphorylation is shifting towards its functional characterization6. Considering that phosphorylation could be conserved, it’s been recommended that not absolutely all phosphorylation is pertinent for fitness7C9. Consequently, prioritization strategies are necessary to facilitate the finding of extremely relevant phosphosites10. Different methodologies have been proposed, including identifying phosphosites that are highly conserved11,12, located at interface positions13C15, showing strong regulation, or combinations of such features10,16. Mutational studies have also been used to characterize relevant phosphorylations17, but cannot yet be applied to human phosphorylation at scale. Machine learning methods remain a poorly explored approach to study the functional relevance of phosphorylation. Here, we generated the largest human phosphoproteome dataset to date, identifying 119,809 human phosphosites. For each phosphosite, we compiled annotations covering 59 features and integrated them into a single score of functional relevance, named here the phosphosite functional score. This score can correctly identify regulatory phosphosites for a diverse set of mechanisms and predict the impact of deleterious mutations. Results Mass spectrometry-based proteomics map of the human phosphoproteome In order to create a comprehensive MS-based definition of the human phosphoproteome, we curated 112 human public phospho-enriched datasets derived from 104 different cell types and/or tissues from the PRIDE database18 (Supplementary Table 1). Using MaxQuant, we jointly re-analyzed the subset of NSC117079 6,801 human MS experiments passing the quality control criteria, corresponding to 575 NSC117079 days of accumulated instrument time19 (Methods). The joint analysis (deposited in PRIDE, dataset PXD012174) ensured an adequate control of the false discovery rate NSC117079 (FDR) estimated using a target-decoy strategy20 (Methods). FDR was estimated for correct matching to the peptide-spectrum match (PSM), protein and ARPC2 the presence of phosphosite modification(s) and kept at 1% (Methods). The modification localization possibility (also known as False Localisation Price) was also approximated, reflecting the NSC117079 self-confidence of pinpointing which residue bears the phosphorylation. Probabilities above 75% indicate extremely assured localizations (Course I sites). We determined 11.7 million phosphorylated peptide-spectrum fits (PSM-level FDR 1%), corresponding to 181,774 phosphopeptides spanning 203,930 phosphorylated serines, tyrosines or threonines. Of these, just 119,809 sites handed the 1% site-level FDR modification (59% accurate positive sites) with 90,443 categorized as Class.