PEELing

Tutorial

Overview

Molecular compartmentalization is vital for cellular physiology. Spatially-resolved proteomics allows biologists to survey protein composition and dynamics with subcellular resolution. Here, we present PEELing (proteome extraction from enzymatic labeling data), an integrated package and user-friendly web service for analyzing spatially-resolved proteomics data. PEELing assesses data quality using curated or user-defined references, performs cutoff analysis to remove contaminants, connects to databases for functional annotation, and generates data visualizations—providing a streamlined and reproducible workflow to explore spatially-resolved proteomics data.

Besides this plug-n’-play web portal, PEELing is also available as a command line program at: https://github.com/JaneliaSciComp/peeling.

Manuscript preprint:  https://www.biorxiv.org/content/10.1101/2023.04.21.537871. Please note that this bioRxiv preprint will be updated as we add new functionalities to PEELing.

Input

In this tutorial, we use a published dataset (download) as an example: the cell-surface proteome of mouse Purkinje cells at postnatal day 15 (Shuster, Li et al., 2022—PMID:  36220098). As shown below in the experimental layout, we performed tandem mass tag (TMT) based quantitative mass spectrometry using 4 TMT channels: 2 for  labelled replicates  (129C and 128C; HRP+, H2O2+) and 2 for  non-labelled controls  (127C and 127N; HRP or H2O2 omitted).

place holder

Cellular Compartment

Choose a cellular compartment to use pre-set references or select “Other (Custom TP / FP lists)” to upload user-defined references.

If “Other (Custom TP / FP lists)” is selected, a true positive (TP) reference and a false positive (FP) reference should be uploaded as tab-separated value files (.tsv). Each .tsv file contains only one column with a header row and then UniProt accession numbers (e.g., A0A023I7E1).

TP list in an excel file.

The TP reference contains proteins known to be in the chosen cellular compartment. For instance, the cell-surface TP reference (download) is specified by the UniProt term: ((cc_scl_term:SL-0112) OR (cc_scl_term:SL-0243) OR (keyword:KW-0732) OR (cc_scl_term:SL-9906) OR (cc_scl_term:SL-9907)) AND (reviewed:true), which includes SwissProt-reviewed extracellular (SL-0112), secreted (SL-0243), signal peptide-containing (KW-0732), type II transmembrane (SL-9906), and type III transmembrane (SL-9907) proteins.

The FP reference contains proteins not localized to the chosen cellular compartment. For instance, the cell-surface FP reference (download) is specified by the UniProt term: (((cc_scl_term:SL-0091) OR (cc_scl_term:SL-0173) OR (cc_scl_term:SL-0191)) AND (reviewed:true)) NOT (((cc_scl_term:SL-0112) OR (cc_scl_term:SL-0243) OR (keyword:KW-0732) OR (cc_scl_term:SL-9906) OR (cc_scl_term:SL-9907)) AND (reviewed:true)), including SwissProt-reviewed cytosolic (SL-0091), mitochondrial (SL-0173), and nuclear (SL-0191) proteins that do not express on the cell surface. Some cell-surface proteins, such as the Notch family proteins, are also localized in intracellular compartments and are not considered false positives, and are thus removed from the FP reference.

Data File (.tsv)

Upload the data file (.tsv tab-separated value file only) in the following format:

The first column contains UniProt accession numbers of proteins (e.g., Q9JHU4). Although PEELing automatically connects to UniProt and maps accession numbers from user input, it is best to use updated databases during mass spectrometry data processing to minimize obsolete accession numbers.  Remaining columns contain labelled-to-control ratios of proteins.  The labelled-to-control ratio can be derived from any mass spectrometry quantification strategy: SILAC, TMT, iTRAQ, label-free, or others. In the provided example using TMT, 2 labelled replicates (129C and 128C) and 2 non-labelled controls (127C and 127N) produce 4 labelled-to-control ratios (129C:127C, 129C:127N, 128C:127C, 128C:127N).

The first row lists indexes of labelled-to-control ratios.  We recommend keeping them informative and concise since these indexes will be displayed in the output.

place holder

To convert a .xlsx Excel file to a .tsv file, you can open it in Excel, export it as “Text (Tab delimited) (*.txt)”, and then manually change the file extension from “.txt” to “.tsv”.

place holder

# Non-Labelled Controls and # Labelled Replicates

For spatially-resolved proteomics, it is necessary (or exceedingly recommended) to include non-labelled controls, which capture non-specific bead binders and other contaminants and thus enable the cutoff analysis. Please type in the number of non-labelled controls at “# Non-Labelled Controls” (e.g., 2 for the provided example). For “# Labelled Replicates,” please type in the number of labelled replicates (e.g., 2 for the provided example).

PEELing expects all possible labelled-to-control ratios from the .tsv input file. In the provided example, there are 2 labelled replicates (129C and 128C) and 2 non-labelled controls (127C and 127N). Thus, 4 ratios (129C:127C, 129C:127N, 128C:127C, 128C:127N) should be included in the .tsv file.

Tolerance (optional)

PEELing conducts cutoff analysis on all labelled-to-control ratios individually and, for the final proteome, retains only those proteins that pass the cutoff of all or multiple ratios, which further eliminates contaminants. The "Tolerance" setting is optional and enables users to control the stringency of the cutoff. By default, it is set to 0, meaning that a protein must pass the cutoff of all ratios to be included in the final proteome—in the provided example, a protein must pass cutoff of all 4 ratios to be included. If Tolerance is set to 1, a protein can be filtered out by 1 ratio but still be included in the final proteome—in the provided example, a protein passing cutoff of any 3 ratios is included. If Tolerance is set to n, a protein can fail the cutoff in up to n ratios and still be included in the final proteome. We recommend setting the tolerance value to a small number to better filter out contaminants.

Plot Format (optional)

Choose the output/downloadable plot format here.

place holder

Click “Submit” and take a sip of coffee!

Output

Failed ID Mapping

In order to obtain the best match with our automatically updated references, PEELing maps protein IDs in the user-submitted data file to the latest version using UniProt API. Occasionally the UniProt server may fail to map some of the IDs, possibly due to a temporarily high workload. In this case, PEELing will show the number of failed IDs at the top of the Results section. The user can resubmit until all IDs are successfully mapped. If all IDs are successfully mapped, this “Note” information will not appear.

place holder

Correlation Analysis

Correlation plots and coefficients for evaluating whether replicates are consistent with each other or exhibit overall discrepancy.

Note  The heatmap and the scatter plot of the default ratio pair are included in the downloadable results. To include the scatter plot of another ratio pair, please make sure to “Make” the plot before downloading. PEELing does not automatically make scatter plots for all ratio pairs.

place holder

Quality Checks and Cutoff Plots

For each labelled-to-control ratio on the left, selectable tab (e.g., 129C:127C below), two plots are shown:

The left plot shows true-positive rate (TPR, blue), false-positive rate (FPR, orange), and difference between TPR and FPR (TPR–FPR, green) plotted against the ratio-based ranking (x-axis). In a successful enrichment experiment, TPR (blue) increases quickly while FPR (orange) rises slowly. Consequently, “TPR–FPR” (green) initially increases and then declines, forming a single maximum peak—where the cutoff is placed.

The right plot is a receiver operating characteristic (ROC) curve, in which y-axis represents TPR while x-axis represents FPR. In a successful experiment, the ROC curve bends toward the left-upper corner as shown below. In the ROC curve, the cutoff point is marked as a red dot, along with its corresponding ranking position, protein identity, TPR, and FPR.

place holder

If the TPR–FPR value fluctuates up and down without forming a single peak or the ROC curve follows the diagonal line without bending towards the left-upper corner, it suggests suboptimal or failed enrichment. This could be due to an abundance of contaminants being enriched. In such cases, it is not recommended to use PEELing for further analysis. Instead, improved sample preparation or other filtering methods should be considered to address the issue.

Plots of all ratios are automatically included in the downloadable results.

Post-Cutoff Proteome

List of proteins passing cutoff analysis, as well as their total number.

Click each UniProt accession number to jump to the corresponding UniProt protein page.

place holder

Top Surface Proteins

List of the top 100 most enriched proteins based on each labelled-to-control ratio.

Click each UniProt accession number to jump to the corresponding UniProt protein page.

place holder

Protein Location and Function Annotation (Panther)

Select the corresponding organism and click Send to perform protein ontology and pathway analyses of the post-cutoff proteome through the Panther server. Top 10 terms based on false discovery rate (FDR) are listed for protein localization (Panther GO Slim Cellular Component), function (Panther GO Slim Biological Process), and pathway (Reactome).

Users can also click “Panther” in the title to go to Panther’s website for this analysis, by submitting the protein list from the “Post-Cutoff Proteome” section or the post-cutoff-proteome.txt file in the downloadable results.

place holder

Note   Sometimes the Panther server becomes unresponsive. If it takes too long for the results to come back, please resend the request later.

Download Results

All analysis results including plots are downloadable by clicking on the "Download Results" button.

place holder