Ahmad_Improving mass..

Improving mass spectrometry data searching workflow to maximize protein Identifications
Shadab Ahmad1, Amol Prakash1, David Sarracino1, Bryan Krastins1, MingMing Ning2, Barbara Frewen1, Scott Peterman1, Gregory
Byram1, Maryann S. Vogelsang1, Gouri Vadali1, Jennifer Sutton1, Mary F. Lopez1
1Thermo Fisher Scientific, BRIMS (Biomarker Research in Mass Spectrometry), Cambridge, MA
2Massachusetts General Hospital, Boston, MA
Purpose: Development of a comprehensive protein identification
workflow that helps identify more high confidence peptide/protein
IDs including post translational modifications than traditional
Methods: Use of combinations of multiple search engines (e.g.,
SEQUEST and Mascot) where combinations of PTMs were
judiciously chosen for each node based on uniprotKB-relative PTM
abundances from high-quality, manually curated, proteome-wide
Results: Tremendous enhancement in the high confident
percolator validated peptide/protein identification compared to
standard SEQUEST and MASCOT workflow.
Mass spectrometry has become an established method for protein
identification and characterization in recent years. The number of
protein identification from complex biological samples depends on
many factors, ranging from data acquisition strategy to MS/MS
data searching methods. Unfortunately, only a fraction of spectra
generated have confident peptide matches for any complex
biological sample. There are several factors that are being
overlooked by many users in data searching strategy including
appropriate combination of post translational modifications (PTMs),
coding SNP2, isoforms of proteins, iterative searching etc. that can
possibly help identify these unmatched spectrum. We herein
develop a comprehensive protein identification workflow that helps
identify higher number of high confidence peptide/protein IDs and
also identify multiple PTMs and partially cleaved peptide in a single
with Thermo QExactive benchtop mass spectrometer, with top 15
data dependent MS/MS using HCD fragmentation.
Data Analysis
The acquired data was searched with proteome discoverer 1.4
(Thermo Fisher Scientific) using comprehensive workflow and also
with general SEQUEST workflow with standard PTMs (oxidation at
methionine as dynamic modification and alkylation as static
modification) coupled with percolator validation (General Search).
Peptide Identification
We compare the results from our comprehensive searching
workflow with general search. We found that on average, the
number of high confidence peptides identification (FDR≤0.01)
increased by approximately 70% with our comprehensive workflow
as compared to general searches, whereas the number of medium
confidence peptides identification (FDR≤0.05) increment was twice
as compared to general searches (figure2).
Moreover the comprehensive workflow identified several high
confident peptides with multiple PTMs which reveal the importance
of right combination of PTM in a search node (table1).
Table1. Examples of peptide containing multiple PTMs from
Comprehensive search.
R1(ADP-Ribosyl); G7(Myristoyl);
S2(Phospho); S4(Phospho); K8(Methyl);
Y4(Phospho); A6(Acetyl)
M9(Oxidation); C10(Carboxymethyl);
F13(Amidated); E17(Carboxy); F20(Amidated)
We further investigate the matched and unmatched spectra while
using general search and our comprehensive search. We found that
the percentage of matched spectra improves significantly when
using comprehensive search workflow (figure 4, table2).
FIGURE 4. Comprehensive workflow increases number of matched
FIGURE 2. Comprehensive workflow increases number of
peptide identification
Comprehensive workflow development
We developed a comprehensive MS/MS searching workflow within
Proteome Discoverer using a combination of multiple search
engines (Figure1) in an iterative fashion to maximise number of
protein/peptide identification by considering the most frequently
found PTMs1; sequence-isoforms of proteins; and partially cleaved
peptide etc. Effect of various factors on peptide identification were
explored and implemented in the process that include protein
isoforms, missed cleavage sites, semi tryptic digestion and most
importantly appropriate combination of PTMs in each search node.
The combination of PTMs were judiciously chosen based on
uniprotKB-relative abundances of each PTM found experimentally
and putatively, from high-quality, manually curated, proteome-wide
data1. The workflows were tested on plasma and urine samples
acquired on a hybrid Orbitrap mass spectrometer.
Table2. Comparative table for matched spectra
FIGURE 1. Structure of Comprehensive workflow
The comprehensive workflow found to increase the number of high
confident protein (FDR≤0.01) by 63% and the high confident
grouped protein by 44% with respect to the general search.
Moreover the comprehensive workflow increases the high confident
group proteins (with at least two high confident peptides for every
protein in the group) by 15% (figure3).
Matched Spectra
Matched Spectra
Spectra General Comprehensive Spectra General Comprehensive
27.9 %
43.5 %
26.0 %
38.5 %
15.5 %
34.4 %
14.5 %
30.1 %
19.9 %
32.8 %
19.1 %
30.1 %
8.2 %
18.1 %
8.0 %
16.8 %
FIGURE 3. Comprehensive workflow increases number of
grouped protein identification (with at least two peptide hits
per protein)
 Comprehensive workflow identified approximately 70% more high
confident peptide as compare to general search strategy.
 The comprehensive workflow helped increase the number of high
confident protein identification and high confident grouped protein
identification by approximately 63% and 44% respectively as
compared to general search approach.
 Comprehensive workflow identifies large number of high
confident peptides with multiple PTMs.
Sample Preparation
In order to evaluate the performance of the comprehensive workflow
we took four human samples from two different sources (a) Urine
and (b) Plasma (three samples). Human urine and plasma samples
were collected with full consent and approval. The samples were
subjected to reduction and alkylation followed by digestion with
Liquid Chromatography and Mass Spectrometry
The digested samples were separated with C18 column with 5-45%
acetonitrile gradient in 0.1% formic acid through nano-LC system.
The urine sample (sample no. 1) and a plasma sample (sample no.
2) were run for 140 minutes and 90 minutes respectively and the
data were acquired with LTQ Orbitrap Velos MS with top 11 and top
10 data dependent MS/MS respectively using CID fragmentation .
Another two plasma samples (sample no.3 and 4) were run for 250
minutes and 240 minutes respectively and the data were acquired
 The percentage of matched spectra improves significantly when
using comprehensive search workflow.
1. Khoury GA, Baliban RC, Floudas CA. Proteome-wide posttranslational modification statistics: frequency analysis and
curation of the swiss-prot database. Sci Rep. 2011 Sep 13;1.
2. Schandorff S, Olsen JV, Bunkenborg J, Blagoev B, Zhang Y,
Andersen JS, Mann M. A mass spectrometry-friendly database
for cSNP identification. Nat Methods. 2007Jun;4(6):465-6.
SEQUEST and Percolator are registered trademarks of University of Washington. All other trademarks are the property of Thermo Fisher
Scientific and its subsidiaries.
This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of

similar documents