Proteome Discoverer Workflow

Report
HPP
SpHPP
Use of SEQUEST search results with
ProteoRed.org MIAPE Extractor
La Cristalera, Miraflores de la Sierra, 10-11 December 2012
INDEX
1. A working Workflow to extract MIAPE information
from Proteome Discoverer 1.3 search results
using ProteoRed MIAPE Toolkit
Óscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián
2. Data dependent acquisition using inclusion list
(IL)
Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal
MASCOT WORKFLOW
Mascot
Mass Spectra
RAW
Identification
Mascot
MGF
Output file
mzIdentML
MIAPE
Extractor
MIAPE
Generation
MIAPE MS
Ó. Gallardo
MIAPE
Generator
Tool
MIAPE MSI
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
Identification
RAW
MGF
Ó. Gallardo
Output file
MSF
MIAPE
Extractor
mzIdentML
PROTEOME DISCOVERER WORKFLOW
(GPL) LP-CSIC/UAB 2011-2012
RAW
MGF
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
RAW
Discoverer Daemon
MGF
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
Identification
RAW
Output file
MSF
Proteome Discoverer
Discoverer Daemon
MGF
Ó. Gallardo
MIAPE
Extractor
mzIdentML
PROTEOME DISCOVERER WORKFLOW
ProCon
0.9.152
MSF
mzIdentML
A. Medina August 2012
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
ProCon
0.9.162
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....
MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished
SequenceCollection written
CV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.
CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526)
at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851)
1. ProCon 0.9.162 was unable to interpret
correctly the Controlled Vocabulary used by
Proteome Discoverer to identify Post
Translational Modifications (PTMs)
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....
MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished
SequenceCollection written
CV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.
CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
2. ProCon 0.9.162 also had problems with it’s
internal array references
Ó. Gallardo
MSF
.Prot.XML
mzIdentML
PROTEOME DISCOVERER WORKFLOW
ProCon
2
0.9.163
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....
MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished
SequenceCollection written
CV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.
CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526)
at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851)
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....
MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished
SequenceCollection written
CV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.
CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
1. ProCon 0.9.163 was unable to identify
correctly Post Translational Modifications
(PTMs) , marking all of them as “unknown
modification” in the resulting mzIdentML file
2. ProCon 0.9.163 had still problems with it’s
internal array references
Ó. Gallardo
MSF
.Prot.XML
mzIdentML
PROTEOME DISCOVERER WORKFLOW
ProCon
3
0.9.164
MSF
.Prot.XML
mzIdentML
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
Identification
RAW
Output file
MSF
Proteome Discoverer
Discoverer Daemon
.Prot.XML
MIAPE
Extractor
MGF
MIAPE
Generation
MIAPE
Generator
Tool
Ó. Gallardo
mzIdentML
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
Identification
ID
Output file
RAW
Proteome Discoverer
Discoverer Daemon
MSF
...........................................................67% finished
.....................TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai
.TaxID for organismName unknown: Sphaerochaeta globosa
...TaxID for organismName unknown: Leptospira borgpetersenii serovar
.....
MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished
SequenceCollection written
CV term for unknown modification Deamidated / +0.984 Da (N, Q) not found.
CV term for unknown modification Acetyl / +42.011 Da (Any NTerminus) not found.
Spectra IDs didn’t match between MGF file and
mzIdentML file
.Prot.XML
PepMS
Charge
MGF mgf
RT
ID
MIAPE
Extractor
MIAPE
Generation
MIAPE
Generator
Tool
Ó. Gallardo
ID
IDmzid
mzIdentML
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
ID
Identification
RAW
Output file
ID
MSF
Proteome Discoverer
Discoverer Daemon
.Prot.XML
ID
PepMS
Charge
MGF RT
MIAPE
Extractor
mzIdentML
MIAPE
Generator
Tool
MIAPE MSI
MIAPE
Generation
MIAPE MS
Ó. Gallardo
PROTEOME DISCOVERER WORKFLOW
Proteome Discoverer
Mass Spectra
Identification
RAW
Output file
MSF
Proteome Discoverer
Discoverer Daemon
.Prot.XML
MGF
MIAPE
Extractor
mzIdentML
MIAPE
Generator
Tool
MIAPE MSI
MIAPE
Generation
MIAPE MS
Ó. Gallardo
WORK IN PROGRESS
1. Exportation of Prot.XML files from the MSF
ones, and utter conversion of MSF +
ProCon
Prot.XML files to mzIdentML files
is notdevelopers are working in a new version
that doesn’t need Prot.XML files, making the
automatized
conversion process much faster and easier.
2. ProCon has still some errors, is very slow
with large files, and is memory hungry
1. Uploading of MSF + mzIdentML files through
1. We are working in an automation script, to
MIAPE Extractor is not yet automatized
automatize MIAPE Extractor data
2. Although we can generate MIAPE data from
extraction: MIAPE Extractor Automator v.2
Sequest search results, MIAPE Toolkit
of MIAPE Extractor and
doesn’t work very well with this data2.forDevelopment
the
analysis stage: we can not retrieve the MIAPE Generator tool continues
identified proteins, there are problems with improvement in each version
the Sequest Score fields, …
Ó. Gallardo
INDEX
1. A working Workflow to extract MIAPE information
from Proteome Discoverer 1.3 search results
using ProteoRed MIAPE Toolkit
Óscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián
2. Data dependent acquisition using inclusion list
(IL)
Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal
Data dependent acquisition with inclusion list
RATIONAL OF USING DDP WITH INCLUSION LIST (IL):
a.- Most target proteins assigned to the groups of the shotgun project were not detected using
shotgun approaches.
b.- The few detected peptides were not optimum for MRM analysis (not proteotypic, with Met/Cys,
with missed cleavage).
c.- Preliminary tests at LP-CSIC/UAB using targeted approaches require a limited list of peptides
(need to restrict the list of target m/z values to 20-30) and failed to detect the target proteins.
DDP with Inclusion list increases the probability to
positively detect low abundant proteins/peptides
without the constraints of targeted approaches.
16 PROTEINS SELECTED FOR INCLUSION LIST
- 6 proteins assigned to the LPCSICUAB laboratory
- 10 proteins assigned to MRM labs and not
detected by shotgun
J. Villanueva
Laboratory
Uniprot
Name
Canals
P69905
HBA_HUMAN
FB
Q6GPI1
CTRB2_HUMAN
CG
P24855
DNAS1_HUMAN
MPV
Q6A1A2
PDPK2_HUMAN
FC
P16444
DPEP1_HUMAN
CG
Q9BSW7
SYT17_HUMAN
CG
P11597
CETP_HUMAN
MPV
P15391
CD19_HUMAN
CG
Q53FZ2
ACSM3_HUMAN
FV
Q8N4N3
KLH36_HUMAN
Abian
Q9BUU2
METTL22_HUMAN
Abian
P33076
CIITA_HUMAN
Abian
Q9Y661
HS3ST4_HUMAN
Abian
Q14703
MBTPS1_HUMAN
Abian
B7ZMK8
PRSS36_HUMAN
Abian
A4GXA9
EME2_HUMAN
Procedure: Data Dependent with IL
To obtain the inclusion list:
1.- All tryptic peptides 7-25AA.
2.- m/z values assuming z=2 and z=3 for all peptides.
3.- Filter duplicate m/z values (software requirement)
Number of m/z values in the inclusion list: 556 (num peptides 282)
Samples CCD18 and MCF7
Aliquot 250 µg protein
OffGel (12 fractions)
FASP digestion
LC-MS/MS (DDP, IL, Targeted)
Protein Discoverer
J. Villanueva
Signal ID
P33076_GCTLLLTARPR
P11597_VFHSLAK
P16444_YPDLIAELLR
Q53FZ2_EGWGNLK
P24855_YDIALVQEVR
Q8N4N3_VASMNQR
Q8N4N3_VKPAVCSLLPK
Q14703_APCPGCSHLTLK
Q9Y661_AISDYTQTLSK
Q9BSW7_TAVEQWHSLR
P69905_VDPVNFK
P16444_TLEQMDVVHR
A4GXA9_MGLLAVGPDLSR
m/z
400.9013
401.2348
401.5646
402.2062
402.5561
403.2032
404.5779
409.5392
409.5473
409.5478
409.7243
409.8769
410.2292
DATA DEPENDENT WITH INCLUSION LIST: LTQ-ORBITRAP
Sample VH: MCF-7
RT: 0.00 - 140.02
MS traces
1856
41.90
100
NL: 2.17E9
TIC F: FTMS + p NSI
Full ms
[400.00-1800.00] MS
HPP_VallHebron_DDP
orbi_Test1_120724_Fr0
6_06
90
2567
53.80
Relative Abundance
80
70
Offgel Fr6
60
50
2692
55.50
40
30
20
10
905
407
17.09 25.72
0
1734
39.81
7533
136.19
3458
68.15
2412
51.34
4149 4392
5301
80.00 83.91
98.67
6742
6210
114.05 122.97
7635 NL: 9.66E8
136.17 TIC F: FTMS + p NSI
Full ms
[400.00-1800.00] MS
HPP_VallHebron_DDP
orbi_Test1_120724_Fr0
7_07
100
90
2770
55.79
80
Offgel Fr7
70
60
2616
53.80
50
40
30
20
308
15.32
10
2495
1869 52.20
42.36
1571
1065 37.08
28.39
3374
64.60 3583
67.93
3766
71.01
4522
83.95
5418
98.50
4819
88.73
6285 6832
113.47 122.71
0
0
20
40
60
80
Time (min)
J. Villanueva
100
120
140
RESULTS: Inclusion list and targeted
DATA PROCESSING FOR IL DATA:
1.- MGF generation with PDv1.3
2.- Database search: Proteome Discoverer and Mascot
3.- FDR 5%
RESULT:
Data dependent with IL: 282 Listed peptides undetected
(same that targeted experiments)
 Low amount of target
proteins
 Proteins not expressed
in these cells
J. Villanueva
Chromosome 16 protein description: Data Dependent Analysis
DATA PROCESSING:
1.- MGF generation with PDv1.3
2.- Database search: Proteome Discoverer (and Mascot)
3.- Search results and Filtering (1 %FDR): MIAPE Extractor (Data
Inspector Module) and Proteome Discoverer.
Work in progress:
MIAPE EXTRACTOR:
The data could be uploaded and the FDR process could be achieved.
Data Inspector Module: Detected errors to be solved: unable to extract
protein information from SEQUEST data.
J. Villanueva
Work in progress...
Number of proteins that passed the 1%FDR filter:
1.- Significant differences between searching algorithms
Need an in-depth data revision.
MCF7
Acquisition
method
DDP
CCD18
DDP
Sample
J. Villanueva
search
method
MASCOT
SEQUEST
MASCOT
SEQUEST
MIAPE EXTRACTOR
Num peptides Num proteins
3079
2316
3561
1422
3102
2370
2250
980
PROTEOME DISCOVERER
Num peptides Num proteins
--3616
1282
3765
1180
2475
946
HPP
SpHPP
Use of SEQUEST search results with
ProteoRed.org MIAPE Extractor
Thank you for
your attention.
Any question?
La Cristalera, Miraflores de la Sierra, 10-11 December 2012

similar documents