| Size: 5317 Comment:  | Size: 8168 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| == Table of Contents == [[TableOfContents()]] | == Proteome scanning of PDZ domain interactions using support vector machines == | 
| Line 6: | Line 5: | 
| == Goals == * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences * Analyze PDZ, WW and then SH3 domains | ## == Table of Contents == ## <<TableOfContents>> | 
| Line 10: | Line 8: | 
| == Background == * [wiki:/PDZ PDZ Domains] * [wiki:/MachineLearning Machine Learning] | == Motivation == PDZ domains mediate important biological processes through the recognition of short linear motifs. Two recent independent high through put protein microarray and phage display experiments have been used to detect PDZ domain interactions. Several computational predictors of PDZ domain interactions have also been developed, however they are trained using only protein microarray data or focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require not only an accurate but precise predictor due to the thousands of possible interactors in a given proteome. However, once validated these predictions would increase the coverage of current PDZ domain interaction networks and further our understanding of the biologically processes they mediate. | 
| Line 14: | Line 11: | 
| == Strategy == * [wiki:/Strategy Strategy] | == Results == We developed a PDZ domain interaction predictor using SVMs trained with both protein microarray and phage display data. In order to use the phage display data for training, we developed a method to deterministically generate artificial negative interactions for the phage display data since it consisted of positive interactions only. Through extensive blind testing we showed that the SVM could predict interactions in different organisms. We then used the SVM to scan the proteomes of different organisms to predict binders for several PDZ domains. Predictions were validated using PDZBase or protein microarray data and a comparison of F1 measures and FPRs between the SVM and published or commonly used predictors demonstrated the SVM’s improved accuracy and precision. | 
| Line 17: | Line 14: | 
| == Ideas == * [wiki:/Ideas Ideas] | == Supplementary Data == a. Supplementary Document (Link) a. PDZSVM Data Files [[attachment:PDZSVMData.zip]] * Models * Chen model parameter and binding site encoding files * Stiffler model parameter files * Proteomes * Ensembl proteome files for Human, Worm and Fly * Experiment Interaction files (in peptide file format) * Fly files from Chen * Human files from Sidhu * Mouse files from Stiffler * Worm files from Chen * Curated Interaction files (flat files) * PDZBase for Human (Worm and Fly included, but not used) * Human Protein Reference Database * Phage codon bias files | 
| Line 20: | Line 32: | 
| == Data == * [wiki:/PDZData PDZ Data] | == Availability and Implementation == Source code and dependencies are freely available upon request, implemented in Java. * Dependencies: * jfreechart 1.0.12 (and dependencies) * weka 3.9.1 * auc calculator (Davis & Goadrich, 2006) * !BioJava 1.5 * iText 2.1.3 * jmatio * BRAIN 1.0.5 (pdzsvm) * libSVM 2.8.9 (pdzsvm) | 
| Line 23: | Line 44: | 
| == Experiments == * [wiki:/Experiments Experiments and Results] | ## == Goals == ## * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences ## * Analyze PDZ, WW and then SH3 domains | 
| Line 26: | Line 48: | 
| == Status == * [wiki:/Log Status] | ## == Background == ## * [[/PDZ|PDZ Domains]] ## * [[/MachineLearning|Machine Learning]] ## == Strategy == ## * [[/Strategy|Strategy]] ## == Ideas == ## * [[/Ideas|Ideas]] ## == Data == ## * [[/PDZData|PDZ Data]] ## == Experiments == ## * [[/Experiments|Experiments and Results]] ## == Status == ## * [[/Log|Status]] | 
| Line 80: | Line 118: | 
| ## * [wiki:/Meeting Notes] | ## * [[/Meeting|Notes]] | 
| Line 82: | Line 120: | 
| == Tools/Resources == * [wiki:/ToolsResources Tools and Resources] | ## == Tools/Resources == ## * [[/ToolsResources|Tools and Resources]] | 
| Line 85: | Line 123: | 
| == Reading Notes == * [wiki:/Shirley/MBCReadings Molecular Biology of the Cell] * [wiki:/Shirley/MBCReadings Protein-protein Interaction Detection] * Support Vector Machines | ## == Reading Notes == ## * [[/../ShirleyHui/MBCReadings|Molecular Biology of the Cell]] ## * [[/../ShirleyHui/PPIReadings|Protein-protein Interaction Detection]] ## * Support Vector Machines | 
| Line 90: | Line 128: | 
| == Related Literature == * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea] * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell] | ## == Related Literature == ## * [[http://www.connotea.org/rss/user/s2hui?download=view|Literature List on Connotea]] ## * [[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading|Molecular Biology of the Cell]] | 
Proteome scanning of PDZ domain interactions using support vector machines
Motivation
PDZ domains mediate important biological processes through the recognition of short linear motifs. Two recent independent high through put protein microarray and phage display experiments have been used to detect PDZ domain interactions. Several computational predictors of PDZ domain interactions have also been developed, however they are trained using only protein microarray data or focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require not only an accurate but precise predictor due to the thousands of possible interactors in a given proteome. However, once validated these predictions would increase the coverage of current PDZ domain interaction networks and further our understanding of the biologically processes they mediate.
Results
We developed a PDZ domain interaction predictor using SVMs trained with both protein microarray and phage display data. In order to use the phage display data for training, we developed a method to deterministically generate artificial negative interactions for the phage display data since it consisted of positive interactions only. Through extensive blind testing we showed that the SVM could predict interactions in different organisms. We then used the SVM to scan the proteomes of different organisms to predict binders for several PDZ domains. Predictions were validated using PDZBase or protein microarray data and a comparison of F1 measures and FPRs between the SVM and published or commonly used predictors demonstrated the SVM’s improved accuracy and precision.
Supplementary Data
- Supplementary Document (Link)
- PDZSVM Data Files PDZSVMData.zip - Models - Chen model parameter and binding site encoding files
- Stiffler model parameter files
 
- Proteomes - Ensembl proteome files for Human, Worm and Fly
 
- Experiment Interaction files (in peptide file format) - Fly files from Chen
- Human files from Sidhu
- Mouse files from Stiffler
- Worm files from Chen
 
- Curated Interaction files (flat files) - PDZBase for Human (Worm and Fly included, but not used)
- Human Protein Reference Database
 
- Phage codon bias files
 
- Models 
Availability and Implementation
Source code and dependencies are freely available upon request, implemented in Java.
- Dependencies:  - jfreechart 1.0.12 (and dependencies)
- weka 3.9.1
- auc calculator (Davis & Goadrich, 2006) 
- BioJava 1.5 
- iText 2.1.3
- jmatio
- BRAIN 1.0.5 (pdzsvm)
- libSVM 2.8.9 (pdzsvm)
 
Team
- Shirley Hui
- Gary Bader
