| Size: 1313 Comment:  | Size: 5111 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| == Table of Contents == [[TableOfContents()]] | |
| Line 4: | Line 7: | 
| * Predict specificity of peptide recognition domain from the primary amino acid sequence. | * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences | 
| Line 7: | Line 10: | 
| == Strategy == | == Background == * [wiki:/PDZ PDZ Domains] * [wiki:/MachineLearning Machine Learning] == Strategy/Ideas == * [wiki:/Strategy Strategy and Ideas] == Data == * [wiki:/PDZData PDZ Data] == Experiments == * [wiki:/Experiments Experiments and Results] | 
| Line 10: | Line 24: | 
| * [wiki:/Log Status Log] | * [wiki:/Log Status] | 
| Line 12: | Line 26: | 
| == Tasks == | ## == Tasks == ## ## 1. --(Learn SVN, Brain code (!ResidueResidueCorrelation))-- ## 1. Literature review related to domain specificity (background activity), PDZ domains (from Ioana's project) ## 1. --(Run !ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary))-- ## 1. MSA subproject ## 1. --(Learn basics of multiple sequence alignment (Baxevanis, chapter 12))-- ## 1. Find and evaluate MSA algorithms (compare notes with Stacy) + evaluate Superfamily, PFAM databases of protein family alignments ## 1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results. ## 1. Benchmark/validate correlation subproject ## 1. We know H (PDZ), T @-2 (peptide) correlation ## 1. Look at structures (e.g. 1N7T and 1BE9) to see if correlated residues/positions are close to each other and compatible (physicochemically). We need to focus on ## PDZ structures that have bound peptides (search in PDB) ## 1. Build set of known true and false correlations for use in evaluating prediction algorithm (Note: also ask Dev Sidhu, when available). See [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=10871264 Baldi et al. review] ## 1. Amino acid group subproject ## 1. Learn about amino acid groups ## 1. Define an initial aa grouping (reasonable grouping from Levy paper) ## 1. Add new feature to !ResidueResidueCorrelation class so it considers grouping + run on PDZ data. This involves implementing the groups as a reduced alphabet (amino acids in a group are considered equivalent) ## 1. Try all groupings to see how it affects the results (from Levy paper) ## 1. See if we can incorporate aa similarity defined by substitution matrix approach (e.g. BLOSUM, PAM, GONNET) into our method, instead of grouping ## 1. Similarly, evaluate aa similarity defined by factor analysis (Atchley et al paper) ## 1. Think about new PDZ domain features that can be used for prediction. | 
| Line 14: | Line 48: | 
| 1. Learn SVN, Brain code (ResidueResidueCorrelation) 1. Literature review related to domain specificity (background activity) 1. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary) 1. Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data 1. Think about new PDZ domain features that can be used for prediction. | ## == Ideas == ## * [wiki:/MachineLearning Machine Learning Page] ## * With current correlation counting calculation, Weight calculation by how many peptides are in the peptides file (i.e. normalize the correlation calculation in some way) ## * Build tools to help interpret correlations in the context of multiple sequence alignments (and later structures). ## * Use of structural data (PDZ domain structures) (may require homology modeling) ## * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) ## * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis ## * Analysis of SNPs and how they affect domain binding (including correlations between SNPs) ## * Define the binding site of the PDZ domain based on phage display data. Given that identical binding sites between two PDZ domains should correspond to identical ## binding specificities, find the set of PDZ domain sites that correlate perfectly with binding specificity. | 
| Line 20: | Line 58: | 
| == Ideas == * Use of structural data (PDZ domain structures) (may require homology modeling) * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis | ## == Courses == ## === Biology === ## * [http://bio250y.chass.utoronto.ca/ BIO250] - Cell and Molecular Biology ## * Classes: Tues/Thurs - 1-2 PM (Convocation Hall) OR Mon - 6-8 PM (MC 102-Mechanical Engineering Building) ## * Textbook: [http://www.amazon.com/Molecular-Biology-Fourth-Bruce-Alberts/dp/0815332181/ref=pd_sim_b_1/105-5132391-0345258?ie=UTF8&qid=1188913552&sr=1-4 Molecular Biology of the Cell 4th Ed.] Alberts et al. ## === Protein Structure === ## * BCH340H1 - Proteins: from Structure to Proteomics ## * Classes: Winter 2008 ## * Textbook: ? ## * Previous Course Web Pages: ## * [http://arrhenius.med.utoronto.ca/~chan/bch340h04-outline.html 2004 Chan] ## * [http://xtal.uhnres.utoronto.ca/prive/BCH340/ 2006 Prive] ## === Machine Learning === ## * CSC2515 - Machine Learning ## * Previous Course Web Pages: ## * [http://www.cs.toronto.edu/~roweis/csc2515/ 2003-2006 Roweis] ## == Committee Meetings == ## * [wiki:/Meeting Notes] == Tools/Resources == * [wiki:/ToolsResources Tools and Resources] == Related Literature == * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea] * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell] | 
| Line 29: | Line 90: | 
| == Documents == == Background Literature == * The Structure and Function of Proline Recognition Domains, Zarrinpar et al., 2003 attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf | 
Table of Contents
Goals
- Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
- Analyze PDZ, WW and then SH3 domains
Background
- [wiki:/PDZ PDZ Domains]
- [wiki:/MachineLearning Machine Learning] 
Strategy/Ideas
- [wiki:/Strategy Strategy and Ideas]
Data
- [wiki:/PDZData PDZ Data]
Experiments
- [wiki:/Experiments Experiments and Results]
Status
- [wiki:/Log Status]
Tools/Resources
- [wiki:/ToolsResources Tools and Resources] 
Related Literature
- [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea] 
- [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell] 
Team
- Shirley Hui
- Gary Bader
