Enrichment Map Genesets
Summary
- Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with GSEA) updated monthly from original source locations available with: - Entrez gene ids
- UniProt accessions 
- Gene symbols
 
Sources
| Source | File Origin | File Type | ID extracted | Frequency source is updated | Number of pathwayss | Notes | 
| KEGG | KEGG ftp site (July 2011) | gmt | symbol | static as of July 1, 2011 | 236 | Not available in biopax, available in flatfile, translated into gmt files | 
| Msigdb - c2 | static (needs to be updated manually) | gmt | Entrez gene | sporadically |   total 880: | Only need other and Biocarta as all other sources are currently covered | 
| NCI | scripted grab from pathwaycommons | gmt | Entrez gene | every 4 months | 217 pathways | Still has next step issues in biopax geneset extraction | 
|   | biopax | Entrez gene | sporadically | ? | Can't parse biopax level 3 | |
|   | biopax | Entrez gene | static | 386 pathways | Biopax 3 - Complete Mess! - currently getting from Msigdb | |
| IOB | directly from IOB - static (July 2011) | biopax | Entrez gene | sporadically |  35 pathways -   | need biopax pathways fixed so species info is correct but information is still extractable. | 
| NetPath | www.netpath.org/browse (scripted grab of file numbered 1-25) | biopax | Entrez gene | static |   25 pathways -  | need biopax pathways fixed so species info is correct but information is still extractable. | 
| HumanCyc | scripted grab of zipped release from password protected website. | biopax | Uniprot | updated periodically | 249 Pathways | available in biopax level 2 and level 3 | 
| Reactome | scripted grab of zipped release from website | biopax | Uniprot | updated release | 1117 pathways (release 37) | No way of getting version of release from biopax file | 
| GO | scripted grab from EBI ftp site (human) | GAF | Uniprot | released once a month |  13,034 no GO IEA  | source is direct from original curator of annotations | 
|  msigdb - c3  | grab from Msigdb | gmt | Entrez gene | sporadically |  221 miRs  | 
 | 
File Structure
< > denotes directory
- <Release> - directory is named according to date sets were updated. - <Species> - <Identifier> - (either Entrez gene, UniProt, Gene symbol) - <GO> - BP = biological process
- MF = molecular function
- CC = Cellular component
- All = BP + MF + CC
- no_GO_IEA - indicates that the file excludes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) 
- with_GO_IEA - indicates that the file includes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) 
 
- <Pathways> 
- <miRs> 
- <TF> 
- <Disease phenotypes> 
 
 
 
- In each <identifier> directory There are amalgamated gene set files: - AllPathways - contains all pathway sources in the Pathways directory 
- GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory.
 
Creating customized Genesets
- Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt ) 
cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt

 NCI
 NCI