| Size: 1245 Comment:  | Size: 3306 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 5: | Line 5: | 
| == Affymetrix Microarray Data == * CEL files: contain processed intensity values, higher intensity (transcript abundance) more active genes | 1. Affymetrix Microarray Data * CEL files: contain intensity values, higher intensity (transcript abundance) more active genes | 
| Line 14: | Line 14: | 
| == Microarray Experimental Designs == | 2. Microarray Experimental Designs | 
| Line 16: | Line 16: | 
| * Pooling (biological averaging), blocking | * Pooling (biological averaging), blocking, randomized | 
| Line 19: | Line 19: | 
| == Data Exploration == | 3. Data Exploration | 
| Line 26: | Line 26: | 
| 4. Data Preprocessing * Approaches: background correction, normalization, PM correction, and summarization * Background correction methods: * rma: robust multiarray average method (Irizarry et al. 2003) * mas: Affymetrix Microarray Suite background correction method (2002) * GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004) * Normalization methods: * quantile, contrast and loess: discussed and compared by Bolstad et al. (2003) * constant (scaling): taken by Affymetrix, usually done after summarization * invariantset: used in the dChip software (Li and Wong 2001) * qspline: normalized by fitting splines to the quantiles (Workman et al. 2002). * PM correction methods: * mas: an ideal mismatch subtracted from PM (Affymetrix 2002) * pmonly: no adjustment to the PM values. * subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999) * Summarization methods: * avgdiff: the average (Affymetrix MAS 4.0 1999) * mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002) * liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset: * y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij * y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij * medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset * y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities * playerout: Lazaridis et al. (2002) || Popular methods || Background correction || Normalization || PM correction || Summarization || || RMA || rma || quantile || pmonly || medianpolish (log2 scale)|| MAS5: mas background correction, what normalization, mas PM correction, and mas summarization (log2 scale) MBEI: no background correction, invariantset normalization, subtractmm PM correction, and liwong summarization | 
Summary of Affymetrix Microarray Data Analysis
1. Affymetrix Microarray Data
- CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
- CDF (chip description file) files: specify the probe and probe set to which each cell belongs
- Terms: - Probe: oligonucleotides of 25 base (pair) length used to probe RNA targets (25 base sequence)
- Probe pair: a unit composed of a perfect match (PM) and its mismatch (MM)
- Probe pair set: PMs and MMs related to a common affyID (a group of probe pairs corresponds to a particular gene or a fraction of a gene. Some genes are represented by more than one probe set.)
- affyID: an identification for a probe set (which can be a gene or a fraction of a gene) represented on the array
 
2. Microarray Experimental Designs
- Biological and technical replicates
- Pooling (biological averaging), blocking, randomized
- Sample size determination
3. Data Exploration
- MA plots - M values are log fold changes, M=log2(T/C)=log2(T)-log2(C)
- A values are average log intensities between two arrays, A=(log2(T)+log2(C))/2
 
- Images, residual images
- Histograms, boxplots
- RNA degradation plots
4. Data Preprocessing
- Approaches: background correction, normalization, PM correction, and summarization - Background correction methods:   - rma: robust multiarray average method (Irizarry et al. 2003)
- mas: Affymetrix Microarray Suite background correction method (2002)
- GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
 
 - Normalization methods: - quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
- constant (scaling): taken by Affymetrix, usually done after summarization
- invariantset: used in the dChip software (Li and Wong 2001)
- qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
 
- PM correction methods: - mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
- pmonly: no adjustment to the PM values.
- subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999)
 
- Summarization methods: - avgdiff: the average (Affymetrix MAS 4.0 1999)
- mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002)
- liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset: - y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
- y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij
 
- medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset - y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
 
- playerout: Lazaridis et al. (2002)
 
 Popular methods Background correction Normalization PM correction Summarization RMA rma quantile pmonly medianpolish (log2 scale) 
- Background correction methods:   
MAS5: mas background correction, what normalization, mas PM correction, and mas summarization (log2 scale) MBEI: no background correction, invariantset normalization, subtractmm PM correction, and liwong summarization
