Data Analysis Methods

Fluorescent images of hybridized microarrays were obtained using the GenePix 4000 microarray scanner. Images were analyzed with ScanAlyze. Single spots or areas of the array with obvious blemishes were flagged and excluded from subsequent analyses. Fluorescence ratios (along with numerous quality control parameters; see ScanAlyze manual) were stored in a custom database. Raw data files for each array containing all measured values and manual flags are available here. A set of clones that consistently behaved poorly across arrays was identified and excluded from all analyses; a list of excluded elements is available here. Fluorescence ratios were calibrated independently for each array by applying a single scaling factor to all fluorescent ratios from each array; this scaling factor was computed so that the median fluorescence ratio of well-measured spots on each array was 1.0.

All non-flagged array elements for which the fluorescent intensity in each channel was greater than 1.4 times the local background were considered well-measured. The ratio values were log-transformed (base 2) and stored in a table (rows=individual cDNA clones, columns=single mRNA samples). Where samples had been analyzed on multiple arrays, multiple observations for an array element for a single sample were averaged. Array elements that were not well-measured on at least 80% of the 96 mRNA samples were excluded. Data for the remaining genes were centered by subtracting (in log space) the median observed value, to remove any effect of the amount of RNA in the reference pool. This dataset contains 4,026 array elements and is available here. Hierarchical clustering was applied to both axes using the weighted pair-group method with centroid average (WPGMC) (Sneath and Sokal 1973) as implemented in the program Cluster. The distance matrixes used were Pearson correlation for clustering the arrays and the inner product of vectors normalized to magnitude 1 for the genes (this is a slight variant of Pearson correlation; see Cluster manual computational details). The results were analyzed with TreeView. The datasets used for Figs. 3 and Figs. 4 are also available.