UTHSC Brain mRNA U74Av2 (Dec03) HWT1PM
RECOMMENDED BRAIN DATA SET. This December 2003 data freeze provides estimates of mRNA expression in brains of BXD recombinant inbred mice measured using Affymetrix U74Av2 microarrays. Data were generated at the University of Tennessee Health Science Center (UTHSC). Over 300 brain samples from 35 strains were hybridized in small pools (n=3) to 100 arrays. Data were processed using a new method called the Heritability Weighted Transform (HWT) developed by Kenneth F. Manly and Robert W. Williams. Our initial results demonstrate that the HWT1PM transform generates estimates of gene expression that yield more significant QTLs than RMA, dChip, PDNN, or MAS 5.
About the cases used to generate this set of data:
This data set includes estimate of gene expression for 35 genetically uniform lines of mice: C57BL/6J (B6, or simply B), DBA/2J (D2 or D), their B6D2 F1 intercross, and 32 BXD recombinant inbred (RI) strains derived by crossing female B6 mice with male D2 mice and then inbreeding progeny for over 21 generations. This set of RI strains is a remarkable resource because many of these strains have been extensively phenotyped for hundreds of interesting traits over a 25-year period. A significant advantage of this RI set is that the two parental strains (B6 and D2) have both been extensively sequenced and are known to differ at approximately 1.8 million SNPs. Coding variants (mostly single nucleotide polymorphisms and insertion-deletions) that may produce interesting phenotypes can be rapidly identified in this particular RI set.
BXD1 through BXD32 were produced by Benjamin A. Taylor starting in the late 1970s. BXD33 through BXD42 were also produced by Taylor, but from a second set of crosses initiated in the early 1990s. These strains are all available from the Jackson Laboratory, Bar Harbor, Maine. BXD43 through BXD99 were produced by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams in the late 1990s and early 2000s using advanced intercross progeny (Peirce et al. 2004). Only two of these incipient strains are included in the current database (BXD67 and BXD68).
In this mRNA expression database we generally used progeny of stock obtained from The Jackson Laboratory between 1999 and 2001. Animals were generated in-house at the University of Alabama by John Mountz and Hui-Chen Hsu and at the University of Tennessee Health Science Center by Lu Lu and Robert Williams.
The table below lists the arrays by strain, sex, and age. Each array was hybridized to a pool of mRNA from three mice.
How to download these data:
All standard Affymetrix file types (DAT, CEL, RPT, CHP, TXT) can be downloaded for this data set by selecting the strain names in the table above and then selecting the appropriate file, or download the particular transform in an Excel work book with both individual arrays and strain means and SEMs. Please refer to the Usage Conditions and Limitations page and the References page for background on appropriate use and citations of these data.
About the samples used to generate these data:
Each array was hybridized with labeled cRNA generated from a pool of three brains from adult animals usually of the same age and always of the same sex. The brain region included most of the forebrain and midbrain, bilaterally. However, the sample excluded the olfactory bulbs, retinas, or the posterior pituitary (all formally part of the forebrain). A total of 100 such pooled samples were arrayed: 74 from females and 26 from males. Animals ranged in age from 56 to 441 days, usually with a balanced design: one pool at approximately 8 weeks, one pool at approximately 20 weeks, and one pool at approximately 1 year. Strain averages of mRNA expression level are therefore typically based on three pooled biological replicate arrays. This data set does not incorporate statistical adjustment for possible effects of age and sex. Users can select the strain symbol in the table above to review details about the specific cases and array processing center (DP = Divyen Patel at Genome Explorations, Inc; TS = Thomas Sutter at University of Memphis). You can also click on the individual symbols (males or females) to view the array image.
About the array platform:
Affymetrix U74Av2 GeneChip: The expression data were generated using 100 U74Av2 arrays. The chromosomal locations of U74Av2 probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possible to confirm the BLAT alignment results yourself simply by clicking on the Verify UCSC and Verify Emsembllinks in the Trait Data and Editing Form (see buttons to the right side of the Location line).
Most probe sets on the U74Av2 array consist of a total of 32 probes, divided into 16 perfect match (PM) probes and 16 mismatch controls (MM). Each set of these probe has an identifier code that includes a unique number, an underscore character, and several suffix characters that highlight design features. The most common probe set suffix is at. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene. Other codes include:
f_at (sequence family): Some probes in this probe set will hybridize to identical and/or slightly different sequences of related gene transcripts.
s_at (similarity constraint): All Probes in this probe set target common sequences found in transcripts from several genes.
g_at (common groups): Some probes in this set target identical sequences in multiple genes and some target unique sequences in the intended target gene.
r_at (rules dropped): Probe sets for which it was not possible to pick a full set of unique probes using the Affymetrix probe selection rules. Probes were picked after dropping some of the selection rules.
i_at (incomplete): Designates probe sets for which there are fewer than the standard numbers of unique probes specified in the design (16 perfect match for the U74Av2).
st (sense target) : Designates a sense target; almost always generated in error.
Descriptions for the probe set extensions were taken from the Affymetrix GeneChip Expression Analysis Fundamentals.
About data processing:
HWT1PM is an acronym for heritability weighted transform version 1, perfect match probes only.
Most Affmetrix transforms generate a single consensus estimate of expression based on as many as 32 probes that hybridize with variable selectivity to the target transcript. Each probe could be given an equal weight to derive a consensus estimate of expression (essentially one vote per probe). However, the hybridization performance of probes and their ability to generate a biologically meaningful estimate of mRNA level is highly variable and idiosyncratic; depending on melting temperature, stacking energy, the mixture of background transcripts, and characteristics of reactions used to extract mRNA and to generated labeled cRNA. A simple way to evaluate the performance of probes is to compute their heritabiity within a large data set.
Heritability is essentially the ratio of genetic variance to the total variance. A highly informative probe is one with little variability within strain but a great deal of variability among strains; essentially the main effect of "strain" in an analysis of variance (ANOVA). Heritability estimated in this way is necessary but not sufficient to define a QTL. To define a QTL, the variation must also correlate with genotypes at some genomic location(s).
We have studied 35 strains and can therefore estimate the "between-strain variance." We have also typically performed three biological replicates within strain. Therefore, we can estimate genetic and non-genetic sources of variance. In our study we have minimized non-genetic variance by pooling samples and by rearing all mice in a standard laboratory environment. We are in a good position to estimate these two variance components and compute the heritability of the 490,000 probes on the U74Av2 array. All of these estimates, both for the perfect match (PM) and mismatch (MM) probes, are provided in the PROBE INFORMATION table associated with every transcript (click on the work "Probe" in any of the TRAIT DATA pages).
Estimation of Heritability: Individual probe intensities from Affymetrix U74Av2 microarrays were log2-transformed and normalized to a standard array-wide mean of 8 units and a standard deviation of 2 units as described for several other data sets (e.g., UTHSC Brain mRNA U74Av2 (Dec03) MAS5).
For each probe, the mean squared deviations within strains (MSw) and the mean square deviation between strains (MSb) were calculated by ANOVA. Raw heritability was estimated as (MSb-MSw)/(n x MSt), where n is the average number of replicates per strain (usually 3) and MSt is total variance in the 100 array data set. These particular raw heritability estimates are provided in the PROBE INFORMATION table for each transcript (click on the blue word "Probe" in any of the TRAIT DATA pages and then scroll to the far right column labeled 100brains h2). Note, these raw heritabilities may have negative values because they are calculated from the difference of two estimates subject to sampling error.
Adjusted heritability was derived from raw heritability by assigning values of 0 and 1, respectively, to raw heritability values below 0.0 or above 1.0. Weights for each probe were calculated by dividing the adjusted heritability by the mean adjusted heritability for all probes in the probeset. In essence this divides the 16 total votes (there are 16 PM probes per probe set) on the basis of their heritability scores. For example. If 8 of the probes had a heritability of 0.5, 4 had a heritability of 0.25, and 4 had a heritability of 0, then these three groups would get weights of 1.6, 0.8, and 0, respectively in generating the consensus estimate of expression level. Expression estimates for each probe set were calculated as the weighted average of those probe-specific means, using the heritability weights just described. The final expression estimates for each strain were calculated as an unweighted average of all biological replicates within each strain.
General Comment: From a statistical point of view the 100 arrays data set we are working with has four dimensions. The first dimension is genetic, and is formed by the set of genetically distinct inbred strains (n = 35) and their genotypes. The second dimension in non-genetic and is represented by the replicate samples within each isogenic line. The third dimension is formed by the multiple probes that make up each probe set. There are up to 32 probes per probe set, but in this transform we have focused attention only on the 16 PM probes. Finally, the fourth dimension is represented by the 12422 probe sets that target different transcripts. For genetic analysis and QTL mapping, dimensions 2 and 3 must be collapsed into single estimate of mean gene expression for each strain that can be compared with genotypes (dimension 1). Heritability is determined by the relative expression variance contributed by dimensions 1 and 2. The HWT1PM method uses the information from dimensions 1 and 2 to define weights that allow dimension 3 to be collapsed using a weighted average. Dimension 2 is still collapsed using a simple average.
Data source acknowledgment:
Data were generated with funds to RWW from the Dunavant Chair of
Excellence, University of Tennessee Health Science Center, Department
of Pediatrics. The majority of arrays were processed at Genome Explorations by Dr. Divyen Patel. We thank Guomin Zhou for generating advanced intercross stock used to produce most of the new BXD RI strains.
Information about this text file:
This text file originally generated by RWW and KFM, December 2003. Updated by RWW, Oct 31, Nov 6, 2004 and by KFM Nov 8, 2004.