Weighted Analysis of Microarray Experiments (WAME)
WAME was gradually generalised to the current state through three papers. For the main results, please view the abstracts of the papers below.
The WAME procedure is available in a R-package.
The R package
Note that this is an alpha release. Try help(package="WAME") for documented functions.
WAME 0.0.5 (2006-03-17)
WAME 0.0.7 (2006-10-24)
WAME 0.0.8 (2006-12-05) Primary changes: Included help. Try help(package="WAME") for documented functions.
WAME 0.0.9 (2006-12-12) Primary changes: Estimate of Sigma now includes error
code from nlm as attribute.
WAME 0.0.10 (2006-12-14) Primary changes: Minor changes in cross.plot.
WAME 0.0.11 (2007-06-12) Minor changes.
Type the following line in R to install the WAME package:
install.packages("WAME", contriburl="http://wame.math.chalmers.se")
or
update.packages("WAME", contriburl="http://wame.math.chalmers.se")
to update the package.
Send bug reports and suggestions to Anders Sjögren or Erik Kristiansson.
Anders Sjögren*, Erik Kristiansson, Mats Rudemo, and Olle Nerman,
Submitted to BMC Bioinformatics and available as preprint 2007:29, Mathematical Sciences, Chalmers University of Technology, ISSN 1652-9715.
* Corresponding author: anders.sjogren@math.chalmers.se.
Abstract
Background :
In DNA microarray experiments, measurements from different biological samples are often assumed to be independent and to have identical variance. For many datasets these assumptions have been shown to be invalid and typically lead to too optimistic p-values. A method called WAME has been proposed where a variance is estimated for each sample and a covariance is estimated for each pair of samples. The current version of WAME is, however, limited to experiments with paired design, e.g. two-channel microarrays.
Results :
The WAME procedure is extended to general microarray experiments, making it capable of handling both one- and two-channel datasets. Two public one-channel datasets are analysed and WAME detects both unequal variances and correlations. WAME is compared to other common methods: fold-change ranking, ordinary linear model with t-tests, LIMMA and weighted LIMMA. The p-value distributions are shown to differ greatly between the examined methods. In a resampling-based simulation study, the p-values generated by WAME are found to be substantially more correct than the alternatives when a relatively small proportion of the genes is regulated. WAME is also shown to have higher power than the other methods. WAME is available as an R-package.
Conclusions :
The WAME procedure is generalized and the limitation to paired-design microarray datasets is removed. The examined other methods produce invalid p-values in many cases, while WAME is shown to produce essentially valid p-values when a relatively small proportion of genes is regulated. WAME is also shown to have higher power than the examined alternative methods.
Erik Kristiansson*, Anders Sjögren, Mats Rudemo, and Olle Nerman,
Statistical Applications in Genetics and Molecular Biology: 5(1), Article 10.
* Corresponding author: erikkr@math.chalmers.se.
Abstract
In microarray experiments, several steps may cause sub-optimal quality and the
need for quality control is strong. Often the experiments are complex, with several conditions studied
simultaneously. A generalised linear model for paired microarray
experimemnts is proposed as a generalisation of
the paired two-sample method by Kristiansson et al. 2005. Quality variation
is modelled by different variance scales for different (pairs of) arrays, and shared
sources of variation are modelled by covariances between arrays. The
gene-wise variance estimates are moderated in an empirical Bayes
approach. Due to correlations all data is typically used in
the inference of any linear combination of parameters. Both real and
simulated data are analysed. Unequal variances and strong correlations are
found in real data, leading to further examination of the fit of the model
and of the nature of the datasets in general. The empirical distributions of
the test-statistics are found to have an considerably improved match to the null
distribution compared to previous methods, which implies more correct p-values provided that most genes
are non-differentially expressed. In fact, assuming independent observations
with identical variances typically leads to optimistic p-values.
The method is shown to perform better than the alternatives in the
simulation study.
Supplementary figures
ApoAI cross-plot
Cardiac cross-plot
Erik Kristiansson*, Anders Sjögren*+, Mats Rudemo, and Olle Nerman,
Statistical
Applications in Genetics and Molecular Biology: 4(1), Article 30.
* Both authors contributed equally, order was randomised.
+ Corresponding author: anders.sjogren@math.chalmers.se.
Abstract
In microarray experiments quality often varies, for example between
samples and between arrays. The need for quality con
trol is therefore strong. A statistical model and a corresponding
analysis method is suggested for experiments with pair
ing, including designs with individuals observed before and after
treatment and many experiments with two-colour spotted arrays. The
model is of mixed type with some parameters estimated by an empirical
Bayes method. Differences in quality are modelled by individual
variances and correlations between repetitions. The method is applied
to three real and sever
al simulated datasets. Two of the real datasets are of Affymetrix type
with patients profiled before and after treatment
, and the third dataset is of two-colour spotted cDNA type. In all
cases, the patients or arrays had different estimated variances,
leading to distinctly unequal weights in the analysis. We suggest also
plots which illustrate the variances and correlations that affect the
weights computed by our analysis method. For simulated data the
improvement relative to previously published methods without weighting
is shown to be substantial.
The supplementary source code is now deprecated and the functionality is available through the R package above.