Tuesday, March 11, 2014

Allefeld 2014: Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA

A recent paper by Carsten Allefeld and John-Dylan Haynes, "Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA (see full citation below), caught my eye. The paper advocates using a MANOVA-related statistic for searchlight analysis instead of classification-based statistics (like linear SVM accuracy). Carsten implemented the full procedure for SPM8 and Matlab; the code is available on his website.

In this post I'm going to describe the statistic proposed in the paper, leaving the discussion of when (sorts of hypotheses, dataset structures) a MANOVA-type statistic might be most suitable for a (possible) later post. There's quite a bit more in the paper (and to the method) than what's summarized here!

MANOVA-related statistics have been used/proposed for searchlight analysis before, including Kriegeskorte's original paper and implementations in BrainVoyager and pyMVPA. From what I can tell (please let me know otherwise), this previous MANOVA-searchlights fit the MANOVA on the entire dataset at once: all examples/timepoints, no cross-validation. Allefeld and Haynes propose doing MANOVA-type searchlights a bit differently: “cross-validated MANOVA” and “standardized pattern distinctness”.

Most of the paper's equations review multivariate statistics and the MANOVA; the “cross-validated MANOVA” and “standardized pattern distinctness” proposed in the paper are in equations 14 to 17:
  • Equation 14 is the equation for the Hotelling-Lawley Trace statistic, which Allefeld refers to as D ("pattern distinctness").
  • Equation 15 shows how Allefeld and Haynes propose to calculate the statistic in a "cross-validated" way. Partitioning on the runs, they obtain a D for each partition by calculating the residual sum-of-squares matrix (E) and the first part of the H equation from the not-left-out-runs, but the second part of the H equation from the left-out run.
  • Equation 16 averages the D from each "cross-validation" fold, then multiplies the average by a correction factor calculated from the number of runs, voxels, and timepoints.
  • Finally, equation 17 is the equation for “standardized pattern distinctness”: dividing the value from equation 16 by the square root of the number of voxels in the searchlight.
To understand the method a bit better I coded up a two-class version in R, using the same toy dataset as my searchlight demo. Note that this is a minimal example to show how the "cross-validation" works, not necessarily what would be useful for an actual analysis, and not showing all parts of the procedure.

The key part from the demo code is below. The dataset a matrix, with the voxels (for a single searchlight) in the columns and the examples (volumes) in the rows. There are two classes, "a" and "b". For simplicity, the "left-out run" is called "test" and the others "train", though this is not training and testing as meant in machine learning. train.key is a vector giving the class labels for each row of the training dataset.

For Hotelling's T2 we first calculate the "pooled" sample covariance matrix in the usual way, but using the training data only:
S123 <- ((length(which(train.key == "a"))-1) * var(train.data[which(train.key == "a"),]) + (length(which(train.key == "b"))-1) * var(train.data[which(train.key == "b"),])) / (length(which(train.key == "a")) + length(which(train.key == "b")) - 2);

To make the key equation more readable we store the total number of "a" and "b" examples:
a.count <- length(which(train.key == "a")) + length(which(test.key == "a"));
b.count <- length(which(train.key == "b")) + length(which(test.key == "b"));

and the across-examples mean vectors:
a.test.mean <- apply(test.data[which(test.key == "a"),], 2, mean);
b.test.mean <- apply(test.data[which(test.key == "b"),], 2, mean) ;  

a.train.mean <- apply(train.data[which(train.key == "a"),], 2, mean); 
b.train.mean <- apply(train.data[which(train.key == "b"),], 2, mean);

now we can calculate the Hotelling's T2 (D) for this "cross-validation fold" (note that solve(S123) returns the inverse of matrix S123):
((a.count*b.count)/(a.count+b.count)) * (t(a.train.mean-b.train.mean) %*% solve(S123) %*% (a.test.mean-b.test.mean));

The key is that, paralleling equation 15, the covariance matrix is computed from the training data, multiplied on the left by the mean difference vector from the training data, then on the right by the mean difference vector from the testing data.

Should we think of this way of splitting the Hotelling-Lawley Trace calculation as cross-validation? It is certainly similar: a statistic is computed on data subsets, then combined over the subsets. It feels different to me though, partly because the statistic is calculated from the "training" and "testing" sets together, and partly because I'm not used to thinking in terms of covariance matrices. I'd like to explore how the statistic behaves with different cross-validation schemes (e.g. partitioning on participants or groups of runs), and how it compares to non-cross-validated MANOVA. It'd also be interesting to compare the statistic's performance to those that don't model covariance, such as Gaussian Naive Bayes.

Interesting stuff; I hope this post helps you understand the procedure, and to keep us all thinking about the statistics we choose for our analyses.

ResearchBlogging.orgAllefeld C, & Haynes JD (2014). Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA. NeuroImage, 89, 345-57 PMID: 24296330


  1. Hi Jo,

    thank you very much for publicizing our work on your blog. Just a few comments:

    – To my knowledge, MANOVA-related statistics have been used in MVPA by Kriegeskorte et al. (2006; Mahalanobis distance) and Haynes and Rees (2005; Wilks' Lambda). BrainVoyager's "MANOVA" option actually implements Mahalanobis distance and Hotelling's T-squared. As far as I know, PyMVPA only supports classifiers so far, but I believe the authors are interested in integrating cvMANOVA in the future.

    – Equation 14 is the equation for something that can be considered the "true value" of the Lawley–Hotelling trace, but normally that trace is only considered as a test statistic, i.e. computed from a sample. That's why we chose to introduce the new term "pattern distinctness" and symbol D to denote that true value, and to further distinguish between the Lawley–Hotelling trace as a biased estimator of D and our cross-validated unbiased estimator ^D.

    – As you already mentioned, there are some differences between your example of a cross-validated Hotelling's T-square and our cross-validated MANOVA. The two most important ones are: cvMANOVA uses not just an estimate of covariance pooled across the two classes in question, but utilizes the residuals of the full multivariate GLM across all "training" volumes, obtaining a much more precise covariance estimate. And, cvMANOVA works not just for two classes, but for arbitrary contrasts in a complex experimental design.

    – It is true that cross-validation is most often used in a classifier-training & -testing context, but the concept is more general than that. Wikipedia describes it as a procedure for model validation that guards against overfitting, and gives an initial example that doesn't use classification, but linear regression. Overfitting in that case leads to an underestimation of the model error as measured by the MSE. Cross-validation removes most of that estimation bias, and the remaining part is corrected by a factor that depends on the number of data points and the number of regressors. Though our procedure might be not quite as transparent, the correspondence should be clear. Instead of cross-validating residual error, we essentially use a *product* of the estimated effect Beta_Delta between "training" and "test" data, which can be seen as a "correlation"-type measure of model fit, leading to an unbiased estimator of explained variance.


    1. Thanks for your clarifying comments! I'll respond a bit to the last one here.

      My idea of cross-validation comes from the way it is framed in a machine learning context (e.g. Hastie's "The Elements of Statistical Learning"), where keeping the training and testing sets totally independent on each fold is of prime importance; hints that the training and testing sets are mixed together at any stage of analysis are a major red flag.

      But perhaps the a more precise way to see the difference between your cvMANOVA and a “standard” leave-one-run-out linear SVM MVPA is not that the cvMANOVA *mixes* the training and testing sets, but rather that the statistic for each fold is calculated from all test examples *simultaneously*.

      When using a linear SVM we give the algorithm the training dataset, and it returns the weights (w) and constant (b) of the decision hyperplane (e.g. http://mvpa.blogspot.com/2014/02/code-snippet-extracting-weights-from.html). We then plug each test set example into that hyperplane equation and record if it gives the correct class (or not), then we average these proportions across the cross-validation folds.

      But I think we could describe your procedure in the same terms, and with a diagram like Figure 2 in http://mvpa.blogspot.com/2013/06/mvpa-permutation-schemes-permutation.html: instead of getting the equation for a decision hyperplane from the training data we get the matrix (for the little two-class example in the post) produced from ((a.count*b.count)/(a.count+b.count)) * (t(a.train.mean-b.train.mean) %*% solve(S123). Then, instead of calculating the proportion of test set examples classified correctly (one at a time through the fit hyperplane equation), we do the matrix multiplication with the entire test set at once to produce the statistic for the cross-validation fold. These statistics (proportion correct or pattern distinctness) are then averaged across the folds and carried to the next stage of analysis.

      So, I am beginning to warm to the idea that your cvMANOVA is doing cross-validation in a sensible manner. ;)

    2. Exactly. Training and test data have to be kept separate in the sense that the model built from (the classifier trained on) the training data must not be influenced by the test data, and the test data used to validate the model (test the classifier) must not be modified based on properties of the training data (or the model / classifier). But what comes out of the cross-validation procedure, the accuracy or our ^D, is always (an average of fold-wise) functions of training *and* test data. If it were only a function of one, the other part of the data would be wasted. :-)

      Concerning "simultaneously": maybe here the difference between us is that I'm very much used to decoding of run-wise beta estimates, which is the standard in our lab, and therefore for two classes there are only two test data points in each fold. So in a way the test statistic is computed "simultaneously" for all trials of one class, and again "simultaneously" for all trials of the second class.