Tuesday, August 18, 2015

demo: permutation tests for within-subjects cross-validation

This R code demonstrates how to carry out a permutation test at the group level, when using within-subjects cross-validation. I describe permutation schemes in a pair of PRNI conference papers; see DOI:10.1109/PRNI.2013.44 for an introduction and single subjects, and this one for the group level, including the fully balanced within-subjects approach used in the demo. A blog post from a few years ago also describes some of the issues, using examples structured quite similarly to this demo.

For this demo I used part of the dataset from doi:10.1093/cercor/bhu327, which can be downloaded from the OSF site. I did a lot of temporal compression with this dataset, which is useful for the demo, since only about 18 MB of files need to be downloaded.Unfortunately, none of the analyses we did for the paper are quite suitable to demonstrate simple permutation testing with within-subjects cross-validation, so this demo performs a new analysis. The demo analysis is valid, just not really sensible for the paper's hypotheses (so, don't be confused when you can't find it in the paper!).


The above figure is generated by the demo code, and shows the results of the test. The demo uses 18 subjects' data, and their null distributions are shown as blue histograms. The true-labeled accuracy for each person is plotted as a red line, and listed in the title, along with its p-value, calculated from the shown null distribution (the best-possible p-value, 1/2906, rounds to 0).

The dataset used for the demo has no missings: each of the people has six runs, with four examples (two of each class) in each run. Thus, I can use a single set of labels for the permutation test, carrying out the relabelings and classifications in each person individually (since it's within-subjects cross-validation), but with the null distribution for each person built from the same relabelings. Using the same relabelings in each person allows the group-level null distribution (green, in the image above) to be built from the across-subjects average accuracy for each relabeling. In a previous post I called this fully-balanced strategy "single corresponding stripes", illustrated with the image below; see that post (or the demo code) for more detail.

The histogram for the across-subjects means (green histogram; black column) is narrower than the individual subject's histograms. This is sensible: for any particular permutation relabeling, one person might have a high accuracy, and another person a low accuracy; averaging the values together gives a value closer to chance. Rephrased, each individual has at least one permutation with very low (0.2) accuracy (as can be seen in the blue histograms). But different labelings made that low accuracy in each person, so the lowest group-mean accuracy was 0.4.

The group mean of 0.69 was higher than all the permuted-label group means, giving a p-value of 0.0003 = 1/2906 (2906 permutation relabelings were run, all possible). The equivalent t-test is shown in the last panel, and also produces a very significant p-value.

Friday, August 14, 2015

start planning: PRNI 2016

Start planning: the 6th International Workshop on Pattern Recognition in Neuroimaging will be held at the University of Trento (Italy), (probably) 21-24 June 2016. PRNI is an OHBM satellite conference, generally held several days before and somewhat geographically near the OHBM conference, which next year is in Geneva, Switzerland, 26-30 June 2016. Once the hosting is sorted out the website will be (as for past meetings) linked from www.prni.org.

The paper deadline will be mid-March 2016, perhaps with a later deadline for short abstract-only poster submissions. Previous workshops' papers were published by IEEE; search for "Pattern Recognition in Neuroimaging". Conference papers are standard in many engineering-related fields, but might be unfamiliar for people from a psychology or neuroscience background. Basically, PRNI papers are "real" publications: they are peer-reviewed, published in proceedings, indexed, and cite-able. The paper guidelines aren't settled yet, but will likely be similar to those of previous years. MVPA methods papers (applications, statistical testing, ...) are a good fit for PRNI; not just fMRI, but any neuroimaging modality is welcome.

Monday, August 3, 2015

R? python? MATLAB?

I've been asked a few times why I use R for MVPA, and what I think people just getting into MVPA should use. I don't think that there is a universally "best" package for MVPA (or neuroimaging, or statistics), but here are some musings.

The question as to why I personally started using R for MVPA is easy: I started before MVPA packages were available, so I had to write my own scripts, and I prefer scripting in R (then and now). Whether to keep using my own scripts or switch to pyMVPA (or some other package) is something I reconsider occasionally.

A very big reason to an established package is that it's a known quantity: coding bugs have hopefully been caught, and analyses can be reproduced. Some packages are more open (and have more stringent tests) than others, but in general, the more eyes that have studied the code and tried the routines, the better. This need for openness was one of my motivations for starting this blog: to post bits of code and detailed methods descriptions. I think the more code and details we share (blog, OSF, github, whatever), the better, regardless of what software we use (and I wish code was hosted by journals, but that's another issue).

I'm a very, very big fan of using R for statistical analyses, and of knitr (sweave and RMarkdown are also viable options in R) for documenting the various analyses, results, impressions, and decisions as the research progresses (see my demo here), regardless of the program that generated the raw data. My usual workflow is to switch to knitr once an analysis reaches the "what happened?" stage, regardless of the program that generated the data being analyzed (e.g., I have knitr files summarizing the motivation, procedure, and calculating results from cvMANOVA analyses run in MATLAB). Python has the iPython Notebook, which is sort of similar to knitr (I don't think as aesthetically pleasing, but that's a matter of taste); I don't think MATLAB has anything equivalent. Update 12 August 2015 (Thanks, Dov!): MATLAB comes with Publishing Markup, which (at a quick glance) looks similar to RMarkdown.

All neuroimaging (and psychology, neuroscience, ...) graduate students should expect to learn a proper statistical analysis language, by which I mostly mean R, with MATLAB and python coming in as secondary options. In practice, if you have proficiency in one of these programs you can use the others as needed (the syntax isn't that different), or have them work together (e.g., calling MATLAB routines from R; calling R functions from python). The exact same MVPA can be scripted in all three languages (e.g., read in NIfTI images, fit a linear SVM, write the results into a file), and I don't see that any one of the three languages is clearly best or worst. MATLAB has serious licensing issues (and expense); python dependencies can be a major headache, but which program is favored seems to go more with field (engineers for MATLAB, statisticians for R) and personal preference than intrinsic qualities.

So, what should a person getting started with MVPA use? I'd say an R, python, or MATLAB-based package/set of scripts, with which exact one depending on (probably most important!) what your colleagues are using, personal preference and experience (e.g, if you know python in and out, try pyMVPA), and what software you're using for image preprocessing (e.g., if SPM, try PRoNTO). Post-MVPA (and non-MVPA) investigations will likely involve R at some point (e.g., for fitting mixed models or making publication-quality graphs), since it has the most comprehensive set of functions (statisticians favor R), but that doesn't mean everything needs to be run in R.

But don't start from scratch; use existing scripts/programs/functions as much as possible. You should mostly be writing code for analysis-specific things (e.g., the cross-validation scheme, which subjects are patients, which ROIs to include), not general things (like reading NIfTI images, training a SVM, fitting a linear model). Well-validated functions exist for those more general things (e.g., oro.nifti, libsvm); use them.

Monday, July 6, 2015

Notes from the OHBM 2015 "Statistical Assessment of MVPA Results" Morning Workshop

Thanks to everyone that attended and gave feedback on the OHBM morning workshop "Statistical Assessment of MVPA Results" that Yaroslav Halchenko and I organized! We've received several requests for slides and materials related to the workshop, so I'll collect them here. It appears that material from the meeting will also be searchable from links on the main OHBM 2015 page. As always, all rights are reserved, and we expect to be fully cited, acknowledged, and consulted for any uses of this material.

I started the workshop off with a tutorial on permutation testing aimed at introducing issues particularly relevant for MVPA (and neuroimaging datasets in general). I'll eventually post a version of the slides, but some of the material is already available in more detail in two PRNI conference papers:
  • Etzel, J.A. 2015. MVPA Permutation Schemes: Permutation Testing for the Group Level. 5th International Workshop on Pattern Recognition in NeuroImaging (PRNI 2015). Stanford, CA, USA. In press, full text here, and in ResearchGate.
  • Etzel, J.A., Braver, T.S., 2013. MVPA Permutation Schemes: Permutation Testing in the Land of Cross-Validation. 3rd International Workshop on Pattern Recognition in NeuroImaging (PRNI 2013). IEEE, Philadelphia, PA, USA. DOI:10.1109/PRNI.2013.44. Full text here, and in ResearchGate.
Next, Johannes Stelzer gave a talk entitled "Nonparametric methods for correcting the multiple comparisons problem in classification-based fMRI", the slides for which are available here.

Then, Nikolaus Kriegeskorte gave a talk entitled "Inference on computational models from predictions of representational geometries", the slides for which are available here.

Finally, Yaroslav Halchenko finished the session with a talk giving an "Overview of statistical evaluation techniques adopted by publicly available MVPA toolboxes", the slides for which are available here

Monday, May 18, 2015

resampling images with wb_command -volume-affine-resample

I often need to resample images without performing other calculations, for example, making a 3x3x3 mm voxel version of an anatomical image with 1x1x1 mm voxels for use as an underlay. This can be done with ImCalc in SPM, but that's a bit annoying, as it requires firing up SPM, and only outputs two-part NIfTI images (minor annoyances, but still).

The wb_command -volume-affine-resample program gets the resampling done at the command prompt with a single long command:

 wb_command -volume-affine-resample d:/temp/inImage.nii.gz d:/temp/affine.txt d:/temp/matchImage.nii CUBIC d:/temp/outImage.nii  

If the wb_command program isn't on the path, run this at the command prompt, from wherever wb_command.exe (or the equivalent for your platform) is installed. A lot of  things need to be specified:
  • inImage.nii.gz is the image you want to resample (for example, the 1x1x1 mm anatomical image)
  • affine.txt is a text file with the transformation to apply (see below)
  • matchImage.nii is the image with the dimensions you want the output image to have - what inImage should be transformed to match (for example, the 3x3x3 mm functional image)
  • CUBIC is how to do the resampling; other options are TRILINEAR and ENCLOSING_VOXEL
  • outImage.nii is the new image that will be written: inImage resampled to match matchImage; specifying a outImage.nii.gz will cause a gzipped NIfTI to be written.
The program writes outImage as a one-file (not a header-image pair) NIfTI. It takes input images as both compressed (i.e., .nii.gz) and uncompressed (i.e., .nii) one-file NIfTIs, but didn't like a header-image pair for input.

You need to specify an affine transform, but I don't want to warp anything so the matrix is all 1s and 0s; just put this matrix into a plain text file (I called it affine.txt):
 1 0 0 0  
 0 1 0 0  
 0 0 1 0  

UPDATE 20 May 2015: Changed the resampling method to CUBIC and added a note that the program can output compressed images, as suggested by Tim Coalson.

Friday, May 15, 2015

MVPA on the surface: to interpolate or not to interpolate?

A few weeks ago I posted about a set of ROI-based MVPA results using HCP images, comparing the results of doing the analysis with the surface or volume version of the dataset. As mentioned there, there hasn't been a huge amount of MVPA with surface data, but there has been some, particularly using the algorithms in Surfing (they're also in pyMVPA and CoSMoMVPA), described by Nikolaas Oosterhof (et al., 2011).

The general strategy in all MVPA (volume or surface) is usually to minimize changing the fMRI timeseries as much as possible; motion correction is pretty much always unavoidable, but is sometimes the only whole-brain image manipulation applied: voxels are kept in the acquired resolution, not smoothed, not slice-time corrected, not spatially normalized to an atlas (i.e., each individual analyzed in their own space, allowing the people to have differently-shaped brains). The hope is that this minimal preprocessing will maximize spatial resolution: since we want to detect voxel-level patterns, let's change the voxels as little as possible.

The surface searchlighting procedure in Surfing follows this minimum-voxel-manipulation strategy, using a combination of surface and volume representations: voxel timecourses are used, but adjacency determined from the surface representation. Rephrased, even though the searchlights are drawn following the surface (using a high-resolution surface representation), the functional data is not interpolated, but rather kept as voxels: each surface vertex is spatially mapped to a voxel, allowing multiple vertices to fall within a single voxel in highly folded areas. Figure 2 from the Surfing documentation  shows this dual surface-and-volume way of working with the data, and describes the voxel selection procedure in more detail. In the way I've described my own searchlight code, the Surfing procedure results in a lookup table (which voxels constitute the searchlight for each voxel) where the searchlights are shaped to follow the surface in a particular way.

It should be possible to do this (Surfing-style, surface searchlights with voxel timecourses) with the released HCP data. The HCP volumetric task-fMRI images are spatially normalized to the MNI atlas, which will simplify things, since the same lookup table can be used with all people, though possibly at the cost of some spatial normalization-caused distortions. [EDIT 17 May 2015: Nick Oosterhof pointed out that even with MNI-normalized volumetric fMRI data, the subject-space surfaces could be used to map adjacent vertices, in which case each person would need their own lookup table. With this mapping, the same i,j,k-coordinate voxel could have different searchlights in different people.]

The HCP task fMRI data is also available as (CIFTI-format) surfaces, which were generated by resampling the (spatially-normalized) voxels' timecourses into surface vertices. The timecourses in the HCP surface fMRI data have thus been interpolated several times, including to volumetric MNI space and to the vertices.

Is this extra interpolation beneficial or not? Comparisons are needed, and I'd love to hear about any if you've tried them. The ones I've done so far are with comparatively large parcels, not searchlights, and certainly not the last word.

grey matter musings

fMRI data is always acquired as volumes,  usually (in humans) with voxels something like 2x2x2 to 4x4x4 mm in size. Some people have argued that for maximum power analyses should concentrate on the grey matter, ideally as surface representations. This strikes me as a bit dicey: fMRI data is acquired at the same resolution all over the brain; it isn't more precise where the brain is more folded (areas with more folding have closer-spaced vertices in the surface representation, so multiple vertices can fall within a single voxel).

But how much of a problem is this? How does the typically-acquired fMRI voxel size compare to the size of the grey matter? Trying to separate out fMRI signals from the grey matter is a very different proposition if something like ten voxels typically fit within the ribbon vs. just one.

Fischl and Dale (2000, PNAS, "Measuring the thickness of the human cerebral cortex from magnetic resonance images") answers my basic question of how wide the grey matter typically is in adults: 2.5 mm. This figure (Figure 3) shows the histogram of grey matter thickness that they found in one person's cortex; in that person, "More than 99% of the surface is between 1- and 4.5-mm thick."

So, it's more typical that the grey matter is one fMRI voxel wide than multiple.  A 4x4x4 mm functional voxel will be wider than nearly all grey matter; most voxels within the grey matter will contain some fractional proportion, not just grey matter. Things are better with 2x2x2 mm acquired voxels, but it will still be the case that a voxel falling completely into the grey matter will be fairly unusual, and even these totally-grey voxels will surrounded on several sides by non-grey matter voxels. To make it concrete, here's a sketch of common fMRI voxel sizes on a perfectly straight grey matter ribbon.



This nearness of all-grey, some-grey, and no-grey voxels is problematic for analysis. An obvious issue is blurring from motion: head motion of a mm or two within a run is almost impossible to avoid, and will totally change the proportion of grey matter within a given voxel. Even if there was no motion at all, the different proportions of grey matter causes problems ("partial volume effects"; see for example): if all the signal came from the grey matter, the furthest-right 2 mm voxels in the image above would be less informative than the adjacent 2 mm voxel which is centered in the grey, just because of the differing proportion grey. Field inhomogeneity effects, scanner drift, slice-time correction, resampling, smoothing, spatial normalization, etc. cause further blurring.

But the cortex grey matter is of course not perfectly flat like in the sketch: it's twisted and folded in three dimensions, like shown here in Figure 1 from Fischl and Dale (2000). This folding leads complicates things further: individual voxels still have varying amounts of grey matter, but can also encompass structures far apart if measured along the surface.





This figure is panels C (left) and D (right) from Figure 2 of Kang et al. (2007, Magnetic Resonance Imaging. Improving the resolution of functional brain imaging), and illustrates some of the "complications". The yellow outline at left is the grey-white boundary on an anatomical image (1x1x1 mm), with two functional voxels superimposed, one in red and one in green (the squares mark the voxels' corners; they had 1.88x1.88x5 mm functional voxels). The right pane shows the same two voxels' locations in a surface flat map (dark areas grey matter, light areas white). In their words, "Although the centers of the filled squares in the corners of the red and green functional voxels in (C) are the same distance apart in the 3-D space and points in the same voxel must be within 5.35 mm, functional activations in the red voxel spread to areas over 30 mm apart on the flat map, while activations in the green voxel remain close to each other."

Volume-to-surface mapping algorithms and processing pipelines attempt to minimize these problems, but there's no perfect solution: acquired voxels will necessarily not perfectly fall within the grey matter ribbon. We shouldn't allow the perfect to be the enemy of the good (no fMRI research would ever occur!) and give up on grey matter-localized analyses entirely, but we also shouldn't discount or minimize the additional difficulties and assumptions in surface-based fMRI analysis.