Wednesday, May 10, 2017

task fMRI motion censoring (scrubbing) #3: impact

 This is the third post in a series, which started with what we were aiming for with the censoring, then how we implemented it for a dataset. Here, I'll show an example of how the censoring changed the GLM results. The GLMs were run in afni, using 3dDeconvolve and TENT functions; the task timing is a complex mixed design (trials embedded in blocks), so the model has both event-related and block-related ("sustained activity") components.

Setting our censoring threshold to FD > 0.9 removed very few frames; less than 2% in most runs, and never more than 12%. I wondered if such a modest amount of censoring would have a noticeable impact, but it looks like it did (which is a bit of a reminder of GLM sensitivity, but that's another issue).

Here's an example person. In the Stroop task for this person 1 frame was censored in the Bas session, 15 in the Pro, and 0 in the Rea. There are more than 1000 frames in each session, so this is not many ant all. The acquisition was with the MB4 paradigm, so we have a pair of runs (AP - PA encoding) for the task each session. Here are motion and FD traces for the person, the Pro session (highest censoring); this is a pretty good set of traces, with the biggest spikes censored at FD > 0.9 (red x).

Now, here are F-statistic images for the same task and person, for the sustained ("block") effect from a GLM. The no-censoring image is first, followed by the with-censoring image. The first row for each is the Bas session, followed by Pro, then Rea; the color scaling is the same in each.

The third (Rea) row is identical in the two images: no censoring (so they better match, since each session went into a separate GLM). The top rows (Bas) look very similar, though the peak values (rng in upper right) vary slightly. The second row (Pro), with the 15 censored frames, varies quite a bit between the two. I found the area marked with the blue arrows particularly interesting: the white matter is much brighter in the no-censoring version, and this white-matter pattern isn't present in the other (less spiky motion) runs, and looks very much like an artifact (not BOLD-ish); particularly the k=44 slice.

Similar effects are seen in the TENTs: high-motion people and sessions tend to have spikier (and less HRF-like) shapes, which is ameliorated a bit by the censoring. There seems to be a bit less ringing with censoring, as well. So, while these are preliminary, qualitative assessments, I'm encouraged that this small amount of censoring may be sensible.

Thursday, May 4, 2017

task fMRI motion censoring (scrubbing) #2: implementing

In the previous post I showed some plots of motion regressors (with enorm and FD) from an ongoing study, with qualitative descriptions and examples of the sort of motion we're seeing. In this post I'll describe some of the literature about motion censoring for task fMRI, and how we decided to implement censoring for this study.

Probably the most directly relevant recent paper for censoring task fMRI datasets, and the one whose recommendations we're broadly following, is Siegel, et. al (2014). They explored the effects of censoring on three datasets at various FD thresholds. As is reasonable, given the great variations in experiments, they refrain from making "universal recommendations", but do provide useful summaries and guidelines.

As in most things, there's no free lunch with censoring: increasing the amount of censoring reduces the number of trials available for response estimation, but hopefully lets those estimates be more reliable. Siegel, et. al (2014) found that a threshold of FD > 0.9 did well in many cases, and generally suggest fairly light censoring - removing the highest-motion frames, not every frame with any movement (see the Discussion section, page 1994). Further, they suggest removing only the above-threshold frames, not adjacent frames (page 1992):
"In a one-factor ANOVA, the FD > 0.9, (f0, b0) mask produced significantly higher zscores than all of the other masks except FD > 1.1 mm (f0,b1) which was not significantly different. On the basis of these results, we do not recommend removing volumes proceeding or following high-motion volumes ..."
Siegel, et. al (2014) didn't attempt to interpolate censored frames, citing the difficulty in accurately interpolating gaps of more than one TR. This strikes me as reasonable, particularly in task designs, where, depending on the analysis, it may be best to simply omit trials with above-threshold movement.

Setting the censoring threshold for any particular study is at least partially subjective, which is unfortunate, given the already-too-many experimenter degrees of freedom. We decided to see if the FD > 0.9 threshold suggested by Siegel, et. al (2014) seemed reasonable: did it capture rare spiky motion, but not oscillations? What percentage of frames were censored? This effort is what let to the images in the previous post: I marked the censored frames on plots of each run's motion, and we judged whether the marked frames seemed reasonable. In our case, no run had more than 12% of the frames censored, and most had less than 2%, so we decided to proceed with the FD > 0.9 threshold.

Looking at papers citing Siegel, et. al (2014), I found one using FD > 0.9 for censoring (Davis, Goldwater, & Giron, 2017), one with FD > 0.8 (O'Hearn, et. al, 2016), and one with FD > 0.5 (Bakkour et. al, 2017). Others mention censoring for motion, but without giving details, and I've heard people mention censoring based on standard deviations of the estimates within the particular dataset. Censoring based on enorm values is pretty similar to the FD used by Siegel, though afni tends to recommend a smaller threshold, such as 0.3, for adult task fMRI. I don't have time to compile a summary of common enorm-based thresholds, but would be interested if someone else finds or creates one!

A final consideration is whether to use only censoring, or censoring plus having the motion estimates as nuisance regressors in the GLM. As summarized in Siegel, et. al (2014), page 1992:
"Motion censoring generally outperformed motion regressions. Censoring at FD > 0.9 mm performed significantly better than the best regression .... To see whether a combination of censoring and regression might most benefit the data, a GLM was created using the default censoring settings and regressions of the derivatives of realignment estimates. The changes in z-score produced by this GLM were not significantly different from censoring alone ..." 
We are currently including 6 motion regressors in the (3dREMLfit) GLMs, plus omitting the censored frames. We might need to reevaluate that choice at some point; we did a bit of looking at including more regressors, but haven't previously considered using only censoring.

Friday, April 21, 2017

task fMRI motion censoring (scrubbing) #1: categorizing

Motion ... whether caused by head movement, other movement, breathing, or something else, it is one of the banes of fMRI. Motion artifacts are a huge issue for resting state fMRI, but not only - it causes big problems in task fMRI as well.The best things to do, of course, is to minimize movement during acquisition, by consistent head positioning, bracing with pads (or other systems). But no system is perfect (or able to eliminate breathing and heart beats), so we need to consider motion in the analyses. Here (as usual, though it's certainly not perfect) I'll use the motion traces (by which I mean the x, y, z, roll, pitch, yaw values produced during realignment and often used as nuisance regressors) as a proxy for motion.

Before deciding on any sort of censoring scheme for a study, it's good to look at the motion from all of the people, to get an idea of general movement categories. This post will show some runs I've decided are representative; exemplars of different sorts of movement. For background, these are individual runs from a cognitive task fMRI study, mostly with an MB4 acquisition scheme (details here).

All of these plots have vertical grey lines at one-minute intervals; the runs are around 12 minutes long. The horizontal green lines show the timing of the three task blocks present in each run; tasks were presented at random times and of varying durations during these blocks. The top pane has the translation (mm) and rotation (degrees) from the Movement_Regressors.txt file produced during (HCP-style) preprocessing. The second pane has the enorm and FD versions of the same motion traces, in mm.

I'll start with really nice traces, then work through to some that are not so nice, illustrating our qualitative categorization. I think it's useful to "calibrate your eyes" in this way to have a baseline understanding of some of the data characteristics before starting serious analyses or data manipulations.

Best possible: freakishly smooth: not even 0.5 mm translation over the entire 12 minute run; the little jiggles are probably related to breathing, and are also incredibly regular.

Not perfect, but very good; isolated spiky movement. This trace has very little drifting, almost entirely regular oscillations. This is the sort of movement that seems exactly suited to motion censoring: quite nice, except for a few short periods. (The frames censored with a threshold of FD > 0.9 are marked by red x.)

The next category are traces with prominent oscillations, but otherwise pretty clean (not terribly spiky or drifting), and fairly consistent in magnitude and frequency across the run. We'll be using these types of runs without censoring in our analyses (at least for now).

Finally, are the ones of more questionable quality and utility: numerous spikes, drifting, and/or changes in oscillation magnitude. Frames to be censored at FD > 0.9 are marked, but that's only designed to detect spikes. Slow drifts have generally been considered less problematic for task fMRI than spikes, and we generally have comparatively few drifts in this dataset, regardless.

Spiking and drifting are fairly familiar in motion traces; oscillations, less so. (Though I'm sure this sort of movement existed prior to SMS!) It is certainly possible that the oscillation changes (e.g., third image in last set, second in previous pair) reflect changes in respiration rate (perhaps at least somewhat due to entraining to the task timing), which could affect BOLD in all sorts of problematic ways, and for extended periods. We're actively looking into ways to quantify these sorts of effects and minimize (or at least understand) their impacts, but I don't think there are any simple answers. We have respiration and pulse recordings for most runs, but haven't yet been working with those in detail.

Tuesday, March 21, 2017

upcoming events: PRNI and OHBM

June will be busy for me: I'll be attending PRNI in Toronto, then on to Vancouver for OHBM. Here's a bit of a sales pitch; hope to see many of you there!

PRNI (Pattern Recognition in NeuroImaging) is a great little conference, focused on machine learning neuroimaging applications (lots of fMRI, but also EEG, MEG, etc.). It has aspects of both engineering conferences, with proceedings (you can submit a short paper - and there's still time; the deadline isn't until 3 14 April) and psychology conferences (you can submit a short abstract for poster presentation). We're still collecting tutorial proposals, but have a good lineup of invited speakers: Randy McIntosh, Janaina Mourao-Miranda, Rajeev Raizada, and Irina Rish. Besides the website (, we put announcements on facebook and twitter (@prniworkshop).

At OHBM I'll be speaking at the PR4NI tutorial on Sunday, "A new MVPA-er’s guide to fMRI datasets". Here's the abstract: "fMRI datasets have properties which make the application of machine learning (pattern recognition) techniques challenging – and exciting! This talk will introduce some of the properties most relevant for MVPA, particularly the strong temporal and spatial dependencies inherent in BOLD imaging. These dependencies mean that some fMRI experimental designs are more suitable for MVPA than others, due, for example, to how the tasks are distributed within scanner runs. I will also introduce some of the necessary analysis choices, such as how to summarize the response in time (e.g., convolving with an HRF), which brain areas to include, and feature selection techniques."

I also organized a symposium, which will be Tuesday morning, "High resolution fMRI via multiband (SMS) acquisition: opportunities and limitations". This symposium isn't about MVPA, but rather the practicalities of high-resolution fMRI: working with fMRI datasets with small voxels (say, 2 mm or so isotropic) and multiband acquisitions is different than single-shot fMRI datasets with 3 mm voxels. I will put up a separate post with talk and speaker details soon - I think it'll be a great session. Finally, I'll be presenting a poster sometime, about MVPA-ish twin similarity analyses using parts of the HCP dataset.

Wednesday, March 1, 2017

adjusting my mental model: movement correlation after preprocessing

It's time to adjust my mental model of the fMRI signal: there's a lot more correlation with movement in the timecourses after preprocessing than I'd expected. That movement really affects fMRI is not at all new, of course, and is why including the motion regressors as covariates in GLMs is standard. But I'd pictured that after preprocessing (assuming it went well and included realignment and spatial normalization) the correlation with movement left in the voxel timecourses should be pretty low (something like normally distributed, centered on 0, ranging between -0.2 to 0.2), without much spatial structure (i.e., maybe some ringing, but fairly uniformly present over the brain). Asking around, I think this is a fairly common mental model, but it looks to be quite wrong.

For exploring, I simply used afni's 3dTcorr1D program to correlate the timecourse of every voxel in several preprocessed task fMRI datasets with each of the six motion regressors (generated during preprocessing). 3dTcorr1D makes an image with 6 entries in the 4th dimension (sub-brick, in afni-speak), one for each of the 6 motion columns; the value in each voxel the Pearson correlation between that voxel's timecourse and the movement column. I plotted these correlations on brains, and made histograms to summarize the distribution.

The correlations are much higher than I expected, even in people with very little movement. Here's an example; more follow. Below are the motion regressors from single task run (about 12.5 minutes long; HCP-style preprocessing; MB8 protocol), the correlation with each motion regressor, and a (density-style) histogram of the voxel-wise correlations. Color scaling for this and all brain images is from 1 (hottest) to -1 (coolest), not showing correlations between -0.25 and 0.25.

If my expectation (correlations normally distributed, centered on 0, ranging between -0.2 to 0.2) was right, there shouldn't be any color on these images at all, but there's clearly quite a bit: many voxels correlate around 0.5 with roll and pitch (4th and 5th brain rows are mostly "hot" colors), and around -0.5 with x, y, and z (first three rows mostly "cool" colors). There's some structure to the peak correlations (e.g., a hotter strip along the left side in slice 46), which may correspond with sulci or large vessels, but it's rather speckly. Note that this is a pretty low motion subject overall: less than 2 mm drift over the 12 minute run, and only 4 volumes marked for censoring (I didn't censor before the correlation).

Looking at other people and datasets, including from non-SMS acquisitions with larger voxels and longer TRs, it appears like correlations of 0.5 are pretty common: this isn't just some sort of weird effect that only shows up with high-resolution acquisitions. For another example, these are the histograms and motion regressors for four runs from one person included in this study (acquired with 4 mm isotropic voxels; run duration 7.6 min, TR 2.5 sec, SPM preprocessing). The corresponding brain images are below the jump.

So, really visible motion (which at least sometimes is linked to respiration) in the voxel activity timecourses (such as here) is to be expected. Unfortunately, the correlation is not always (or even usually) uniform across the brain or grey matter, such as below (just the correlation with x and y translation). It also looks like very little (under a mm) motion is needed to induce large correlations.
What to do? Well, adjust our mental models of how much correlation with movement is left in the activation timeseries after preprocessing: there's quite a bit. I'll be exploring further, particularly isolating the task windows (since I work with task, not resting state, datsets): how are the correlations during tasks? I'm not at all sure that applying a global signal regression-type step would be beneficial, given the lack of homogeneity across the brain (though I know there are at least a few reports on using it with task data). Censoring high-movement trials (i.e., not including them) is likely sensible. Interestingly, I've found similar MVPA performance in multiple cases with temporal compression by averaging and fitting a model (PEIs), which would not have been my guess looking at these correlation levels. Perhaps averaging across enough timepoints and trials balances out some of the residual motion effects? I am concerned, however, about respiration (and motion) effects remaining in the timecourses: it's clear that some people adjust their breathing to task timing, and we don't want to be interpreting a breath-holding effect as due to motivation.

Any other thoughts or experiences? Are you surprised by these correlation levels, or is it what you've already known?

Wednesday, February 22, 2017

a methodological tour de force: "The effect of spatial resolution on decoding accuracy in fMRI multivariate pattern analysis"

"The effect of spatial resolution on decoding accuracy in fMRI multivariate pattern analysis" by Gardumi et al. (full citation below) is an impressive methodological tour de force: comprehensive analyses clearly described (even their group-level permutation scheme!). Some of its themes are similar to those by Coutanche, Solomon, & Thompson-Schill (2016) which I posted about yesterday: understanding the spatial scale of fMRI information.

The approach in Gardumi et al. (2016) is different than that of Coutanche et al. (2016): they started with 7T images acquired with 1.1 mm isotropic voxels, then reconstructed the images at 2.2 and 3.3 mm effective resolution as well, by zero-padding the k-space images, as illustrated in their Figure 1, below.
A neat part of this approach is that the resulting images have the same voxel size, but lower effective resolution, making it possible to directly compare analyses including the same number of voxels (which is good, since MVPA performance generally interacts with the number of voxels). Changing the effective resolution this way also avoids the issues related to differences between acquiring 1.1 and 3.3 mm voxels (e.g., movement sensitivity): only a single scanning session was used for each person.

Another interesting aspect is that they had two classification tasks: decoding the speaker or the spoken vowel (see paper for details; they used auditory stimuli and single-subject-defined auditory anatomical ROIs). One of their results summary figures is below (lots more detail in the paper!), showing group-level average accuracy for the two classifications at each effective resolution. As an aside, the x-axis is the number of voxels included, picked from univariate tests (n most active voxels from training set GLM): accuracy increased for both until around 1000 voxels were included, then leveled off (again, see the paper for details), which matches my general experience of plateauing performance (e.g.).

Anyway, Figure 6 (plus other tests that they describe) shows that smaller voxels generally did better for their vowel decoding classification, but not for speaker decoding. In the discussion Gardumi et al. (2016) ties this to previous literature findings "that informative voxels in the auditory cortex are widely distributed for vowel decoding, while more clustered for speaker decoding."

Yesterday I wrote that I'm not "convinced that it's safe to infer information spatial resolution from voxel resolution" ... am I convinced by Gardumi e al.? Yes, I think so. Below is my cartoon for how it could work. The blue squares are the brain region, the white circles informative parts, and the red squares voxels at two different sizes. Suppose that you need around a quarter of the voxel to be informative for its activity to be biased (and so contribute to a classification): this is much easier to obtain with small voxels than large ones if the informative parts are widely distributed (left), but about as easy to obtain with both small and large voxels if the informative parts are clustered (right).

So, I now am thinking that it can sometimes be valid to make inferences about the spatial distribution of information from comparisons across voxel resolutions. The way in which the different voxel resolutions are obtained strikes me as very important: I have a lot more reservations about inferences when the resolutions are generated by different acquisition sequences than by k-space zeroing. And perhaps some of my change of heart is due to the different mental models I have of "widely distributed" or "clustered" information as opposed to "coarse" or "fine-grained" spatial resolution. Both of my cartoons above have 10 informative bits (circles): would you describe the one on left as fine-grained and the one on the right as coarse-grained? Gardumi A, Ivanov D, Hausfeld L, Valente G, Formisano E, & Uluda─č K (2016). The effect of spatial resolution on decoding accuracy in fMRI multivariate pattern analysis. NeuroImage, 132, 32-42 PMID: 26899782

Tuesday, February 21, 2017

interesting approach: "A meta-analysis of fMRI decoding: Quantifying influences on human visual population codes"

A (fairly) recent paper from Coutanche, Solomon, & Thompson-Schill, "A meta-analysis of fMRI decoding: Quantifying influences on human visual population codes" (full citation below), has an interesting approach to the effort to understand the spatial scale at which information is present in fMRI signals.  

Coutanche, Solomon, & Thompson-Schill 2016 describes a meta-analysis of visual MVPA studies, the details of which (and most findings) I won't get into here. But I do want to highlight their use of the different spatial resolutions (acquired voxel size) across studies to get at spatial resolution. In their words,
"Multi-voxel decoding should be optimized (all else being equal) when the voxel size of acquired data matches the spatial resolution (i.e., granularity) of a region's information-containing patterns. We hypothesized that if V1 holds a more fine-grained map of information than later visual regions, employing larger voxels should not benefit decoding in V1, but may benefit decoding in post-V1 regions (through greater signal-to-noise at the scale of these patterns). .... Naturally, at a certain point, increasing the voxel size is expected to impair performance for any region."
They found that,
"The results of our regression analyses supported this: using larger voxels improved decoding in V2 (B=0.01, p=0.049), unlike V1 (B=0.002, p=0.451). ... These findings are consistent with later visual regions holding coarser multi-voxel codes, while V1 relies on fine-grained patterns."
The "all else being equal" does a lot of work, since there are major interactions between acquisition parameters and the signal-to-noise in the resulting functional images (I'm far from convinced that using voxels around 2 mm isotropic or smaller is a good idea for general task fMRI, but that's another topic!). But if I take as a starting assumption that we have equally good signal-to-noise across a sensible range of voxel sizes, do I accept that decoding should then be optimized "when the voxel size of acquired data matches the spatial resolution (i.e., granularity) of a region's information-containing patterns"?

The idea that, "at a certain point, increasing the voxel size is expected to impair performance for any region", strikes me as plausible: if the voxels are large enough to encompass the entire region, only the average activity of the region as a whole can be used in the analysis, losing any information contained in within-region activation patterns. However, no brain region exists in a vacuum - they are surrounded by other brain structures - and fMRI voxels don't have sharply-defined edges, so in practice, too-large voxels will have signal from adjacent regions, and the combination of regions might have quite a lot of information.

Matching the voxel size to the spatial resolution might indeed optimize decoding if the brain was a fixed grid (so the voxels could be aligned to coincide perfectly with the grid), but I'm not convinced that it's a useful aim for actual fMRI datasets: even if both the voxels and spatial resolution was at 1 mm isotropic, the chance that the voxels would align with the brain grid seems vanishingly small. Setting the voxel size to something like 2 mm seems better, with the aim of having each voxel contain at least one of the 1 mm information units (in other words, setting the voxels larger than the true spatial resolution).

Overall, I accept that idea that voxel size could be used as a marker of spatial resolution in the abstract: a fixed (unmoving, not surrounded by other regions) region and equally good signal-to-noise across the range of voxel sizes. In actual fMRI datasets, I'm not as convinced that it's safe to infer information spatial resolution from voxel resolution, but it is an intriguing idea.

UPDATE 22 Feb 2017: see musings on Gardumi et al. (2016). Coutanche, M., Solomon, S., & Thompson-Schill, S. (2016). A meta-analysis of fMRI decoding: Quantifying influences on human visual population codes Neuropsychologia, 82, 134-141 DOI: 10.1016/j.neuropsychologia.2016.01.018