Tuesday, April 15, 2014

not having fun with R: as.integer, truncating, and rounding

I had a very unpleasant R debugging experience this morning: when is what you see not what you actually have?

The screenshot at left reproduces what I was seeing: inds1 and inds2 both are shown as the vector 2, 4, but inds2 selects the 2nd and 3rd array elements, not the 2nd and 4th, as expected.

The code I used to create inds1 and inds2 makes the problem clear - the second number is just a bit larger than 4 in inds1, but just a bit smaller than 4 in inds2:
inds1 <- c(2, 4.00000001);
inds2 <- c(2, 3.99999999999);

So, what happened?

The  inds1 and inds2 arrays are type numeric (double), but they are shown on the screen rounded, in this case, not showing any numbers past the decimal. They look like integers, but are not. When used to specify array indices R coerces the arrays to integers. as.integer() does not round, but rather truncates. Thus, inds2[2] becomes 3.


I will now be including tests in my code to be sure what I think are integers are actually integers, such as all.equal(inds, as.integer(inds)).

Friday, March 28, 2014

connectome workbench: working with volumes

I think of plotting surfaces when I think of the Connectome Workbench, but Workbench can do quite a few nice things with volumes as well. If you haven't already read my previous HCP tutorial posts, read those first, because I'm going to be skipping quite a bit of the introductory info. I'm using the most recent version of the Workbench, 0.84.

I'm very pleased with these screenshots in this post and will describe how I made them. The pictures show both a surface and volume-slices view of the accuracy map resulting from my searchlight demo.

Before firing up the Workbench, you'll need both the volumetric (NIfTI) and surface (*.shape.gii or *.funct.gii) versions of the file you want to plot, plus the corresponding anatomical underlays. In this case, the demo searchlight accuracy map is aligned to the MNI anatomical template, so we'll use the atlas_Conte69_74k_pals anatomy, as before. Thankfully, the atlas download includes both surface and volumetric versions, so that's ready to go. The demo volumetric NIfTI can be downloaded here, and the wb_command -volume-to-surface-mapping program will create the corresponding *shape.gii file, also as before.

Now that we have all the files we need,we can plot them in Workbench. I started a new .spec file based off the Conte69 anatomy so that I could skip loading the files I didn't need (e.g. borders) and save the ones I do need, but that's not essential (see here for explanation).

Once Workbench is open, we need to load in the images, both volume and surface, that we want to overlay, plus the volume anatomy, since that isn't included in the default Conte69_atlas-v2.LR.32k_fs_LR.wb.spec. These four files (searchlightAccuracies_rad2.nii.gz, Conte69_AverageT1w.nii.gz, searchlight_right.shape.gii, and searchlight_left.shape.gii) can all be loaded (as Volume Files and Metric Files) through repeated use of the File -> Open File dialog.

Finally, we can start making nice pictures. I found it convenient to work with just two Workbench tabs, one for the surface and one for the volume version of the dataset. You can close extra tabs by clicking the little red boxes at the top of each, and open new ones with File -> New Tab.

Go to the first tab and click the "All" radio button in the "View" part of the Toolbar. The window will change to show the surface of both hemispheres, but not the searchlight map; change the bottom two METRIC entries in the Overlay ToolBox to searchlight_right.shape.gii and searchlight_left.shape.gii. You can turn the brain around with the mouse to see the overlay (adjust the color scaling via the little wrench icon). Now, switch the top dropdown box to VOLUME searchlightAccuracies_rad2.nii.gz. This will plot the volume data inside the surface, projected onto planes (twist the brain around to see the planes). You can adjust the planes' location in the "Slice Indices/Coords" part of the Toolbar.

Go to the other tab and click the "Volume" radio button in the "View" part of the Toolbar. Set the bottom dropdown box to VOLUME Conte69_AverageT1w.nii.gz (the anatomic template), then the upper dropdown box to VOLUME searchlightAccuracies_ rad2.nii.gz.When both of these layers are checked On the view should look something like the right side part of this image.

Now we have two tabs open, one with a surface, and one with a volume. To view both side-by-side like in the screenshot, click View -> Enter Tile Tabs. There are a lot of neat things you can do in the "Tile Tabs" view; play around with it and check out the tutorial.

Another feature of the Workbench I really like for volumetric data is making "montages": views of many slices at once, like the top screenshot in this post. To make these, click the "M" button in the "Slice Plane" part of the Toolbar. You can then switch the number of slices shown by adjusting the boxes in the "Montage" part of the Toolbar, and where it starts showing slices in the "Slice Indices/Coords" part of the Toolbar. The montage view doesn't have to be axial slices, but any type - just click the buttons in the Slice Plane part of the toolbar.

So, this was an overview of what I thought was pretty nice when using the Workbench with volumetric images. I did find a few things frustrating, particularly the Yoking; I just couldn't make yoking work. Also, volume-only montage view is rather like MRIcron ... but the view doesn't recenter on clicked coordinates; you adjust the position through the "Slice Indices/Coords" part of the Toolbar. It would also be neat to plot horizontal lines on the surface when viewing data like in the top screenshot, where the horizontal lines indicate the slices shown in the volume montage. But, some nice features, and I'll probably be using the Workbench with volumetric data quite a bit more in the future.

Tuesday, March 25, 2014

NIfTI, CIFTI, GIFTI in the HCP and Workbench: a primer

The HCP is releasing preprocessed data in both volumetric NIfTI and surface/volumetric CIFTI formats. Working with the HCP files, or doing much of anything with the Workbench, requires navigating through a plethora of .*.nii and .*.gii files. In this post I'll explain why we need all these files, and how they relate to each other. Disclaimer: I'm writing this as a primer from the viewpoint of someone familiar with volumetric fMRI data analysis; it is not at all a full description of everything the files can be used for. Also, though I'm referring to the HCP and Workbench, these file formats are used by other projects and software.

For a starting point, consider how we work with volumetric NIfTI files. Neuroimagers often think about volumetric NIfTI files as storing functional data in a 4d matrix (x,y,z, and time). Libraries such as oro.nifti make reading NIfTI files fairly easy: they create a 3d or 4d matrix of voxel values, plus a object with the header information.

While you can get an idea of the anatomy by looking at slices of the 4d functional data matrix, analyses generally rely on having a 3d matrix of anatomical data (binary mask of regions, anatomic scan, etc) perfectly aligned to the 4d functional data. So, the 4d NIfTI file doesn't contain everything we need: we get some alignment information out of the header (qfactor, etc), but also need the registered 3d anatomical data. For a concrete example, I had to provide two files for the little ROI-based analysis demo: the dataset (4d NIfTI with preprocessed BOLD) and the ROI mask (binary 3d NIfTI showing the voxels corresponding to the anatomical region of interest), plus stating that the dataset was normalized to the MNI anatomical atlas (so that we can overlay the data on the correct anatomical template).

Now, on to CIFTI. CIFTI-2 files follow the NIfTI-2 file format specification (CIFTI-2 is a "flavor" of NIfTI-2, so both use the *.nii file extension), and both consist of a data matrix and headers. In the case of the HCP data, the functional timecourses are in the data matrix part of *.dtseries.nii CIFTI files. Like NIfTI volume files, the CIFTI file contains information about where voxels are, though this information is stored in a different place (in the extension containing the CIFTI XML). But, paralleling how you need an anatomic file to figure out exactly where the voxels in a volumetric NIfTI lie, you need other files (not just the CIFTI) to tell you where the surface vertices lie, and how they're connected (the "triangles", etc). Aside: While I wrote "surface vertices" in this paragraph, note that the HCP CIFTIs store both surface vertices (for the cortical sheet) and volumetric voxels (for sub-cortical structures).

These "other files" are not a single file but multiple; as many as necessary. Having all of these files is akin to having multiple ROI files available for an analysis: you won't use each ROI in each analysis, just the ones corresponding to the anatomical area (or whatever) you need for a particular test. The "other files" for the HCP are not just ROIs, but can also be underlying anatomy at different inflation levels, maps of tissue types, etc.

For example, at left is a screenshot showing some of the "other files" provided for each HCP person in the released datasets. These files are from /100307_Q3/MNINonLinear/Native/: the maps are in subject space. Many files with similar names are in /100307_Q3/MNINonLinear/fsaverage_LR32k/: maps of the same structures/types, but aligned to the MNI template anatomy (specifically, the 32k Conte69 mesh, see page 112 of Glasser, et. al 2013).

And now we're encountering GIFTI files: many of the "other files" are in GIFTI format, with the extension .*.gii. The naming of the "other files" (the last bit before the .gii) in the HCP tends to follow the CARET conventions, and gives a hint as to what sort of information they contain:

*.surf.gii, "gifti surface files", contain only vertex coordinates and triangles (which vertices are connected). The HCP *.surf.gii files are mostly structures that you might want to overlay data onto, such as 100307.L.inflated.native.surf.gii (left hemisphere, inflated) and 100307.L.midthickness.native.surf.gii.(left hemisphere, not inflated at all, but rather halfway through the thickness of the cortical ribbon).

*.func.gii and *.shape.gii, "metric files", contain data values for every vertex. Essentially, these are data arrays whose indices correspond to a surface file - you need a matching surface file to know where in the brain to put the data stored in a metric file. For example, a metric file from the HCP release is 100307.L.corrThickness.native.shape.gii: the cortical thickness at each vertex.

For an example of how these files work together, my tutorial on plotting a NIfTI image with the Workbench uses the wb_command -volume-to-surface-mapping program to create .shape.gii files aligned to Conte69.*.midthickness.32k_fs_LR.surf.gii. The data from the volumetric NIfTI (e.g. searchlight accuracies at each voxel) is stored (by vertex) in .shape.gii files, but a shape.gii file by itself isn't enough to plot the data properly on a surface: you need an aligned .surf.gii file as well. Paralleling how you need an aligned anatomy to properly overlay a volumetric NIfTI ROI, you need an aligned surf.gii to know how to properly locate the data from a metric file.

Whew! Hopefully this primer helps explain why so many files are released with the HCP data, and a bit about how they work together. For additional information see the Workbench Glossary, as well as Glasser, et. al 2013. If you've found any references particularly useful that I haven't already linked to, please send them along and I'll add links.

I want to end this post with a BIG thank you to Tim Coalson, who patiently (and repeatedly) walked me through these file types and how they relate to each other.

Tuesday, March 11, 2014

Allefeld 2014: Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA

A recent paper by Carsten Allefeld and John-Dylan Haynes, "Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA (see full citation below), caught my eye. The paper advocates using a MANOVA-related statistic for searchlight analysis instead of classification-based statistics (like linear SVM accuracy). Carsten implemented the full procedure for SPM8 and Matlab; the code is available on his website.

In this post I'm going to describe the statistic proposed in the paper, leaving the discussion of when (sorts of hypotheses, dataset structures) a MANOVA-type statistic might be most suitable for a (possible) later post. There's quite a bit more in the paper (and to the method) than what's summarized here!

MANOVA-related statistics have been used/proposed for searchlight analysis before, including Kriegeskorte's original paper and implementations in BrainVoyager and pyMVPA. From what I can tell (please let me know otherwise), this previous MANOVA-searchlights fit the MANOVA on the entire dataset at once: all examples/timepoints, no cross-validation. Allefeld and Haynes propose doing MANOVA-type searchlights a bit differently: “cross-validated MANOVA” and “standardized pattern distinctness”.

Most of the paper's equations review multivariate statistics and the MANOVA; the “cross-validated MANOVA” and “standardized pattern distinctness” proposed in the paper are in equations 14 to 17:
  • Equation 14 is the equation for the Hotelling-Lawley Trace statistic, which Allefeld refers to as D ("pattern distinctness").
  • Equation 15 shows how Allefeld and Haynes propose to calculate the statistic in a "cross-validated" way. Partitioning on the runs, they obtain a D for each partition by calculating the residual sum-of-squares matrix (E) and the first part of the H equation from the not-left-out-runs, but the second part of the H equation from the left-out run.
  • Equation 16 averages the D from each "cross-validation" fold, then multiplies the average by a correction factor calculated from the number of runs, voxels, and timepoints.
  • Finally, equation 17 is the equation for “standardized pattern distinctness”: dividing the value from equation 16 by the square root of the number of voxels in the searchlight.
To understand the method a bit better I coded up a two-class version in R, using the same toy dataset as my searchlight demo. Note that this is a minimal example to show how the "cross-validation" works, not necessarily what would be useful for an actual analysis, and not showing all parts of the procedure.

The key part from the demo code is below. The dataset a matrix, with the voxels (for a single searchlight) in the columns and the examples (volumes) in the rows. There are two classes, "a" and "b". For simplicity, the "left-out run" is called "test" and the others "train", though this is not training and testing as meant in machine learning. train.key is a vector giving the class labels for each row of the training dataset.

For Hotelling's T2 we first calculate the "pooled" sample covariance matrix in the usual way, but using the training data only:
S123 <- ((length(which(train.key == "a"))-1) * var(train.data[which(train.key == "a"),]) + (length(which(train.key == "b"))-1) * var(train.data[which(train.key == "b"),])) / (length(which(train.key == "a")) + length(which(train.key == "b")) - 2);

To make the key equation more readable we store the total number of "a" and "b" examples:
a.count <- length(which(train.key == "a")) + length(which(test.key == "a"));
b.count <- length(which(train.key == "b")) + length(which(test.key == "b"));


and the across-examples mean vectors:
a.test.mean <- apply(test.data[which(test.key == "a"),], 2, mean);
b.test.mean <- apply(test.data[which(test.key == "b"),], 2, mean) ;  

a.train.mean <- apply(train.data[which(train.key == "a"),], 2, mean); 
b.train.mean <- apply(train.data[which(train.key == "b"),], 2, mean);

now we can calculate the Hotelling's T2 (D) for this "cross-validation fold" (note that solve(S123) returns the inverse of matrix S123):
((a.count*b.count)/(a.count+b.count)) * (t(a.train.mean-b.train.mean) %*% solve(S123) %*% (a.test.mean-b.test.mean));

The key is that, paralleling equation 15, the covariance matrix is computed from the training data, multiplied on the left by the mean difference vector from the training data, then on the right by the mean difference vector from the testing data.

Should we think of this way of splitting the Hotelling-Lawley Trace calculation as cross-validation? It is certainly similar: a statistic is computed on data subsets, then combined over the subsets. It feels different to me though, partly because the statistic is calculated from the "training" and "testing" sets together, and partly because I'm not used to thinking in terms of covariance matrices. I'd like to explore how the statistic behaves with different cross-validation schemes (e.g. partitioning on participants or groups of runs), and how it compares to non-cross-validated MANOVA. It'd also be interesting to compare the statistic's performance to those that don't model covariance, such as Gaussian Naive Bayes.

Interesting stuff; I hope this post helps you understand the procedure, and to keep us all thinking about the statistics we choose for our analyses.


ResearchBlogging.orgAllefeld C, & Haynes JD (2014). Searchlight-based multi-voxel pattern analysis of fMRI by cross-validated MANOVA. NeuroImage, 89, 345-57 PMID: 24296330

Wednesday, March 5, 2014

"not esoteric mathematic theory"

This has to be one of the best introductions to a section on the assumptions for a statistical test I've ever seen:
"All parametric statistical procedures are inferential procedures (i.e., they make inferences about populations). Mathematics and logic dictate that inferences be based on assumptions, and so like any other parametric statistical technique, the MANOVA has assumptions with which scientists must be concerned. These assumptions are not esoteric mathematic theory but conditions of the data that must be assessed (and hopefully satisfied) before trying to interpret the results of a MANOVA."
page 253 of Reading and Understanding Multivariate Statistics by Grimm &Yarnold (1995), "Multivariate Analysis of Variance" chapter by Kevin P. Weinfurt.

Saturday, February 22, 2014

RSA: how to describe with a single number?


RSA (Representational Similarity Analysis) can make very pretty matrices, but sometimes we want to describe the RSA matrix by a single number.

For example, at left is an RSA matrix for a dataset with six examples in each of two classes (w and f). The matrix was calculated from a single ROI and person, using Pearson correlation. Following convention, dark blue indicates correlation of +1 and red, correlation of -1.

Concretely, the darkest-red cell (f3-w3) shows that the activity in the ROI voxels on trial f3 was negatively correlated with the activity on trial w3.

We can see that this matrix is sensible: there is more blue (positive correlation, greater similarity) in the w-w and f-f cells than the w-f cells. Restated, the activation in the ROI's voxels was more correlated (less dissimilar) on trials of the same type than on trials of different types, as we'd hope.

But what if I want to describe this matrix with a single number, for example, so that I can see if the RSA produced similar results to classifying w vs. f with a linear SVM (e.g., do people with higher accuracy have a more blue-and-red RSA matrix)?

My approach has been to "average the triangles": subtract the mean of the average different-type similarity cells from the mean of the same-type similarity cells. In the figure at left, the blue triangles are the same-type cells (w-w and f-f), and the red, the different-type cells (w-f and f-w).

Logically, if the same-type trials are more similar than the different-type trials, this average-the-triangles-then-subtract measure will produce a positive value, with larger values being a "better" RSA.

(Note: I generally calculate Pearson correlations, then use the Fisher r-to-z transform before before and after mathematical operations, but left that out of the text so far. The exact method needed will of course depend on the (dis)similarity metric employed.)

Reading some papers today, I realized that this triangle-averaging-and-subtracting method is a special case of a technique that's been written up a few times, and which Pereira & Botvinick 2013 called generating a "similarity structure score". The same general strategy was followed in Rothlein & Rapp 2014, who summarize it neatly in their Figure 2, which I copied a bit of at left. (full citations below)

The key idea is that you first create what Rothlein calls a "predicted RSM" (and Pereira a "similarity structure scoring matrix", and me a "template"): a matrix the same shape as your RSA matrices, but filled with -1, 0, or 1, reflecting the (RSA matrix) pattern you're testing for. You then calculate a score for how much each real RSA matrix matches the template RSA. Rothlein 2014 calculates the Pearson correlation, Pereira suggests a few different techniques.

Pereira 2013 suggests scaling the template matrix "so that
the weight of rewards and penalties is balanced"; this is the same as my triangle-averaging-and-subtracting method if the template matrix has +1 for all same-type cells and -1 for all different-type cells, then the the template and RSA matrices are multiplied cell-wise, then summed.

I was curious to compare the statistics produced by the different methods, so calculated a "similarity structure score" using my triangle-averaging-and-subtracting version and Rothlein's Pearson correlation-based method, in both cases using template matrix has +1 for all same-type cells and -1 for all different-type cells. The result is at the left.

The different plotting symbols and colors indicate different ROIs, and the thin lines are linear regression lines through the two stimulus sets in the dataset (details don't matter here). The thick grey lines are for x=0, y=0, and x=y.

It's clear that the similarity structure scores produced by these two methods are highly correlated, with the Pearson correlation producing numerically larger values. I don't see a big reason to pick one method or another in this case; the decision could have quite a bit more impact in other cases, such as when the template matrix is sparse (lots of zeros).

So, at the top of the post I used a motivating example of wanting to see if the RSA produced similar results to classifying w vs. f with a linear SVM ... did it? Yes.

At left is plotted the RSA statistic against the classification accuracy, with the plotting symbols and colors indicating different ROIs. The two stimulus types are indicated by background color - one has a brown regression line and brown symbol backgrounds, the other, a black regression line and no symbol background coloring.

Interestingly, there is a tight linear correlation between the accuracy and RSA scores within each stimulus type, but not across types. Looking at different parts of the dataset, I don't think this is a non-linear relationship (e.g. larger RSA values at higher accuracies) but rather that the different stimulus types actually have different regression slopes. But I'd be curious to hear if anyone has done these sorts of comparisons in a more rigorous manner.


ResearchBlogging.org Francisco Pereira, & Matthew Botvinick (2013). Simitar: simplified searching of statistically significant similarity structure Pattern Recognition in Neuroimaging (PRNI), 2013 International Workshop on , 1-4 DOI: 10.1109/PRNI.2013.10
 
ResearchBlogging.orgDavid Rothlein, & Brenda Rapp (2014). The similarity structure of distributed neural responses reveals the multiple representations of letters NeuroImage, 89, 331-344 DOI: 10.1016/j.neuroimage.2013.11.054

Sunday, February 16, 2014

code snippet: extracting weights from the linear svm in R

Here's a few snippets of code, showing how to extract the weights, decision hyperplane, and distance to the hyperplane from a linear SVM fit in R (e1071 interface to libsvm).

The setup: train.data and test.data are matrices with examples in the rows and voxels in the columns (i.e. two non-intersecting subsets of the entire dataset, making up a single cross-validation fold). The first column in each (named "target") contains the class labels, and the rest of the columns ("vox1", "vox2", ...) have the BOLD activation values.

fit (train) the SVM using train.data:
fit <- svm(target~., data=train.data, type="C-classification", kernel="linear", cost=1, scale=FALSE); 

extract the weights and constant from the fit SVM:
w <- t(fit$coefs) %*% fit$SV; 
b <- -1 * fit$rho; (sometimes called w0)

Now, the equation of the decision hyperplane is 0 = b + w1*vox1 + w2*vox2 ...

The distance of each point (test case) from the hyperplane can be calculated as usual for a point-plane distance:
((w %*% t(test.data[i,2:ncol(test.data)])) + b) / sqrt(w %*% t(w));

The class of any test point is determined by sign((w * x) + b):

sign((w %*% test.data[,i]) + b)