MVPA Meanderings: February 2014

UPDATE (6 July 2018): see this post for updates; I don't recommend using Pearson correlation to quantify similarity to the template matrix anymore.

RSA (Representational Similarity Analysis) can make very pretty matrices, but sometimes we want to describe the RSA matrix by a single number.

For example, at left is an RSA matrix for a dataset with six examples in each of two classes (w and f). The matrix was calculated from a single ROI and person, using Pearson correlation. Following convention, dark blue indicates correlation of +1 and red, correlation of -1.

Concretely, the darkest-red cell (f3-w3) shows that the activity in the ROI voxels on trial f3 was negatively correlated with the activity on trial w3.

We can see that this matrix is sensible: there is more blue (positive correlation, greater similarity) in the w-w and f-f cells than the w-f cells. Restated, the activation in the ROI's voxels was more correlated (less dissimilar) on trials of the same type than on trials of different types, as we'd hope.

But what if I want to describe this matrix with a single number, for example, so that I can see if the RSA produced similar results to classifying w vs. f with a linear SVM (e.g., do people with higher accuracy have a more blue-and-red RSA matrix)?

My approach has been to "average the triangles": subtract the mean of the average different-type similarity cells from the mean of the same-type similarity cells. In the figure at left, the blue triangles are the same-type cells (w-w and f-f), and the red, the different-type cells (w-f and f-w).

Logically, if the same-type trials are more similar than the different-type trials, this average-the-triangles-then-subtract measure will produce a positive value, with larger values being a "better" RSA.

(Note: I generally calculate Pearson correlations, then use the Fisher r-to-z transform before before and after mathematical operations, but left that out of the text so far. The exact method needed will of course depend on the (dis)similarity metric employed.)

Reading some papers today, I realized that this triangle-averaging-and-subtracting method is a special case of a technique that's been written up a few times, and which Pereira & Botvinick 2013 called generating a "similarity structure score". The same general strategy was followed in Rothlein & Rapp 2014, who summarize it neatly in their Figure 2, which I copied a bit of at left. (full citations below)

The key idea is that you first create what Rothlein calls a "predicted RSM" (and Pereira a "similarity structure scoring matrix", and me a "template"): a matrix the same shape as your RSA matrices, but filled with -1, 0, or 1, reflecting the (RSA matrix) pattern you're testing for. You then calculate a score for how much each real RSA matrix matches the template RSA. Rothlein 2014 calculates the Pearson correlation, Pereira suggests a few different techniques.

Pereira 2013 suggests scaling the template matrix "so that
the weight of rewards and penalties is balanced"; this is the same as my triangle-averaging-and-subtracting method if the template matrix has +1 for all same-type cells and -1 for all different-type cells, then the the template and RSA matrices are multiplied cell-wise, then summed.

I was curious to compare the statistics produced by the different methods, so calculated a "similarity structure score" using my triangle-averaging-and-subtracting version and Rothlein's Pearson correlation-based method, in both cases using template matrix has +1 for all same-type cells and -1 for all different-type cells. The result is at the left.

The different plotting symbols and colors indicate different ROIs, and the thin lines are linear regression lines through the two stimulus sets in the dataset (details don't matter here). The thick grey lines are for x=0, y=0, and x=y.

It's clear that the similarity structure scores produced by these two methods are highly correlated, with the Pearson correlation producing numerically larger values. I don't see a big reason to pick one method or another in this case; the decision could have quite a bit more impact in other cases, such as when the template matrix is sparse (lots of zeros).

So, at the top of the post I used a motivating example of wanting to see if the RSA produced similar results to classifying w vs. f with a linear SVM ... did it? Yes.

At left is plotted the RSA statistic against the classification accuracy, with the plotting symbols and colors indicating different ROIs. The two stimulus types are indicated by background color - one has a brown regression line and brown symbol backgrounds, the other, a black regression line and no symbol background coloring.

Interestingly, there is a tight linear correlation between the accuracy and RSA scores within each stimulus type, but not across types. Looking at different parts of the dataset, I don't think this is a non-linear relationship (e.g. larger RSA values at higher accuracies) but rather that the different stimulus types actually have different regression slopes. But I'd be curious to hear if anyone has done these sorts of comparisons in a more rigorous manner.

Francisco Pereira, & Matthew Botvinick (2013). Simitar: simplified searching of statistically significant similarity structure Pattern Recognition in Neuroimaging (PRNI), 2013 International Workshop on , 1-4 DOI: 10.1109/PRNI.2013.10

David Rothlein, & Brenda Rapp (2014). The similarity structure of distributed neural responses reveals the multiple representations of letters NeuroImage, 89, 331-344 DOI: 10.1016/j.neuroimage.2013.11.054

Here's a few snippets of code, showing how to extract the weights, decision hyperplane, and distance to the hyperplane from a linear SVM fit in R (e1071 interface to libsvm).

The setup: train.data and test.data are matrices with examples in the rows and voxels in the columns (i.e. two non-intersecting subsets of the entire dataset, making up a single cross-validation fold). The first column in each (named "target") contains the class labels, and the rest of the columns ("vox1", "vox2", ...) have the BOLD activation values.

fit (train) the SVM using train.data:
fit <- svm(target~., data=train.data, type="C-classification", kernel="linear", cost=1, scale=FALSE);

extract the weights and constant from the fit SVM:
w <- t(fit$coefs) %*% fit$SV;
b <- -1 * fit$rho; (sometimes called w0)

Now, the equation of the decision hyperplane is 0 = b + w1*vox1 + w2*vox2 ...

The distance of each point (test case) from the hyperplane can be calculated as usual for a point-plane distance:
((w %*% t(test.data[i,2:ncol(test.data)])) + b) / sqrt(w %*% t(w));

The class of any test point is determined by sign((w * x) + b):

sign((w %*% test.data[,i]) + b)

MVPA Meanderings

Saturday, February 22, 2014

RSA: how to describe with a single number?

Sunday, February 16, 2014

code snippet: extracting weights from the linear svm in R

Thursday, February 6, 2014

pointer: "A discussion of causal inference on fMRI data"