Sunday, January 3, 2016

yep, it's worth the time to publish code

I have a paper that's getting close to being published in PLOSone. After review, but prior to publication, they require you to not only agree in principle to sharing the dataset, but that it actually be shared. This can be tricky with neuroimaging datasets (size, privacy, etc.), but is of course critically important. It's easy to procrastinate on putting in the time necessary to make the datasets and associated code public; and easy to be annoyed at PLOS for requiring sharing to be set up prior to publication, despite appreciating that such a policy is highly beneficial.

As you can guess from the post title, in the process of cleaning up the code and files for uploading to the OSF, I found a coding bug. (There can't be many more horrible feelings as a scientist than finding a bug in the code for a paper that's already been accepted!) The bug was that when calculating the accuracy across the cross-validation folds, one of the fold's accuracies was omitted. Thankfully, this was a 15-fold cross-validation, so fixing the code so that the mean is calculated over all 15 folds instead of just 14 made only a minuscule difference in the final results, nothing that changed any interpretations.

Thankfully, since the paper is not yet published, it was simple to correct the relevant table. But had I waited to prepare the code for publishing until after the paper had been published (or not reviewed the code at all), I would not have caught the error. Releasing the code necessary to create particular figures and tables is a less complicated undertaking than designing fully reproducible analyses, such as what Russ Poldrack's group is working on, but still nontrivial, in terms of both effort and benefits.

How to avoid this happening again? A practical step might be to do the "code clean up" step as soon as a manuscript goes under review: organize the scripts, batch files, whatever, that generated each figure and table in the manuscript, and, after reviewing them yourself, have a colleague spend a few hours looking them over and confirming that they run and are sensible. In my experience, it's easy for results to become disassociated from how they were generated (e.g., figures pasted into a powerpoint file), and so for bugs to persist undetected (or errors, such as a 2-voxel radius searchlight map being labeled as from a 3-voxel radius searchlight analysis). Keeping the code and commentary together in working files (e.g., via knitr) helps, but there's often no way around rerunning the entire final analysis pipeline (i.e., starting with preprocessed NIfTIs) to be sure that all the steps actually were performed as described.

UPDATE 1 February 2016: The paper and code are now live.