Publishing chemical structures in the 21st century
This entry is somewhere between a rant and a call for comments, but it is on a topic that is close to my heart, and which I think my interest a few others: reproducibility of science and publication practices. During the past two weeks, I've been annoyed a few times by trying to reproduce (or build upon) published computational work, by looking at the structures the authors had worked on/reported/predicted. And there, I was sorely disappointed: in many cases, the structures are not readily available!__
Out of the seven papers I've had to work with recently (all published in 2009 or later), here are the various behaviors I have observed:
- structures described in short format in the paper itself:
- full listing of atomic positions in supporting information, in PDF format
- a screenshot (bitmap image) of a full listing of atomic positions is included as PDF supporting information
- in one case, the structures were not included at all: only their unit cell parameters were given (and a reference for the experimental crystallographic structure from which calculations were started)
What bothers me is that, in all cases, it takes a non-trivial amount of time to produce a structure file, either by copy-pasting information or retyping it, while it would have cost the authors nothing to publish the structures in a standard text-based data-minable format: CIF file for crystals, XYZ for molecules, CML if you like it, etc. This can be achieved either by publishing it as supporting information, or by depositing it in a database. As a referee, I would definitely have flagged that in my review, in the name of reproducibility and good scientific practices.
So this was the rant part. Now, the call for comments: given that the practice outlined above endures, I wonder: what are arguments against this? And in particular, how is the standard for computational/theoretical chemistry so different from, e.g., experimental crystallography (where deposition into databases is the norm)?
As author, as referee and/or as editor, what is your point of view on this? Is mine a minority view, or are things the way they are simply because of the system's inertia?