FX's blog: musings on chemistry, among other things…

To content | To menu | To search

Friday 11 December 2015

Same-journal citation in Chemistry

(9 months, no blog entry…)

This week, I read two articles on very different aspects of citation patterns. The first one was an analysis, by Stuart Cantrill over at the Nature Chemistry blog, of the journal's impact factor citation distribution. The second was a very enlightening Science paper from 2012 on Coercive Citation in Academic Publishing, and it got me wondering: I have never, so far, experienced coercive citation from an editor (i.e. instructions from an editor asking to add more citations to their own journal). However, I have been in the past — several times — advised by wise and knowledgeable (read: older) colleagues to make sure I was including same-journal references before submitting a manuscript. Usually it comes with some logic behind it, like “it makes the editor less likely to judge the paper out of scope if they see a lot of citations from their own journal”. But although that logic could apply to specialized journals, it makes no sense for more generic journals…

So, I wanted to look at the citation patterns among the “top three” general audience chemistry journals, namely Nature Chemistry, JACS and Angewandte Chemie. I took a sample of 2000 papers each from the 2010–2014 period (only 1200 for Nature Chem, which publishes fewer papers), and looked at the distribution of references from those papers. Let's first look at the most-frequently cited journals for each source:

Most-frequently cited journals

First, we see that in all three journals, JACS and Angewandte are the most cited journals (in this order). This makes sense: they really are the top general journals in chemistry and publish a lot of papers every year (much more so than Nature Chem, this difference in volume of publications explaining the much lower spot of the later). After that, you can begin to spot differences in the journals cited: Nature Chem features more heavily interdisciplinary journals Science, Nature, and PNAS. On the other hand, Angewandte (and JACS to a smaller extent) clearly features more citations to subfield-specific journals, and in particular organic chemistry journals. This would reflect a heavier focus of Angewandte on organic chemistry and synthesis, something that is regularly mentioned in chemistry circles (mostly by people outside molecular chemistry)! On the other hand, the only subfield-specific journal to make it into Nature Chem’s top 10 is a physical chemistry journal, the venerable Journal of Chemical Physics (which is one of my personal favorites).

Next, let's focus on citations between the three journals themselves:

Same-journal citations

For column X (source) and row Y (citation target), the table tells you what's the percentage of citations to Y in X. For example, 12.1% of the references found in Angewandte are to articles in Angewandte, while 14.8% are to papers in JACS. What is interesting (to me) is to look at same-journal citation, i.e. whether papers in journal X are more likely to be cited in this same journal than in other journals. And… it is the case, by about 2% to 3% in each case. So there is a small but significant excess of same-journal citation. I can see three explanations possible:

  1. One explanation may be that these journals have different audiences, and therefore it is natural that they feature more self-citation than other journals. I think this is definitely not true in terms of the subfields of chemistry, since all three are general chemistry journals with broad readership.
  2. There might be a geographical effect, with more US authors in JACS and more German authors in Angewandte… but are you really more likely to cite your (geographical) neighbor rather than the work of chemists from another continent? I do not think this can account for the differences observed.
  3. The final reason is that there may still exist, consciously or unconsciously, a tendency to include (or favor) same-journal references when writing a manuscript for a specific journal.

Let me know in the comments or on Twitter what you think! There surely are other possible reasons I have failed to see…

PS: on the topic of geographical diversity of these journals, you can go back and see my earlier post on the globalization of chemistry as seen through publications in the field…

Friday 6 March 2015

Tired of “issues”, “concerns” and “worrying signs” in published papers

I don't know if bad academic conduct in scholarly publishing is increasing, or if I just notice it more often now that I have an active Twitter presence… but it's seriously annoying to see the accumulation of issues in published papers. And because I don't like it, and I think it's everyone's responsibility to act to prevent it (including me), I end up spending time to write to editors, etc. Time that could be used for other stuff, like, I don't know… my own research!

Recent examples that annoy me significantly:

  • TEM images with worrying signs of manipulation (and that's using very cautious language for something that looks clear-cut) in 4 papers by the same authors… including 3 papers in a journal where the author is also an editor.

  • Plagiarism accidents where, after 10 months of careful consideration, the editor decides it was “not intentional”. Like, you were cleaning your keyboard, and repeatedly pressed copy-change window-paste by accident? (Full story over there)
  • A paper with (in my view) unwarranted citations to my work, on an unrelated topic. I wrote to the editor to report it, and posted my concerns on Twitter, and the editor told me they didn't like that. Also, they decided that it was OK for the authors to cite me since “ they have put some thought into their selection”, and it was up to cite to cite whomever they wanted. Turns out, commenters on PubPeer later found other issues with that paper, including figures/tables duplication with other papers (which are, though, on different topics). I wrote again to the editors, but I am not holding my breath: too many seem to like the old-school, closed-door, nothing-to-see-here conduct.

Gaming intermission: spot the differences between the tables from this 2012 paper

and from this 2015 paper

I'm tired of this. Just how common really is academic misconduct in chemistry publications?

Monday 2 February 2015

Postdoc position open in my group

I'm happy to announce there is an open postdoc position in my research group:

  • Rational Design of Framework Materials 
 with Tailored Mechanical and Thermal Properties: A Combined Physics/Chemistry Approach
  • Part of a collaboration between myself (François-Xavier Coudert; Chimie ParisTech) and Lydéric Bocquet (École normale supérieure).
  • Physical and molecular modeling at multiple scales: finite element calculations, molecular dynamics, quantum chemistry. We'll also 3D-print stuff and squeeze it, because it's fun.
  • Full announcement (project details, etc.) here
  • Starting… well, when we find the right candidate! We have the funding ready, but we're willing to match her (or his) schedule.

Oh, and did I say it was right in the center of Paris? And an amazing environment to work in, scientifically, too…

Monday 8 December 2014

Why correcting the scientific record is hard

Here I explain, from a very simple example at a modest scale, why correcting the scientific record is difficult in today's academic world. I think it highlights, in a very simple way, some flaws of our systems that we can hopefully address. (Also worth noting: although the example comes from the condensed matter field, you don't have to know anything about physics to read the text below!)

I have recently worked, with a MSc student in my group, on the understanding of elastic behavior of porous materials. During the course of his internship, we spent a lot of time in understanding exactly how elastic stability criteria work in the most generic case. Stability conditions, know as Born stability criteria, have very simple and well-known mathematical expressions for cubic crystals, but things get more complex for systems with lower symmetry. While the fundamentals have been set forth very clearly a long time ago by Max Born (in particular in his 1954 book, Dynamics Theory of Crystal Lattices), his and subsequent books usual give explicit expressions only for common high-symmetry crystals (cubic, hexagonal, etc.)

Now, during the course of the MSc project, we stumbled onto some papers that quote incorrect mathematical formulations of the Born conditions, while usually citing the original Born book (in which these expressions are not found). Most of the errors arise from people incorrectly generalizing the "cubic" conditions. We looked a bit more, and found more examples of such errors, in papers between 2007 and 2014. Now, if you find several mistakes in series of related equations, in a dozen papers published in a given field throughout a decade, what do you do?

Over the course of a few days, we wrote a short paper, explaining the general form of the conditions, and explicitly listing their mathematical expression for each possible crystal class. We uploaded it to the arXiv, and sent it for publication to a well-read journal in the field (Phys. Rev. B). I highlighted, in the cover letter to the editor (see below), that it was meant to be a concise and pedagogical reference that would benefit the research community, and it seemed to us worthy of publication.

First, it took the editor some time to send the paper out for review. He first sought input from a member of the editorial board, we were told. My cover letter was apparently convincing, as the manuscript was sent for review and we got three referee reports on this. I won't quote them in full (I don't have their permission), but they can be summarized as:

  1. After a considerable amount of thinking, this is appropriate for publication in PRB. The multiple erroneous references cited show it is valuable, though it shouldn't be needed in an ideal world. [Then some useful comments for improving the work. Thanks!]
  2. The basic principles are well-established, the derivations are mathematically simple. Reject.
  3. I’m in favor of publishing the paper; whether in PRB is an editorial decision best left to the PRB editors. [Also known as the “no risk” approach to refereeing ;-)]

Editor decided paper couldn't be published, but we could resubmit if we felt we had a strong response. It underwent further review, during which one of the referees said something very interesting:

I have checked "Enough significant new physics? No" both times, but still tend to agree with the third referee -- this should be published somewhere

I think this is interesting, because it shows why correcting the scientific record is hard: it is considered not new science. The current standards for publication highlight (with some good reason) the original and the sexy. But while doing so, we need to keep room for corrections, comments and allow the discussion to go on, through peer-reviewed articles.

Another interesting point, made by the reviewer, is the unstated comment that maybe PRB is too good a journal for this sort of work. I think it also shows that corrections and pedagogical papers are deemed less important than your “regular” research article.

Finally, in our case, the paper was accepted and is now published. We managed to convince the reviewers and editor of the value of our paper to the community, which should be in my eye an important yardstick for publication.

Let me know your thoughts, in comments or on twitter. Also: what do you think should come next? Contacting the authors? Post-publication peer review?

Wednesday 26 November 2014

Author-produced PDF from LaTeX on the arXiv

arXiv.org has a policy that articles written in TeX/LaTeX should be uploaded as source (tex + bibliography + figures), rather than as a standalone whole-article PDF file. The enforce this policy automatically, by detecting whether the PDF file you upload has been generated from TeX, and blocking your submission if that's the case.

They have their reasons, explained in the policy linked. However, as any blanket policy enforced automatically by a computer program, it is bound to make mistakes sometimes. One case that particularly annoyed me: it rejects all PDF files including TeX-made figures, even if the PDF of the figure was then included in a MS Word manuscript and the whole thing converted to PDF. That was particularly annoying, because for a long period nobody at arXiv replied to my requests, and my files were just being rejected.

There are other reasons why I don't believe this strict policy is a good thing, even when it is technically accurate:

  • I take great care of the manuscripts I submit, including non-standard fonts and sometimes typography / figures placement, sometimes with manual editing of the PDF before sending it to the publisher. I would rather people see those than the default LaTeX-styled version of my preprint. (yes, I'm a bit of a perfectionist when it comes to typography; I won't apologize)
  • If the inclusion of proper metadata is the issue, there are many PDF manipulations solutions that can do that in an automated manner.
  • Why should TeX users be treated more harshly than others? arXiv hosts some very badly formatted Word-produced (or LibreOffice-produced) PDF files.

In any case, here's how to fool the arXiv TeX detector:

1. It is looking for TeX-specific keys in information dictionaries in the PDF. Those look like this:

/PTEX.FileName (./figures/TE.pdf)
/PTEX.PageNumber 1
/PTEX.InfoDict 279 0 R
/PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) kpathsea version 6.2.0)

The first three indicate the inclusion of a PDF figure, including its original file name (I consider this bad, because it could actually leak information about the document's author, such as home directory). The last one is only included once, indicating what version of TeX produced the document.

2. Those keys cannot be turned off from the TeX source, they're hardcoded in the pdftex program.

3. But you can replace all of these lines with blank characters, without invalidating the PDF. You cannot remove those characters, because that would mess up the look-up tables (called the Xref tables). But replacing each character with a space will result in a document that is still perfectly valid according to the PDF specification.

Using the sed command-line utility to do so is simple:

sed -e '/PTEX\./s/./ /g' < submitted.pdf > arXiv.pdf

will produce a file named arXiv.pdf from your original PDF file submitted.pdf

Took me half an hour to figure that out in detail, and half an hour to write. Maybe it can save some other poor academics this same amount of time! Let me know in the comments if you ever had trouble of the sort…

Wednesday 10 September 2014

Periodic table: PDF and Illustrator template

Well, I have really neglected this blog. So little free time… a lot of which is spent on Twitter!

I recently needed a Periodic Table template for Adobe Illustrator, and the ones I found were outdated (and black and white only). I improved one (original by Brian D’Alessandro), including new elements Cn, Fl and Lv, as well as a color version in addition to b&w. I am sharing it here so Google can find it:

Oh, and you can look on Twitter to see people share their "own" periodic tables, with list of elements they have worked with at some point, under the hashtag #MyPeriodicTable. You can also share yours!

Saturday 24 May 2014

Publishing chemical structures in the 21st century

This entry is somewhere between a rant and a call for comments, but it is on a topic that is close to my heart, and which I think my interest a few others: reproducibility of science and publication practices. During the past two weeks, I've been annoyed a few times by trying to reproduce (or build upon) published computational work, by looking at the structures the authors had worked on/reported/predicted. And there, I was sorely disappointed: in many cases, the structures are not readily available!__

Out of the seven papers I've had to work with recently (all published in 2009 or later), here are the various behaviors I have observed:

  1. structures described in short format in the paper itself: Structure
  2. full listing of atomic positions in supporting information, in PDF format
  3. a screenshot (bitmap image) of a full listing of atomic positions is included as PDF supporting information
  4. in one case, the structures were not included at all: only their unit cell parameters were given (and a reference for the experimental crystallographic structure from which calculations were started)

What bothers me is that, in all cases, it takes a non-trivial amount of time to produce a structure file, either by copy-pasting information or retyping it, while it would have cost the authors nothing to publish the structures in a standard text-based data-minable format: CIF file for crystals, XYZ for molecules, CML if you like it, etc. This can be achieved either by publishing it as supporting information, or by depositing it in a database. As a referee, I would definitely have flagged that in my review, in the name of reproducibility and good scientific practices.

So this was the rant part. Now, the call for comments: given that the practice outlined above endures, I wonder: what are arguments against this? And in particular, how is the standard for computational/theoretical chemistry so different from, e.g., experimental crystallography (where deposition into databases is the norm)?

As author, as referee and/or as editor, what is your point of view on this? Is mine a minority view, or are things the way they are simply because of the system's inertia?

Thursday 24 April 2014

Creating a Twitter bot to survey the literature for me (& others?)

I'm starting an experiment… like some Twitter colleagues around me have done in their respective field: I'm creating a twitterbot to survey the MOF (metal–organic frameworks) literature for me. It's an attempt to try and bring to Twitter (which I use more and more) my earlier workflow for keeping an eye on the literature. I used to subscribe to RSS feeds from certain key journals, and browse through them when I would have the time. The upside is that now and then you read stuff that's outside of your own research subfield. The downside… is that it's very time consuming. So, I could only follow a few journals…

Exponential growth of MOF papers

Exponential growth of MOF papers…

How does it work?

The original inspiration dates back a few months: I became aware of this possibly through Sylvain Deville's announcement of his IT_papers bot. But I didn't exactly follow the "established" methodology… All the blog posts I could find about setting up a twitterbot for scientific literature rely on keyword-based queries of databases (Pubmed, Google Scholar, etc.). I didn't want to follow this approach, so instead my bot relies on filtering RSS feeds through Yahoo pipes. The Pipes workflow is very simple:

Yahoo Pipes workflow

Simply copy-paste a large number of journals' RSS feeds (I've got all potential RSC, ACS and Wiley journals covered… I'm probably missing some from Elsevier, but they're not as active in my field). Join all, filter titles and abstracts for specific keywords, and… voilà! I then used dlvr.it to post the resulting RSS feed to Twitter.

The future

Given the large number of MOF papers published every day (figure 1 above), I don't know how manageable the resulting feed will be, and whether I'll end up using it as my tool for staying up-to-date on the torrent of MOF literature…

More importantly, I don't know if others will find it useful. So, I welcome all feedback on this initiative, whether through comments below this entry, Twitter messages, etc.

Wednesday 26 February 2014

Globalization of chemistry over 5 decades (part 2)

This article is the second part of a series on the evolution of chemistry papers between 1961 and 2011, in which I play with data from JACS papers. Part 1 is here.

In the previous post, I looked at the inflation of authors and references in chemistry writing that has taken place since the 60's. As Matteo Cavalleri put it: “More of everything”. Reflecting back, one of the surprises (to me) was that the phenomenon is not recent, but has been rather progressive since the 60's. As a researcher who's been in academia for 10 years, I thought it was had begun one or two decades ago… but this is a long-lived trend.

Going forward, today we will explore the “world” of chemistry authors and publishers: who writes papers in JACS? what's their diversity (in terms of affiliations, country, etc)? How much did globalization affect chemistry research & writing?

Author diversity

So, the number of authors for a given paper increases over time, as does the number of affiliations… and since the absolute number of paper increased widely in the meantime (1364 papers in 1961 vs. 3176 in 2011), there are necessarily more authors in JACS today than 5 decades ago. But those authors are not all different… and I wondered whether JACS is, in part, a cozy “club” with members publishing multiple papers a year, or whether publication in JACS is a rare event in the chemist's typical year (I know the answer for computational chemistry, sure, but I don't know much about the publishing habits of, e.g., organic chemists). So, here's the number of papers in JACS, per author and for one given year:

Multiple authorship in JACS

As expected, the majority of authors published only one JACS paper in one year (which, of course, doesn't mean that people publish on average one JACS paper per year…). However, the proportion of "multiple papers" authors is far from negligible: 26% in 1961, and 14% in 2011. I actually quite like the idea that this number is going down over time, because I interpret it as a sign that JACS authorship is more becoming more diverse.

I also looked at the 1% (of authors with the most papers per year): there is relatively little change there. To be in the 1% in 1961, you needed 6 papers in a year, while you needed only 5 in 2011. For the anecdotal value, the most prolific authors in 1961 was Herbert C. Brown (at Purdue University), with 16 JACS papers (out of the 538 of his entire career), including some of a 23-part series of articles (here's “Hydroboration. XXIII”, for example)! In 2011, the title goes to Shunichi Fukuzumi, with 15 papers.


In 1961, the Journal of the American Chemical Society was almost exactly that. I don't have systematic affiliation data, so I cannot analyze the geographic distribution, but the top 9 authors were North-American (all working in the US, except Canadian chemist Saul Winstein). The #10 was British, and a few Europeans start to appear in the list after that point. Among the “top 5%“ authors, all were North American or European.

Fast forward to 2011, the top 3 authors were Japanese. The top of the list is dominated by US and Japanses chemists, with sparse European presence. The first Korean author appears at #21 (Wonwoo Nam), tied with the first Chinese (Lei Liu; 7 papers each). I didn't find any Indian colleague in the “top 1%”.

To do look at this in a more quantitative manner, let's look at the distribution of countries (this is not per author, but per affiliation):

Affiliation distribution per country, JACS

(In the legend, only countries with ≥3 papers are featured). Obviously, over the course of 30 years, globalisation had a large impact on authorship: the US share of authors has gone down from two-thirds to a small half, while other countries have progressed. China went from 0.1% of papers to 7%. Europe, as a whole, has grown from 14% to to 25%; diversity within European countries has also increased.

Still, the US dominance is quite prevalent, and China's slice of the whole is rather small… It seems that, even today, there still remain an overly large prevalence of American authors in the Journal of the American Chemical Society. Which is not, after all, a bad thing in itself (other countries have their specific journals), but can reinforce bias when bibliometric factors are used in evaluation of researchers.

Gender balance

Oh, just kidding. I'd love to study this further, but given that no systematic data is available, there's not much I can do. I scrolled the list of 1961 and 1981 top authors looking for female scientists, and I got bored before I found any. In 2011, there were 3 female chemists among the first 36 authors (8%): Naomi Mizorogi (University of Tsukuba), Wei Wang (UCLA), and Melanie Sanford (University of Michigan).

Friday 21 February 2014

Evolution of chemistry writing over 5 decades (part 1)

If there's one thing that a scientist likes, it's playing with data! To do some quantitative analysis of how publication in chemistry has evolved over the years, I looked at the evolution of JACS articles spanning 5 decades, with statistics on published papers in 1961, 1971, 1981, 1991, 2001 and 2011. So, what can we say?

Average number of authors

JACS authors per paper

The histograms above show the number of authors for a given paper. There is a clear trend towards papers with longer author lists, with the average number of authors increasing from 2.4 (in 1961) to 5.3 (in 2011). Of particular note is the quasi-disappearance of single author papers: while they represented 13.7% of published papers in 1961 (188 out of 1364), there were only 10 in 2011 (out of 3176 papers, i.e. 0.3%).

This is in line with the general idea of modern research being more collaborative, which we can try to confirm by looking at the number of different affiliations per paper (only plotted from 1981 to 2011, as earlier data on affiliation is not available):

JACS affiliations per paper

The average number of affiliations increased from 1.3 to 2.4 over this period, somewhat faster than the increase in authorship. I feel, however, it's important to note there still are large number of single-affiliation papers, which I did not exactly expect (most of my own papers being part of collaborations).

Number of references

Here's the number of references per paper, in 1961, 1991 and 2011:

References per JACS paper

The clear increase in references (average of 18.6 in 1961 vs. 49.1 in 2011) is clear, and is (at least partly) responsible for the background inflation of impact factors, for example. However, it should be noted that the increase has been gradual since 1961, so it cannot be driven by the more recent focus on bibliometrics.

That's it for today, my time is up (rugby's over), but next time I'll focus on the writing of chemistry itself: how the language of titles and abstracts has evolved since 1961… In the meantime, comments on the statistics above or suggestions of additional analyses are very welcome!

Sunday 16 February 2014

The blog is dead

The blog is dead, they tell me.
I've never been one to follow trends much, and here I am, late in February 2014, opening a blog. A scientist's blog, a chemistry blog.
Why? I've been thinking about this for a while, but I cannot find a single good reason to write a blog. I find many small ones, however: to start some new discussions; to meet some new people; to try and play a (small) active part in the (r)evolution scientific publication & communication that is about to (needs to?) rock our world; to procrastinate, with style; to assuage my love of never-ending lists, of (literal) parentheticals, and semi-colons.
So, here you have it: my blog. I'm FX, which is short for François-Xavier (Coudert). After years of having an anonymous read-only Twitter account (to follow some friends), I've decided last December start using it under my real name. Results are, so far, inconclusive… but I'm having fun. Part of the incentive to start a blog is to write some longer material, when I am frustrated by Twitter's style.
To say what, exactly? That's easy, ’cause I love to talk! I'll talk about my field (theoretical methods in physical chemistry, molecular simulation, statistical thermodynamics… and a bit of materials science), although I also cover some of it for Computational Chemistry Highlights. I'll also talk about scientific publication and the evolutions that are necessary to it. In particular, I intend to write a post soon about the specific issues that Chemistry, as a field, faces in these recent evolutions.
So, assuming someöne reads this: whether you're a seasoned veteran of the chemical blogosphere, a passerby, or my wife, thanks for reading, and please use the comments to make remarks. For example, I started a blogroll with some links from my RSS feeds, but it welcome any suggestions for links to add!