UOMF: Reference Management with Org Mode

This is an article from a series of blog postings. Please do read my "Using Org Mode Features" (UOMF) series page for explanations on articles of this series.

Updates
- 2019-09-25: added to blog series "Using Org Mode Features"
- 2021-06-16: comment on oc.el
- 2022-08-08: A Workflow by Koustuv Sinha

While I was a PIM researcher at Graz University of Technology, I was using Emacs Org-mode for managing references to white papers and books.

My starting point was this description of a workflow. I added many features to the workflow and described it on GitHub.

John Kitchin from Carnegie Mellon University (blog, Twitter, GitHub) is an awesome Org-mode user and contributor. Every workflow he has implemented with Org-mode is a great source of inspiration for many similar workflows I use. If you're into teaching or doing research, you definitely have to follow his work!

So John has implemented reference management with Org-mode as well: org-ref got a major update these days. Because of his latest update, I write this blog post to explain my method, his method, and compare the approaches. This should give you a good starting point on your decision how you are going to use Org-mode for your reference management workflow.

My Reference Management Workflow

My method for managing informations on papers is based on the premise that I write papers in (pdf)LaTeX directly and not with Org-mode. It consists of one Org-mode file holding one heading per reference with none or more sub-headings. Here is one example which has a sub-heading for the abstract:

 *** Voit2012 - TagTrees: Improving Personal Information Management using Associative Navigation   :PIM:
 :PROPERTIES:
 :CREATED: <2012-09-17 Mon 17:48>
 :ID: Voit2012b
 :END:

 [[bib:Voit2012][Voit2012.bib]]
 [[pdf:Voit2012][Voit2012.pdf]]

 **** Abstract

 #+BEGIN_QUOTE
 This dissertation gives an overview of research related to Personal
 [...]
 #+END_QUOTE

I am using my own reftex-set-cite-format for adding new references to my collection. It's described on GitHub and you can find my currently used Emacs setup on GitHub as well. To add a new reference heading in Org-mode I only have to manually create and write a Bibtex file, press C-c ) h, and select the new reference.

Each reference has one Bibtex file (Voit2012.bib), one PDF file (Voit2012.pdf), and one optional PDF file containing PDF notes (Voit2012-notes.pdf) in the same folder. To link to any of them, I press C-c ) with b, r, or p for inserting links to a Bibtex file, Org-mode reference, or PDF file. To make those links work, I added them to org-link-abbrev-alist (described here).

Occasionally I want to have one big Bibtex file holding all references. To get it generated, I am using following script:

#!/bin/sh
cd ~/archive/library && \
   rm references.bib && \
   cat [A-Z]*bib > references.bib
#end

Since I was writing my papers in LaTeX directly (no Org-mode export), I wrote two handy scripts to support my workflow with references: I was using references in my TeX file like \cite{Voit2012}, compiled the document, and got warnings on missing references since I did not use any Bibtex file yet. Then I invoked a shell script file which parses the LaTeX temporary files containing the warnings on missing references and generates a Bibtex file. This way, I got Bibtex files which holds only the few currently cited references of this paper in work.

You can find one version of the script for Bibtex and another version of the script for biber/Biblatex on GitHub.

As a very cool bonus, I developed a method to extract PDF annotations. I was reading research papers on my Android tablet, doing simple highlighting and writing some remarks as annotations in the PDF file using RepliGo Reader for Android which is discontinued unfortunately.

However, any other app writing standard PDF annotations to the PDF file should do. If not, you have to find out how your PDF tool is storing annotations by looking at the PDF file source directly and modifying the parsing lines in my script.

Having read a paper, added highlighting of important phrases and words, I stored the annotated paper like Voit2012-notes.pdf in my library directory as mentioned above. With C-c ) n I can add a sub-heading to a reference that contains a small babel script. It executes vkextract_annotations_to_orgmode_snippet.sh with the reference. The script parses the note file, inserts every word highlighted and every annotation written directly to my Org-mode file. This way, my paper summaries were generated automatically and I could search and find references by keywords directly in Org-mode. How cool is that?

org-ref

John's workflow with org-ref seems to be built with the intention to write white papers directly in Org-mode and use the export functionality to get LaTeX or PDF files. He has invested much more effort in his method than I did with mine. Therefore, I can't describe his method in great detail as I did above with mine.

You can find the source and documentation of org-ref on GitHub.

John made a screencast with eleven and a half minutes of awesomeness describing the basics of his method:

He also describes the new features of the recent update in ten minutes:

As you can see, with drag and drop of PDF files to generate Bibtex entries, drag and drop DOI URLs to download PDF papers and generate Bibtex entries, his featureset is clearly more advanced than my method. Just to mention two features of many.

Superficial Comparison

I have to admit that I did not try out John's method by myself. I just watched the screencast videos and read some of the documentation files. Currently, I don't have the necessity of managing references. In case I have to, I would definitely check out and use John's method.

However, my method does have some advantages to my point of view. My method supports writing in LaTeX (not Org-mode) a bit more. For example, my papers contain only the reference file for this specific paper and not my complete set of references.

As far as I remember, it was mandatory to me to write all papers in a format that Org-mode LaTeX export was not able to deliver without substantial additional effort. Org-mode does not export to ACM format. Org-mode does not export to (mostly) not very well done LaTeX templates of conference proceedings. Additionally, I always had to tweak the LaTeX source here and there in order to satisfy restrictions on space, my stupid level of typographic perfection, or other things I can't accomplish with Org-mode to LaTeX export. I wonder how John is able to deal with this annoyance.

Update: John added a comment below which is quite interesting - you should check it out.

My method uses shell scripts that have to me re-written for users of Windows systems. This is a clear disadvantage although not for me.

Clearly, it is no big deal to "migrate" the few unique features of my method to John's workflow. For example, exctracting PDF annotations can be added to John's method without any additional effort at all. I'd love to have a positive influence on John's method with this blog entry and my scripts. Maybe you are willing to send him pull requests with minor improvements here and there. It's always great to have a bigger community using the same method than tinkering on your small scripts all by yourself.

If you want to add your opinion or new ideas to this topic, please leave a comment below!

Update 2021-06-16: There is this blog article with a DIY method and soon, oc.el is about to introduce some out-of-the-box functionality for reference management with Org. This looks promising:

 [cite/style/sub-style:global prefix;cite prefix @key1 cite suffix; global suffix]

Examples:

 simple [cite:@low2001]
 simple with locator suffix [cite:@low2001 p.23]
 citet style [cite/text:@low2001]
 multi-cite with global prefix: [cite:see ;@low2001;@mcneill2011]

A Workflow by Koustuv Sinha

Irreal featured a blog article by Koustuv Sinha who is writing about his paper reading workflow that covers discovering, managing, syncing and annotating using Org-mode and bits from org-ref mentioned above.

This is a really nice workflow and I most probably would start testing out that one when I do have the requirement of working with research papers.

Comment via using the hashtag #20151226_ReferenceManagementWithOrgmode (decentralized), email (persistent) or via Disqus (ephemeral) comments below: