π

UOMF: Linking to Text Within PDF Files

Show Sidebar

Update 2020-12-11: switch from org-link-abbrev-alist to org-link-set-parameters

This is an article from a series of blog postings. Please do read my "Using Org Mode Features" (UOMF) series page for explanations on articles of this series.

By reading this article, you will learn how to create custom link abbreviations to link to text within PDF files.

Motivation

At work, I do have to work with a set of specification documents. In order to fulfill the specifications properly, I need to re-visit the exact specification frequently.

Therefore, I want to have a representation of each specification item within my Org mode files. It should look like this:

 * [[JSPEC:Spec1234][Spec1234]] The title of the spec item (MUST; SpecFileX)
 :PROPERTIES:
 :SPEC: [[SpecFileX:Spec1234][Spec1234]]
 :END:

 #+begin_verse
 This is the specification description.
 #+end_verse	  

Two things might seem unfamiliar to you:

 [[JSPEC:Spec1234][Spec1234]]	  

... and ...

  [[SpecFileX:Spec1234][Spec1234]]	  

Both are link types I defined myself. The first one JSPEC links to our Jira instance where every specification item has a representation. The second one, SpecFileX links to the "Spec1234" identifier (=text) within the specification document in PDF format.

This way, I do have my personal notes related to this specification item within Org mode. And just a link away are the representations within our Jira instance and the original PDF file paragraph.

The following sections explain how its done.

Linking to Jira

This method works for any web page that takes a string as a query string. You could create links to Wikipedia pages, online dictionaries, and so forth.

To add you custom link type to the list of known abbreviation links, you need to add this to your configuration:

(add-to-list 'org-link-abbrev-alist '("JSPEC" . "https://jira.example.com/browse/SPEC-0?jql=project%20%3D%20SPEC%20AND%20Spec-ID%20~%20%22%s%22"))	  

Please note that the actual server address as well as the query depends on your situation.

With this, you are then able to define a link like

 [[JSPEC:Spec4321]]	  

... and it gets translated into ...

 https://jira.example.com/browse/SPEC-0?jql=project%20%3D%20SPEC%20AND%20Spec-ID%20~%20%22Spec4321%22	  

This was easy. Now to the more complex link type.

Linking to an External PDF File

First of all, we need to find out how to open a PDF file and jump to a specific match for a text string. On my system, I'm using the Okular pdf reader that accepts the command line parameter --find to search for a string.

The following command line ...

 okular /home/user/documents/SpecFileX.pdf --find "Spec4321"	  

... translates in Elisp to ...

(start-process "" nil "okular" "/home/user/documents/SpecFileX.pdf" "--find" "Spec4321")	  

If you're using Okular as well, you should come up with an example invocation with an existing document that works that way.

For each specification PDF document, you need to define a separate link abbreviation because you can not define standard abbreviations with two parameters: the PDF file name and the query string.

For our hypothetical SpecFileX.pdf file, the setup looks like this.

We need a function to retrieve a specification document per specification document:

(defun my-SpecFileX (spec)
  (start-process "" nil "okular" "/home/user/documents/SpecFileX.pdf" "--find" spec)
   "")	  

Create a new link:

(org-link-set-parameters "gemSpec_DS_Anbieter"
                         :follow #'my-gemSpec_DS_Anbieter)	  

Now, you can create links like ...

 [[SpecFileX:Spec4321]]	  

... which results in Okular opening the file /home/user/documents/SpecFileX.pdf and jumping to the first occurrence of the Text "Spec4321". This also works with text that contains spaces.

Don't Use org-link-abbrev-alist for PDF Links

Please do not use org-link-abbrev-alist for that:

(add-to-list 'org-link-abbrev-alist '("SpecFileX" . "%(my-SpecFileX)")) ;; WRONG!	  

Unfortunately, there were many Org mode operations that somehow causes the link to be "visited". For example, when changing the priority or sometimes when unfolding the hierarchy, you end up with many open Okular instances: one for each visible link.

yantar92 wrote me an email comment and explained very well, why the org-link-abbrev-alist method is not working:

Hi,
org-link-abbrev-alist should not be used to define code opening links. It is merely a transformer of your custom link "abbreviation" to another known link. It must return string and not have side effects.
The reason of it is because link "abbrevs" do not define actual new link type, so org-mode need to transform abbrevs to actual link and then look into the way, for example, to fontify the link or to check if link should have a tooltip. Thus, your function is called in many different cases when you do not really open the link.
Instead you should better define a new link type (not abbreviation). Then, you can set :follow parameter of your custom link type that will actually be triggered when you open the link.
All you need is
(org-link-set-parameters "SpecFileX" :follow #'my-SpecFileX)
instead of
(add-to-list 'org-link-abbrev-alist '("SpecFileX" . "%(my-SpecFileX)"))

So I changed from org-link-abbrev-alist to org-link-set-parameters and everything worked fine.


Related articles that link to this one:

Comment via email (persistent) or via Disqus (ephemeral) comments below: