π

Implementing Proper English Title Capitalization With Emacs Elisp

Show Sidebar

Update 2015-06-17: moved my solution to GitHub and added link

In iTunes, I am using a script by Doug Adams which renames music track names so that they get proper English title capitalization. For example "a hard day's night" gets renamed to "A Hard Day's Night". But not all words should be capitalized in titles in general. Words like "a", "the" stay lower case in case they're not starting the sentence and so forth.

But what are those rules and how can I implement an Emacs lisp function so that I don't have to worry about title capitalization by myself?

What Is Proper English Title Capitalization?

Well, the answer is not quite simple. There are many different point of views:

A posting on stackexchange is a good jump page to begin with. It also has a comment which gives a four-point list of rules:

  1. Always capitalize the first and the last word.
  2. Capitalize all nouns, pronouns, adjectives, verbs, adverbs, and subordinate conjunctions ("as", "because", "although", "if", etc.).
  3. Lowercase all articles, coordinate conjunctions ("and", "or", "nor"), and prepositions regardless of length, when they are other than the first or last word. (Note: NIVA prefers to capitalize prepositions of five characters or more ("after", "among", "between").)
  4. Lowercase the "to" in an infinitive.

Wikipedia summarizes different approaches and compares them as well.

A particular web page from grammarbook was linked quite often. Among the long list of capitalization stuff, rule number 16a deals with titles:

Rule 16a. Composition titles: which words should be capitalized in titles of books, plays, films, songs, poems, essays, chapters, etc.? This is a vexing matter, and policies vary. The usual advice is to capitalize only the "important" words. But this isn't really very helpful. Aren't all words in a title important?
The following rules for capitalizing composition titles are universal. - Capitalize the title's first and last word. - Capitalize verbs, including all forms of the verb to be (is, are, was, etc.). - Capitalize all pronouns, including it, he, who, that, etc. - Capitalize not. - Do not capitalize a, an, or the unless it is first or last in the title. - Do not capitalize the word and, or, or nor unless it is first or last in the title. - Do not capitalize the word to, with or without an infinitive, unless it is first or last in the title.
Otherwise, styles, methods, and opinions vary. Small words such as or, as, if, and but are capped by some, but lowercased by others.
The major bone of contention is prepositions. The Associated Press Stylebook recommends capitalizing all prepositions of more than three letters (e.g., With, About, Across). Others advise lowercase until a preposition reaches five or more letters. Still others say not to capitalize any preposition, even big words like regarding or underneath.
Hyphenated words in a title also present problems. There are no set rules. Some writers, editors, and publishers choose not to capitalize words following hyphens unless they are proper nouns or proper adjectives (Ex-Marine but Ex-husband). Others capitalize any word that would otherwise be capped in titles (Prize-Winning, Up-to-Date).

Another web page comes up with this list of rules:

I had to do some research for basic terms as well. This way, I found out what coordinating conjunctions are ("And, but, for, nor, or, so, and yet—these are the seven coordinating conjunctions.") and what kind of prepositions the English language has.

Title Capitalization Within Emacs

I was looking for a method to capitalize headings/titles in Emacs. I found s-titleise-s which simply capitalizes every word which is clearly not covering my issue.

Sadly, I don't know enough Elisp to code to do it by myself (yet). So I posted on the Emacs devel mailinglist. Emanuel Berg was kindly enough to answer me:

Good idea, I can never memorize those goofy rules. It could be a cool thing to have for example in BibLaTeX.
You don't have to know a lot of Elisp to do that. Here is a start. Only you'll have to insert the "stopwords" yourself.

He added a function make-a-title() which turned out to be a great help for me:

(setq do-not-capitalize '("ah" "oh" "eh"))

(defun make-a-title (beg end)
  (interactive "r")
  (save-excursion
    (goto-char beg)
    (forward-word)
    (backward-word)
    (while (< (point) end)
      (if (member (thing-at-point 'word t) do-not-capitalize)
          (forward-word)
        (capitalize-word 1) )
      (forward-word)
      (backward-word) )))	  

What's still missing in his make-a-title():

I extracted a list of not-to-capitalize words from the rules above so that they fulfilled my requirements.

My Solution

With my limited elisp knowledge, I could modify and complete Emanuels function so that the missing things are fixed.

You can find my version on GitHub.

I mapped this function to my-map c (with my-map being C-, at the moment). It's quite cool to use it on headings and titles of any kind. This blog entry as proper capitalized headings thanks to this function.

Further more, I also tried to come up with a unit test. However, this test does come back with an error which should not be the case for this input:

(ert-deftest my-title-capitalization ()
  "Tests proper English title capitalization; FIXXME: doesn't work yet"
  (should (string= (with-temp-buffer
		     (insert "the presentation of this heading of my own from my keyboard and yet\n")
		     (goto-char (point-min))
		     (set-mark-command nil)
		     (goto-char (point-max))
		     ;(transient-mark-mode 1)
		     (my-title-capitalization)
		     (buffer-string))
		   "The Presentation of This Heading of My Own from My Keyboard and Yet\n"
		   )))	  

If you've got an idea what I did wrong, drop a comment below.

After suggestions from an email comment, I am going to add it as a package on marmalade.

Comments

David Mann commented:

I find the function (copied and pasted from your blog) fails with this phrase: this is the time that tries men’s souls produces This Is the Time That Tries Men'S Souls and hangs with the cursor on the ’S’ of Souls. I also find the function hangs with other phrases that don’t contain an apostrophe. Would like to see it work because it’s a very useful idea!

You're absolutely right.

Unfortunately, although I found something about this issue on the web, I could not fix it in my script so far.

Barry Fishman added additional input to the Elisp code so that the issue with apostrophes is resolved.


Related articles that link to this one:

Comment via email (persistent) or via Disqus (ephemeral) comments below: