Update 2015-06-17: moved my solution to GitHub and added link
In iTunes, I am using a script by Doug Adams which renames music track names so that they get proper English title capitalization. For example "a hard day's night" gets renamed to "A Hard Day's Night". But not all words should be capitalized in titles in general. Words like "a", "the" stay lower case in case they're not starting the sentence and so forth.
But what are those rules and how can I implement an Emacs lisp function so that I don't have to worry about title capitalization by myself?
Well, the answer is not quite simple. There are many different point of views:
A posting on stackexchange is a good jump page to begin with. It also has a comment which gives a four-point list of rules:
- Always capitalize the first and the last word.
- Capitalize all nouns, pronouns, adjectives, verbs, adverbs, and subordinate conjunctions ("as", "because", "although", "if", etc.).
- Lowercase all articles, coordinate conjunctions ("and", "or", "nor"), and prepositions regardless of length, when they are other than the first or last word. (Note: NIVA prefers to capitalize prepositions of five characters or more ("after", "among", "between").)
- Lowercase the "to" in an infinitive.
Wikipedia summarizes different approaches and compares them as well.
A particular web page from grammarbook was linked quite often. Among the long list of capitalization stuff, rule number 16a deals with titles:
Rule 16a. Composition titles: which words should be capitalized in titles of books, plays, films, songs, poems, essays, chapters, etc.? This is a vexing matter, and policies vary. The usual advice is to capitalize only the "important" words. But this isn't really very helpful. Aren't all words in a title important?
The following rules for capitalizing composition titles are universal. - Capitalize the title's first and last word. - Capitalize verbs, including all forms of the verb to be (is, are, was, etc.). - Capitalize all pronouns, including it, he, who, that, etc. - Capitalize not. - Do not capitalize a, an, or the unless it is first or last in the title. - Do not capitalize the word and, or, or nor unless it is first or last in the title. - Do not capitalize the word to, with or without an infinitive, unless it is first or last in the title.
Otherwise, styles, methods, and opinions vary. Small words such as or, as, if, and but are capped by some, but lowercased by others.
The major bone of contention is prepositions. The Associated Press Stylebook recommends capitalizing all prepositions of more than three letters (e.g., With, About, Across). Others advise lowercase until a preposition reaches five or more letters. Still others say not to capitalize any preposition, even big words like regarding or underneath.
Hyphenated words in a title also present problems. There are no set rules. Some writers, editors, and publishers choose not to capitalize words following hyphens unless they are proper nouns or proper adjectives (Ex-Marine but Ex-husband). Others capitalize any word that would otherwise be capped in titles (Prize-Winning, Up-to-Date).
Another web page comes up with this list of rules:
- As you have probably noticed "short" words, those with less than five letters, are generally lowercase in titles, unless they are the first or last words in a title.
- Generally, we do not capitalize:
- Articles: a, an, the
- Coordinating Conjunctions: and, but, or, for, nor, etc.
- Prepositions (fewer than five letters): on, at, to, from, by, etc.
- When in doubt and you do not have a reference guide in front of you, here is one general rule to remember recommended by The U.S. Government Printing Office Style Manual:
- "Capitalize all words in titles of publications and documents, except a, an, the, at, by, for, in, of, on, to, up, and, as, but, it, or, and nor."
I had to do some research for basic terms as well. This way, I found out what coordinating conjunctions are ("And, but, for, nor, or, so, and yet—these are the seven coordinating conjunctions.") and what kind of prepositions the English language has.
I was looking for a method to capitalize headings/titles in Emacs. I found
s-titleise-s which simply capitalizes every word which is clearly not covering my issue.
Sadly, I don't know enough Elisp to code to do it by myself (yet). So I posted on the Emacs devel mailinglist. Emanuel Berg was kindly enough to answer me:
Good idea, I can never memorize those goofy rules. It could be a cool thing to have for example in BibLaTeX.
You don't have to know a lot of Elisp to do that. Here is a start. Only you'll have to insert the "stopwords" yourself.
He added a function
make-a-title() which turned out to be a great help for me:
(setq do-not-capitalize '("ah" "oh" "eh")) (defun make-a-title (beg end) (interactive "r") (save-excursion (goto-char beg) (forward-word) (backward-word) (while (< (point) end) (if (member (thing-at-point 'word t) do-not-capitalize) (forward-word) (capitalize-word 1) ) (forward-word) (backward-word) )))
What's still missing in his
- Always capitalize first word
- Always capitalize last word
- List of do-not-capitalize words
I extracted a list of not-to-capitalize words from the rules above so that they fulfilled my requirements.
With my limited elisp knowledge, I could modify and complete Emanuels function so that the missing things are fixed.
You can find my version on GitHub.
I mapped this function to
my-map c (with
C-, at the moment). It's quite cool to use it on headings and titles of any kind. This blog entry as proper capitalized headings thanks to this function.
Further more, I also tried to come up with a unit test. However, this test does come back with an error which should not be the case for this input:
(ert-deftest my-title-capitalization () "Tests proper English title capitalization; FIXXME: doesn't work yet" (should (string= (with-temp-buffer (insert "the presentation of this heading of my own from my keyboard and yet\n") (goto-char (point-min)) (set-mark-command nil) (goto-char (point-max)) ;(transient-mark-mode 1) (my-title-capitalization) (buffer-string)) "The Presentation of This Heading of My Own from My Keyboard and Yet\n" )))
If you've got an idea what I did wrong, drop a comment below.
After suggestions from an email comment, I am going to add it as a package on marmalade.
David Mann commented:
I find the function (copied and pasted from your blog) fails with this phrase:
this is the time that tries men’s soulsproduces
This Is the Time That Tries Men'S Soulsand hangs with the cursor on the ’S’ of Souls. I also find the function hangs with other phrases that don’t contain an apostrophe. Would like to see it work because it’s a very useful idea!
You're absolutely right.
Unfortunately, although I found something about this issue on the web, I could not fix it in my script so far.
Barry Fishman added additional input to the Elisp code so that the issue with apostrophes is resolved.