π

Don't Do Complex Folder Hierarchies - They Don't Work and This Is Why and What to Do Instead

Show Sidebar

I often read comments from people that are trying to come up with a clever, deeply-nested directory hierarchy to manage their personal files. You will frequently find discussions about these topics on this sub-reddit. I don't recommend investing effort in complex directory structures and this is why and what to do instead.

Note: Please do read this article in order to learn about the terms "directory" and "folder" and how my personal directory hierarchy looks like.

Me Being World Champion on Complex Hierarchies

Many years ago, I felt like I was world champion in deriving the perfect directory structure for myself. I sat down, did my requirements analysis, analyzed the data I've already had and came up with a concept for my personal directory structure that made perfect sense to me.

I had similar hierarchies for emails, bookmarks and my files even though I did not know about the series of whitepapers by Boardman et al from 2001 until at least 2004. They studied and proposed same hierarchy approaches for multiple things such as emails, bookmarks, files and so forth.

My own hierarchies worked well for quite some time. Until my life advanced and therefore my requirements and data changed over time. Suddenly, the perfect hierarchy was not so perfect any more at all.

So I did the whole thing again. Requirements analysis, data analysis resulting in a new concept. Much better than the one before. Re-ordering many items from the old hierarchy to a new, better one is tedious.

After multiple iterations doing this, I realized the large amount of effort which was invested for just the next temporary "solution". This had to stop. There had to be a better way of doing this.

Looking for a Better Solution

Learning about Inbox Zero for email management, I got rid of my hierarchy there. It was a rewarding experience, not having to move emails into more or less fitting sub-folder. To my surprise and relive, an inbox folder and a single archive folder was enough. I won't go into details on my email workflow here. The thing I want to emphasize is that moving away from complex structures using over a hundred of sub-folders to a minimal one is not only possible but might even come with many benefits attached.

Ten years ago, I had the fortune of writing a PhD thesis in the field of Personal Information Management. I was able to spend a great amount of time reading whitepapers and books on this topic. I learned the basics that started with great research work from the seventies onward. I learned that we're actually using systems in our daily life that are know to follow badly designed concepts. Much better concepts were developed, scientifically tested and approved. Unfortunately, most concepts were never adapted by the software industry. Open source projects - with a few non-successful exceptions - mostly seem to imitate concepts introduced by commercial products.

Being in the IT domain for decades, I don't expect any mid-term (r)evolution to our daily-life systems that would provide significantly different and more advanced concepts. If you follow my blog for some time, you might have already read why the desktop metaphor is limiting our abilities, how Ted Nelson explained in 1979 why centralized (cloud) concepts are bad, what computing concepts I would like to see and use in future and why I came to the conclusion that nowadays software won't innovate as much as hardware does.

Some Basics

My thoughts here are mainly about personal information management with files in contrast to collaborative information management involving a group of people. This quickly gets way more complex when more than one person is involved. I have thought a bit about these collaborative situations. So far, I was not able to come up with a recommendation or even solution. In short: when it comes to more than one person, a common point of view needs to be defined and followed. The only satisfying concept in my opinion would be a system where every participant is able to maintain a completely separate, individual file organization method. This is something I have never seen anywhere so far.

Further more, we need to differ between search-based retrieval and navigation-based retrieval. I will mention a somewhat combined approach later on.

File management concepts and navigation-based retrieval was my research domain.

The Problem With Strict Hierarchies

Early computer file systems were designed in a completely different context. The number of files produced by computer users were not larger than a few dozens or a few hundreds. Far less than the many hundreds of thousands of files I do curate in my home directory nowadays. I may have more files than the average person but research shows that the average person will end up with one hundred thousand files, 440,000 emails and 120,000 digital photographs.

Technical limitations as well as the missing requirements to manage hundreds of thousands of user files, the early file systems did not come with sophisticated file management features. On the contrary. The features of those file systems were even more limited than the possibilities the average librarian had back then.

Books were ordered on shelves in one particular order. There is no other way to order physical things differently when you don't have endless supply of same or similar items and space.

The problem with this that when you order anything using one concept you are not ordering them in infinite concepts minus one. So when you order books according to their release date, you're not ordering them by the main author, the publisher, the topic, any identification number (ISBN), the country, or size and color.

For effectively managing physical items like books, people came up with workarounds: for each item there were multiple index cards. An index card was put into the shelve ordered by author, another copy of that index card was put into the shelve ordered by the publisher, and so forth.

This way, a book had not only one physical place where we could find it. We had multiple places with surrogates, the index cards. The old file systems did not even have a similar feature like that.

In this discussion here, I'm deliberately neglecting the file system workarounds that were developed. Please do read this article why I don't consider symlinks and hardlinks as a practical solution for the average user.

People who try to come up with a complex and nested directory hierarchy are still following the limiting patterns that were defined in the fifties of the previous century when the file systems were designed.

This results in various more or less obvious issues.

Navigation is usually only understood as using the default file browser (the default ones such as File Explorer or Finder are not really good tools IMHO) and go through the directory structure top to bottom to get to the item. This requires selection decisions on each level of the directory tree. You need to either locate the target file or select the next (sub-)directory to go to. Some issues related to this are:

  1. The decisions between sub-directories on each level are not distinct. Unfortunately, you can not put real-life items in a totally strict hierarchy without logical conflicts. Whatever structure you're coming up with, I can easily construct endless examples where its uniqueness fails. This is a crucial thing to know when designing complex hierarchies. This is also a well-known disadvantage of dated concepts like the Dewey Decimal Classification. Don't get me started on this one.
  2. Sometimes it is hard to recognize that you're not going to find the item in this sub-hierarchy. You have to go through a number of locations until you realize that you followed the wrong sub-hierarchy. This is lost effort and often leads to the wrong conclusion that the file you're looking for does not exist here. This is the worst case scenario for every retrieval task.
  3. The decisions you want to use to navigate through your directories are usually influenced by your current mental context. This context is different from the mental context when storing items. For example when you save an image from your birthday party you most likely choose a directory related to your birthday event. When retrieving, you're probably looking for an image of aunt Sally. That is a totally different context and chances are that you don't look for the perfectly nice photograph of aunt Sally in the directory of your birthday because in this particular situation, you forgot that she was attending the party.

In the literature, you will find more on these topics when you look for "semantic cueing". As a side-note, I tend to like "temporal cueing" for certain retrieval tasks. This is why I created Memacs and its eco-system.

Another thing worth mentioning is related to retrieval tasks where you don't exactly what to look for. For example when you are looking for a nice image to use as a background for a presentation slide on the topic of privacy and IT security. It is almost impossible to do using navigation without knowing exactly which image you have in what directory. Serendipity is hard with a strict hierarchy of directories.

Search

In contrast to navigation-based retrieval, desktop search comes to the mind.

On the one hand side, only a fraction of persons are using desktop search for file retrieval even though they do have such a system in-place. All relevant studies show that for file retrieval on the local computer system or local network, navigation is chosen over search in the absolute majority of cases.

Please note that it's not my opinion that this should be that way. I think that all users should be proficient with both retrieval methods and use them according to the current retrieval context.

On the other hand side, search does come with its conceptual disadvantages as well. Without going too much into details, the downsides mostly relate to psychological hurdles to come up with a suitable search query. And: you have to know what you're looking for to a certain degree.

Further more, some desktop search tools are just not well designed. For example, despite the fact that Apple Spotlight search in general works much better than Windows search, the visualization of the results is very simplified and therefore limiting. Fifteen years ago I was using Copernic Desktop Search which was - in my opinion - much more advanced than nowadays desktop search engines.

Search-based retrieval is part of way more scientific papers that navigation-based retrieval. Related to the non-search-related topic in this article here I won't discuss search method here any further.

Life Beyond a Complex Directory Hierarchy

So what to do instead of having deeply nested directories?

My recommendation is that you follow the same rationale that librarians did centuries ago. Don't spend much effort in organizing the files. Follow a very flat hierarchy concept and invest your effort in advanced retrieval methods instead.

Therefore, my approach is to use a very simple hierarchy. To support any complex retrieval approach, I'm using a combination of retrieval methods:

  1. Classical file navigation: mostly done using my zsh (shell) or dired
    • Most of my retrieval tasks are done this way.
  2. File navigation based on tags (see next section)
    • Some of my retrieval tasks do rely on this method that would not be possible with different methods.
  3. When I want to retrieve information of various kind that are related to a specific event, I am using my Memacs setup.
    • It lists all of my data to a great extend of details on a time-line which enables me connecting files that otherwise would not have any connection at all.
    • This is used rarely. However, when I fire up my time-line, it's always a god-like experience. :-)

I'm not using desktop search for retrieval at the moment. On the one hand side I did not find a good implementation yes that does not come with severe disadvantages (performance, usability, result visualization, ...). On the other hand side, my navigation method works quite well in my personal domain. With limiting the potential directories any given file may reside, desktop search also loses some of its necessity. But that might just apply to my personal situation.

You have to come up with your own set of retrieval methods. I don't expect everybody to follow my personal recommendation and approach described in the next section. Up to here, my article should reflect the general case. If you are interested on my personal solution, continue reading.

One of My Methods: filetags and TagTrees

I've created a set of tools and a method to support my file management and retrieval tasks. It was introduced and explained on this article that also contains a video of a talk I gave together with a small demo.

My approach with filetags, date2name, appendfilename, move2archive and TagTrees offers people more efficient file management and multi-classification using tags. Instead of curating a directory structure, you should curate a controlled vocabulary of tags. This way, you can circumvent the strict hierarchy for information. With a decent (but not too big) set of tags, filetags is able to derive a completely directory structure called TagTrees which offers you many different navigational paths to the same file. This time, as long as you don't choose tags that do not apply (which is less likely than directories that do not apply), you will find your file within the TagTrees in each case.

Similar to the index cards of the librarians, the file is represented as a link. And it is represented not once but many, many times. You can use your associative part of the brain instead of the part of your brain that remembers where you've stored the item in a totally different set of context.

This way, you can store your vacation photographs from Italy with your vacation-centric mental context, assigning tags accordingly. A retrieval process within TagTrees allows for going through the navigation-only directories based on the tags and selecting an arbitrary set and order of tags for reaching the very same file. This way, you can find a nice image of a medieval gate from that castle for your slide deck even though you did not remember that you took that photograph with the old locks and coatings.

Distinct Sets of Files

I don't think that a single big pile of file would work a person. There are some types of data I keep in separate sub-hierarchies.

You can't apply general patterns to source code repositories. I'd never ever suggest to use anything different for source code repos. A repo is atomic, in this sense. However, IMHO you should not build a complex hierarchy for your different repos. But that's a different debate.

If you do have that many books, you can decide to maintain a separate sub-hierarchy in $HOME/books/ with its own controlled vocabulary (CV) specifically curated for managing books. There, you can use tags like "research", "pim", "retrieval", "psychology" to tag a specific book. With filetags, you can have different CVs for different sub-hierarchies of directories.

By using TagTree navigation in your file browser, you can then quickly navigate through thousands of books using your tags and limit your "skimming folder" (where you end up locating the item of question) by using two or three tag folders. You won't need more and if your CV is a good one, you end up with only a handful of items to choose from.

And guess what: this way you'll recognize books you already have forgotten about. Much more likely than in a strict hierarchy. Also an interesting, different topic.

Tagging Is Not Like Tagging

There is a whole universe between one method/system using tags and another. Most of the tag-based systems I've seen are crap. Mine is also far from perfect but in my daily work, it turns out to be useful or at least a significant step forward.

Keep in mind that the process of tagging is something that needs to be learned as well. Curating a nice set of tags in a controlled vocabulary is a good start. In general: tags that are rarely used are bad tags and tags that are used too often might be also not good tags. There is a slide dedicated to the topic of "good tagging practice" in the video linked on this page (starting with 24:30).

Use What's Working

My retrieval tasks are not tag-based all the time. Not at all. I've got a quite good map of my files in my head. Since I'm using a flat concept, I end up in the target folder with the first step. For this, I'm using advanced folder jumping tools based on frecency such as z or my-dired-recent-dirs(). Then I'm locating the file using grep or similar filter mechanism if necessary. No tags involved at all.

TagTrees are perfect for locating files to a given topic. Such as locating books for a specific topic or choosing images for presentation slides or managing movies.

I'm not thinking that the method I personally use is the perfect choice for everybody. However, for people who care looking for an alternative way of doing things, my tools and methods do provide a nice pool to choose from. At least I tend to think that my methods do provide reasonable answers to the issues that come with directory hierarchies and a large amount of files.

If you have your own method or experience you would like to share, please comment below!

Comment via email or via Disqus comments below: