π

Design Decisions: File System Links versus LNK Files

Show Sidebar

Update 2018-08-25: comment by goldfire

This is the second article of my Design Decisions series.

It is a rather long story about various aspects of IT history and their impact for user-facing features. When you read through the end, you are rewarded with some insight on the differences between file link features and how they evolved during the recent decades.

Why Linking?

Please read this blog post about the Desktop Metaphor and why it is considered being a hindrance in nowadays virtual worlds. I'll wait here until you return.

Back from the other article and understood its point? Good. Let's continue.

Links within a file system might provide a workaround to the limitations of the desktop metaphor until we got better systems that overcome the those limitations of real-world metaphors in virtual computer systems. I was and I am working hard on tools and methods to overcome those limitations with nowadays systems and to define new golden standards for systems yet to come.

Keep in mind that for those reasons, file system based links ought to have a much more dominant role in our daily work as they do have now. I, personally, do use links quite intensively in my personal workflows. They simplify my digital life quite well.

Disclaimer

This blog article is not based on deeper knowledge of the design decision process at Microsoft back in the 90s. I don't have any special insider information.

Unfortunately, I could not find any source describing the design decision process. If you do have such a source, please drop me a line (links below). In general, the research for this article got no deeper than web pages offer on this topic. If I wrote something wrong, please drop me a line so that I am able to correct the facts.

As always: this article is not here to blame anybody. This article is here to give us a lesson on things that might have been solved differently.

Situation Before the Design Decision

Traditional POSIX/UNIX-compatible systems have two distinct features within their file systems that provide the possibility of having access to one file or directory (same as "folder") at multiple places within the file system hierarchy.

Unfortunately, those possibilities to have something on many places is not something the usual computer user is using. This is partly because computer users are not educated enough and partly because the technical implementation are not supporting serendipity or good usability. The story on latter one in the case of Microsoft Windows is discussed in this article.

If you already know about symbolic links and hard links, you might skip this section and continue reading "The Decision".

Inodes

For understanding the differences between link types, we are going to discuss some details of file systems.

An important basic data structure of many modern file systems is called inode. For the sake of this article, an inode is the smallest entity a file system is able to reference. They have a fixed size and therefore, files usually consists of a list of linked inodes where its content is stored.

There are different kind of inodes: directories, files, symbolic links, data inodes. Each inode has its unique number, the inode ID. With those IDs, inodes may be referred or linked to. This way, a directory links to the inodes of contained sub-directories and files. Each file stores its list of data inode IDs that holds the content of the file. You get the picture.

A file or directory inode stores associated meta-data such as file permissions, creation time, access time, modification time, and so forth.

Now let us dig into some link types.

Hard Links

The first possibility to link items is called a "hard link" (or "hardlink"). As an end-user you can think of a hard link as a set of file names that are uni-directional links to the data representation of a file or directory.

Usually, a file has one association between the file inode and its list of data inodes. When the file gets deleted, both (file inode and data inodes) get freed up.

When you have multiple hard links (file inodes) pointing to the same data representation (data inodes), the data inodes have a link counter that is greater than one. When five file inodes link to the same set of data inodes, the data inodes store a link counter of five.

If one of the five file inodes get deleted, the link counter of all data inodes gets decreased by one. Only if the last remaining file inode pointing to the data inodes get deleted, their link counters reach zero and the data inodes get deleted as well.

When the path location of one file inode is changed (move file), the hard link still points to the correct data inodes. This is all managed within the file system itself without bothering the operating system layer above.

For example, you have a file A Document.txt and you create a hardlink which is named Another Document.txt. Actually, you can't tell which one was the "original" one. When A Document.txt gets deleted, Another Document.txt still exists (having the very same content). There is no original and its hardlink - they both are equivalent.

Symbolic Links

The second link type is called a "symbolic link" (or "symlink"). The symlink inode directly stores the path to a file inode and not its ID.

As a simplification, you can think of a symlink as a stored path to its link target as text. Please note that symlinks are not implemented as files per se. They are a feature of the file system itself and therefore, symlinks are transparent to programs. For a program, it doesn't matter whether or not it is working on a symlink or its link target. This is very important to remember for the rest of this article.

Due to the fact that a symlink stores the path to its target (instead of its inode ID or the list of data inodes), renaming or moving the target result in a symlink that points to a non-existing target. The symlink got broken.

Implications and Characteristics of Symlinks and Hardlinks

Although symlinks as well as hardlinks are features of the file system and therefore are transparent, there are differences that are interesting to know.

As mentioned above, symlinks are different from hardlinks as they might link to a non-existing target and thus may get "broken". On the other side, symlinks do have a big advantage compared to hardlinks. Hardlinks are limited to link file names to their content (data representation) only within the very same file system partition. If you have multiple hard disks or if you have multiple partitions on one disk, you can not have a hard link target and its content on different partitions.

In contrast to that, symlinks can link to arbitrary link targets. This holds true even for detachable storage such as USB thumb drives. When the thumb drive containing the link target is attached ("mounted"), the link is working. When the thumb drive is not attached, the link is broken. This is a good thing to have in many situations.

Symlinks are clearly visible as such in typical file browsers as well in the shell. In contrast to that, you can't tell whether or not a hardlink is a hardlink or the "original" file since there is no difference between the "original" file and its hardlinks.

As you already know by now: for symlinks, there is always the original file and one (or more) symlinks to the original. Whereas the set of hardlinks are not indistinguishable from each other. They just link to the same data representation, their content.

Software that deals with files, like for example synchronization tools, do have to decide how to handle symlinks. They either respect their link character and synchronize them as links which might lead to broken links on the "other side" (no original on the other side or it has a different location). The other possibility to handle symlinks is to resolve the symlinks, synchronizing the content of the original file which results in a normal file in on the "other side". The latter one has the advantage that there will no broken links after synchronization. For hardlinks, synchronization software does not have to respect anything since each hardlink is an original file in its own.

A simplified and somewhat "filtered" example showing a directory that contains two files that are hard linked and a symbolic link.

The Decision

Microsoft developed the FAT file system in the late seventies. It was the default file system for all operating systems from DOS (the successor of the Quick and Dirty Operating System - no joke) onward until it got replaced by Windows NT-based operating systems with their more advanced NTFS.

When creating Windows 95, Microsoft became aware that they need a possibility to provide links. On the one hand side, they wanted to provide access to files or folders on more than one place. On the other hand side, they wanted to assign meta-data such as icons to executable files.

So they had to come up with a design and a technical decision that would introduce a new feature to their operating system.

The FAT file system does not support something comparable to Alternate Data Streams (ADS) from NTFS which would have been a method to extend FAT functionality with new functions. Differences between file systems that provide something like ADS and that do not provide such functionality still result in files like FINDER.DAT, RESOURCE.FRK, or .DS_Store (macOS). You might have stumbled over one or two of them already.

Instead of implementing the well-known concept of symbolic links and hard links in their FAT file system, Microsoft decided to introduce something completely different: they invented LNK files. LNK files are also referred to as "shortcuts" or "link files".

File Explorer showing the context menu of a file which offers a shortcut creation feature.

LNK files contain binary data (not human readable) in normal files with the file extension .lnk. Their file content needs to be read and interpreted by any software that needs to resolve the link. Therefore, LNK files are not a feature of the file system. They are files with a certain kind of content which needs interpretation. This is very important to keep in mind for the discussion below.

A user is able to create a LNK file in the File Explorer by various methods:

File Explorer showing the drag-and-drop method to create a shortcut.

LNK files are similar to .desktop files from the UNIX/Linux universe that were established later-on but are not a user-facing feature such as symlinks or hardlinks. Their focus is on adding meta-data to executables such as icons and translations in contrast to link arbitrary files to different locations within the file system hierarchy.

File Explorer showing the properties of a LNK file.

Microsoft hides certain things from the user. For example, from Windows 95 upwards, the UI hides file extensions by default. Please change this behavior for security purposes now. The principle of hiding parts of the truth is also very common on Apple systems and might be a "bad design decision" article on its own. However, even when file extensions are shown, the LNK file extension .lnk is hidden on Windows.

The same LNK file is shown in File Explorer and in the cmd.exe application.

It was intended that double-clicking a LNK file is equivalent to double-clicking the file to which it refers. This is true for most cases but there are still situations where this does not work as expected and results in annoying "missing DLL" errors when the properties of the LNK file and the link target differ.

In order to mitigate some cases of broken links for LNK files, Windows 9x-based versions of Windows use a simple search algorithm to fix certain broken shortcuts.

So much about the thing Microsoft came up with. But how about the negative impact that came with this decision to use LNK files instead of hardlinks and symlinks?

Consequences

Not using the proven concept with symlinks/hardlinks and developing a completely new thing came with negative consequences. Here is a selection of the most annoying effects of that decision.

End-User Software Support for LNK Files

As we have learned, LNK files are not handled by the file system layer. They are ordinary files in a binary format. Every view or editing program that needs to resolve the contained link needs to implement an algorithm that is able to interpret the content.

This way, when you implement a text editor, you have to add additional functionality in order to open up linked files using LNK links. Therefore, for many years, there were many programs which could not be used on LNK files. That meant that LNK files are not a general way of linking files.

Symlinks as well as hardlinks do not require special handling within editing software tools. A text editor does not have to contain algorithms to open up symlinks or hardlinks different from normal files because they are a file system feature, solving the issue on the file system layer and not on the application layer.

Short recap from above: For symlinks, copying tools (re-creating items) have to deal with symlinks (resolving or copying) but editing tools do not have to handle symlinks any special. For hardlinks, no special handling is needed even for copying tools.

When dealing with LNK files, any software (copying as well as viewing/editing tools) has to implement LNK file support on their own. You need to write the routines to open a file and then you need to write the routines to open a LNK file separately. All applications. Bad idea.

Performance

Once again: symlinks/hardlinks are features from the file system and LNK files are file like any other files on-top of the file system.

Now imagine a software tool that creates many links at once. This is not an unusual thing to do. There are many backup solutions that are using symlink/hardlink mechanisms to avoid redundancies when creating incemental backups that still contain a full set of files.

I developed tagstore and filetags that generate TagTrees. TagTrees are a clever trick to combine navigation and search for file retrieval. In order to implement TagTrees using file systems of nowadays operating systems without introducing redundancies, I had to use links. Because of the nature of the concept, TagTrees tend to use many links. The number of links required for a given set of tags and files is exponential to the number of tags and files.

Long story short, my tools generate many links. My tools support macOS, Windows and GNU/Linux. All operating systems except Windows support symlinks and hardlinks. Therefore, my macOS and GNU/Linux tools are using symlinks. Since they are file system features, the software tells the file system to create a link which is done quite fast.

Windows has to use LNK files instead. Creating a LNK file uses the application layer (my tools or File Explorer), the operating system layer, and the file system layer. Internal analysis of the things going on when a single LNK file is created resulted in dozens of internal layer switches which takes much longer than a simple call to the file system layer to create a symlink/hardlink.

# from: id:TSTagTreesperformance

Back in 2012, we analyzed the performance differences on three different systems. Each system ran the set of tests three time and their mean values are summarized below. Unfortunately, I did not write down all the hardware details. This is not that severe since operating system I/O is the bottle-neck for this operation and differences were high enough that hardware differences was almost negligible.

The following table summarizes the time spent to create the links for different number of tags per file. The more tags, the more links had to be created.

Tags per item h1 [s] h2 [s] h3 [s]
4 0.02 0.35 0.46
5 0.08 1.26 2.39
6 0.38 7.55 14.09
7 4.13 51.06 104.75
8 88.87 421.05 798.04

As you can see, although the first hardware has the poorest general hardware performance, creating the necessary amount of symbolic links is much faster than generating the same number of LNK files on the two Windows systems with the better hardware.

This has a huge performance impact when you run a backup solution that is using links, when your file server has to resolve links and so forth.

NTFS Features Related to Links

Now as we have learned how Microsoft introduces links to their FAT file system, let us discuss how they used their experience to introduce link in their modern file system, the New Technology File System or in short: NTFS.

I do have to apologize upfront: this is now getting a bit complicated. Certain features were introduced with NTFS versions that shipped with various Windows versions, existing features were modified from one version to the other, features changed their complete nature, names got changed and overall it really gives you the impression that there was never a clear strategy related to these things.

The Alternate Data Streams were already mentioned above. They were introduced with Windows NT 3.1 in order to support Services for Macintosh (SFM). This was discontinued later-on. Currently, besides of various malware that is using ADS to hide from the user, you only know ADS when you get asked if it is okay to start a downloaded file (once). This information is stored within these streams. Copying a file from NTFS to a different file system does not preserve information stored in ADS.

More interesting for the purpose of linking stuff are the NTFS reparse points which were introduced with Windows 2000. They are the technological object within NTFS that allowed extensions. Different from ADS, it allowed for NTFS symbolic links, directory junction points, volume mount points and Unix domain sockets.

So, let's go through the link-related features one by one.

NTFS symbolic links were rather simple links with NTFS 3.0. Later on, they got extended with NTFS 3.1 (Windows XP) so that they could link to more file system objects.

However, they were not exposed to user mode. So bad luck to use symbolic links as a user. Later, Microsoft provided mklink which allowed only Administrators to create links. This changed only with Windows 10 Insiders build 14972 where non-Administrator accounts could create links for the first time.

Another thing that puzzles me is the fact that symbolic links to files are different from symbolic links to directories which are called NTFS junction points (since NTFS 3.0 with Windows 2000). Junction points are limited to absolute paths on local drives whereas symbolic links also allow for cross-partition links or even remote SMB paths.

Further more, Microsoft prohibited certain system directories to be redirected using junction points. They also recommend not to link directories like Users, Program Files (x86), Program Files or ProgramData because this might break things like updates or Windows Store Apps.

As Microsoft shipped Windows Vista, they added another thing to this mess to replace Windows 2000 and Windows XP junction points: soft links. They are able to link to files, directories and cross-partition items. For linking to remote items, the remote system had to support soft links as well which limits the linking ability to Windows Vista or newer.

NTFS hard links for files are similar to the UNIX hard links as described above. They have similar properties and do feature the link counter mechanism but with a limit of 1024 links to a file. They were introduced with Windows NT4 and are limited to same-partition links. Only with Windows 2000, you could use API functions to create or remove hard links. Administrators were able to use the mklink command line tool.

The volume mount points are an interesting story but I will deal with them in a separate blog article on drive letters.

NTFS Links in the Wild

I hope you could follow my lines through the wild NTFS feature show.

The good news is that Microsoft introduced way better methods to link file system items with various NTFS versions. So far, so good.

In the real life, no user is using NTFS reparse point features. The UI does not expose the possibility to create reparse points. Until recently, only the Administrator was allowed to create or even remove them.

But there is even a bigger argument against using reparse points. There is a pattern popular on Windows systems which I don't know when it evolved in history. When an application is opening a file like Document.docx, it renames the file to a temporary name like ~Document.docx starting with a tilde character. As the user modifies the content of the file, changes are written to the memory and to Document.docx. When the user closes the application, the temporary file ~Document.docx gets removed and Document.docx now contains the updated content.

Despite the fact that Microsoft tells a different story (omitting renaming the original file), I found out that many programs work like that.

This workflow has one big downside when somebody is using reparse points features like symbolic links. As the original file gets renamed to a temporary file, a completely new file is written with the modified content and the temporary file gets removed, this method replaces any links with new copies instead of taking care for links.

This way, tools like tagstore or filetags (mentioned above) can not use file system supported features for linking even when the Administrator limitation was dropped. Links get replaced by copies silently. This is the most important argument against (symbolic) links on Windows systems.

Therefore, Windows users are doomed to use the low performing, limited LNK files instead of advanced NTFS file system features.

Lessons Learned

From a high-level perspective and for the reasons I explained in the first section I tend to write that introducing limitations from the real world to the virtual world is a bad idea. Crutches are an important tool for lame. But it is not a particularly good idea to run a marathon with crutches. In my opinion, we are all running marathons using crutches and we are so used to them that we don't see their negative aspects any more.

However, introducing the desktop metaphor was not the idea of Microsoft, whose design decisions I explained in this article. From that perspective, we should concentrate on different lessons.

I would love to know how the discussion evolved before they invented the LNK file concept. There ought to be good reasons for it. Although I think that those reasons were more likely to be driven by political or financial reasons than arguments that were related to technology or even usability. From my point of view, it simply was not a good technological solution to introduce the LNK concept. Coming to a point where Microsoft recognized the need for links, they should have taken a closer look on what is out there, what the good parts were and where they could even improve.

With NTFS, Microsoft had the once-in-a-lifetime opportunity of a fresh new start. Within NTFS they implemented various UNIX features in order to maintain a certain level of POSIX compliance for Windows NT. With the development of a completely new file system, it would have been obvious to re-implement the symlink/hardlink concept. This concept is not only a well approved one in terms of implementation. All computer professionals knew how to use those features. It was common knowledge to build on. Re-inventing the wheel always neglects those effects and resets each participant right to the start.

Further more, implementing and using a totally new concept takes its toll. Teething troubles are inevitable. This may explain the chaotic picture on how various NTFS features evolve over time.

This thing with the temporary files that replace links with copies is also a new pattern which turned out to break things. Even though NTFS offers various link features since decades, users can't use them because of those wrong patterns.

With decades of experience with POSIX file system features and the ability of using a much powerful hardware, Microsoft was in the position to build their NT system on the shoulders of giants and outperform any competitioner performance- and feature-wise. Instead, Microsoft engineers came up with something that has poor performance, needs adaptation for each an every application that needs to deal with files and requires special handling for the Windows platform.

If you have programming knowledge, take a look on the many exceptions of the rule for the Windows platform in standard programming language libraries such as Lib/os.py of Python. Most of the time, you can differ between "this is how it's done on Windows" and "all other platforms". In some cases, there is not even a Windows pendant for a given functionality at all.

Many people tend to think that this "keeping Windows different" is based on a general strategy of tying people on this platform. You will read about this notion in future articles of this blog post series.

My prediction is that for various reasons mentioned above, we will end up using LNK files on the Windows platform forever.

Comments

Goldfire commented on the reddit-thread on this topic, adding several excellent points that put things into perspective:

Lots of good information here, but you've overlooked a few things that I think are pretty important:
1. You don't seem to be aware of the reason why link creation has historically been a privileged operation. You should look into the security implications of links in Windows applications. More than a few exploits have come out of applications not handling links property because it's non-trivial to get right and they weren't expecting them (since they weren't always supported). [[https://www.slideshare.net/OWASPdelhi/abusing-symlinks-on-windows][Here's a slide deck]] about that by James Forshaw of Project Zero, who's done a lot of great work on this and related topics. 2. Shortcuts are not just links to file system paths. They can link to anything in the shell namespace. That includes things that aren't files and can only be addressed in the shell namespace; things like My Computer ("This PC" nowadays) or the control panel, among others. 3. Shortcuts are more similar to the Windows 3 concept of program groups than symlinks would have been (you could think of them as just a generalization of that concept, in fact). Windows and DOS users were not familiar with symlinks, and adding them would have been a big new concept for those users to have to learn. The conceptual continuity was very important to keep Windows 95 usable for people who were mostly going to be coming to it from DOS or Windows 3, not from any Unix. 4. Shortcuts can contain command-line arguments and other metadata that symlinks cannot implement without also having the filesystem provide a more general metadata mechanism. As you pointed out, FAT did not have this. The trouble is that any design for adding that feature would have to have been highly compatible with existing FAT implementations, and it seems plausible that satisfying that constraint was simply not possible. NTFS had a much more extensible design from the beginning, so it was possible to add support to it for more types of reparse points over time.
In general, a lot of decisions that went into the design of Win32 that look silly today with the benefit of hindsight were really pretty sensible at the time they were made. It's important to consider the full historical context when evaluating things like this.

I can follow goldfire's arguments. Security is important, indeed. However, POSIX-compatible systems do have symlinks/hardlinks and provide even more access to stuff via file handles (RAM, raw devices, ...). I don't say it's trivial but it was solved before.

The notion of providing more than just symlinks could probably solved via the existing BAT file method. Maybe I should think of LNK files as extended BAT files. Interesting thought. The downsides of having to support those extended BAT files still persist though.

Totally agree that the full historical context is necessary. As far as I remember, I had to fight the negative aspects I mentioned above right from the start of LNK files though. My first OS was DOS v3.0 and so I did the whole Microsoft story up to Windows NT when I started to use different OS as well which widened my point of view dramatically.

Thanks for that comment!


Related articles that link to this one:

Comment via email (persistent) or via Disqus (ephemeral) comments below: