(One)-minute geek news

Group talks, Laboratoire de Chimie, ENS de Lyon

Links


On a UNIX-like machine, everything is a file (seriously, it is impressive when you think about it, but every command is the name of a file (yes, even the command/file [)), environment variables are stored in a file, directories are files (special files of course, but still files), even the standard input and output of each process are files!

Of course, there are different types of files (beware: I am not talking about the different formats of standard files): standard files, directories, fifo, links, ... Here, we will focus on links, as they can be powerful (we will see it in the next trick) but tricky for the unexperienced user.

The basics

There are two different kinds of links: hard links and symbolic links (also called soft links).

The distinction is crucial, since there are huge differences between them.

However, both can be edited/created using the command ln

Soft links (symbolic links)

Soft links are basically shorcuts. There are the UNIX-equivalent of Windows' links.

They can be thought as files simply containing a path to another file. When a soft link is accessed, the user is redirected to the file pointed by this soft link. Please not that the targeted file can be of any type. In other words, you can link standard files, directories, ... even links themselves.

To create a soft link "my_link" pointing to the path "target_path", simply run ln -s target_path my_link

By default, this command requires that the file "my_link" does not already exist. But you can force the overwriting by using the -f command-line option of ln: ln -s -f target_path my_link

Beware: the path stored is not checked nor updated. So if you create a soft link to a path that does not exist, your link will be broken. And similarly, if you create a soft link to a file that you subsequently move, your link will be broken too.

Besides, as you may know it, a path can be stored in two different forms on a UNIX-like system:

  • Absolute path: an absolute path does not depend on the current working directory (e.g. "/usr/local/bin" ; "/proc/cpuinfo" ; "/sys/devices/virtual/thermal/thermal_zone0/temp"). They start from the root directory "/", and represent a branch in the tree-like filesystem structure.
  • Relative path: a relative path depend on the current working directory (cwd) (e.g. "../" (the parent directory of the cwd) ; "./sub_dir/" (the directory named "sub_dir" in the cwd, equivalent to simply "sub_dir/") ; "dummy_filename" (the file named "dummy_filename" in the cwd, equivalent to "./dummy_filename") ; "../../data/../data2/CONTCAR" (they can be complicated)). They start from the current working directory.
I recommend you to avoid creating soft links on relative paths, as it can be pretty tricky for the inexperienced user. But if you still want/need to do it, here are some pieces of advice:
  • If your link path is relative, be aware that it is relative to the path of the link. Meaning that if you move your link, it might be broken.
  • I insist that the relative link path is relative to the path of the link. This is particularly tricky if you want to create a link in a directory which is not your current working directory. (e.g. ln -s ./my_file ../my_link will create a soft link to the file "../my_file" (relatively to the current working directory) (which is "./my_file" relatively to the link "position")). For such cases, you might want to consider the -r option of ln (e.g. the previous example is equivalent to the command ln -s -r ../my_file ../my_link (and allows for the autocompletion of the targeted path argument)).

Examples of application:

  • You are currently working on a few directories that have very long names, and are tidious to write (even with autocompletion), you can just create a shortcut! (e.g. ln -s ~/simulations/PhD/first_year/mol_CT/Pt/6x6x6/ ~/cur_simul/)
  • I have actually extended this directory bookmarking with a tool facilitating bookmarking management available on GitHub: INSERT LINK. Enjoy!
  • You want to have identical copies a file (whose path will not change), and make sure that all those copies are always syncronised, without using too much disk space. (e.g. you want to try different input parameters for the same structure (whose coordinates are in "~/simul_dir/structures/water_test/POSCAR"). Each directory dedicated to a specific simulation can contain a link to "~/simul_dir/structures/water_test/POSCAR" ("CUTOFF_50/POSCAR", "CUTOFF_75/POSCAR", "CUTOFF_100/POSCAR") without unnecessary disk space lost!)

Hard links

For understanding hard links, it is necessary to understand how files are stored on the physical hard drive (the following is highly simplified, see here for more details).

First, let us remind ourselves that most of the physical storage of a disk is divided among the partitions present on that disk (I won't talk about how partitions are handled...). When I will talk about disk space/blocks, I will refer to the disk space/blocks allocated to the specific partition considered.

The content of a file is stored on some blocks of the hard drive. The addresses of these blocks are stored in an inode (along with numerous other metadata about the file: size, date of last modification, ..., but not its filename (since there can be many, but the number of filenames associated is stored)). Those inodes are themselves stored in a dedicated part of the disk.

Directories are just files whose content is a table associating each filename "contained" in this directory to its corresponding inode number.

When a file is deleted (with rm for example), the directory entry pointing to its inode is deleted (inside the file that is a directory), and the counter (located in the file's metadata, in its inode) of filenames (i.e. directory entries) pointing to its inode is decreased. If this counter reaches 0, the inode is freed. (Note that the data associated with this file was not erased. It will be rewritten eventually, but as long as it is not rewritten it can still be recovered!)

Why using inodes, and not directly associating a filename to block addresses? So that it is possible to have a file (i.e. indexed by its inode) associated with multiple filenames. So basically to allow hard links (which is not available on Windows for example).

So a hard link is an entry in a directory that refers to an inode shared by an already existing standard file.

It is a link to an inode, and can be seen as a link to the contents of a file. A same standard file can have multiple hard links refering to its content (basically, multiple path/filenames), and is considered non-deleted as long as there exists at least one link to its inode.

So, a soft link can be seen as a link to a path/filename, while a hard link can be seen as a link to an inode.

To create a hard link to an already existing file, simply use the command ln without the -s option: ln target_path my_link

Advantages:

  • Renaming or moving a file will not impact its inode. Consequently, a valid hard link will not be broken if you perform those operations on a the target file.
  • Instead of having a "real" file, and multiple links to the path of that file (soft links is a file, with an associated inode, ...). With hard links, you only have a single file (so a single inode), and multiple directory entries to that pointing to that file.

Drawback:

  • The target of a hard link cannot be a directory. (because it would break the tree-like structure of the filesystem, since hard links are not distinguished from "original" directory entries, unlike soft links which are separate files)

Take away idea

If you want to duplicate a file without further editing its content from the original, use ln instead of cp

So, less cp, more ln...