(One)-minute geek news

Group talks, Laboratoire de Chimie, ENS de Lyon

Pathname expansion


Bash features multiple expansions that occurs before the execution of a command. Among them is the pathname expansion, that is usually underused.

Basic expressions

Bash expands paths that are given in the form of a pattern, by replacing the pattern with all matching paths (among acceptable paths), space-separated.

Pattern can be seen as regular expressions for paths. But keep in mind that special pattern characters have very much different meaning from usual regular expressions operators in the Unix world.

Let us recall the basic operators:

  • * (usually called wildcard) matches any string that does not include a slash / (that could complete a basename, to produce a valid path). Example: cp2k/test/*.xyz matches all XYZ files directly located in the cp2k/test/ directory (but not in its subdirectories: cp2k/test/water/H2O.xyz would not be matched...)
  • ? matches any character (similarly). Example: cp2k/test/H2?.xyz will match cp2k/test/H2O.xyz and cp2k/test/H2S.xyz but not cp2k/test/H2O_old.xyz
  • [...] matches a character among the set described by .... Multiple type of sets exist, and can be combined together:
    • Basic sets. Example: []a9a!^-] would simply match a single ] or a or 9 or ! or ^ or - (note that this is probably the most tricky basic set...)
    • Expression ranges. Example [3-6] would match a single digit between 3 and 6 included (this is therefore equivalent to [3456]), [r-u] would probably be equivalent to [rRsStTu]... This depends of the sorting order used on your machine, and with most default configurations, the order used is (non exhaustive list, tested on the PSMN): `^~<=>|_-,;:!.'"()[]{}@$\&#%+012...789aAbBcCdDeEéèë...xXyYzZ
    • Character classes: [[:class:]] matches a single character belonging to the class class. Example [[:lower:][:digit:]] would match either a single lowercase letter or a single digit. The following classes are available (in the POSIX standard): alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit (for detailed informations, see this page).
    • Negations: If your selection starts with a ^ or !, the selection is among any character but the following set. Example [^a[:upper:]] would match any character that is not an uppercase letter, nor the character a.
A pattern is recognized as such by Bash if it contains at least one unescaped special pattern character above. These operators can be combined in a same pattern.

Globstar: a hidden gem

But Bash usually comes also with a largely unknown feature: an additional pathname expansion mechanism triggered by the syntax **, called globstar.

The idea is simple: a globstar matches any string that completes the pattern into a valid path, without the no-slash restriction. In other words, the globstar is some sort of recursive wildcard!

Confused? Let us see an example: grep '^Ru' cp2k/**.xyz would check the Ru atoms of all your XYZ files within the cp2k directory, and all its subdirectories (it would match cp2k/first_try.xyz, cp2k/test/H2O.xyz, cp2k/single_point/alumine/old/surface.xyz, ...). The wildcard-only equivalent would be: grep '^Ru' cp2k/*.xyz cp2k/*/*.xyz cp2k/*/*/*.xyz cp2k/*/*/*/*.xyz ...

This simply awesome feature is yet largely unknown... Why? I guess because it has to be enabled first (on most Bash configurations, by default) with the (not really intuitive) command shopt -s globstar. So add it to your .bashrc already!

Customize your .bashrc

Enable globstar by adding the following lines into your ~/.bashrc file:

# If set, the pattern "**" used in a pathname expansion context will
# match all files and zero or more directories and subdirectories.
shopt -s globstar