(One)-minute geek news

Group talks, Laboratoire de Chimie, ENS de Lyon

Sponge


When you edit a file "in-place" with sed or awk, ... you usually and up with an empty file at the end. In this geek news, we are gonna see how to deal with such a case.

The reasons

When you attempt to read from a file and write to that same file in a single command (or pipe), you usually end up with a truncated file (i.e. empty).

To simplify, in that case, you usually work with some kind of workflow: everything is processed as it comes, without retention of data. This was cleverly designed to avoid the creation of intermediary stuff (i.e. files mostly).

One big consequence being that your output file is very likely to be written on while your input file is read. Usually, this is fine, unless those files are the same. In which case, you end up rewritting your file with what came at the beginning in the workflow (i.e. nothing).


Example: sed 's/e/a/g' test > test will simply truncate your file test

Use temporary files

You can of course use temporary files: sed 's/e/a/g' test > tmp_test;mv tmp_test test (assuming tmp_test did not existed before, otherwise oops...)

Trick of sed

If your command is a single sed command, you can use a trick.

Simply add the -i option to sed to allow true "in-place" modifications.

Our previous example becomes: sed -i 's/e/a/g' test will now performs as meant.

Otherwise

Hovewer, sometimes a single sed command is not convient for your problem (well, theoretically, you could do about anything with a single sed command since it is turing-complete, if I am not mistaken...). So what can you do.

Fortunately, you were not the only one with this problem, so that a small utility was developped for that very purpose: sponge.

An example is worth more than a long description: awk '{print $1}' test|sponge test will not simply truncate your file, but work as intended.

How does it work, well let's say it soaks up the data before writting it. Basically breaking the workflow.

The only problem is that it is rarely installed by default... So I propose you a replacement script stee to add in your .bashrc

Source code

stee () { function finish { echo "cleaning tmp file";rm -rf "$tmp_file"; };trap finish INT TERM EXIT;tmp_file="$(mktemp)";cat - > "$tmp_file" && mv "$tmp_file" "$1" || finish; }

Enjoy!

References

man bash