Holistic Engineering

A random assortment of shit with sprinkles.

Stop Using Bourne Shell

| Comments

Specifically, for scripts. Feel free to use the shell all you want for your command-line goodness.

In this article I’m going to assert that not only are shell scripts harder to read and write (which is not very hard to prove), but that they don’t really win you anything in the performance department, either. Frankly, this is a side-effect of more powerful machines, and more complicated shells and shell subsystems than anything; don’t throw your interactive bash functions away just yet. Similar criticisms could be made about make, but that’s a rant that’s been driven into the ground already.

And I’m just going to get it out right now, for those of you reading that still think a pair of suspenders and a giant beard are still in fashion: C Shell doesn’t solve the problem either.

Been Dazed and Confused For So Long

Some of the earliest criticisms I ran into about the bourne shell was from the Unix Hater’s Handbook. There was no shortage of examples, which I strongly recommend you read if you haven’t.

Bringing in our own examples, let’s start small. Here’s a great way to copy foo to bar while being completely confused as to how you managed to do so:

cp.sh
1
2
#!/bin/sh
cat bar foo >bar

Which works because bar is overwritten (to a zero-size file) before cat executes, which then concatenates both files, bar and foo, but because bar is now empty, only the contents of foo end up in stdout, which then get sent to bar. Voila, cp!

Let’s do something more “complex”. Maybe you’d like to take all the files in a directory, loop over them and do something magical:

loop.sh
1
2
3
4
5
6
#!/bin/sh

for i in $(find . -type f)
do
  cat $i
done

And then someone does this in your directory:

trollface.sh
1
2
3
#!/bin/sh

echo "HA HA" >"my cat's breath smells like cat food"

And suddenly your script breaks, but does so silently. set -e to the rescue, but we haven’t really solved the problem. find with -exec would work here, which is fine until we need to do something more than run one command, which we could solve with shell functions in a completely, wonderfully indirect and ugly way. Let’s do something “straightforward” that lets us do more than one thing.

director_of_shell_engineering_at_bell_labs.sh
1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/sh
set -e

filelist=$(find . -type f)

while [ "$filelist" != "" ]
do
  line=$(echo "$filelist" | head -1)
  filelist=$(echo "$filelist" | tail +2)
  ls "$line"
  cat "$line"
done

True story, I spent about 10 minutes getting this code right in a test directory. And the great thing is now, depending on the system you’re on, this doesn’t work either. Why, might you ask?

Let’s talk about what #!/bin/sh means to different systems. First off, be forewarned that most systems today use bash as the replacement for /bin/sh. Most of the stuff in here are builtins in bash, which means they run directly in the interpreter. In Bourne Shell, though, things like echo and [ (yes, look in your /bin directory) are not, and actually are separate binaries on your system that run in subshells.

Why does this matter? Two major reasons; one, you are writing against bash, which we’ll get into why that’s not really solving problems later, and two, if you run into a proper Bourne Shell with echo and [ invoked as real programs, those quotes in the script work completely differently.

To understand this, let’s look at what happens in this line:

iterators_in_sh_are_hard_lets_go_shopping.sh
1
line=$(echo "$filelist" | head -1)

First, $() is equivalent to backticks which markdown is unfortunately completely failing at letting me express. So, the innards of it get evaluated first. These variables are turned into meaningful content and sent to a subshell that runs echo, do not pass go, do not collect $200. This is partially why the quotes are there; if we stripped them, multiple lines would get represented as multiple arguments as the variable is interpolated into a string of content.

The real fun is in this line though:

omg_a_real_shell_mom.sh
1
while [ "$filelist" != "" ]

What happens here? Can you tell why it’s broken?

Here, I’ll give you a hint: is this a syntax error?

yep.sh
1
[ != ]

Obviously so, since there are no arguments on either side of the conditional. This happens in the plain case when $filelist has content (the right side is still missing) and the case where $filelist is empty, which is the predicate for terminating the while loop (both sides are empty). if statements and other bourne shell reserved words have the same problem.

This is because [ (which is a synonym for test, which is where the manual lives) is a command invoked in a subshell, but the whole command is evaluated first, then passed to /bin/[.

Feel free to try this yourself: /bin/[ "" != "" ] This way you don’t hit the builtin, but your shell still handles the variable/quote system. You may need to escape the [ and ] if you’re on zsh.

Here’s the solution to the problem, which you might have encountered in scripts (it’s a fairly common pattern) that need to be portable:

fixed_yo.sh
1
while [ "x$filelist" != "x" ]

Why does this matter?

Remember folks, we just walked a list of files in an error-free and portable way. That’s it. We could talk about how globs are going to save the world and how every globbing system works differently on every shell and even between versions of shells or settings within the shell. Or we could talk about how convoluted this gets with globs and emulating additional parameters to find that find content based on predicates. We could talk about how using filenames with spaces in them is retarded and you should feel bad for letting them on to your filesystem. We could talk about this all day and probably for the rest of our lives, but the reality is I’m about to show you an easier way.

Here’s an example in ruby:

better.rb
1
2
3
#!ruby

Dir["**/*"].each { |file| system(%w[ls #{file}]) if File.file?(file) }

Here’s an example in perl:

better.pl
1
2
3
4
5
6
#!perl
use File::Find;

find(\&wanted, ".")

sub wanted { system(qw(ls $_)) if -f $_ }

Both examples are not very great, to be frank, but have none of the problems your shell script will. They will not break on files with spaces in them. They’re also shorter, and arguably easier to read; they even don’t try to reinvent ls, although both are more than capable of doing so, and there are lots of good reasons to do it the ruby/perl way than calling system. They also work portably; it’s trivial to adjust these programs to run on Windows, for example, if you want. Good luck doing that in bash.

BERT ERMAGHERD PERL AND RERBY ER SER SLER AND YERS TER MUCH MERMERY

Let’s examine that for a minute. I’ve been in this profession long enough to remember working on servers where the mere suggestion of running all your init scripts against bash would probably put you in a position where you were considering touching up that old resume, “just in case”. bash is now the du jour shell on most systems and is frequently enough used for the init system, which is where you’ll find 99% of this argument lives. It does solve some of the aforementioned problems with more traditional bourne shell. bash has arrays, locals, even extended predicates for the builtin test and a million other things that really don’t make shell any easier to read, but certainly more functional. In fact, I’d posit it makes the problem worse, but going through bash and zsh’s feature set is beyond the scope of this article.

But really, and this is the real meat-and-potatoes of this argument, bash scripts don’t solve a problem in our modern environment that other tools don’t solve better for an acceptable trade of resources.

irb, which is close to a realistic baseline for a ruby program, comes in at about 10k resident. perl -de1 comes in at around 5k resident on my machine. python with enough to run os.system() comes in at 4.5k. bash comes in at 1.2k. This machine has 16GB of ram and fits on my lap. Any server that has memory issues these days is not going to have it because its supporting init tools were written in a non-shell language.

To address runtime speed with extreme hyperbole, this argument doesn’t even hold water if you were booting services on your phone. Your phone, if it was made in the last year, probably has multiple cores and a clock speed of greater than 1Ghz per core. It’ll deal more than just fine with some small perl scripts that launch programs.

As for support, go on, find me a popular linux distribution that doesn’t have a copy of perl or python installed already. Don’t worry, I’ll wait. Emphasis on popular, as anyone can build a useless linux distribution.

Statically linked /bin/sh is somewhat hard to find by default these days as well, so that argument can go do something to itself too. Besides, there’s nothing keeping you from statically linking a modern scripting language.

To put it another way, shell for init at this point is the technical embodiment of the peter principle. The level of effort (go on, look if you want) to support shell as an init system is staggeringly complicated on any unix system these days, and hard to read and modify. Tools like FreeBSD’s rc.subr and ports are notably gory mixes of shell and make that force even the most hardened shell programmer to run away screaming in terror; and this is largely evidenced by all the modern supporting tools being written in something that’s not make and not shell. Linux systems are no better, look into sysconfig and debian’s init orchestration and you’ll find that it’s not that much different, and we’ve not even started into how the init scripts themselves look. Tools like autoconf are finally seeing real competition, where a key feature is abstracting the shell away from the end user.

The existing solutions are so complicated largely because these systems are compensating from limitations in shell; for example, a simple export can completely alter the way your shell program works, whether or not it’s actually expecting that to happen. Getting quoting right is a notorious pain in the ass. Parsing is a pain in the ass. This shows in supporting tooling too: tools like /usr/bin/stat exist solely for the purpose of driving shell scripts and are considerably harder to use than the calls they emulate. This is long before we get into the portability of any of this, as our aforementioned savior find is one of the biggest offenders when it comes to cross-unix feature sets. Don’t even get me started on getopt.

So, in conclusion, unix peoples: build a better system. I don’t care if it’s in perl, python, ruby, lua… I mean, I just want to solve problems and I’m sure someone else has the time and interest in these arguments. I’m just tired of building “works for me” tools, or portable ones that are fragile and incomprehensible, simply because of the language choice that was forced upon me to remain compliant with the system. As configuration management starts taking a more prominent seat in driving how we make decisions on how to build systems, having a portable base will get more and more important, and shell just has risen to its level of incompetence.

Comments