Some Approaches To Single-Process, Durable Queueing
Get in Line
As computer professionals, queueing is something that’s close to most of us. After all, from the bottom up – from CPU instructions to network packets to async reactor calls – things get in line.
There are lots of queueing approaches in computer science and in life in general; as a result, we’re only going to talk about a specific subclass of queues: durable ones that specifically are not distributed amongst multiple hosts. These tend to be great models for smaller systems that do not need the overhead, complexity or reliability problems that distributed queues have.
Some options before you dive in
So, this is an exploratory mission; while many of these patterns are in use in production scenarios, you may be looking for something more like the following software if you need to solve this problem:
However, if you specifically need to solve the problem covered in this article, you will find there are no good solutions out there that easily resolve your issue. I could release boltqueue
(below) as a library but I feel it’s more useful as a pattern than an actual thing you code against.
Explain Queues to Me Like I’m Five
Ok, you didn’t ask for this but that’s what the table of contents (lower left opens the sidebar) is for. :)
A queue is a line; at the end of the line things are inserted, and at the beginning of the line things are removed. These are called the “tail” and “head” respectively. Inserting is called a “enqueue”, and removing is called a “dequeue”.
The most basic thread-safe queue
In golang, a basic queue can be modeled as an array with a mutex (a lock that only allows one thing to run at a time), like so:
1 | package main |
Channels
An even easier solution in Golang specifically is channels. Channels are the language construction of a queue which can be shared without copying its data.
Here’s a solution that uses channels:
1 | package main |
Simpler, and much faster given the runtime’s direct involvement.
So, wait, you said something about durable?
Yes, I did! You may have noticed that your queue always starts at 0
and counts up to 9999
. Durable queueing enables us to start at 10000
and work upwards as we encounter new data, between process restarts. Naturally, this involves databases of some form.
The filesystem is a database, right?
Yes, it is, but not without its shortcomings in this regard. Here’s an approach that appears to work, but probably won’t if the disk fills or becomes unavailable. POSIX filesystem architecture was not very well-suited to fulfill this need. That said, many local mail clients interact with filesystem queues via the “Maildir” format, which is essentially a queue.
Here’s a contender for such a queue:
1 | package main |
So, when you start this queue, stop it, and start it again, it will resume where it left off. Try it! Run it like so:
1 | go run main.go /tmp/foo |
And press Control+C
to exit it. ls /tmp/foo
. Notice how it’s full of files? Those are your queue elements; each one will contain its value as the content. Working as intended!
If you want to see it repeat itself from it’s last state:
1 | go run main.go /tmp/foo | head -10 |
Run that a few times. It will yield the first ten items of the queue.
When you’re done, just rm -r
the directory.
So you said this wasn’t kosher. What gives?
On POSIX-compliant systems (pretty much everyone), there’s a thing called EINTR
, it’s a constant that’s short for “Interrupted System Call”. While Golang handles this very elegantly in spots, what it means is that your transactions (the writes to the filesystem) can be interrupted, so they have to be retried. This most commonly occurs when a signal is fired; like the “interrupt” (SIGINT
) signal you sent when you last pressed Control+C
, as well as the head -1
signalling the exit by way of SIGPIPE
, another signal used to determine to a process writing to a pipe that its reader has died.
Again, Golang will handle this well and terminate early or persist the write regardless, but we’re still losing data. If you’re watching carefully, that number jumps a few points when you do either thing from time to time – that’s de-sync from the queue’s internal state and the filesystem.
You can read more about EINTR here. man 7 signal
for documentation on signals.
The solution here is not especially complicated but also deceptively delicate: a signal handler, plus some more detailed file management around reading and writing would be in order to correct some of these edge cases. DB safety outside of the program is a non-starter.
The problem here is not necessarily the performance or overhead of the solution; but the effort required to maintain it and verify it.
So what’s next? More databases?
Yes, let’s try a SQLite database next. SQLite is nice because we can both model it entirely in memory or on disk, allowing for some more interesting deployment strategies in complicated scenarios. It’s also made to run alongside our program, ensuring that signal handling is simple (just close the db).
SQL is its own beast though, and comes with its own, unique woes.
Let’s look at what that approach might look like. Here is a basic queue that just fills a database and retrieves it in lockstep; we can lock the transaction for now with a mutex (more on this later, though) and just gate it that way, using the database as backing storage leveraging very few of SQL’s concurrency management tools.
We’ll be using mattn/go-sqlite3 to manage our database, although you may want to consider using a higher level component such as GORM for a larger application.
1 | package main |
You can run this with the following:
1 | go run main.go foo.db |
Run it multiple times to see the restart. It should never be off.
So, what’s wrong with it?
So, there are a number of issues with this approach. Nothing is inherenly wrong with this approach, though, unlike the filesystem example. You can use this, if all you want to queue are unsigned integers.
SQL is a lot of overhead
SQL has to be parsed, the syntax is optimized and then in almost every database, there is a second layer of optimization that works against indexes and other structures within the database itself just to make a query. While definitely having its place it’s a lot of overhead for us.
SQL transactions have weird semantics
They’re also sluggish and tend to vary between database platforms.
AKA, this is why we’re using a mutex. I have never personally found SQL’s transactional systems to be anything but “clear as mud”, especially when working with more stringent levels of isolation. In this case, a simple mutex mitigates the kind of contention in the database that transactions typically fail at, miserably.
Which brings me to my last point…
It’s slow
It’s just slow. It does the job and if you were on a phone or something and were limited on tooling, sure. Maybe. But there are better options for this problem.
A wild BoltDB appears
BoltDB is a key/value storage engine for golang that is in the spirit of LMDB, another popular key/value store developed in C.
We’ll be using the bbolt variant, for our purposes, since it seems to be better maintained.
1 | package main |
You run this one just like the last one:
1 | go run main.go bar.db # not the same db! ;) |
And you can restart it, pipe it to head
and so on. It outputs two numbers this time, one corresponding to the index, and another that indicates from 0 the items entered during that process run; this number will reset as you restart the process, but depending on how fast you drain you will not see the repeats immediately. The index will not repeat.
Because the signal handler interrupts the fmt.Println
, most of the time we do not see the last value cleared; so there will be a 1 unit gap.
How does it work?
The NextSequence()
calls yield an integer which is then packed into a 8-byte array, which is how large a uint64
is. It is this stored along with your own value in similar fashion, also a uint64
. For return, it simply pulls the first item the cursor yields and deletes it.
BoltDB’s cursors are nice in the fact that they always return the keys in sorted byte order, so this works without ever consulting the tail at all, or worrying about sorting. The db is happily persisted and all is good in the world; if you’re feeling paranoid, a signal handler will finish the job here.
Advantages
There are a lot of advantages to this pattern and it’s one I think I’m fairly comfortable using and talking about, so here are a lot of advantages over the others:
Locking is a solved problem
The Update()
calls automatically enter a transaction when the inner function is called; if an error is returned, the transaction is automatically rolled back. You may notice I opted for a similar pattern with the SQL code, just doing it by hand. The Update()
call here is done under database lock so no writers can interact with the database at all until the transaction completes. This is functionally a mutex for us.
It’s fast
Try it! Compare it to the SQL version. The bbolt version does much much less despite being a similar amount of code. It also creates smaller data structures that need less work to iterate. This is a better solution on performance overall.
It’s pretty damn easy
The code above is not that complicated, contains no real “gotchas”, and isn’t that hard to read (at least, I don’t think). The bbolt solution wins on the maintainability front, as well. If you need multiple queues in a single database, use multiple buckets. You can even relate non-queue k/v data to the queue in a separate bucket and handle that all under View()
or Update()
calls safely. You just need to understand how bbolt manages its keys and the relationship between keys and cursors.
Thanks for Reading
I hope you’ve enjoyed this content! Come back for more, eventually, when I get the will to write!