Holistic Engineering

A random assortment of shit with sprinkles.

Integration Test Your Systems With Chef Workflow

| Comments

At the Opscode Summit in October, there was a lot of talk of testing. One thing I wanted to discuss at length was the concept of Integration Testing. A lot of energy and words have been spent on the testing of individual components, or Unit Testing which is great — these conversations needed to happen. Another discussion has been about Acceptance Testing and its relation to Integration Testing.

Altogether these were healthy discussions but I do feel that quite a bit of focus was spent on “omg! testing!” without actually expending focus on “what is testing accomplishing for us?” There’s a lot of words out there on what kind of testing is useful and what is not.

These things were briefly discussed, I don’t think it was adequately explored; after all, as Chef users we’re happy to have anything right now, and as tools mature and our expectations refine I think we’ll have a better idea of what fits better in the general case. That’s not a knock on any tool, anyone or any thing. I’d say as a group we’re a lot better off than other groups of a similar nature. I just think more exploration, especially with regards to and how we view testing, is important.

I started working on a project at the time which implemented a workflow (which will be the subject of another post), and integration testing system. You can find all the products here, but this article is largely about why integration testing is more important than we give it credit for.

I like to know why I’m picking my tools and what problems they solve. Therefore, I’m going to spend a little time explaining how I see these three testing methods and how they relate to operations, and then go over some current solutions and find the holes in them.

Unit Testing

The nice thing about Chef is that, due to its framework-y nature, we have our units spelled out for us already. Namely, the cookbook. Test Kitchen is great for this in the Open Source context — it’s designed from the start to run your tests on numerous platforms, which is exactly what you want when writing cookbooks you plan for others outside your organization to use. This is pretty much a solved problem thanks to that. For your internal organization, things like minitest-chef-handler go a long way towards helping you test cookbooks.

Acceptance Testing

Acceptance Testing is asking the question “does this work?” from an end-user perspective. I view this as the equivalent of your superiors, say the head of the company or maybe your division, asking that question. It’s an awfully important question to ask, which is why there are so many solutions for it already.

Nagios is an Acceptance Testing system, as is Sensu and Zabbix and other monitoring systems. They ask this question sometimes hundreds of times a minute. From an operations perspective, “does this work?” is functionally equivalent to “is it running?” — acceptance testing outside of that is probably best left to the people who developed your software you administer.

Integration Testing

So, Unit Testing is the cookbooks and Acceptance Testing covers the externals of what you maintain. What’s left?

Here’s a few things you might have done with Chef and watched blow up in your face:

  • Configured any kind of replication with chef, really any kind of replication at all
  • Made assumptions about how networked machines interoperate with each other in the wild
  • Made assumptions about behavior of machines that are working properly interact with machines that aren’t working properly

What do all these things have in common? They all are things we might automate with Chef, but they are things that are not necessarily external and they certainly aren’t functions of the unit. In the real world, nobody runs a single recipe on a machine.

The Real World

In the real world, we:

  • Run multiple recipes on a single server to configure it cohesively
  • Expect the functions of a server to play nicely with other servers on the network
  • Expect the network to be a composite of servers that work together to provide a set of services
  • Expect the services to work

All I’m really saying in this entire article, is that #2 and #3 above aren’t really accounted for.

Unit Testing isn’t always a solution

Unit Testing is great and solves some real problems, but unfortunately we spend too little time on determining what problems Unit Testing solves that Chef doesn’t solve already. Chef, by its very nature as a configuration enforcement framework, is really just a big old fat test suite for server installs.

Consider this example. While you may think it’s contrived, more than a few Unit Tests in the wild do things like this — in fact, I assert most of them do at some level.

Here’s an example of a file resource being created:

recipe.rb
1
2
3
4
5
file "/tmp/some_file" do
  content "woot"
  user "erikh"
  group "erikh"
end

And here’s our unit test:

test.rb
1
2
3
def test_file
  assert(File.exist?("/tmp/some_file"))
end

What happens if Chef can’t create /tmp/some_file? I think we all know — ironically enough, Chef aborts. The test suite itself never actually runs because Chef didn’t finish.

This is duplicating effort and I really, really, really, really hate duplicating effort. Computers are supposed to save time, not waste it. And if this were a shell script that ran and (set -e aside) then we ran this test suite, it might make sense. But we’re using Chef, which actually tests that it succeeded and explodes if it doesn’t.

It’s also why I think it’s great for open source cookbooks — it allows me, the end-user to assert that you, the open source author, at least made some attempt to define what your cookbook should do. When something changes in your cookbook, related to patches you get or changes you’ve made, you’re aware that you’ve broken the contract and I can be a responsible person and verify you have done so before deploying the latest and greatest and dealing with consequences.

For example: what happens if the contents of the file aren’t woot? Well, chef overwrites it with woot. Determining whether or not you intended that to change is probably a better use of your time.

However, is it good for internal systems? I’m not as sure it’s so useful. We’re not bound by responsibility to anyone but our stakeholders, and the fact is that most of our stakeholders in internal systems don’t really care about how the systems are set up, because that’s why they hired us in the first place. Changing things is what we do, and changing things in two places to satisfy some notion of “testability” when in fact, the only thing we’re accomplishing is to ensure Chef is doing what it says on the label is probably not a full solution.

To put it another way, in a system where change is free and constant, unit tests have less purpose with an already validating system like Chef to support it. Either your shit was set up properly, or it wasn’t, and you can actually do almost all of that in a Chef run without writing any additional unit tests. I am just suggesting the usefulness of it is diminished is all, not that it’s a bad idea.

Running everything on one box isn’t a solution

Other testing tools have you doing the equivalent of integration testing by converging tons of cookbooks on a single system and then running a test suite against the result. This just doesn’t reflect reality. Unless your entire network is a single server in a closet somewhere, your reality never, ever, ever, works like this.

I think it’s safe to say that the reason most people start using Chef in the first place is because they have more than one server to manage. There are tons of problems you can find just by testing network interoperation that aren’t even possible to test this way, or if possible, require severe convolution (like moving a standard service to a different port so you can use two of them on the same box) which doesn’t reflect reality either.

While this is a callous way to describe it, I’m going to call this “Bullshit Testing”, because in many ways you’re testing bullshit you’d never do in production, and bullshitting yourself with the results you get back from your test suite. Bullshit simultaneously does not solve the problem and largely exists to instill confidence, as a Princeton Professor once said. Good book, btw.

Integration Testing finds lots of things

And here are some concrete examples! These are all actual bugs I fixed while developing the testing suite, that I don’t think would have been possible to find with the systems that already exist. Our monitoring system would have found all of these though, long after they had become a problem.

Our BIND cookbook was misconfiguring the slaves’ ability to accept updates from the master, which accepts most of its updates from nsupdate and similar tooling. A test which checks both the master and the slave for the DNS record failed, exposing this issue.

Our collectd tooling was using a very chatty plugin that sent metrics to graphite, which resulted in the creation thousands of whisper databases over very short periods of time. The cookbooks that we were using for graphite throttled the whisper database creation to 50/minute. A test which verified that our collectd tooling was getting the metrics to graphite exposed the issue. After some debugging, and realizing the data was getting there, it was taking upwards of 15 minutes to get out of the cache and on to disk; unacceptable for an environment that uses autoscaling. Disabling the plugin (which wasn’t necessary for us) got everything working acceptably again.

The same collectd tooling came with a python graphite shipper — when the graphite server was unavailable in certain scenarios, the python plugin would spin endlessly and write its inability to connect to syslog. Our syslog tooling writes both to the server and remote centralized stores — which would have meant that while we were fixing a broken graphite server, there’s a high likelihood we would have ended up filling the disk on pretty much every machine on the network. A test that broke determining whether or not syslog was working exposed the issue, because the oncoming flood of messages left the receiver so far behind it didn’t write the message to disk in time for the test to check it.

I don’t know about you folks, but any time I can find something that’s going to break and find out before my monitoring system tells me is a win.

Did you like these ideas?

If you did, there’s great news! I’m working on a workflow and testing system which, while beta quality at the moment, attempts to meet your needs. If you don’t like the system, come help me make it better. If you don’t like how I’m doing it, or just don’t like me, but see the value of this kind of testing, please do something! We need more alternatives and approaches to this problem.

Anyhow, chef-workflow is here. It’s big, it’s vast, and it’s an advanced tool for people with advanced needs. I’ll be writing more articles about different aspects of the system as I get time, so watch this space.

Comments