We've been using Twitter's kestrel queue server for a while now at work, but only from our service layer, which is written in python. Now that we have some queueing needs from our application layer, written in PHP, I spent a few days this week adding queue support to our web application. I thought I'd share what I learned, and how I implemented it.
The kestrel server itself was pretty straightforward to get up and running. The only thing I would point out is that I recommend sticking to release branches, as master was fairly unstable when I tried to use it. Regarding implementing the client, there were a few goals I had in mind when I started:
- Since kestrel is built on the memcache protocol, try and leverage an existing memcache client rather than build one from scratch
- Utilize our existing batch job infrastructure, which I covered previously here, and make sure our multi-tenant needs are met
- Keep the queue interface generic in case we change queue servers later
- Utilize existing kestrel management tools, only build out the the functionality we need
With these goals in mind, I ended up with 4 components: a kestrel client, a producer, a consumer, and a very small CLI harness for running the consumer. But before I even coded anything, I set up kestrel web, a web UI for kestrel written by my co-worker Matt Erkkila. Kestrel web allows you to view statistics on kestrel, manage queues, as well as sort and filter queues based on manual inputs. Having this tool up and running from the get go made it easy to watch jobs get added and consumed from my test queue, and also easily flush out the queues as needed.
The Kestrel Client
I couldn't find any existing kestrel clients for PHP, so I started looking at the two memcache extensions: the older memcache, and Andrei Zmievski's memcached, the latter of which is based on the libmemcached library. I started with memcache, and while it worked fine initially, I quickly found that I could not modify timeouts. This interfered with the way kestrel recommends you poll it for new jobs, and I would see timeout errors from the memcache extension if you tried to set the poll timeout to 1 second or higher (the memcache default). The memcached extension does not have these issues, so I went with it.
The first gotcha I ran into was serialization. You can use memcached's serializer for writing to kestrel, but when it reads the data back, it doesn't recognize that it is serialized. So I just serialize the data manually in my client, and things work fine. One other thing to note is that you'll want to disable compression, or do it manually, as the memcached extension will automatically compress anything over 100 bytes by default, and will not decompress it when reading from kestrel.
The other issue is that if you want to use any custom kestrel commands, you can't. Since the application layer doesn't need anything fancy, the memcached extension will work fine for it. Once we need support for the upcoming monitor (batching) in kestrel 2, we may need to implement a kestrel client from scratch. Kestrel web supplies everything else we need right now.
Once the decision was made to use memcached, I wrote a light decorator for it, EC_KestrelClient. This handles instantiation of the memcached client, serialization, and helpers for some kestrel specific options to the GET command. It also has support for passing memcached specific options through it. The class ended up looking like this:
The producer is very simple. It just formats the data into a standard structure, including current tenant information, namespaces the queue so it doesn't collide with other projects, and adds it to the queue. The producer looks like this:
The consumer has a bit more to it, though still pretty straightforward. It's intended to be run from a monitoring tool like daemontools or supervisord, so there is a very small CLI harness that just passes the CLI arguments into EC_Consumer and runs it. After parsing the CLI arguments, EC_Consumer polls kestrel for new jobs, and runs them through our standard batch job infrastructure. Until we have more confidence in PHP's long running process ability, I added an optional maxium jobs argument, which will stop the consumer from processing more than X jobs and then terminate. The monitoring service (supervisord) will then just restart it in a matter of seconds. I also added an optional debug argument for testing, so you can see every action as it happens. The CLI harness looks like this:
And the main consumer class, EC_Consumer, looks something like this:
Putting it together
Now that all the pieces are put together, let's take a look at in action. Adding example job "HelloWorld" to the queue "hello_world" from within our application looks something like this:
And finally, here's an example of running the consumer from the CLI harness, along with some example debug output of processing the job:
That's it! I'd be interested to hear how other folks are interfacing with kestrel from PHP.