Bill Shupp Software engineer, photographer, musician, space geek


Organizing PHP Batch Jobs

This week at work I got the chance to address the growing number of batch oriented CLI scripts for our main web application.  While they weren't quite unmanageable yet, they were heading in that direction.  There was too much common code, especially with bootstrapping the application and parsing options.  Also,  the location of scripts didn't really make sense... ./bin/bar.php./cron/foo.php, etc.  So I decided to carve out some time and clean it up.

The goals were pretty straight forward:

  • Everything must use the application's model layer.  This is mostly so that the built in caching will be consistent, but also to enforce that all data access goes through the same code.
  • Centralize all CLI option parsing, application bootsrapping, error handling, and multi-tenant logic (this is a multi-tenant SaaS application)
  • Keep the jobs themselves very simple.

With the above in mind, I ended up splitting things up into 3 parts:

  • ./bin/ecli.php, the only PHP CLI script which does minimal bootstrapping, collects the options/arguments, and runs EC_CLI
  • EC_CLI, a class that does the actual option parsing (via Zend_Console_Getopt), application bootstrapping, error handling, and multi-tenant runs (recursively calls ./bin/ecli.php for each tenant), and runs the actual job
  • EC_Job_Abstract, the Job interface that collects job specific arguments and runs the job.

Now, when I need to create a new batch job EC_Job_Foo, I just put the contents in EC_Job_Foo::run(), and execute it like so:

./bin/ecli.php --environment production --instance clienta,clientb --job Foo

The ecli.php script ends up looking like this:

Pretty easy, huh?  It makes testing all the components really easy as well.

Finally, we made the decision to use Jenkins instead of the traditional cron scheduler for recurring jobs.  This has a few advantages: It's got a simple web interface for managing the job, you plug it into your existing Jenkins notifications (email, jabber, etc), console tailing, job dependencies, etc... It works really well.

Filed under: Code, PHP Leave a comment
Comments (2) Trackbacks (3)
  1. Nice post Bill.

    Any issues with memory leaks by running everything through one process? I suppose it would be easy to add garbage collection to the ecli.

  2. We have not had any memory problems. But note that when running a job on multiple tenants of our application, ecli is called recursively, one time on each tenant. So each is its own PHP process. Further, you can pass additional arguments to your job itself, which could break its work into individual processes as well, Even run them in parallel if you want.

Leave a comment