Obviously cron jobs are abundantly useful for so many things, all the way from basic housekeeping up to big application functionality.
They’re also the source of plenty of flail. What do I mean?
- They are neither code nor data, so often get overlooked, or shonkily installed, by application deployment tools
- They run with a minimal environment that can catch out the unwary: scripts that work in interactive shell sometimes don’t from cron
- The default behaviour of mailing output to the cronjob owner generates large amounts of mail that gets ignored, filtered or bounced
- Jobs can fail silently and no-one notices until, say, you need to restore that backup that hasn’t run for last six months
- Jobs that helpfully append their output to a log commonly don’t rotate that log
- It’s easy to have jobs overlapping if they get stuck or take longer than expected to complete. This is a splendid way of wedging a machine.
The mail aspect is a particular peeve. In some jobs my mailbox has enjoyed several thousand cron generated mails a day, and there’s no way I’m able to accurately look at each one and react to it. Mostly they contain expected output from successful job execution, so they’re easy to skip. But I don’t trust my eyes to get that right all the time.
One approach to this is to arrange for jobs to only send mail on error. This is an improvement, but can lead into thinking that a job is happily succeeding when in fact it’s either not running or the only-on-error logic is bust. Since cron jobs often cover essential system tasks like backing up, syncing data around and reporting it’s vital that they don’t fail silently.
I’ve worked somewhere that tackled this by collating cron-generated mails from diverse systems into a system mailbox and pattern matching them for failure signs. This seems slightly dubious — it’s fragile and labour intensive — but at least the system also flagged if expected jobs failed to arrive and got our inboxes tamed.
To tackle these problems I find myself writing wrappers for cronjobs. I’ve written several variants to meet different situation’s needs. Unhelpfully I call them all cronwrap
. These wrappers sets out to
- Engage the amazingly useful
lockrun
utility to guard against multiple execution of stuck crons - Place cron output into timestamped logs that can be both aged out and made available to interested parties
- Hook into local monitoring systems:
- On execution, update a run counter (SNMP data or some simple text file)
- On failure, send a SNMP trap or leave some bait for Nagios. Also, update a fail counter
- If
lockrun
has prevented a job running owing to overlap, send a SNMP trap or similarly bait Nagios
- If required, send output by mail somewhere (sometimes this is necessary, even with the concerns listed above)
So, nothing surprising there. Using such wrappers helps keep cron jobs tamed and reliable, and it’s monitoring them near to where the action occurs, rather than mediating via SMTP.
This is hardly invention either, there’s plenty of prior art with different nuances in behaviour to meet the needs of different environments. Perhaps I’ll merge the variants of my efforts and publish too.
What’s curious is that this functionality isn’t available inside the cron daemon (( To be clear, I’m talking about the BSD cron written by Paul Vixie. None of the variants I’ve seen address these concerns either. I’d love to know if there’s any I’ve missed.)) itself. It is perfectly placed to catch exit status, divert output and know if a job has overrun; and would remove the need for all this additional monkeying to make jobs reliable and well behaved. If my C wasn’t just read-only I’d have a crack at it!
There, I’ve finally condensed all my cron rant into one sustained piece.
Update: I posted a cron wrapper at https://github.com/zomo/cronwrap.
6 Responses to “cron”
Leave a Reply
Recent articles
- Docker, SELinux, Consul, Registrator
(Wednesday, 04. 29. 2015 – No Comments) - ZFS performance on FreeBSD
(Tuesday, 09. 16. 2014 – No Comments) - Controlling Exim SMTP behaviour from Dovecot password data
(Wednesday, 09. 3. 2014 – No Comments) - Heartbleed OpenSSL vulnerability
(Tuesday, 04. 8. 2014 – No Comments)
Archives
- April 2015
- September 2014
- April 2014
- September 2013
- August 2013
- March 2013
- April 2012
- March 2012
- September 2011
- June 2011
- February 2011
- January 2011
- October 2010
- September 2010
- February 2010
- September 2009
- August 2009
- January 2009
- September 2008
- August 2008
- July 2008
- May 2008
- April 2008
- February 2008
- January 2008
- November 2007
- October 2007
- September 2007
- August 2007
- December 2006
- November 2006
- August 2006
- June 2006
- May 2006
- March 2006
- February 2006
- January 2006
- December 2005
- November 2005
- October 2005
February 25th, 2010 at 4:41 am
I realize that no one uses it outside of Mac OS X, but I like launchd for a lot of the reasons you don’t like cron.
February 25th, 2010 at 5:03 am
Weird, i was just on the same vibe earlier. I’m working up a blog post on a sort of cron dashboard. It won’t be anything close to production quality, but hopefully it will be usable.
Can I pick your brain for an example cron wrapper script?
February 25th, 2010 at 9:53 am
Sure, one variant has a diddy web interface that a generous onlooker might call a dashboard at a stretch! It was written on someone else’s dime though, so I’ll check if they’re cool for me to publish.
February 25th, 2010 at 9:23 pm
As a suggestion for a follow-up post, can I make a suggestion that you post some example cronwraps you’ve written? I would find those very useful and informative. I currently have problems with one job occasionally overrunning past its next invocation, so I’m going to give lockrun a go, but I’d also like to see how you handle logging and output direction.
February 26th, 2010 at 7:25 pm
Cron follows UNIXs philosophy for programs – do one thing and do it well. Cron simply scheduled jobs/scripts/programs. Whatever runs or doesn’t run is your problem. (.. and it does seem to be a problem to you.) I far prefer being able to do my own monitoring and managment of failed jobs than be constrained to one method built into cron.
Your problems all seem to fit into a scripting category. So maybe it’s that you should be looking at.
February 26th, 2010 at 10:54 pm
Absolutely. Using such wrappers are my monitoring and management solution to failing and misconstructed cron jobs. I don’t get to vet every cron that is installed on platforms I admin, but I do get to encourage the use of these wrappers to sidestep common pain points.