How to Monitor Cron Jobs (Without Learning the Hard Way)
Every team has a story: the backup job “ran” for six months without backing anything up, the report cron died when disk filled, or a deploy removed the crontab line. Cron does not call you when it breaks. This guide walks through practical ways to monitor cron jobs and why heartbeat pings became the default pattern for serious teams.
Why cron fails silently
Traditional cron emails stdout/stderr to MAILTO — if anyone still reads that mailbox. Many images disable mail, jobs redirect to /dev/null, or failures happen before your script prints anything. systemd timers and cloud schedulers have the same class of problem: the platform thinks it launched the job successfully even when business logic failed.
So the first lesson: launch success ≠ job success. Monitoring has to observe outcomes, not just process start.
Approach 1: Log files and grep
Write structured logs and scan them with an agent or a log platform. This works when you already run centralized logging. Downsides: you must define what “healthy” looks like in log lines, handle clock skew, and pay for retention. It is powerful but heavy for a single nightly script.
Approach 2: Exit codes and mail
Remove redirects so cron emails failures. Fine for one server and a human who reads mail. Breaks when mail is misconfigured, spam-filtered, or when multiple jobs interleave in one inbox.
Approach 3: Heartbeat / dead man's switch
At the end of a successful run, your job calls an HTTPS URL. An external service records the ping and compares it to the schedule. If the ping is missing past a grace period, you get paged. This is model-neutral: bash, Python, Kubernetes CronJob, Windows Task Scheduler — anything with outbound HTTPS works.
CronCraft is one implementation of that pattern; the conceptual write-up is in dead man's switch pattern explained.
Choosing thresholds
Match expected frequency to reality. A job that “should finish daily by 4am” needs a window that tolerates slow runs — that is what grace periods are for. Too tight and you get alert fatigue; too loose and real outages hide inside the buffer.
Explicit failure signals
Sometimes your script knows it failed before exit — catch blocks, partial writes, validation errors. Hitting a /fail endpoint (CronCraft supports this) records failure immediately instead of waiting for silence. Pair with preventing silent cron failures for narrative.
Dependencies between jobs
When job B must not run unless job A succeeded, ad-hoc file locks get brittle. CronCraft models chains so downstream work does not pretend it is healthy when upstream stopped pinging — see cron job dependency management.
Operational checklist
- One ping URL per logical job, stored next to the script in your runbook.
- Schedules in CronCraft match crontab or platform scheduler.
- Test fire a run after deploy; confirm the dashboard moves.
- On-call rotation receives alerts through a channel people actually read (email free tier; Slack on Pro).
Security and token handling
Treat ping URLs like passwords. Anyone who can guess or intercept the token could spam fake successes or failures. Store tokens in your secret manager, pass them via environment variables in production, and rotate after incidents. CronCraft’s dashboard lets you regenerate URLs if a token leaks.
CI and one-off jobs
Scheduled monitoring also applies to pipelines that “should run daily” but are implemented as CI workflows. If the workflow does not execute, no heartbeat fires — same signal as cron silence. You can curl from the final step of a GitHub Action, GitLab job, or Jenkins stage.
Multi-environment hygiene
Give production, staging, and dev separate CronCraft jobs. Mixing schedules across environments creates alert noise (“staging did not ping production’s window”). Name jobs so on-call knows which fleet broke.
When not to use heartbeats alone
Heartbeats prove liveness on a schedule, not correctness of output. If the job runs but writes wrong data, you still need assertions, reconciliation checks, or downstream validation. Combine external pings with internal invariants.
Where CronCraft fits
CronCraft focuses on heartbeat monitoring, history, and alerts — hosted or self-hosted. It is not a log aggregator or APM. If you want the smallest change to an existing bash job, add one curl line and register the schedule; see bash examples.
Free tier: 10 jobs, email alerts.
Start free