Docker is very popular these days and used almost everywhere in deployments. The idea is pretty straight forward: build, deploy, observe it running. This is very clear for live tasks, but what if you want to build a recurring task?

Note: If you are interested only in the technical solution to include system variables into your cron tasks skip to the last section.

I know that most people do not rely only on system tools. Most modern programming languages need to be deployed in some sort of middleware platform and most of them support recurring tasks (similar to what cron does under Linux). While this is OK in most situations, I would not see myself starting an Apache server (in any flavour) just to perform a DB backup or move some files from one environment to the other.

Cron

My preferred solution, for a long time, has been cron. It can be a bit confusing and clunky at first, but as long as you follow a set of very simple rules with Cron, you’ll probably be fine. If you don’t use Cron on the daily basis, read the manual before writing the rules. I won’t talk about the order of the parameters or the fact that they are tab separated.

Lesson 1: First thing to remember is the difference between a number or * in an execution pattern: the number tells you that the task is executed for that specific timestamp, where * is executed for every possible option:

  # run five minutes past every hour -> once a hour
  5 * * * * task

If you want to execute a task every 5 minutes, you can use a / notation as in:

  # run every five minutes -> twelve times per hour
  */5 * * * * task

Lesson 2: Use placeholders when you can. Instead of using this the execution rules, cron uses a set of rules for very commonly used operations (@reboot is my favourite option):

@reboot    :    Run once after reboot.
@yearly    :    Run once a year, ie.  "0 0 1 1 *".
@annually  :    Run once a year, ie.  "0 0 1 1 *".
@monthly   :    Run once a month, ie. "0 0 1 * *".
@weekly    :    Run once a week, ie.  "0 0 * * 0".
@daily     :    Run once a day, ie.   "0 0 * * *".
@hourly    :    Run once an hour, ie. "0 * * * *".

Note: @reboot is executed when the cron daemon reboots, not when the system reboots.

Lesson 3: Always (and I really mean it) redirect the console output of your task. By default the cron daemon sends you the task logs via e-mail (using a local mail daemon). While this may be useful to some people, it is difficult to track execution logs via e-mail. To make sure that everything is redirected to the logs (errors and std output) use the error redirect as well:

  task >> log.log 2>&1

Lesson 4: This is where the Docker problem comes in. Do not rely on system variables in your tasks and task setup. If you can, use full paths and no system variables.

Docker + cron

Adding cron to Docker is pretty straight forward: you setup your task in a cron task and start the docker daemon at the end of your Docker image.

This is the crontab-script script, which is added to the Docker image to setup a crontab.

@reboot root /code/cron-execution.sh >> /logs/cron.log 2>&1
31 5 * * * root /code/cron-execution.sh >> /logs/cron.log 2>&1

This is how the Docker build script looks like:

FROM ubuntu:16.04
# install cron
RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get install -y cron && \
    apt-get clean
# setup crontab
ADD scripts/crontab-script /etc/cron.d/task-cron
RUN chmod 0644 /etc/cron.d/task-cron

# copy your code
# ...

WORKDIR /code
CMD cron -f

Easy … right? If you don’t use system variables and follow the previous rules, you will be able to execute your cron tasks properly.

Adding system variables

By default the cron daemon does not allow read the environment variables. This is usually not a problem, if you configure your images statically. But if you use docker compose, with variables configured at runtime, this is becoming very annoying. This is an example of a docker compose with runtime variable binding, but using the previous version of the setup, these variables (e.g. DATA_SINK) are not visible to the cron-execution.sh script.

Sample docker-compose.yml:

version: '2'
services:
  sample-pipeline:
    environment:
      PIPELINE_VERSION: ${version}
      DATA_SINK: "some postgresql url"

My workaround was to replace the Docker entry point with another script which create an environment script containing all environment variables:

FROM ubuntu:16.04
# install cron
RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get install -y cron && \
    apt-get clean
# setup crontab
ADD scripts/crontab-script /etc/cron.d/task-cron
RUN chmod 0644 /etc/cron.d/task-cron

# copy your code
# ...

WORKDIR /code
# this is the custom entry point
CMD start.sh

start.sh:

#!/bin/bash

scriptPath=$(dirname "$(readlink -f "$0")")

printenv | sed 's/^\(.*\)$/export \1/g' > ${scriptPath}/.env.sh
chmod +x ${scriptPath}/.env.sh

cron -f

cron-execution.sh needs to import the .env script to have access to all the system variables:

#!/bin/bash

scriptPath=$(dirname "$(readlink -f "$0")")
source "${scriptPath}/.env.sh"

# the docker-compose variables should be available here
echo "DATA_SINK = ${DATA_SINK}"

I know this looks very complex, but it is a very flexible mechanism to manage dynamic docker containers.