Easy Zero-downtime Docker Compose deployment

For a long time, Hyvor Talk ran on VPS servers directly. All the dependencies were installed directly (PHP, PHP extensions, Node, etc) on the server. However, this became quite cumbersome. Upgrading was hard. The application was fragile due to inconsistencies between the environments it runs on. There are solutions like Ansible to automate that, but we decided to use Docker to containerize the application. I’ll write a different tutorial on how I containerized our Laravel application. Here, I will focus on a specific problem: zero-downtime docker-compose deployments.

After dockerizing the application, I was looking for solutions to deploy this. The easiest one was docker-compose, which comes with Docker. It ran our application without any issues… until we needed to deploy a new version.

The major problem with docker-compose is that when we update a container, the first one is stopped and then the second one is started. This causes a downtime - in our case, a 10-20 seconds downtime between deployments.

Maybe it’s time to use a container orchestration tool like Docker Swarm or Kubernetes? I kept them as the last resort and started finding a way to make it work with just docker-compose. I also didn’t want to write a 100-line shell script to make this work.

Google and ChatGPT gave very complex solutions, and most of them did not work. I found a better solution at Hackernews - specifically, hn.algolia.com.

It’s docker-rollout, a docker plugin for zero downtime docker-compose deployments.

It’s pretty easy to install on your server (check the package README for up-to-date instructions).

1# Create directory for Docker cli plugins
2mkdir -p ~/.docker/cli-plugins
4# Download docker-rollout script to Docker cli plugins directory
5curl https://raw.githubusercontent.com/wowu/docker-rollout/master/docker-rollout -o ~/.docker/cli-plugins/docker-rollout
7# Make the script executable
8chmod +x ~/.docker/cli-plugins/docker-rollout

Now, you can run docker rollout <service> command to start a new version of the service without downtime.

To make this work, you need a proxy server like NGINX or Traefik.

I could not make NGINX proxy work the first time, so I tried Traefik, having the urge to learn a new tool.

Before docker-rollout, the application was exposed on the port 80. So, all HTTP traffic directly went to the applciation contianer. Now, we are binding port 80 to Traefik service. It will determine which container to use (docker-rollout takes care of it)

Setting up Traefik

Create a new docker-compose.traefik.yaml

1version: "3.7"
4 traefik:
5 image: traefik:v2.9
6 container_name: traefik
7 command:
8 - "--api.insecure=true"
9 - "--providers.docker=true"
10 - "--providers.docker.exposedbydefault=false"
11 - "--entrypoints.web.address=:80"
12 ports:
13 - "80:80"
14 # - "8080:8080"
15 volumes:
16 - "/var/run/docker.sock:/var/run/docker.sock:ro"

I borrowed the docker-compose.traefik.yaml code from this example in the docker-rollout repository.

  • --api.insecure=true - I have no idea why this is needed

  • --providers.docker=true - Sets the provider to Docker. You can learn more about the Docker provider here.

  • --providers.docker.exposedbydefault=false - Services are only exposed if they have the label traefik.enable=true (which we are going to add to our application service)

  • --entrypoints.web.address=:80 - Sets the web address

For testing, you may expose port 8080 as well, which serves the Traefik dashboard.

Then, run start the Traefik service.

1docker-compose -f docker-compose.traefik.yaml up -d

Updating your service

Then, update your service docker-compose file:

1version: "3.9"
4 myservice:
5 image: mycompany/myservice:latest
6 labels:
7 - "traefik.enable=true"
8 - "traefik.http.routers.<service>.entrypoints=web"
9 - "traefik.http.routers.<service>.rule=PathPrefix(`/`)"
10 deploy:
11 update_config:
12 order: start-first
13 failure_action: rollback
14 delay: 5s
15 healthcheck:
16 test: 'curl -f http://localhost || exit 1'
  • Replace <service> with your service name.

  • healthcheck.test is super important. That’s how Docker knows if your service is online and healthy.

  • traefik.enable=true - Enables Traefik for this service. This is needed because we added exposedbydefault=false to Traefik config

  • I spent more than an hour finding the best traefik.http.routers.<service>.rule value. By default, the Docker provider in Traefik sets the default rule to a Host(). So, Traefik only routes requests to that service if the Host header matches that value. In my case, I wanted to route all traffic. I tried Host(*), HostRegexp({domain: .*}) - nothing worked. I couldn’t find a way to disable the default rule as well. Finally, PathPrefix(`/`) did the job to match all. Depending on your use case, you will have to choose the best option here.

Zero downtime deployments

Finally, use docker rollout

1docker rollout myservice

What happens here is that instead of stopping and starting a new container, docker rollout scales the container to 2, and waits until the new service is healthy. Then, it updates Traefik to route all traffic to the new service. Finally, the old container will be stopped.

Feel free to leave a comment below đź‘‹