Backup strategy of this blog when it is running on docker.

So you can see that I had some backup issue recently and lost couple posts. Honestly, I still have not figured out what was the problem so I’m going to lay out my backup stack and if you can spot anything wrong, leave a comment below.

First, this blog is running on 2 dockers. One for WordPress. It just have the PHP code running, with some credentials as environment variables in the memory, so nothing need to be back up. The other docker runs the MariaDB (basically MySQL), which contains all the the data for the post, comments, users, etc. That is what I want to backup.

backup1

If you read my previous posts about docker, you know that it is mostly read only. So if you start a docker running a database on it, it will lose the data whenever you restart it. Unless you use data volume. So my MariaDB docker has a volume for the folders: /etc/mysql and /var/lib/mysql , where MariaDB will save the data. And they will stay the same when I stop/start the container.

backup2

However, I did not map the data volume to a physical folder on the host. I’m not 100% where the data volume actually is but I think if I destroy the container, the data volume will be gone. I did not map it to a physical folder because the machine is just a VM on Microsoft Azure. I don’t think there is much value mapping it and save it there since the machine can be gone easily.

What I did was to backup the data volume to Amazon S3. I originally used this dockup project and its docker image. But it has some problems with restore. So I used a fork of it here. The fork did not have a docker image, so I built it here. I created 2 services on tutum.co and tell them to mount the data volume from my MariaDB. With one click of a button, I can easily backup/restore my MariaDB data volume as a zip file onto/from Amazon S3.

backup3 backup4

But I did not want to do this manually every time I post something. (Or maybe I should have) So I set up a cron job like thing to back it up every day. I used https://github.com/sunshineo/tutum-schedule which is my fork from https://github.com/alexdebrie/tutum-schedule . What I changed was make the project run a non stop python process directly instead of supervisord. I discussed this with alexdebrie in the tutum slack channel and we all agreed that this is a good idea.

So there you are, something worked pretty well when I tested end of May. I did not post must the month of June but the backup did happen everyday. However, the backed up file never changed even after I made some post. This is a mystery to me.