Setting Up Automatic Daily Database Backup on Kubernetes
Setting Up Automatic Daily Database Backup on Kubernetes (Updated: 2024-03-03)
Kubernetes has been supporting StatefulSets for a while now, so it's a no brainer to deploy a database app in a Kubernetes cluster. Statefulset allows your pods to maintain sticky, persistent IDs, as well as persistent storage, making it ideal for database apps. Still, even with persistent storage, you should always configure automatic offsite backup when deploying a database app on a Kubernetes cluster. This would ensure your data survives any disaster that might befall your new shiny cluster.
Automatic offsite backups can be a hassle, but it doesn't have to be. In this post, I'm going to setup an automatic daily backup on a MySQL pod into a Backblaze B2 bucket.
Step 1: Automatically Dump All Databases into a Persistent Volume
I want to dump all databases inside a MySQL pod into a persistent volume. Each database will be dumped into a separate file (so I can load them individually as needed later) on a persistent volume which will be uploaded into a Backblaze B2 bucket later. I'll use Kubernetes' new CronJob workload to schedule a daily database dump job.
It's quite annoying to set this up over and over again, so I packed a small MySQL backup script into a container (arifwn/container-backup:mysql
) suitable for use inside a Kubernetes' CronJob workload. This docker image can be used to backup a specific database or all databases. The source code is availabe on my github repo here: https://github.com/arifwn/container-backup.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
This will create a new CronJob
workload entry on the Kubernetes cluster that run every 2:00 AM everyday. I also mounted a persistent volume (created beforehand) that will be used to store the resulting .sql
dumps. We'll mount this volume in another CronJob
entry to upload the content to Backblaze B2 later.
Using PostgreSQL? No problem!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
Step 2: Upload to a Backblaze B2 Bucket
Next is uploading the .sql
files generated from the last step into a Backblaze B2 bucket. Backblaze B2 is a (very) cheap cloud storage service ($0.006/GB/Mo) from Backblaze with free ingress bandwidth cost, perfect for storing large backup data offsite. Egress is also cheap (free up to 3x monthly storage, then $0.01/GB) and very fast, unlike some other cheap cloud backup solution (e.g. AWS Glacier).
I also packed my trusty B2 backup script into a docker image (arifwn/container-backup:b2
). This image will compress each top-level folder in the mounted volume into a .tar.gz
archive and upload it into the specified B2 bucket. The image will also perform daily cleanup on the target bucket so only the last 7 daily backups, the last 8 weekly backups, and the last 12 monthly backups retained. The source code is availabe on my github repo here: https://github.com/arifwn/container-backup.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
This will create a new CronJob
workload entry on the Kubernetes cluster that run every 2:30 AM everyday (plenty of time for the database dump cron to complete). The cron will compress all data found inside the mounted volume (each top-level folder will be compressed into a separate .tar.gz
archive), upload them into a B2 bucket, and remove any old backup archives except the last 7 daily backups, the last 8 weekly backups, and the last 12 monthly backups.
And that's it! The database is now backed up automatically every night to a B2 bucket. It only take minutes to setup and you can sleep soundly at night knowing your data is safe from random cluster failure. Don't forget to setup CronJob
failure alert so you'll get notified when uploads failed (the B2 backup image will return with non-zero exit code on failure).
Step 3 (Optional): Upload to an Offsite Server with RSync
You can't have enough backup these days. While having copies of your data on B2 is great for your peace of mind, having yet another copy safely stored in another backup provider is great for redundancy. RSync.net is a great backup provider and supports automatic daily snapshots so you can go back in time to retrieve previous version of your file. All you need to do is generating a new ssh key and upload them to your rsync.net server, then use those keys on the config below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
Note that you don't have to use RSync.net, any host with ssh access will do.