Update 2024-04-28:
The resulting script from this blog post can be found on my git page: rsync_encrypted_backup
Backups have been somewhat of a pain for me for quite a while, as I could never find a suitable, easy to manage and easy to recover option for my private computer.
My goal was to create a simple off-site backup routine (i.e. “the cloud”), which would be easy to recover from, suitably fast - ideally with atomic/delta updates - and reasonably secure, i.e. strong default encryption like AES256-level.
I tried several options like 7z
with encryption and low (or no) compression
rate, sending a whole ZIP archive to a remote storage or even updating existing
archives. However, this of course turned out to be rather cumbersome, prone to
write-errors / connection issues and extremely slow.
The next approach did work reasonably well, and is what I want to present here. I am sure there is still room for improvement, so if you have any suggestions, feel free to send me a DM in the Fediverse or an e-mail.
The well-known rsync
tool is a natural candidate for atomic backups in the
Linux-world. It can sync directories with all sort of remote end-points,
including (S)FTP, WebDAV, etc. It keeps ACLs, modes and ownership of files
intact and is relatively fast, light on system resources and can do syncing both
ways (i.e. it may also be used to restore your files). However, rsync
does not
support encryption while syncing your files.
So in order to encrypt backups within rsync
, they have to be encrypted
before transmission, ideally in real-time and without impacting read-speed
all that much.
File Encryption
A close to perfect solution for this task is gocryptfs
, the spiritual
successor of encryptfs
. It is an encrypted overlay-file-system, that
(crucially) supports “reverse-mode”, is extremely fast and utilizes strong
encryption methods.
What this means exactly in the context of backups is, that we can mount the directory we want to back up (e.g. our home-directory) in an encrypted, real-time updated form, and sync the encrypted versions of all files rather than the original unencrypted versions. The aforementioned “reverse-mode” is useful, because it mounts a pre-existing, unencrypted directory as encrypted volume.
So first, let’s start with creating a setup for the encrypted file-system. This has to be done only once and creates the metadata and encryption heads for the volume. Once this is done, we only need to mount the encrypted volume in the future:
This process will ask you to set an encryption passphrase as well as provide you with a master restore key. BACK THIS KEY UP SOMEWHERE SAFE AND IN SEVERAL PLACES, BOTH DIGITALLY AS WELL AS PHYSICALLY!
The metadata file will be stored in your unencrypted directory as
.gocryptfs.reverse.conf
and in the encrypted storage as gocryptfs.conf
(unencrypted). Make sure to store this somewhere secure too, as it is required
to decrypt the storage in case you need to restore your backups.
From now on, we may mount the directory in its encrypted from:
|
|
In your encrypted directory, you will now find your entire home-directory in
encrypted form. The reason we used --plaintextnames
before was, that it makes
the recovery process a lot easier, if you can actually identify the files and
folders from their names (ofc. their contents are encrypted). If you do not need
this feature, because you’d recover the entire directory, rather than only
partials of it, you may consider removing that parameter when creating the
volume.
The --ro
parameter sets read-only permissions for the encrypted mount, meaning
that you can not write new files to the encrypted volume. Importantly, writing
to the unencrypted directory is still possible. Doing so will also update the
encrypted directory in real-time. The parameter may protect our directory from
technical or user mistakes, however, i.e. if we by accident use the reverse
order of target and source in rsync
…
If we want to recover a backup later-on, of course we do need to write permissions in the encrypted volume. This is mentioned later in this blog post again.
File Transmission
Next, we can finally back up our encrypted directory via rsync
. Let’s first
talk about the parameters that might be useful for backups. Personally, I want
to exclude several directories in the backup, like the “Downloads”, “.cache” and
similar folders. rsync
can even use wild-cards here, so you can exclude every
.git
folder or specific file-types (if they have the appropriate file-ending).
This is of course not possible (or a lot harder…) if you skipped the
--plaintextnames
when creating the encrypted volume, as all file- and
directory names are obfuscated without it.
We want to keep file-permissions and file-owners, so the --archive
parameter
is handy here. Since we want to see what is happening during the procedure, the
--verbose
and --progress
parameters are useful as well. Additionally, files
that we have deleted from our system should also disappear from the backup, next
time we sync them up. Ideally, this should happen after new files are
transferred via the --delete-delay
.
Because I back up multiple devices to the same network storage, it is a good
idea to name the target folder after the hostname of the device. Furthermore,
even though I do atomic backups, I want to keep several versions of my
backup-files. So I back up to different folders on the remote storage solutions,
based on time. To be more exact, I append the current month to the target
directory’s name, such that I always have the past 12 versions of my backups
(considering that I run backups once every month.): ${HOSTNAME}_$(date +%m)/
.
However, if you want to keep fewer past versions, there is a little trick via
the modulo of the current month. Say,
you want to keep only the past 3 versions, you can do $(($(date +%m) % 3))
,
which will divide the number of the current month (i.e. 5 for may) by 3 and give
you the remainder of 2. So over the course of a year, this calculation would
give you 1 in January, 2 in February, 0 in March, 1 in April, 2 in May, 0 in
June and so on. This in turn means that you’ll always keep the past 3 months as
different versions of your backup. Adjust this value to your needs and the size
of your remote storage.
The whole transfer procedure looks like this:
|
|
Final touches
For an easy, semi-automated backup routine, a few additional ease of life improvements come in handy, such as mounting the reverse filesystem before backup and unmounting them afterward.
Additionally, I like to send desktop notifications whenever I am using the
script in a desktop environment. In order to detect this, I use the $DISPLAY
environment variable for X11 desktops and the $WAYLAND_DISPLAY
variable for
Wayland environments. I typically use gdbus
to send notifications, wrapped in
a shell-function:
|
|
All in all the final script looks like this:
|
|
Restoring a backup
Restoring the backup is relatively easy as well. For simplicity, I’ll assume
that the entire backup should be restored. Take a look at rsync
’s options if
that is not what you want. Of course, you can also recover only specific
directories or files.
First off we need the .gocryptfs.reverse.conf
that the encryption tool created
when we initialized the file system for the first time. That file contains meta
information about the encrypted storage, but crucially not the decryption
password. When mounting the file system, it has been put unencrypted in plain
text into the encrypted storage as gocryptfs.conf
and transferred to the
remote storage.
In case you lost your entire local file system and want to restore it from the backup, we first need to fetch this configuration file:
Once this is done, we can mount the encrypted file storage again, however this time with writing permissions, so we can restore the files from the remote storage into the encrypted file system:
|
|
Now we can finally start to transfer the files. They will simultaneously show up as decrypted files in the home directory: