Space-saving backups with sparse files or squashfs

By hambier On November 4th, 2015 In Linux

Recently I had some serious trouble with my Windows-based Laptop (yeah, I know...). That’s when I decided it’s time to take a snapshot of the whole SSD. After all I don’t want to reinstall everything from scratch if (or should I say when?) my Windows installation decides to self-destruct.

After some head-scratching I made a list of my personal requirements for a backup strategy:

Disk image-based. I want to backup my SSD’s complete contents, including partition table, MBR (if applicable), recovery partition, etc.
Only standard GNU/Linux tools (i.e. no Clonezilla). I want to be able to write a random (live) Linux-distribution to a USB-stick, boot the laptop and restore the disk image without bothering with special tools.
Possibility to “loopmount”. I want to be able to loopmount the backup image in order to recover single files.
“Small” backups. The notebook’s SSD has a capacity of 256GB, but it has quite some free space left. I don’t want a full-size 256GB dd-image. The backup should either be compressed or make use of sparse files.
NFS and sshfs compatibility. The backup will be saved on my local fileserver. I’m not going to meddle with USB HDDs (which I don’t have).

There are actually (at least) two different methods that fulfill my requirements perfectly.

[Warning: dd can be as dangerous as it is powerful! A single typo and you may lose all your data. The methods exposed hereafter are for the seasoned Linux user.]

1. Sparse files

Files are “sparse” if unused sections (large zero-filled blocks) are not actually stored on the disk, but skipped over. This requires support by the involved filesystems, but some testing revealed that it’s working just fine on my local QNAP fileserver, which uses NFS over Ext4 (both of which support sparse files). It also requires support by the tool used to dump the disk image, in my case dd (from GNU coreutils). I had no problem using its sparse conversion option in Ubuntu 14.04 (booted from a USB key).

Here’s the whole procedure:

Backing up

Make sure unused disk space is actually zero-filled. For HDDs this can be achieved by writing a file with all zeros until it’s full and delete it immediately. (On a Linux device you may use “dd if=/dev/zero of=zerofile bs=10M; rm zerofile”; I don’t know about Windows.) In case of an SSD, it should be sufficient to have TRIM enabled. (For recent versions of Windows, TRIM should be activated by default.)
Boot from a USB drive with a “live” Linux system. The drive to be backed up must not be mounted. (With the possible exceptions of LVM or btrfs snapshots.)
Mount the network storage (e.g. mount -t nfs server:/backupshare /mnt/nfsbackup)
Check if the mounted filesystem actually supports sparse files:
```
truncate -s 1G /mnt/nfsbackup/1Gtestfile
du -h --apparent-size /mnt/nfsbackup/1Gtestfile
du -h /mnt/nfsbackup/1Gtestfile
```
If all is well, these commands illustrate the usefulness of sparse files: the test file has an (apparent) size of 1GB, but uses 0 (!) bytes of actual storage space. You may now remove the test file.
Backing up (still on the live Linux system) and checking the backup size:
```
dd if=/dev/sdx of=/mnt/nfsbackup/notebook-sdx-sparse.img iflag=direct oflag=direct bs=64K conv=sparse
du -h --apparent-size /mnt/nfsbackup/notebook-sdx-sparse.img
du -h /mnt/nfsbackup/notebook-sdx-sparse.img
```
The dd command will take a long time, obviously. Replace “sdx” by the actual device name.
The value ‘64K’ specifies the blocksize used by dd: larger blocksizes lead to higher read/write speeds, but reduce the usefulness of its sparse-writing code (which operates on blocks of the specified size). As for the precise value of 64K, I trusted the information I found here.

The resulting image file (containing multiple partitions!) can now be mounted from any Linux computer! Here’s how:

Mounting and inspecting

This can be done using any Linux device: a Linux installation on the machine that was backed up, a Linux setup in a virtual machine, a Linux system running from a live USB key, etc.

(Optional) List the partitions inside the raw image using kpartx:

sudo kpartx -l /mnt/nfsbackup/notebook-sdx-sparse.img

Here’s some sample output:

loop1p1 : 0 2048000 /dev/loop1 2048
loop1p2 : 0 532480 /dev/loop1 2050048
loop1p3 : 0 262144 /dev/loop1 2582528
loop1p4 : 0 470673408 /dev/loop1 2844672
loop1p5 : 0 26599424 /dev/loop1 473518080
loop deleted : /dev/loop1

Actually create the loop devices in /dev/mapper (one per partition):

$ sudo kpartx -a -v /mnt/nfsbackup/notebook-sdx-sparse.img
add map loop1p1 (252:1): 0 2048000 linear /dev/loop1 2048
add map loop1p2 (252:2): 0 532480 linear /dev/loop1 2050048
add map loop1p3 (252:3): 0 262144 linear /dev/loop1 2582528
add map loop1p4 (252:4): 0 470673408 linear /dev/loop1 2844672
add map loop1p5 (252:5): 0 26599424 linear /dev/loop1 473518080

Mount the partition you’re interested in:
```
mount /dev/mapper/loop1p2 /mnt/tmp
```

When you’re done: unmount and remove the mappings:

sudo umount /mnt/tmp
sudo kpartx -d -v /mnt/nfsbackup/notebook-sdx-sparse.img

Restoring

You have 3 options:

Restore single files by mounting the image file’s contents. (See above.)

Restore the whole disk image (using a live Linux distribution):

dd if=/mnt/nfsbackup/notebook-sdx-sparse.img of=/dev/sdx bs=10M

conv=sparse

fstrim

secure erase the whole SSD

conv=sparse bs=64K

Restore a partition or two: create the mappings using kpartx (see above), then restore only the partition(s) you want. (For this to work properly, the partition table must be the same as before the backup, or at least have partitions at the correct offsets. Be careful; I haven’t actually tried this particular command.)
```
dd if=/dev/mapper/loop1p3 of=/dev/sdx9 bs=10M
```

2. SquashFS

A nice alternative to the method above is to store a dd-image inside a squashfs container. SquashFS is a read-only compressed filesystem that performs quite well. The obvious advantage is more space saving than by simply skipping over 0-filled blocks. The downside is a greater complexity. But still, it retains all the properties I’m looking for! Normally squashfs images are created based on an existing folder hierarchy. In other words, I would first have to create a regular dd-image (hundreds of GB) and then squash it and write it to a second huge file. Fortunately it’s also possible to create squashfs images without writing all of the uncompressed contents to disk first. This can be achieved through a judicious use of “pseudo files”.

Backing up

Follow steps 1-3 of the sparse file instructions (i.e. zero-fill, USB boot & NFS mount)
Install squashfs-tools: sudo apt-get install squashfs-tools
Create a squashfs filesystem containing a single regular file (‘f’): sdx_backup.img with permissions ‘444’ and owned by root (gid: root). This file will contain the output of the dd command, i.e. a backup similar to the one from the first method, except for the sparse part.
```
mkdir empty-dir
mksquashfs empty-dir /mnt/nfsbackup/notebook-sdx-squashfs.img -p 'sdx_backup.img f 444 root root dd if=/dev/sdx bs=10M'
```
Wait...

Mounting and inspecting

Mount the squashfs image file to a mountpoint. (As always, the mountpoint must exist...)
```
sudo mount -t squashfs /mnt/nfsbackup/notebook-sdx-squashfs.img /mnt/squashfs
```

Actually create the loop devices in /dev/mapper (one per partition):

$ sudo kpartx -a -v /mnt/squashfs/sdx_backup.img
add map loop1p1 (252:1): 0 2048000 linear /dev/loop1 2048
add map loop1p2 (252:2): 0 532480 linear /dev/loop1 2050048
add map loop1p3 (252:3): 0 262144 linear /dev/loop1 2582528
add map loop1p4 (252:4): 0 470673408 linear /dev/loop1 2844672
add map loop1p5 (252:5): 0 26599424 linear /dev/loop1 473518080

Mount the partition you’re interested in:
```
mount /dev/mapper/loop1p2 /mnt/tmp
```

When you’re done: unmount and remove the mappings:

sudo umount /mnt/tmp
sudo kpartx -d -v /mnt/squashfs/sdx_backup.img
sudo umount /mnt/nfsbackup/notebook-sdx-squashfs.img

Restoring

Use a combination of the methods exposed above (mount the squashfs filesystem, then restore with commands analogous to those used to restore the sparse image files.)

3. Encryption?

It should also be possible to encrypt the backups on-the-fly. Two ideas come to my mind, but I haven’t yet tried to put them in practice:

Pipe the dd-output through gpg (like this but using dd instead of tar; no need for conv=sparse). Unfortunately this would create a backup which is not directly (loop-)mountable.
Estimate the maximum size of the squashfs image, e.g. 70% of the used non-zero disk space and create a loopback-mounted LUKS-encrypted image-file in which the on-the-fly-created squashfs image will be stored. But that’s a dd-image as a file inside a squashfs image inside a LUKS-encrypted image file...

Image source: „Backup Backup Backup - And Test Restores“ by John (CC BY 2.0)

Tags : backup, dd, sparsefile, squashfs, compression, kpartx,