Reliable Storage of Critical Data at Home with ZFS on FreeBSD

In this article we will use ZFS, Zettabyte Filing System, and FreeBSD to create a reliable storage of your data, on a network attached storage (NAS) server. For managing ZFS refer to our post here. For a guide how to build your datacenter in the attic click here.
The major steps in our reliable storage server project are:

[ ] Select hardware and build server
[ ] Install and configure FreeBSD
[ ] Understand and configure ZFS
[ ] Create ZFS pools and datasets
[ ] Manage and maintain FreeBSD
[ ] Manage and maintain ZFS

IMPORTANCE OF A DATA STORAGE STRATEGY

We live in an increasingly digital world and printed photographs, paper statements, pay-stubs and tax forms are becoming something of the past. We expect this information can be trusted to digital storage and expect it to be there, available on servers or in the cloud, whenever we need to access it. But reality can be really quite different.

Who has not accidentally lost data during a data moving operation, or due to a dropped hard drive? It is getting harder to access those older statements from banks, databases get hacked, and computers fall victim to “ransom-ware” these days. And we commonly just start thinking of renewing our commitment to making regular backups only after suffering data loss.

We need reliable and automatic storage for our critical data, and automated backup. Let’s start with the first part, creating a reliable storage system. We will have to deal with backups later.

ZFS AND FREEBSD

The two key software design choices for our reliable storage system are the operating system, and the filing system. Both need to be open, and expected to be available for decades into the future. They also need to be mature, secure, and easy to maintain. FreeBSD and ZFS meet those requirements, and ZFS comes standard as part of FreeBSD.

ZFS is a modern filing system that is, most importantly, designed with data integrity as one of the primary design goals. Further, it combines file system and volume management, and integrates de-duplication, compression, encryption, and migration services, reducing the number of loosely coupled parts. ZFS was originally designed by Sun Microsystems for the Solaris OS. In 2005 it was open-source licensed under the CDDL license as an OpenSolaris project, and in 2013 made freely available as OpenZFS on multiple platforms including FreeBSD and Linux. Currently the ZFS trademark is owned by Oracle Corp.

FreeBSD is a free open source Unix-like operating system first released in 1993, descending from Berkeley Software Distribution Research Unix. It focuses on performance, stability and security, and supports AMD64 64-bit architecture hardware.

ECC MEMORY

The type of memory used in the server is truly important in the case of ZFS, because ZFS is already very strong in guaranteeing data integrity on disk, but using non-ECC memory would let go all that to waste. ECC (Error Correcting Code) memory contains parity data to detect and correct memory read/write errors. ZFS relies on integrity of data in memory and does not have another layer of error detection or correction. Unlike other systems it does not have recovery tools, but it does have far better facilities to maintain data integrity on disk.

ECC functionality increases the cost of memory and of the server’s motherboard, which must support it. It also reduces the maximum total amount of memory the server can support. But it is all worth it, because memory errors do happen and can lead to serious data loss. Therefore our server will need to use a mother board that supports ECC memory, and all RAM must be ECC type.

Another know issue with ZFS is that it loves memory. Keep in mind that it was designed to run on those large industrial servers with memory of 16GB and higher. The rule of thumb is to install 1GB per 1TB of raw storage, and at least have a minimum of 1GB, but you may soon be increasing that to 4GB to obtain stable operation when continuously copying large amount of data over extended periods of time.

The reason for these “issues” is that ZFS was designed for high-end servers, and so these “issues” can not really be considered issues at all.

Note that for ECC to work it needs to be supported by the CPU, the motherboard, BIOS and OS.

OTHER HARDWARE

If you have multiple controllers, it is strongly preferable to use disks on different controllers to form a redundant set. It is also helpful to have drives of different manufacturing dates to avoid failure at the same instant. There may be additional hardware considerations, especially if your are trying to use older hardware.

For our basic system we use a single SATA controller and a up to four built-in hard drives, supplemented by two additional SATA drives, and optionally multiple USB drives. The use of USB drives is not recommended, but we could use them for additional redundancy/backup.

BASIC FREEBSD MANAGEMENT

The documentation of FreeBSD is excellent in the form of the FreeBSD Handbook. Therefore we will not detail how to install it here. If you like you can install the OS on ZFS automatically.

Installing Chapter 2
Updating and Upgrading Chapter 23
Keeping the system OS up to date is important to maintain security. To upgrade release:

[ ] Make backup of the OS
[ ] Enter (or similar, where 11.0 is the release number)
$ sudo freebsd-update -r 11.0-RELEASE upgrade
$ sudo reboot
$ sudo freebsd-update fetch
$ sudo freebsd-update install

After updating the OS, update the applications installed on the OS also, using the following steps:

$ sudo pkg update
$ sudo pkg upgrade

To stay up to date, subscribe to the FreeBSD-Current mailing list.

Understanding ZFS

To make sense of ZFS documentation it really helps to have a high-level overview, even if you are familiar with other filing systems; there are no configuration files you need to edit, mounting is automatically handled by ZFS (but can be forced if needed,) NFS is built-in, and so on. ZFS is different, but in the end much simpler to use.

The storage system consists of three levels of abstraction; the lowest is the Storage Pool Allocator (SPA) organizing physical discs into storage pools, the middle is Data Management Unit (DMU) transactionally reading and writing in atomic manner, and the highest is the Dataset Layer (DL) translating operations on the filesystem and block devices (zvols) provided by the pool into operations on the DMU.

In practice each host system will have one or more pools bound to it, and pools will contain one or more datasets (or filing systems). Pools are managed using the zpool command, datasets are managed using the zfs command.

ZFS POOLS

Disks are organized into the basic unit of storage, a pool, by the SPA as a tree of virtual devices (vdevs). Each vdev can consist of one or more other vdevs. The types of vdevs are:

  • mirror, n-way mirror, consisting of vdevs
  • raidz or raidz1 1-disk parity, RAID 5 like, consisting of vdevs
  • raidz2 2-disk parity, RAID 6 like, consisting of vdevs
  • raidz3 3-disk parity, consisting of vdevs
  • disk hardware disk drive, preferably whole drives rather than slices
  • file (not recommended)
  • spare special vdev, for use as stand-by of e.g. a failed disk
  • cache special vdev, for extending non-persistent in-memory data cache, and typically a flash disk
  • log special vdev, for separate storage of intent log records, to increase speed, and typically a flash disk

Pools can be moved between systems by performing export and import operations on original and new host. If needed, a pool import can be forced on a new host without performing export on the prior system. This is possible because the pool configuration information is [I believe] also stored on each disk.

On FreeBSD hosts, the list of pools is stored in the file /boot.zfs.zpool.cache, and is displayed by entering the zbd command. Each pool has a name and vdev_tree properties with details such as path to a physical drive under /dev, or /dev/diskid/, or /dev/gpt/. You can obtain a compact view using:

$ zpool list -v

Additional disks or vdevs can be added for more redundancy, attached to increase capacity, replaced to replace a bad drive with a new one, taken offline, removed, or split off from a pool.

We must finally mention the key concept of scrubbing of ZFS pools to verify data and detect and correct any inconsistencies among disks in a redundant set. Regular scrubbing must be scheduled and setup on your system.

ZFS DATASETS

In a pool we can have second level units of storage called datasets that can either be mount points (mountable filesystem) or, less commonly, block devices (zvols) that then become available as a device in /dev/zvol/{dsk,rdsk}/pool directory. There will always be a root dataset that will always be mountable, upon creation.

There are powerful options that one can apply to Datasets, such as encryption and de-duplication. The latter reduces the storage space requirements by identifying blocks of same data shared by multiple files, and keeping only one of the blocks, or even for instance up to two identical copies, for more important data.

Maintenance operations for datasets include creating snapshots, making backups, and removing older snapshots, and using advanced ZFS capabilities for moving data but this is a topic to be discussed later, when we discuss management of backups, which is another big topic.

Now you can read the documentation and probably make more sense out of it after a first pass.

The Z File System Chapter 12

BASIC ZFS SETUP

We recommend to use a pool with a  2- or better a 3-way mirrored set of whole drives, and if availability is important, an additional spare drive on hot stand-by. We use a 3-way mirror pool for the OS itself, and a 3-way mirror pool for data storage.

[ ] Attach new drives and identify drive IDs corresponding to their serial numbers
[ ] Create or add to a pool
[ ] Create and configure datasets
[ ] Schedule regular scrubbing

After attaching a new drive and rebooting your server, the drives should show up in /dev/ and in /dev/diskid/. Entries for a same disk take forms like: /dev/diskid/DISK-WD-WCC123456789 or /dev/ada1. To see all your drives you can inspect the dmesg.boot file for messages related to storage:

$ sudo more /var/run/dmesg.boot

For a less informative but also complete and sufficient list of installed drives (if all works correctly) use:

$ sudo camcontrol devlist

These methods list even bare drives without partition table or label. To add a new drive to a pool there is no need for partitioning. To create at new pool with three mirrored bare drives use:

$ sudo zpool create <poolname> mirror <vdev1> <vdev2> <vdev3>

with each <vdev> replaced by an entry from /dev. No need to enter full path.

Once a new drive is added as for example /dev/ada1, the entry corresponding to the same drive in /diskid will disappear after a reboot.

Compression

To enable compression (and this will not compress files written already before doing this) and select lz4 as the compression algorithm:

$ sudo zfs set compression=lz4 <poolname>

Sector size

You can check the reported logical and physical block (sector) sizes, for /dev/sda for instance this way:

$ sudo blockdev --getss --getpbsz /dev/sda

Older drives mis-report block size as 512 rather than 4k. In such as case use the option ashift=12 when you create the pool. Otherwise you may end up with mediocre performance. You can find the ashift value of existing pools with:

$ zbd -C

Export Import Trick

To have the pool show drives by disked instead of device number, you need to do the export trick; export the pool, reboot, and force an import by disked like this:

$ sudo zpool export <poolname>
$ sudo reboot
$ sudo zpool import -d /dev/diskid <poolname>

If some datasets are mounted, you can force unmount and export by using the -f option with export.

Creating and Mounting Datasets

Once the pool exists, you can create datasets, and set mount points for them.

$ sudo zfs create <poolname>/<datasetname>
$ sudo zfs create -o mountpoint=<mount-point> <poolname>/<datasetname>
The -o switch allows setting of options at the time of creation. Alternatively:
$ sudo zfs set mountpoint=<mount-point> <poolname>/<datasetname>
Set and get can be used to display or change properties in general.
$ sudo zfs set atime=off <poolname>/<datasetname>
$ sudo zfs set copies=<numerofcopiesofeachfile> <poolname>/<datasetname>

Disabling updating of access time stamps reduces load on the kernel.

For extremely important and valuable data you can set the number of copies to 2 or even 3. This doubles of triples the space usage but is extra redundancy in addition to disk mirroring for instance.

ZFS supports data de-duplication, which is useful when you hold two or more identical copies of certain files. This can saves huge amount of disk space. However, it works only for data written after enabling the feature. And it also requires lots of RAM, so use it sparingly, or be prepared for upgrading your RAM.

Other features you may be interested in are encryption and primary cache. For more see here.

Scrubbing

To maintain data integrity (and when you have ECC RAM) scrubbing your data periodically is a good idea. We can create a cron entry to do it monthly for instance. This is an example to start a job on 2:00 AM on the fifth of each month:

$ sudo crontab -l
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin
# m h mday month wday command
0 2 5 * * /sbin/zpool scrub mypoolname

You can edit crontab and add your own jobs with:

$ sudo crontab -e

BASIC ZFS MANAGEMENT

For a summary reference of ZFS commands and operations refer to Summary Reference on Managing ZFS or to ZFS documentation linked in it.

Some useful commands to inspect your storage system:  df, mount, zfs list -t all, zpool status, zpool history.

After creating your storage system, and every time you make a significant change, document it. The command zpool history will be handy to remind yourself.

For more articles on ZFS see here.

REFERENCES

About FreeBSD (freebsd)
All about ZFS (freebsd)
Oracle Solaris ZFS Data Management (Oracle; pdf)
Solaris 11.3 ZFS (Oracle)
Stystem Administration (OpenZFS)
Hardware (OpenZFS)
ZFS Deduplication (Oracle)
To dedupe or not to dedupe

Disclaimer: We are not responsible for any data loss as the result of reading our articel We stress that you must confirm the accuracy of information in this article yourself and test your server thoroughly before placing any critical data on it.
updated: 20170128; 20170129; 20170130; 20170201; 20170309; 20170705; 20170708; 20170723
photo: CC0 Stocksnap.io Piotr Lohunko

Leave a Reply

Your email address will not be published. Required fields are marked *