Sometimes I just need to actually accomplish something. Even if that something is esoteric and not particularly necessary. That is how I spent the last few weeks contemplating computer file systems.
Unless you’ve been using computers since the early days of the PC, “file system” likely sounds like a foreign word. A file system is the code that keeps track of our data on a computer. Think of it as both the library catalog and the shelving holding the books of our personal collection of documents, photos, movies and whatever else we store on our devices.
In the days MS-DOS systems ruled the earth, most users had to intentionally interact with these concepts at least to a degree. Keeping our stuff safe is important, but the time when it was necessary to have the actual process at the forefront of our minds long ago passed.
These days, even if you have to do a clean wipe of a system — Windows, Mac, iOS or Android — the computer can make all the necessary arrangements automatically. Even Linux, where tinkering with the drive partitioning remained more common for longer, now happily will handle all the details in the background. (Unless you have a strong preference for a particular configuration.)
The last time I remember being somewhat excited about file systems was nearly a decade ago when Apple announced APFS as a replacement to its aging HFS+ system that had been continuously retrofitted with new features since the 1980’s. But, while the new system Apple introduced made a few improvements in using the Mac, its release was almost invisible, upgrading existing systems during a mid-year point release update on iPhones and Macs. People were blissfully unaware anything even changed.
That’s perfectly fine. Just not super exciting.
What changed that for me was a summer project to set up an old server I purchased on eBay to backup the video projects I do in ministry. I loaded it with cheap, refurbished hard disks in what is known as a “RAID,” which provides a storage “drive” that combines the size of multiple cheap drives to provide more space. RAID 5 also protects data by offering redundancy; if any one drive fails, data isn’t lost like it would be if you store everything on a single device and it dies.
With Debian’s much improved installer, I didn’t plan to think a whole lot more about the arrangement beyond what physical drives I put in the system. But, an idiosyncrasy in the system gave me pause and sent me on a spree of tests concerning the speed of my hard drive configuration.
While the cheap, small solid state drive I’d placed in the system for the operating system was transferring data easily at 500-800 MB/second (in some cases even two or three times more than that), which was fantastic, the hard drives were struggling to hit 15-30MB/s. Hard drives are slower, but in what as known as a RAID5 configuration, it should have been a bit faster — fast enough, at least, that the computer could save things faster than I could send them over my home wifi.
If I detailed all the tests I have performed since then, you would think I had gone insane. I became determined to isolate why things were so slow and speed them up. Each step felt like a page turning mystery novel, because I knew there was a bottleneck to find. Find it and I could glory in speed.
RAID can be controlled by the computer’s software or by a hardware controller. My used server was originally pretty high end, so it had a hardware controller included.
There are philosophical debates on software versus hardware RAID, but with the system’s aged processor, offloading work to a dedicated card sounded the wiser path. The card inside the computer could hold “cache” to speed things up, so I found an upgrade chip for $12 on eBay and tried that. Better, but not best.
Then, I pulled that upgrade out, disabled all the extra features on the controller that provided RAID and just gave the system the five hard drives to do with as it willed. That opened the opportunity to use the ZFS file system, which all but insists on software controlled RAID.
I had used ZFS for years on a previous server and loved its resiliency. A few years ago when a RAID of four disk had one die and with little fuss to replace it, everything came back online. Using that same mechanism, I later removed drives one by one to put bigger ones in and it migrated to those without a single lost file.
ZFS is a storage system, not just a file system. It insists on software RAID because it has its own version called “RAIDZ” — a tweaked version of RAID — with cache mechanisms meant to boost both reading and writing speeds. Meanwhile, ZFS does cataloging of the details of every single file so if a drive quietly loses just a tiny bit of data, that data can be automatically restored before one even realizes it is missing.
ZFS is really smart and my experience backed that up. But what about my current quest for speed?
There was great rejoicing.
ZFS on many counts massively outperformed my other tests, moving from 30MB/s up to 100 or even 300MB/s. But, a plot twist: on transferring individual large files — like long video recordings — it still dropped to dismal speeds quickly.
Then I remembered that the usual warning about ZFS is that it is heavily dependent on RAM for its performance features. The rule of thumb is that ZFS should be given 1GB of RAM beyond whatever the system needs to do other things for every terabyte of storage it manages.
I found another deal on eBay and added 64GB of RAM inexpensively. Bingo.
With that and the slightest tuning of the ZFS configuration, suddenly the system was able to sustain speeds faster than the hard drives were technically capable of by queuing work into memory and then distributing it in an orderly fashion.
Given that the primary purpose of the project was to have a good, secure, cheap place to store my backup materials, how much the performance gains I’ve made even matter is debatable. I may never notice the difference in practice.
Yet, when I finally figured out the combination that worked best, I felt a sense of satisfaction at completion.
I suspect like many of OFB’s readers, much of what I do day in and day out is ongoing and reoccurs without ever feeling like I finish it. When working on a project like this one, I’ve realized part of the urgency in succeeding is that I simply want to have accomplished something measurable.
I hope I accomplish much more meaningful things in life, but when a lot of what we accomplish can be seen only in retrospect — if at all — many years later its easy to get discouraged in the moment. Sometimes I need to accomplish something now.
And so I have a ridiculously fast backup drive system and a story to tell about it.
Timothy R. Butler is Editor-in-Chief of Open for Business. He also serves as a pastor at Little Hills Church and FaithTree Christian Fellowship.
You need to be logged in if you wish to comment on this article. Sign in or sign up here.
Start the Conversation