Several years ago we embarked on a journey to build a file system optimized for the deployment and scale of hyperconvergence software. In this two-part series, we’ll reveal technical aspects of the Springpath Hardware Agnostic Log-structured Objects (HALO) File System (FS). If you’re interested in learning why a Log-Structured FS (LSFS) is the right approach for new generation file systems, we invite you to read the articles listed at the end of this post [1, 2].
Understanding Log-Structured File Systems
Treating its backup devices (e.g., HDD, flash or NVRAM) as a logical circular buffer, the LSFS writes new content to the head of the buffer, and old content that needs to be reclaimed is available via snapshotting. The LSFS, with its append-only architecture with immutability on write, is valuable on multiple dimensions. Each file or set of blocks usually has a single writer for eliminating contention and simplifying correctness semantics. Immutable content allows seamless replication, enabling distributed reliability, and the underlying hardware (SSD and/or HDD) is designed to perform optimally for large sequential writes. For instance, the FS needs to understand the physical properties of the SSD/flash before it uses it. The blocks on SSDs have a fixed life of writes, or number of times it can be written to, before it becomes unreliable. The FS designed to overcome this limitation enables writing large immutable chunks to increase the lifespan of the SSD. Note that there is no limitation on random reads from the SSD. Similarly, the HDD provides fantastic throughput on large sequential writes. Through careful tiering of the HDD and SSD, a file system can be built to optimally service varied workloads.
With these qualities in mind, it becomes important to conserve this log-structured immutable property from the ground up (across layers). Otherwise, we lose the efficiencies provided by the hardware. The HALO-FS node is composed of four layers (as shown in figure below): Block, Distributed Key Value Store (DKVS), File System (FS) and Protocol.
The block layer manages the raw devices accessing the storage devices such as the SSD and HDD. It divides them into addressable segments and provides the upper layer with a crash-consistent log-structured block device to write to. It expects the higher layers to clean up dead objects and segments.
Effective Data Placement with DKVS
DKVS is built on top of the block layer and reliably and uniformly distributes key-value objects across multiple nodes. The FS generates these keys using a standard cryptographic hash function, such as SHA-1, from the data. As DKVS receives keys from the FS layer, it combines them into segment sized chunks and replicates them into one or more nodes. These segments are immutable and generate checksums for both key-value pairs and the segment. This segment/key level data integrity allows the HALO-FS to replicate these segments across large distances without the fear of data loss.
To achieve scalability, DKVS allows the HALO-FS to shard the data based on a unique identifier, which we call a vnode number. The file system can mark its keys with this vnode number and DKVS will route it to the appropriate physical node(s) and disk(s). DKVS also provides a mechanism to place data based on attributes such as different device types (e.g., SSD or HDD), reliability factors and other data properties. DKVS splits vnode numbers into bands/regions to expose these attributes to the upper layers. Different regions could map to different file attributes. By providing a simple unique identifier as a parameter to pick these attributes, the HALO-FS simplifies the upper layers API to shard, place and manage the data on DKVS.
We now understand how building the file system layer on top of DKVS allows HALO-FS to optimally utilize both flash (SSD) and disk (HDD) resources in a hyperconverged environment. In the next post we will explain how HALO-FS achieves optimal, consistent and reliable performance for common enterprise workloads.
- LFS, http://highscalability.com/blog/2015/5/4/elements-of-scale-composing-and-scaling-data-platforms.html
- Immutability, http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
Single node view of HALO file system