In the first of this two-part series, we introduced Springpath’s Hardware Agnostic Log-structured Objects (HALO) File System (FS) and discussed effective utilization of SSDs and HDDs. In this post, we will explore how we achieve low latency and handle space reclamation.
Low Latency Read/Writes and Space Saving with HALO-FS
The FS layer transforms the incoming protocol layer IO requests (read/write) into a node (read/write) called the File Tree (FT). The FT is implemented as a modified Merkle tree because unlike a traditional Merkle tree, where every non-leaf level node’s key is a hash of its children’s key, all the nodes in the FT are a hash of the content stored on the DKVS. FT’s leaf-level node’s key is the hash of the data content, and the non-leaf level is the combination of children’s keys and specific file attributes that are necessary for traversing based on protocol layer requests. The FT uses SHA-1 to generate the digital signature of each block written to DKVS for reliability. Since the Merkle tree validates all its children by generating a digest of the hashes of its children, the HALO-FS continually provides end-to-end data integrity and validation . For improving the cluster performance, the HALO-FS cluster instantiates multiple FTs to shard the incoming IO to different nodes and/or storage devices. Based on the performance requirements, these FTs are either read/written onto the SSD or later de-staged to the HDD. Picking the vnode number from the appropriate range makes these choices of the underlying device and its performance characteristics.
- Low Latency Writes (Random or Sequential): The HALO-FS packs incoming writes IOs to faster SSDs, which tier sequentially to guarantee consistent low latency response times for all writes. Furthermore, the HALO-FS intelligently shards data to multiple local nodes, mitigating any hotspots that could cause variance in perceived latency.
- Low Latency Working Set Reads: The HALO-FS intelligently caches read requests on SSD and services most of the read requests directly from the flash tier. It accelerates reads by detecting read-ahead streams and pre-fetches the data into the flash tier. This drastically reduces latency on the read requests coming to the HALO-FS cluster.
- Large Sequential Reads: DKVS shards the data on multiple HDDs to allow parallel fetch of data, enabling a faster sequential read of data. This is beneficial for large batch type user workloads, such as a virus scan or backup.
- Inline Space Saving: Inline deduplication and compression of data before writing to the capacity tier enables the HALO-FS to reduce space consumption on the fly. It also reduces wear and tear of the underlying hardware. Efficient packing by DKVS allows the HALO-FS to compress each key individually allowing quick-targeted extraction of data on read. The in-memory inline-dedupe algorithm has the capability to detect highly dedup-able objects using fixed memory footprint by smartly tagging and tracking such objects. With no IO penalty of index lookup, the HALO-FS does not suffer any write performance penalties while saving on writing to those objects.
Avoiding Performance Degradation with Space Reclamation
Space reclamation is a very important aspect of any LSFS, as it is easy to consume all the underlying space and prevent us from repeating all these great benefits perpetually. In this post I’m focusing on content occupying storage space generated by an immutable LSFS itself. Whenever there is an update to the FT, all the intermediate nodes up to the root of the Merkle tree have to be updated and written out. We eliminate frequent re-writes of the FT by coalescing large writes. This issue is exacerbated when users delete data, where the root of the FT is updated by chopping a single node while it could have potentially invalidated terabytes of data, as that content is not reachable anymore.
The algorithm to detect and clean dead key values is a fairly simple – Mark (the live objects) and Sweep (clean up the rest). The challenge lies in running this algorithm at scale. Even with sufficient sharding, the FT of a large enterprise system can get very large. To efficiently sweep and reclaim space, the HALO-FS builds space-reclamation efficacy stats during the mark phase for different vnodes numbers (shards). Using these stats, the HALO-FS only sweeps vnodes numbers (if required, in parallel) that would result in large space reclamation by carrying forward and compacting live keys. With an efficient space reclamation algorithm and a smart reclamation policy, the HALO-FS is able to provide an effective storage solution with minimum LSFS space overhead.
In our next post, we are going to share discuss Springpath’s native data services such as snapshots and clones.