Today’s hyperconvergence addresses utilization and management issues by bringing DAS (Direct Attached Storage) into compute servers, offering key advantages over traditional NAS or SAN. Distributed storage software runs on all converged nodes and manages DAS devices across the cluster to provide a unified pool of resources. This unification is made possible by high-density CPUs and storage devices (e.g., Flash, Hard Disk Drives, etc.), which enable running distributed storage controller software to manage these storage devices and still leave enough memory and CPU cores in the servers to run applications, containers and virtual machines (VMs). The management complexity is thereby significantly reduced because there is only one converged infrastructure to manage.
A typical hyperconverged system consists of a homogeneous set of virtualized servers, each of which is capable of hosting users’ VMs and/or containers. These VMs access storage via distributed storage software that manages SSDs, HDDs and other storage devices attached to each of the servers. The storage software provides all traditional networked storage functionalities like global namespace, replication, data availability, etc., allowing the VMs to access storage from any hyperconverged node, migrate from one node to another and remain available in the event of failure. As enterprise requirements change, IT admins can add or remove one or more of these hyperconverged nodes to or from the existing cluster to meet fluctuating demands.
This cookie-cutter based approach taken by many hyperconvergence vendors to scale cluster capacity is majorly flawed. It may work for some use cases, BUT it comes at a cost that needs to be more widely talked about. Most real-life enterprise applications don’t use compute and storage proportionally. Some typically require more compute (e.g. Virtual Desktop Infrastructure), while others may require more storage resources. In addition to data storage capacity, applications have different storage IOPS (Input/Output per second) requirements, which may not be proportionate with their capacity usage. There are other aspects to scale out, but compute, storage IOPS and data storage capacity drive a majority of the scale-out decisions that IT admins have to make. Taking a cookie-cutter approach means the addition of a new node adds all three resources in equal proportions, which often results in under-utilization of some and a higher overall cost.
Three dimensions of scale-out hyperconvergence:
Flexible Scale out
In talking with IT admins, we learned their need to scale compute, storage IOPs and capacity independently to meet the ever changing demands of their enterprise application. In order to address this need, we designed the HALO architecture that abstracts out and separates the three common distributed storage functionalities: the data access layer, the distributed caching tier and the distributed capacity tier.
A separated data access layer provides the standard storage access interfaces (e.g., file, block, etc.) and routes the application IOs or the user VM IOs from the hypervisor to the appropriate node(s) in the caching tier. The caching tier is composed of direct-attached high-performance flash-based devices spread across different cluster nodes, providing a distributed, fault-tolerant consistent storage with high random IO performance. The capacity tier is composed of high-density storage devices to provide a longer term fault-tolerant consistent storage. Data is moved automatically between the tiers based on the access patterns to make sure that ‘hot’ data is available in the caching tier.
Because of these separations, each cluster node can be configured independently to provide the services of one or more of these layers and still provide a common global resource pool for efficient utilization of resources. Springpath data platform automatically discovers the resources (flash, HDDs, etc.) and configures each cluster node as compute, caching, capacity or a combination thereof. This allows IT administrators to scale out their hyperconverged infrastructures based on their specific requirements and thereby avoid the unnecessary cost of adding redundant resources.
Scale out Modes via Springpath’s Data Platform
Initial three node Springpath hyperconverged cluster configuration (Example):
Scale out compute, storage IOPS and capacity together (traditional hyperconvergence approach):
Scale out compute only:
Scale-out compute and storage IOPS:
Scale-out storage capacity only:
This flexible scale-out approach is just one of the ways Springpath accommodates the specific needs of enterprises. In our previous blog, Flexibility of a 100% Software Data Platform, we described the benefits of a hardware-agnostic software solution.
In upcoming blogs we’ll discuss the key architectural decisions that shaped our platform.