Keys to building really scalable storage
- Published: Thursday, 20 September 2018 07:21
Quobyte offers tips for building and managing high-capacity infrastructures capable of handling billions of files and hundreds of petabytes.
Hyperscale data centers / centres need enormous capacities for high-performance computing, artificial intelligence, big data analytics, containerized infrastructures, and other challenging modern workloads. Environments with billions of files and hundreds of petabytes delivering 24/7 applications all require large, complex, future-capable storage installations.
Quobyte Inc., a developer of modern storage system software, offers the following characteristics of massive storage infrastructures that still maintain performance and manageability for demanding workloads – and companies of all sizes can benefit from these tips for building scalable storage.
Keep it on file
Block- and object-based platforms or products can’t match the flexibility of file-based systems, especially if performance is a consideration. Block storage as used in traditional systems worked well when only a handful of machines shared a common resource, but performs poorly as it scales, and becomes too complex to administer. Object storage is attractive for its ability to grow to millions if not billions of objects, however, today’s file systems have this scalability too. While object systems succeed for archival data, they become troubled in primary workloads needing high IOPs and low latency, particularly small-file workloads. Some applications have difficulty interfacing with the specialized protocol of object storage without a performance-robbing gateway, which are impractical in large environments.
United we scale
Conventional methods of scaling create storage server sprawl problems and silos, hindering resource and data accessibility by users and applications. In contrast a ‘unified’ approach to storage consolidates heterogeneous communications protocols into a unified pool where data is accessible from and between Linux, Windows, or Mac systems, via NFS, SMB or S3. Unified storage platforms serve legacy applications as well as new, and are effective for environments with traditional and modern workloads. For example, a Windows user can edit a large file at the same time a Mac user is reading the same file without having to copy or move it to another system, or users can easily share a data set across the globe via S3.
Open-source, open computing ecosystems are the preferred choice of hyperscale environments because they boast innovation, affordability, and integrability. The largest data centers in the world use OpenStack to manage compute, storage, and networking economically, so storage platforms should be fully functional with OpenStack – and vice versa. The storage should also support important open-source projects, interfaces, and components such as Cinder for incorporating block-based devices, Manilla for shared files, Glance for images, and Keystone for authentication.
No hidden SPOFs
Fault-tolerance is a must in a large infrastructure with vast numbers of hardware and software systems. In addition to resiliency and redundancy with no single points of failure (SPOFs), it must be extremely simple to locate and swap out any failed system: broken or misbehaving switches, faulty NICs, even bad network cables that can result in packet loss or corruption. Hidden SPOFs also include partial outages that cause ‘split-brain’ scenarios where it’s unclear which version of the data set is correct and up-to-date. Preventing hidden SPOFs usually requires software capable of performing verification and consistency checks as part of its data protection features and automatically handling disk and node failures.
IT staff time is a costly resource in hyperscale infrastructures. Data loads may double or triple from one year to the next, but staff and budget will rarely double or triple. High-capacity storage must be ‘low-touch’ to keep management and maintenance to a minimum. Highly automated systems, self-monitoring/self-healing features unburden administrators tasked with running large-scale installations, and allow even small teams to manage tens to hundreds of petabytes.