Clustered storage is often associated with high performance computing, where the primary focus is on access to large files or data sets. Consequently, there is a perception that clustered storage systems can only support I/O operations for specific target vertical markets.
Another myth is that clustered
A clustered storage system scales to meet growth demands or high availability, including business continuance and disaster recovery requirements, beyond the capabilities of a single storage system.
In the case of a clustered NAS, a clustered file system can be employed that spans across all active nodes with any node being able to concurrently handle file requests. Some clustered file systems also support parallel access as well as concurrent access enabling either multiple nodes to work together to handle large parallel file access, or, multiple nodes each responding to different file request or some combination.
Another dimension of scaling for clustered storage beyond performance is connectivity and capacity with some solutions being able to vary the number and type of different network and storage attachment interfaces along with the amount and type of different tiered storage devices. Another aspect of scalable clustered NAS and file serving solutions are the ability to vary the amount of main memory and number of processing cores or CPUs with some solutions including the ability to leverage off the shelf existing or new servers.
Clustered NAS file serving solutions include vendor propriety hardware, with integrated software, like those from Isilon and Panasas. The software runs on the vendor's propriety hardware, such as processors or servers and propriety storage packaging and enclosures. Software based solutions such as HP Polyserve, IBRIX Fusion and Sun Lustre can be packaged by vendors or solution providers using various off the shelf processors
An advantage of the software based model is that vendors can provide technology and economic benefits by using commodity components to reduce cost, boost performance and tailor storage to meet various needs.
When considering clustered NAS file serving solutions, check with the vendor to determine if their offerings scale across performance, capacity, availability and data management. When evaluating a solution, consider if it provides the following:
• Local and remote replication, snapshots and load-balancing
• Application integration and tiered storage and access support
• File system size, number of files and size of files in addition to total storage capacity
• Clustered file system or clustered nodes and parallel access or concurrent access
• Optimization for low cost bulk storage or high performance storage
• Support for large and small random I/Os including meta data lookups
Keep in mind that more nodes, processors, ports and devices do not guarantee more performance. How the solution's software is architected to use the hardware resources and avoid bottlenecks while providing transparent data access directly affects performance.
For performance applications, you should verify vendor claims using published industry standard benchmarks that simulate your own applications and workload levels.
Some products support scalable performance for large sequential reads or writes, including parallel access of large files or support for small random reads and writes across a large number of files. Other products support both small and large I/O access of files concurrently and in parallel without introducing performance bottlenecks. To learn more about some of the myths and realities that permeate confusion around clustered NAS and file serving, check out the "Dispelling myths about clustering NAS and file servers" tip.
Greg Schulz is founder and senior analyst with the IT infrastructure analyst and consulting firm StorageIO.
This was first published in May 2008