Tutorial

Small businesses storage and midrange storage system buying guide

Storage system innovation over the past two decades has greatly outpaced all other IT tech sectors and shows no signs of slowing down. This guide provides an overview of IT operational and financial midrange storage requirements. It will also examine how the different types of current and emerging midrange storage technologies meet or exceed those requirements.

What is midrange storage?

Requires Free Membership to View

Midrange storage generally refers to storage systems that are more reliable and have more performance than direct-attached storage (DAS) systems, but have less performance and reliability than enterprise-class storage. Historically, midrange storage systems have been sold to small- to medium-sized businesses (SMBs) as their main storage system and to larger enterprises for branch or department-level operations. Typically, midrange systems offer features like deduplication and replication, along with multiple connection options (including iSCSI, REST and Fibre Channel). Performance measurements are somewhat controversial and are typically measured as IOPS and throughput. Common industry benchmarks include SPC-1, SPC-2 for storage area networks (SANs) or DAS, and SPECsfs for network-attached storage (NAS).

Organizational midrange storage requirements

The general midrange storage requirements can be broken down into six categories: capacity, performance, manageability, data protection, technology refresh and total cost of ownership (see chart below for details).

MIDRANGE STORAGE REQUIREMENTS

RECOMMENDATION

Capacity

The chosen midrange storage system must be able to meet or exceed the organization’s usable capacity requirements within the life of the storage system.

Performance

The key in the selection process is to match application and user performance requirements with the most cost-effective technologies.

Manageability

It is critical to match the midrange storage expertise requirements with organizational capabilities now and in the future. A mismatch will create serious errors, administrator disgruntlement and a bad experience.

Data protection

When selecting a midrange storage system, pick the product that provides all of the required data protection within the set budget. A layered approach to midrange storage data protection makes the most sense.

Technology refresh

Tech refresh will be an issue at some point down the line. How much will depend on the technology selected within the midrange storage system.

Total cost of ownership (TCO)

TCO is a critical aspect of buying midrange storage. Make sure when comparing different systems that ultimately the TCO comparison is apples-to-apples, includes everything, and there are no surprises. TCO is measured in capital expenditures (CapEx) and operating expenditures (OpEx). The common TCO measure is on a per-terabyte basis.

Capacity

Hard disk drives (HDDs) are the primary technology utilized for capacity in midrange storage systems. High-capacity HDDs can be found today primarily with SATA or nearline SAS (NL-SAS) interfaces. Both operate at 7200 RPMs. Historically, SATA and FATA have been the midrange storage high-capacity HDDs of choice. That is changing as FATA is phased out and NL-SAS overtakes SATA as the dominant high-capacity midrange storage system HDD. This stems from NL-SAS’ built-in dual porting, lower silent data corruption and price points that are very close to SATA.

Midrange storage systems can scale capacity three different ways: scale-up, scale-out or hybrid combination of the two. Scale-up means that the storage system can handle a lot of HDDs in a single or dual controller system. This has become easier with high capacity HDD drawers in a small space and the implementation of SAS in the backplane of the midrange storage systems. Scale-out allows capacity to be scaled by adding more controllers (typically called nodes) in cluster or grid architecture. Hybrid means both are utilized in the same midrange storage system.

Performance

There are a lot of myths surrounding midrange storage system performance. It is an end-to-end ecosystem and should be managed in that manner. When looking at midrange storage performance, the first variable that comes to mind is the speed and number of drives. HDD speeds come primarily in 15K RPM, 10K RPM and 7.2K RPM. A faster RPM figure translates into more IOPS and faster throughput. More drives equal more aggregate IOPS and throughput up to backplane limits. Short-stroking HDDs (using only the outer part of the HDD drive platters) increases IOPS and throughput while dramatically reducing usable storage. Solid-state disks (SSDs) utilizing NAND FLASH come next. Typically they provide greater IOPS and throughput than HDDs with lower capacities, at much higher costs.

Performance is also affected by storage network type: SAN, NAS and cloud. SAN typically has the lowest latency, which should translate into greater IOPS but does not always do so. Highest bandwidth performance, lowest latency honors goes to InfiniBand (40 Gbps today going to 56 Gbps and 100 Gbps within the next 12 months); followed by FC at 16 Gbps; followed by FCoE/iSCSI/AoE at 10 Gbps, going to 40 Gbps later this year. NAS performance is also accelerating with the release in 2011 of pNFS or parallel NFS that bonds multiple ports as if it were a single virtual pipe. Latency will still be an issue but bandwidth won't because it runs on the same Ethernet at FCoE, iSCSI and AoE. Even cloud storage networks utilizing REST HTTP are capable of providing latencies of current SANs from some vendors.

Other midrange storage performance variable factors include:

  • Storage network end-to-end bandwidth and latency;
  • Storage controller limitations in IOPS and throughput (most midrange storage are x86 architectures today benefit from Moore’s law, while some have additional custom ASICs or FPGAs);
  • Amount of DDR or FLASH cache which can speed writes, reads, or both;
  • Controller backend bandwidth to the HDDs or SSDs;
  • Automated storage tiering software;
  • Deduplication and/or compression;
  • Number of controllers or nodes (many scale-out midrange storage systems have severe limitations on the number of nodes per systems image because performance begins to suffer).

Manageability

The days of midrange storage as a science project have become the exception vs. the rule. The operational value proposition of midrange storage has been shifting to having the expertise built into the storage system instead of the administrator. NAS systems have been doing this for a while, and Dell EqualLogic pioneered this type of manageability for SANs. It has become table stakes for higher end midrange storage. Even Fibre Channel midrange storage, which is notoriously more manually intensive than any other storage network, has become increasingly automated.

It is important to note that there are still numerous midrange storage systems that lack this level of sophistication and automation. These systems usually cost far less than automated systems and can be performance-tuned in many more ways, although it requires administrator expertise.

Data protection

There are multiple ways to protect against HDD failures or unrecoverable read errors, which fail far more frequently than the manufacturers claim. RAID is the most common HDD data protection. But RAID is starting to have issues with rebuilds, data loss and performance. This has led to RAID innovations such as triple-parity RAID; wide-stripe RAID that speeds RAID rebuilds; and RAID with actual self-healing of the HDDs or SSDs that dramatically reduces rebuild requirements. There are even more innovative RAIDless alternatives such as multiple copy mirroring that eliminates parity, reduces copies, increases usable storage, and allows the data to be pulled from multiple locations and HDDs. The latest RAIDless innovation is erasure code-based storage, which eliminates entirely the concept of data rebuilds. Erasure codes break up the data into chunks and stores those chunks not as blocks, but as linear equations. This reduces the number of data copies, increases the amount of usable storage, reduces cost and greatly increases data resilience.

Protection against malware, corruptions, accidental deletions and human errors are typically provided by copy-on-write (COW) snapshots or redirect-on-write (ROW) snapshots. Both are virtually instantaneous and actually only copy the metadata or pointers to the data. COW has a double write performance penalty limiting the number of snapshots per day, but saves money by copying the data to lower cost lower performing tiers. ROW has no double write penalty and can take nearly unlimited snapshots, but copies the data to the same expensive storage tier as the original data.

Protection against system or site disasters can be accomplished by system-to-system mirroring/replication, or by migrating older data, data to be distributed, backup data or any type of tertiary data to cloud storage.

Technology refresh

Midrange storage systems have historically been single or dual active-active controllers (clustered). They have a set technology lifespan, requiring technology refresh every three to five years as usable capacity is consumed, newer HDDs and SSDs are not supported, and performance no longer meets requirements. This technology refresh normally requires a data migration that is a manually intensive, application disruptive, time-consuming process. It is also quite expensive in both time and money.

Newer midrange storage technologies have eliminated or mitigated much of the pain from technology refresh. Scale-out midrange storage systems, storage virtualization (file and/or block), and especially object-based midrange storage systems make tech refreshs easy. Add new nodes to the system, it's discovered, data is copied, then remove obsolete nodes. All of this is performed online with no user or application disruptions.

Total cost of ownership

TCO and how it is measured can significantly vary by vendor. The key is making sure that every midrange system being considered is measured the same way. Midrange storage has traditionally been purchased up front with maximum or near-maximum capacity to negotiate the best price from the vendor. This constitutes a significant part of the total upfront price of the system and is ultimately only a small part of the TCO. TCO consists of the CAPEX costs of the midrange storage system, software and supporting infrastructure, plus the OPEX including floor space, rack space, administrators, hardware maintenance, software maintenance, subscription costs, professional services and, most importantly, power/cooling. Most analysts peg midrange storage OPEX costs at approximately 400% to 600% of CAPEX. This is a good rule of thumb. Actual OPEX costs can also be calculated.

TCO tends to be calculated on “raw” vs. “usable” capacity. To truly compare apples-to-apples, it is much better to compare on “usable.” This means calculating the overhead of the data protection features (the vendor can provide help here) and estimating the positive impact of data reduction features such as deduplication and compression. Dedupe and/or compression increases effective usable capacity by varying amounts depending on the data (structured, unstructured, already compressed, or encrypted). A good rule of thumb is use a usable capacity multiplier of approximately two or three.

Conventional wisdom states that SSDs based on flash are considered too expensive to replace HDDs for high capacity storage. The conventional wisdom may be incorrect. There is a new crop of midrange storage vendors that have focused on MLC or EMLC NAND flash to provide cost-effective flash-only midrange storage systems. One vendor is currently delivering midrange storage systems that scale to 250 TBs at price points equivalent to HDD-based systems with the same or better software and orders of magnitude better performance. There are two more vendors coming out of stealth this year with systems that scale to petabytes with pure flash-based midrange storage systems. These may be viable alternatives if both performance and costs are issues.

One other factor changing midrange storage TCO is cloud storage, as vendors for cloud services may charge only on what is consumed on a per monthly basis. Public cloud storage makes this a pure per-use model. Private cloud storage is a combination of pay-per-gigabyte of usable storage on the software, plus upfront and ongoing costs for the physical infrastructure to support it. This changing storage cost paradigm greatly reduces the midrange storage TCO by eliminating or mitigating unusable storage while eliminating or reducing upfront costs.

About the author: Marc Staimer is the founder, senior analyst, and CDS of Dragon Slayer Consulting in Beaverton, OR. The consulting practice of 13 years has focused in the areas of strategic planning, product development, and market development. With more than 31 years of marketing, sales and business experience in infrastructure, storage, server, software and virtualization, he’s considered one of the industry’s leading experts. Marc can be reached at marcstaimer@comcast.net.

This was first published in June 2011

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: