It takes a well-architected collection of components to build a production data storage system. The server (or...
cluster of servers), CPUs, memory, internal buses, host bus adapters (HBAs), interconnects, storage area network (SAN) components, internal and external disk and tape storage, all play a part in developing a well-balanced data storage system. Failing to provide adequate bandwidth and alternate paths to the data can seriously affect performance, resulting in catastrophic failures at the most inopportune time. Without proper planning, you may even be putting the survival of your business at risk.
In this tip, learn about the various components in your data storage system, including server architecture, SAN components and external disks and disk controllers.DATA STORAGE SYSTEM COMPONENTS TABLE OF CONTENTS
Regardless of the vendor, standard servers come with a CPU complex connected to some memory, and one (or more) buses bridging off to provide access to your peripherals and external devices.
The bus can be thought of as a highway; it has a number of "lanes," and depending on the vintage of your server, the number lanes, or the speed of those lanes, may be limited. If your bus has several lanes (PCIe x8, for example), and if the achievable speed on those lanes is maximized, you will have a good chance of moving data to or from the CPU complex at speeds that allow the CPU to perform work, rather than the alternative, where the CPU is doing nothing while it waits for work. Be sure to discuss your server's architecture with your vendor to ensure you understand how all of the internal components work together.
You should also ask your vendor about limitations in the components that join the PCI buses to your CPU complex. For example, the interconnect design can make a bottleneck, reducing the speed of all the highways to the limits imposed by the interconnect.
This condition has been seen in bus bridging and interconnects components in several families of servers. Be sure to ask your vendor for specifications for each component used in your server to ensure that you don't run into this problem. Figure A illustrates the CPU complex and buses in the server architecture. (Click on image for full size.)
Figure A: CPU complex and buses in server architecture
Ultimately, the objective of the server architecture is to achieve full duplex (reading and writing) on a bus. Many vendors will provide performance numbers for reading, or for writing, but never for reading and writing at the same time. With a tape archiving system, you will be migrating data from older tapes to newer tapes. At the same time, your applications will be reading and writing to disk, and your backup will be reading data from disk, possibly compressing it and writing it to tape. Don't be afraid to discuss this level or I/O activity with your vendor.Host/Channel adapters
Host bus adapters and/or host channel adapters (HCAs), are used to connect external devices to your servers. It is possible that network interface cards (NIC) are used to transfer your data over a network to network-attached storage (NAS) devices, as well. In either case, the configuration parameters for these devices are set to factory defaults, which are almost never suitable for production environments. You need to understand the parameters and choose settings that are favorable to the specific I/O patterns in your environment. Your vendor should provide you with best practice documents to help you understand the settings, and provide assistance to help you with the tuning required for your data storage environment.SAN infrastructure and bandwidth
Providing adequate paths to the disk or tape storage is important in your environment. If you need to transfer 800 MB/sec, but only provide a single 400 MB/sec pipe, you will not achieve your goal. Keep in mind that you can't test the bandwidth of the path with a single operation. I've heard several customers say they tried testing the bandwidth of a device by copying a single file from one of the internal disks to the storage array. This is not a trustworthy test because the bandwidth of the internal disk is limited to the speed of that single internal disk, and the target device is normally a group of RAID disks in the external array. In other words, a single disk operation cannot possibly provide the data flow needed to achieve an adequate test result. There are several benchmarking tools available to help test the data flow and bandwidth of your environment. You may be surprised at how well your systems can perform.Alternate paths
Alternate paths are also important in SAN design. If you set up your servers properly with adequate bandwidth and alternate paths, your SAN will perform well. Alternate paths provide a detour when one of your primary paths to the data is interrupted for some reason. It could be a bad cable, an HBA that has reset itself, or a component in the path that has actually failed. Not providing alternate paths will cause your operation to stop dead in its tracks when any single problem occurs.SAN switches
It is also important that you understand your SAN switch architecture. Vendors typically build switches using either multi-port blades/daughter-cards. The concepts you need to understand are latency, and reducing single points of failure. If, for example, you are migrating data from one set of tapes in your tape library to a new set of tapes in another library, having the two tape drives connected to the same blade/ daughter-card will reduce latency. The amount of time needed to transfer the data from one port to another port on the same blade is less than the time it takes to transfer the same data from a port on one blade across an internal bus to a port on another blade. You will also want your server connected to the same blade as the tape drives. Figure B illustrates SAN switches (Click on image for full size.)
Figure B: SAN switches
Server 1 will have reduced latency reading from Tape Drive 1 and writing to Tape Drive 2, while Server 3 will experience slightly higher latency when reading from Tape Drive 3 and writing to Tape Drive 4.External disks and disk controllers
External disks and disk controllers such as serial-ATA (SATA), SAS, Fibre Channel (FC), 5400 RPM, 7200 RPM, 10,000 RPM, 15,000 RPM, all have data storage capacities ranging from 36 GB to 2 TB. And now solid-state drives (SSDs), or Flash drives, are available in some configurations mixing disks with high performance front-end components based on flash technology. With all these options available today, how do you choose which one is right for your data storage environment?
The right selection and mixture of devices for your specific environment depends on a lot of variables and should be carefully planned before the purchase order is sent to the vendor. To start, focus on the external disk arrays you already have in your environment. Make sure that your RAID sets are evenly balanced across the back-end channels of the external disk array, and make sure the RAID sets are made up of the same type, capacity and rotational speed disk drives. Watch out for channel conflicts and channel blocking on the back end (drive side) channels. Your external storage array has tools available that allow you to monitor performance, detect errors, and proactively solve problems before they cause serious outages. Learn how to use them, and keep a close eye out for warning messages. Make sure your vendor explains any and all warning messages. The controllers would not be issuing warnings unless there was something unusual being detected.
Make sure you understand the architecture of the storage controllers that are managing your disks. As previously described, there are typically several PCI buses coming into the storage controllers from the host side, and many more going out to the backend to the disks. Somewhere in the middle there is a bridge that ties all these buses together. Understand how this bridge works, and if there are any limitations to the theoretical bandwidth imposed by the design of the bridging components.The file system
Finally, make sure you understand the layering of the file system blocks over the RAID sets. Alignment issues can cause serious performance issues for some external storage controllers. Alignment issues occur when the blocking mechanisms are not in sync with each other. If the RAID set is blocked at 16K, and your file system (or application) is reading or writing in 17K chunks, there will be overlap and boundaries will be crossed. Boundary crossing cause the storage controllers to do much more work than necessary, normally seen in the form of Read, Modify, Write (RMW) operations that steal bandwidth from the applications.
A well-architected SMB data storage infrastructure is key to achieving optimal performance in your data backup systems and servers. Each component in the infrastructure typically comes with its own set of tunable configuration parameters. As a manager, you must ensure that each component is tuned properly to achieve the desired performance. From time-to-time you will find a component that, despite your best efforts, fails to achieve the performance desired. This is where you must dig down into the design of the architecture to find the bottleneck.
About this author: Dave Ellis is the Principal Technologist at Instrumental Inc. Dave has more than 32 years of high tech experience. As a member of Instrumental Inc.'s CTO staff, Dave tracks, evaluates and implements high-performance computing solutions that anticipate and meet changing customer requirements.