Home > Small-midsized Business Data Storage Tips > SMB storage tips > Troubleshooting SAN performance issues
SMB Storage Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

SMB STORAGE TIPS

Troubleshooting SAN performance issues


Brian Peterson
07.11.2008
Rating: -3.00- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Storage area networks (SANs) can be complicated and temperamental beasts. This is especially true when they're poorly managed. Troubleshooting is tough because a good design is not always obvious and Fibre Channel (FC) standards are just loose enough to make interoperability a concern. This technical tip will review some common problem areas with SANs, describe how to diagnose problems and offer some suggestions about how to prevent the problems in the first place.

Common problems

A million things can go wrong in a complex storage network. Based on the symptoms, narrowing a problem down to a probable cause in one of these areas should speed troubleshooting and resolution. Each failure type can be grouped into one of the following areas:

Compatibility issues

Although FC SANs have been around for 15 or more years, not all devices interoperate well. It is very common for many SAN problems to result from non-interoperable components. All storage vendors publish some form of a support matrix where they document tested and supported configurations of storage array microcode, SAN switch firmware and host hardware/software.

Exceeding the capacity limits

It is probably obvious that saturating SAN ports will cause bottlenecks and those bottlenecks can transform themselves into elusive application problems. It is usually pretty easy to look at a host or storage port on the SAN and determine if it is 100% busy, but it is tougher to determine if an overloaded inter-switch link (ISL) is a culprit. Sometimes the I/O itself isn't a bottleneck, instead limits like fan ratios (number of HBAs zoned to a storage port) and number of switches in a fabric are exceeded, causing connectivity issues.

Incorrect configuration or zoning

Bad or incorrect zoning is one of the most common causes of SAN problems. Maybe it is because we change the SAN zoning most often. This may also be common because zones contain those tricky 16-digit hexadecimal world wide names (WWNs).

Flakey connections and cables

It seems that when fiber cables fail, they rarely fail completely. Instead, they die a slow, painful and intermittent death. On the way to the grave, they often give applications and administrators fits.

Storage array configuration issues

Each brand of storage array is managed a little differently, but all share some basic concepts. Logical unit numbers (LUNs) must be created and assigned to a host HBA through a front-end SAN port. Problems often arise when the storage administrator makes a typo in configuration the array.

Host configuration issues

A lot can go wrong on a server. They represent a large proportion of the SAN component stack, including the volume manager, operating system, multipathing software, HBA driver, HBA firmware and HBA hardware. Each of these components must be configured as per the storage vendors specifications, or you're asking for trouble.

SAN hardware failures

I purposely listed hardware failures last on the list of common SAN problems because while it is usually the first place we look, it's rarely the problem. Today's SAN hardware is very reliable, but it does fail occasionally. Common failures that can affect host access are SPF port failures, port card failures and entire switch failures.

Problem determination

SAN troubleshooting requires an intimate knowledge of the desired configuration and the expected behavior a particular system. When a problem occurs, it's helpful to narrow it by eliminating the properly functioning components in basic areas: SAN, hosts and storage. Ask yourself these questions:

Is it the SAN?

Have any SAN changes occurred recently? Ask around, check the SAN logs and compare the running configuration to the documentation. Is it SAN reporting events or errors that may be related? Look for failed ports, recent port logouts or fabric rebuilds.

Is it the host?

Can other hosts see the storage in question? Can this host see other storage? Is the HBA logged into the fabric? Have any recent host changes occurred? Are there any SAN-related messages in the hosts system message logs?

Is it the storage?

Can other hosts see the storage in question? Is the storage port logged into the fabric? Have any changes occurred on the storage array recently? Are the storage array logs reporting errors?

Check the support matrices

Make a regular practice of reviewing storage matrices and checking your configuration against what is currently supported. Manufactures are constantly finding new bugs that get fixed in new code. Keep your software versions current and supported and you'll avoid a lot of problems.

Document the SAN

This one is huge. It is so important when troubleshooting a problem to understand what the design intent was. Make sure the documentation records hosts, HBAs, WWNs and where they connect. It should include the storage, storage ports and their WWNs. Finally, the SAN documentation should describe the fabrics, ISLs, zone sets, zones and zone members.

Baseline the SAN performance

Unless you record what is happening on an average every day, it will be tough to determine if a busy port is normal or the culprit during a problem. Minimally, record the average port utilization for every port in the SAN.

Plan your changes

To avoid administrator-induced outages, use the SAN documentation to define changes before they happen. If you are making any decisions about what to do when you're executing the change, you're doing it wrong. Also, it is too easy to forget to document a change after it has occurred.

Backup the configurations

After every day of SAN changes, back up and safely store the switch configuration. This will ensure that you can roll back changes quickly from a backup if a switch fails or gets totally messed up during a change. Believe me, it happens a lot and you'll be glad you have a backup when it hits you.

Troubleshooting SANs can be a non-issue when certain things are under control. Consider these best practices day to day to prevent a huge issue when something does go wrong.

If you have further SAN troubleshooting questions, check out our ITKnowledge Exchange and Ask the Expert sections.

About the author: Brian Peterson is an independent IT infrastructure analyst. He has a deep background in enterprise storage and open-systems computing platforms. He has consulted with hundreds of enterprise customers who struggled with the challenges of disaster recovery, scalability, technology refreshes and controlling costs.

Rate this Tip
To rate tips, you must be a member of SearchSMBStorage.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
SMB storage tips
Conducting a DR Test for SMBs
MAID and other energy-saving storage technologies for SMBs
Top questions to consider when consolidating SANs
How to secure mobile data on USB drives for SMBs
The pros and cons of thin provisioning
How SMBs can ensure storage security
How to establish a recovery time objective
RAID levels and application suitability, Part one
Seven steps for outsourcing data storage for SMBs
SMB capacity planning: Focusing on energy conservation

Small-midsized Business SAN
Top questions to consider when consolidating SANs
The pros and cons of thin provisioning
Choosing the right switch for an iSCSI SAN
How to boot from a SAN
Choosing a SAN-in-a-box
iSCSI vs. Fibre Channel: What is best choice for your SAN?
User forgoes EMC upgrade for StorMagic iSCSI SAN solution
ISCSI SAN FAQ podcast
Who are the major vendors in the iSCSI SAN market?
What are the benefits of iSCSI SAN for small-midsized businesses? Why not just use Fibre Channel?

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts