Tuesday, September 29, 2009

The Hidden Costs of Unreliable Disk Drives

The Dirty Little Secret of Managing RAID Storage
*By Jerome Wendt, President and Lead Analyst at DCIG Inc.

Organizations are storing more data than ever on disk. Archives, backups, DR and video surveillance data along with unstructured data are largely contributing to this explosive growth. Yet one of the dirty little secrets of managing the large disk farms needed to store all of this data is managing the replacement of failed SATA hard disk drives (HDDs). While current RAID technologies do an adequate job of protecting from data loss in most of these environments, when a SATA HDD fails, it still does require someone to replace it.

Replacing failed SATA HDDs may be no big deal in smaller environments. But when you start to consider how potentially unreliable some SATA HDDs are and the time involved with managing their replacement in large disk farms, the process becomes much more complicated. Here are just some of the steps that I had to follow when I worked at a Fortune 500 data center and had to replace a failed HDD (SATA or otherwise):

• Open a trouble ticket in my organization's change control system
• Open a trouble ticket with the vendor to replace the disk drive
• Determine the urgency of replacing the failed disk drive.
• Schedule a time for the HDD replacement.
• Notify the affected application, server, change control and security teams.
• Verify the new drive was successfully installed and close out the open trouble tickets.

While not every organization has to go through all of these steps to replace failed SATA HDDs, regularly replacing failed HDDs becomes a cost and a risk to any company.

Organizations obviously do not want to lose data on their SATA storage systems, but they also do not want to dedicate a full time person to manage the task of replacing failed HDDs. It makes sense to think more about this issue ahead of time and to buy storage systems that mitigate the problem.

Here are some features that organizations should look for in SATA storage systems to ensure high reliability of the SATA HDDs:

• Manufacturers that have a history (5+ years) of working with SATA.
• Manufacturers that only use enterprise SATA HDDs.
• Manufacturers that stress test the HDDs before deploying them in the system.
• Manufacturer can manage HDDs when they are spun down.

Once organizations know about some of these finer points that SATA storage system manufacturers take (or do not take) to ensure the reliability of the SATA HDDs within their systems, it becomes easier to justify choosing one over another for these types of hardware benefits. For instance, Nexsan Technologies is a prime example of an organization that has a long history of working with SATA HDDs (10+ years) and has taken all of these steps and more to ensure the reliability of SATA HDDs on its many products which include SATABoy and SATABeast.

Most organizations say that when they are contemplating the use of SATA HDDs that their primary concern is reliability. In truth, most are more initially concerned about the protection and recoverability of their data which is a fear most SATA storage system manufacturers address through the use of RAID. But RAID only addresses concerns about data reliability, not hardware reliability, and as customers can find out after the fact, reliable SATA HDDs have a value that organizations may only appreciate and understand after they purchase an unreliable storage system.

*Jerome Wendt, President and Lead Analyst at DCIG Inc., writes extensively about data storage, including white papers, product analysis, and blogging at www.dciginc.com. SANDirect thanks him for his contribution today!

1 comment:

  1. Nexsan really is a great choice--in ten years working with them, we've never known one of their disks to fail. There are a few other manufacturers that pass the test, too. All of the manufacturers offered at SANDirect are carefully selected to these and additional exacting standards.