Hardware
Raiting:
24

Why RAID-5 must die?


image

Recently, in the global computer press has begun to appear quite a few articles on the topic "Why RAID-5 is bad” (here are some examples: one, two, and others).

We will try to explain why RAID-5 has worked so far, but now it suddenly stopped.

The capacity of hard drives has been growing without any specific tendency to a stoppage over the past few years. In addition, the capacity of the drives almost doubles every year, but their data transfer rate does not increase much over the same period. Yes, indeed, the drives get interfaces, such as SATA and SATA-II, and SATA-III is on its way, but there is a question “would these drives work faster?” or they just get new interfaces with the numbers of theoretical characteristics.
The usage of hard drives tells us that the capacity grows and the speed does not.
.When RAID-5 was released in 1987, a classic hard drive had the capacity of 21MB and the speed of 3600 RPM. Today, a typical SATA drive has 1TB, namely its capacity was increased in 50 thousand times, but the speed was only doubled.
If the data transfer rate would grow at the same rate as the capacity over the years, then today's drives would have the speed about 30 gigabytes per second.
.
Let’s talk about RAID and its implementation - RAID-5.
RAID - Redundant Array of Independent Disks is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called "RAID levels", depending on what level of redundancy and performance is required.
Among the many described in theory types of RAID, there are typically three forms of RAID used for desktop computer systems: RAID-0 (block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. Hence simple stripe sets are normally referred to as RAID 0. RAID-5 (block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; the array is not destroyed by a single drive failure. RAID- 1 (mirroring without parity or striping), data is written identically to multiple drives, thereby producing a "mirrored set". RAID-1 is almost never used alone, because its speed is limited, so in the high-performance arrays it is used in combination with RAID-0. Typically, such a combination is called a RAID-0 +1 or RAID-10.

RAID-10 is good almost in everything. It is reliable with good performance, except for the fact that it takes 50% of total capacity of drives. Namely that fact often causes users of servers and storage systems to choose as an alternative RAID-5.

Indeed, RAID-5 capacity is equal to (n-1)*hddsize, where n is the number of drives, and hddsize is their size.
The data is spread all over the drives included in the RAID group, their blocks are added by service information, which allows recovering the lost data in the size of any one drive, and this service information does not take some specified drive, but it simply takes a part of the size of this group equal to just one drive capacity. But it is also spread all over the drives.

When there is a failure (complete or partial) of one of the drives of RAID-5 group, then RAID-group becomes degraded, but the data is available, as the missing part of it could be restored at the expense of redundant information of that "additional volume in the size of one drive”. If a new drive will be inserted instead of the failed, the smart RAID-controller will start rebuilding process and fill the new one with needed data.

Here is an overview of the RAID-5 rebuilding process: In a redundant array of independent disks (RAID) configuration, data are stored in arrays of drives to provide fault tolerance and improved data access performance. In a RAID 5 array configuration, the user data and parity data (encoded redundant information) are distributed across all the drives in the array (data striping). By striping the user data and distributing the parity data across all drives in the array, optimum performance is achieved by preventing the slowdown (bottleneck) caused by constant hits on a single drive.
If a drive fails in a RAID 5 array configuration, the data can be reconstructed (or rebuilt) from the parity data on the remaining drives. If the array is configured with an online spare drive, the automatic data recovery process (or rebuild process) begins immediately when a failed drive is detected; otherwise, the rebuild process begins when the failed drive is replaced. To rebuild lost data on a failed drive, each lost segment is read from the remaining drives in the array (where “N” is the total number of drives in the array, “N-1” is the remaining drives). The segment data is restored through exclusive-OR (XOR) operations that occur in the array controller XOR engine.
After the XOR engine restores the lost segment, the restored segment data is written to the replacement or online spare drive. The rebuild process involves (N-1) reads (R) from the operational drives in the array and a single write (W) to the replacement or online spare drive.
When a segment is fully restored, the rebuild process proceeds to restore the next lost segment.
During the rebuild process, the array remains accessible to users; however, performance of data access is degraded. An array with a failed drive operates in “degraded mode.” During the rebuild process, the array operates in “rebuild mode.” If more than one drive fails at any given time, or any other drive fails during the rebuild process, the array becomes inaccessible.
Upon completion of the rebuild process, the rebuilt drive contains the data it would have contained had the original drive never failed. In configurations using an online spare drive, the status of the online spare configuration is restored when the failed drive is replaced. After the failed drive is replaced, the content of the online spare drive will be copied to the replaced drive. After the completion of disk copy, the online spare configuration is restored.

RAID-5 rebuilding process could be very long. This duration depends on many factors. We can easily find the stories on internet, when relatively small 4-6 drives RAID-5 restored data from 500GB disks to the new drive in a day or so.

image

Source: Adaptec
“A RAID 5 array with five 500 GB SATA drives took approximately 24 hours to rebuild.” Source:
“The testing used a 3.5TB array composed of 16 250GB SATA disks configured as RAID 5 ... 3ware took ... over a day to repair a RAID 5 array when under a file server workload.” Source:
“I'm now at 80% of rebuilding my RAID-5 array with 3x 1TB harddrives, I've calculated that the total time needed to rebuild the array will be 66 hours!” Source:
“On my filer I run a software raid 5 across eight 500 GB sata drives, which works great ... Recovery time is about 20 hours. Athlon X2 4200 + and nvidia chipset.” Source:

These figures can be easily multiplied by 2-4 times using the 1TB and 2TB drives!

That's what vendors say about the reliability of disks.
(Here is a summary table of the main series of drives)

Today, almost all manufacturers have hard drives of two main classes.
This is so-called Desktop-drives for desktop systems, and Enterprise drives, designed for servers and other critical cases. In addition, Enterprise-class drives are also divided into SATA disks (rotational speed 7200RPM) and SAS or FC (with a speed of 10K and 15K RPM).

The reliability of data transfer process is usually measured by a parameter BER - Bit Error Rate (Ratio). This is the probability of failure at the certain volume of bits that is read by disk heads.
As a rule, the manufacturers of Desktop-class drives have specified BER value equal to 10 ^ 14 degrees, gradually for all large drives, especially the new series indicate the value of reliability in 10 ^ 15. This number means that the manufacturer predicts the probability of failure in reading. One with 14 zeros is one hundred thousand billion bits.
It would seem that this number is overwhelming. But is it really that great?

Simple math of the level calc.exe tells us that 10 ^ 14 bits is only about 11TB of data. This means that the hard drive manufacturer tells us in such a way that when data is read from a disc with a parameter BER 10 ^ 14 (regular desktop-class drive), it is approximately 11TB, according to the manufacturer, we get somewhere failed bit for sure.
Failed reading bit means failed block of 512 bytes.
Is 11TB that much?

It does not mean that has to be read exactly 11TB, and BER is the only probability that tends to 100% of the 11th terabyte. It proportionally decreases at small volumes.
The drives with BER equal to 10 ^ 15 have probability of error in 10 times better. As we remember, drives’ capacity is doubled with each new generation, which is about 1.5 or 2 years, as well there increases the RAID capacity, and BER10 ^ 15 for SATA has been reached only in the last year and a half.

For example, failure for a 6-disk RAID-5 with disks of 1TB due to BER is estimated at 4-5%, and for the 4TB disks, it will be up to 16-20%.

image

Source: Hitachi Data Systems: Why growing business need RAID-6.
This cold figure means that with probability of 16 - 20% we will get a disk failure during rebuild (and thus lose all data on the RAID). After all, in order to rebuild RAID-controller will have to read all the disks included in a RAID-group.

But even that's not all.
Practice shows that approximately 70-80% of the data stored on disks is so-called cold data. These are the files to which access is relatively rare. With the increase of the total capacity of disks their volume is also increasing. The huge volume of data is often untouched by anyone, even antivirus (why would it check GB rips and mp3?) for months and possibly years.

Data error will be found in an array of cold data only in the process of complete reading (rebuilding process) of the disk contents.
Smart storage systems usually are constantly engaged in disk scrubbing, constantly reading and monitoring the performance of reading for the total disk. But We are sure that the low-cost home RAID-controller does not do it. Therefore, you will find out about bad block that is somewhere in the space of cold data at the time of rebuild process.

Here is some unpleasant truth that hides behind some scandalous articles "Why RAID-5 is bad".

Conclusion:

There are several disadvantages. RAID 5 results in the loss of storage capacity equivalent to the capacity of one hard drive from the volume. For example, three 500GB hard drives added together comprise 1500GB (or roughly about 1.5 terabytes) of storage. If the three (3) 500GB drives were established as a RAID 0 (striped) configuration, total data storage would equal 1500GB capacity. If these same three (3) drives are configured as a RAID 5 volume (striped with parity), the usable data storage capacity would be 1000GB and not 1500GB, since 500GB (the equivalent of one drives' capacity) would be utilized for parity. In addition, if two (2) or more drives fail or become corrupted at the same time, all data on the volume would be inaccessible to the user.

All of the above proves the necessity to stop using the RAID-5 as a fault-tolerant solution for storing the important and critical data.

Some advice:

1. Fully document your storage array configuration during initial setup, including the physical arrangement and order of connection for all component devices, especially the disk drives.
2. Maintain sequence of array members by tagging the physical disk drive units (not simply the tray the drive is mounted in) while everything is fully functional, and certainly, if trouble does arise, BEFORE any trouble-shooting begins.
3. Test the subsystem's ability to recover from a drive failure. With all data backed up, remove one of the drives from the subsystem while it's running, and bring it back to full, undegraded operation using a blank hard drive replacement.
4. Understand the fundamental concepts behind RAID-5 functioning, so that when any anomaly occurs remedies are always focused on preserving the data.
5. Understand the distinctions between RAID-5 and other common levels of implementation, i.e., RAID-0, RAID-1, and RAID-10; knowing what your subsystem is NOT will help you understand better what it really is.
6. Immediately replace a failed hard disk drive element upon fault detection; the replacement must be a pre-tested, known working device, fully conforming to subsystem designer specifications.
7. Fully understand what "fault tolerance" really means in terms applicable to your specific subsystem, and always take action without delay to protect its full, complete functioning.
8. Know that rebuilding a RAID reporting a corrupted file system will not repair it, but further corrupt your data, possibly making it unrecoverable.
9. As with all data storage, back up regularly!
Skull 23 january 2012, 15:55
Vote for this post
Bring it to the Main Page
 

Comments

Leave a Reply

B
I
U
S
Help
Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute