The basic idea behind RAID is to combine multiple small, inexpensive disk drives into an array of disk drives which yields performance exceeding that of one large and expensive drive. Additionally, this array of drives will appear to the computer as a single logical storage unit or drive.
It is a method whereby information is spread across several disks, using techniques such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) to achieve redundancy, lower latency and/or higher bandwidth for reading and/or writing, and recoverability from hard-disk crashes.
Fundamental to RAID is "striping," a method of combining the space on multiple hard drives into a single logical drive for the operating system. Striping involves breaking down the total space on each drive into small "chunks." These "chunks" can be as small as 4k or as large as several megabytes (although testing shows that a 32k or 64k "chunk" size is often optimal). These chunks are then interleaved on the constituent disks to create a "stripe." For example, the first "chunk" on each hard disk would be combined into a single "stripe," the second "chunk" on each into another, and so on. In this way, the total size of the logical drive is the size of all the consituent drives added together.
There are two possible approaches to RAID: Hardware RAID and Software RAID.
The hardware based system manages the RAID subsystem independently from the host and presents to the host only a single disk per RAID array.
An example of a hardware RAID device would be one that connects to a SCSI controller and presents the RAID arrays as a single SCSI drive. An external RAID box moves all RAID handling "intelligence" into a controller that is sitting in the external disk subsystem. The whole subsystem is connected to the host via a normal SCSI controller and appears to the host as a single disk.
RAID controllers also come in the form of cards that act like a SCSI controller to the operating system, but handle all of the actual drive communications themselves. In these cases, you plug the drives into the RAID controller just like you would a SCSI controller, but then you add them to the RAID controller's configuration and the operating system never knows the difference.
Software RAID implements the various RAID levels in the kernel disk (block device) code. It also offers the cheapest possible solution: not only are expensive disk controller cards or hot-swap chassis not required, but software RAID works with cheaper IDE disks as well as SCSI disks. With today's fast CPUs, software RAID performance excels against hardware RAID.
The MD driver in the Linux kernel is an example of a RAID solution that is completely hardware independent. The performance of a software-based array is very dependent on the server CPU performance and load.
Speed
Increased storage capacity
Increased efficiency in recovering from a disk failure
RAID has been rewritten as part of the 2.2 kernel and many changes have been made.
Because there are so many changes, it is difficult to list them all. Briefly, some changes are:
Threaded rebuild process
Fully kernel-based configuration
Arrays can be moved between Linux boxes without reconstruction
Array reconstruction is backgrounded using idle system resources
Hot-swappable drive support
Automatic CPU detection to take advantage of certain CPU optimizations
Of the more notable changes to RAID is the addition of levels 0, 1, 4, 5 and linear support. These RAID types act as follows:
Level 0 -- RAID level 0, often called "striping," is a performance- oriented striped data mapping technique. That means the data being written to the array is broken down into strips and striped across the member disks of the array. This allows high I/O performance at low inherent cost but provides no redundancy.
Level 1 -- RAID level 1, or "mirroring," has been used longer than any other form of RAID. Level 1 provides redundancy by writing identical data to each member disk of the array, leaving a "mirrored" copy on each disk. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks that may use parallel access for high data-transfer rates when reading, but more commonly operate independently to provide high I/O transaction rates. Level 1 provides very good data reliability and improves performance for read-intensive applications but at relatively high cost.
Level 4 -- Level 4 uses parity concentrated on a single disk drive to protect data. It's better suited to transaction I/O rather than large file transfers. Because the dedicated parity disk represents an inherent bottleneck, level 4 is seldom used without accompanying technologies such as write back caching.
Level 5 -- The most common type of RAID. By distributing parity across some or all of an array's member disk drives, RAID level 5 eliminates the write bottleneck inherent to level 4. The only bottleneck it has is the parity calculation process. With modern CPUs and software RAID, that isn't even a very big bottleneck. As with level 4, the result is asymmetrical performance, with reads substantially outperforming writes. Level 5 is often used with caching to reduce the asymmetry.