How to replace a failed hard drive in a RAID array

When a hard drive fails in a RAID array and needs to be replaced, you only have one chance to correctly identify which drive has failed and remove it.  If you make a mistake and remove a drive that is considered good, the RAID array will fail and all data will be lost.


Before replacing the hard drive, make sure you have a good full, bear-metal backup of everything on that RAID array.

Ways to determine which drive has failed:

  1. Look for a red light on the hard drive
  2. Some RAID controllers have the ability to flash the light of the failed hard drive.  This would be from inside the RAID firmware or management software
  3. Gather information from inside the RAID firmware as to which drive has failed.  I.e. hard drive model number, serial number, or port #.  Note, the port # may not be the same has the hard drive slot in the server case.  The port # refers to the connector on the cable coming off the RAID card.

For methods 1 and 2, you can swap the hard drive with a new hard drive while the server is powered on.  For method 3, you will need to power-off the server before locating / verifying which hard drive has failed and replacing it with a new drive.  If you do not power-off the server (even if you are only in the RAID firmware) the RAID array will be marked as failed and all data might be lost.

Hard drive replacements should be done near the end of the business day as server performance will be greatly effected while the RAID array rebuilds.