mdadm replace failed hard drive RAID1

Got a few failed hard drives in software RAID1 and decided to write this article not to search for the procedure the next time this happens.

[divider]

Detect failed hard drive

If you have a lot of error messages in your /var/log/messages and probably get a mail from mdadm monitorring

This is an automatically generated mail message from mdadm running on <host>

A DegradedArray event had been detected on md device /dev/md3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] 
md3 : active raid1 sdb4[1]
 1847608639 blocks super 1.2 [2/1] [_U]
 
md2 : active raid1 sdb3[1]
 1073740664 blocks super 1.2 [2/1] [_U]
 
md1 : active raid1 sdb2[1]
 524276 blocks super 1.2 [2/1] [_U]
 
md0 : active (auto-read-only) raid1 sdb1[1]
 8387572 blocks super 1.2 [2/1] [_U]
 
unused devices: <none>

The main thing when you cat /proc/mdadm is that you will see [_U] or [U_] depending witch hard drive has failed if everything is OK you will see [UU]

[divider]

Removing the failed hard drive

In my case i had /dev/sda failed so i had to mark /dev/sda1, /dev/sda2, /dev/sda3 and /dev/sda4 as failed and remove them from their respective RAID arrays

mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1

mdadm --manage /dev/md1 --fail /dev/sda2
mdadm --manage /dev/md1 --remove /dev/sda2

mdadm --manage /dev/md2 --fail /dev/sda3
mdadm --manage /dev/md2 --remove /dev/sda3

mdadm --manage /dev/md3 --fail /dev/sda4
mdadm --manage /dev/md3 --remove /dev/sda4

Now shutdown the server and replace the hard drive

shutdown -h now

[divider]

Add the new hard drive

After replacing the failed /dev/sda disk boot the system and copy the partition table to match the old /dev/sdb drive witch has the data.

The simple command is:

sfdisk -d /dev/sdb | sfdisk /dev/sda

Them use fdisk -l to check it.

If you have message like this: WARNING: GPT (GUID Partition Table) detected on ‘/dev/sdb’! The util fdisk doesn’t support GPT. Use GNU Parted. you need to install gdisk

apt-get install gdisk

And now copy the partition table from disk /dev/sdb to /dev/sda

sgdisk -R /dev/sda /dev/sdb
sgdisk -G /dev/sda

You should get The operation has completed successfully. as output on both commands.

Now add the new partitions to the RAID1 Array:

mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md2 --add /dev/sda3
mdadm --manage /dev/md3 --add /dev/sda4

You can monitor the process with cat /proc/mdadm

cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[2] sdb4[1]
 1847608639 blocks super 1.2 [2/1] [_U]
 [>....................] recovery = 0.0% (1575168/1847608639) finish=1119.2min speed=27488K/sec

md2 : active raid1 sda3[2] sdb3[1]
 1073740664 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

md1 : active raid1 sda2[2] sdb2[1]
 524276 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

md0 : active raid1 sda1[2] sdb1[1]
 8387572 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

unused devices: <none>

After a fer hours you should have [UU] at the and of each Array

 cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[2] sdb4[1]
 1847608639 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda3[2] sdb3[1]
 1073740664 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[2] sdb2[1]
 524276 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[2] sdb1[1]
 8387572 blocks super 1.2 [2/2] [UU]

unused devices: <none>

[divider]

I hope this article will help you not to lose your data.

mdadmhttps://en.wikipedia.org/wiki/Mdadm

Nikola Stojanoski

System Administrator and Developer. Giving back to the community by blogging about my problems, solutions and practical howto's.