Home » General » mdadm replace failed hard drive RAID1

mdadm replace failed hard drive RAID1

Got a few failed hard drives in software RAID1 and decided to write this article not to search for the procedure the next time this happens.

Detect failed hard drive

If you have a lot of error messages in your /var/log/messages and probably get a mail from mdadm monitorring

This is an automatically generated mail message from mdadm running on <host>

A DegradedArray event had been detected on md device /dev/md3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] 
md3 : active raid1 sdb4[1]
 1847608639 blocks super 1.2 [2/1] [_U]
 
md2 : active raid1 sdb3[1]
 1073740664 blocks super 1.2 [2/1] [_U]
 
md1 : active raid1 sdb2[1]
 524276 blocks super 1.2 [2/1] [_U]
 
md0 : active (auto-read-only) raid1 sdb1[1]
 8387572 blocks super 1.2 [2/1] [_U]
 
unused devices: <none>

The main thing when you cat /proc/mdadm is that you will see [_U] or [U_] depending witch hard drive has failed if everything is OK you will see [UU]

Removing the failed hard drive

In my case i had /dev/sda failed so i had to mark /dev/sda1, /dev/sda2, /dev/sda3 and /dev/sda4 as failed and remove them from their respective RAID arrays

mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1

mdadm --manage /dev/md1 --fail /dev/sda2
mdadm --manage /dev/md1 --remove /dev/sda2

mdadm --manage /dev/md2 --fail /dev/sda3
mdadm --manage /dev/md2 --remove /dev/sda3

mdadm --manage /dev/md3 --fail /dev/sda4
mdadm --manage /dev/md3 --remove /dev/sda4

Now shutdown the server and replace the hard drive

shutdown -h now

Add the new hard drive

After replacing the failed /dev/sda disk boot the system and copy the partition table to match the old /dev/sdb drive witch has the data.

The simple command is:

sfdisk -d /dev/sdb | sfdisk /dev/sda

Them use fdisk -l to check it.

If you have message like this: WARNING: GPT (GUID Partition Table) detected on ‘/dev/sdb’! The util fdisk doesn’t support GPT. Use GNU Parted. you need to install gdisk

apt-get install gdisk

And now copy the partition table from disk /dev/sdb to /dev/sda

sgdisk -R /dev/sda /dev/sdb
sgdisk -G /dev/sda

You should get The operation has completed successfully. as output on both commands.

Now add the new partitions to the RAID1 Array:

mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md2 --add /dev/sda3
mdadm --manage /dev/md3 --add /dev/sda4

You can monitor the process with cat /proc/mdadm

cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[2] sdb4[1]
 1847608639 blocks super 1.2 [2/1] [_U]
 [>....................] recovery = 0.0% (1575168/1847608639) finish=1119.2min speed=27488K/sec

md2 : active raid1 sda3[2] sdb3[1]
 1073740664 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

md1 : active raid1 sda2[2] sdb2[1]
 524276 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

md0 : active raid1 sda1[2] sdb1[1]
 8387572 blocks super 1.2 [2/1] [_U]
 resync=DELAYED

unused devices: <none>

After a fer hours you should have [UU] at the and of each Array

 cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sda4[2] sdb4[1]
 1847608639 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda3[2] sdb3[1]
 1073740664 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[2] sdb2[1]
 524276 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[2] sdb1[1]
 8387572 blocks super 1.2 [2/2] [UU]

unused devices: <none>

I hope this article will help you not to lose your data.

mdadmhttps://en.wikipedia.org/wiki/Mdadm

About Nikola Stojanoski

System Administrator and Developer. Giving back to the community by blogging about my problems, solutions and practical howto's.