A few years ago I decided it was a good idea to have a dedicated file server in my home. After a bit of looking around, I set my mind on a
Maxtor Shared Storage II - 1TB. This has 2 drives of 500GB each inside, and it can be set up as a Raid-0 or Raid-1 device. It is configured via a simple web interface.
I bought one and configured it as a Raid-1 device. After a short while, I also decided to update the firmware with a version based on
OpenMSS.
Shortly after the warranty expired, one of the drives failed badly. The clicking that was coming out of it was pretty loud but in a twisted way also quite pleasant, somehow clicking along with Bob Marley's "Redemption Songs". Anyway, I managed to replace the faulty drive and rebuild the array, and my file server has been living happily ever since... until yesterday.
It was either a power failure or a loose PSU connector, or both. As a result, the power light started flashing alternatively green (once) and amber (once). I went to the diagnostics page only to find that my device was functioning "within normal parameters". Hmmm... that can't be right.
~ # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Array Size : 487106752 (464.54 GiB 498.80 GB)
Device Size : 487106752 (464.54 GiB 498.80 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue May 5 11:18:29 2009
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Events : 0.515034
Number Major Minor RaidDevice State
0 8 22 0 active sync /dev/sdb6
1 0 0 - removed
What??? Removed??? How???
~ # mdadm --examine /dev/sda6
mdadm: cannot open /dev/sda6: No such file or directory
mdadm: cannot find device size for /dev/sda6: No such file or directory
Hmmm...
~ # ls /dev/sd*
/dev/sda /dev/sda3 /dev/sda6 /dev/sdb1 /dev/sdb4 /dev/sdb7
/dev/sda1 /dev/sda4 /dev/sda7 /dev/sdb2 /dev/sdb5
/dev/sda2 /dev/sda5 /dev/sdb /dev/sdb3 /dev/sdb6
~ # cat /proc/partitions
major minor #blocks name
8 16 488386584 sdb
8 17 257008 sdb1
8 18 257040 sdb2
8 19 257040 sdb3
8 20 1 sdb4
8 21 506016 sdb5
8 22 487106833 sdb6
8 0 488386584 sdc
8 1 257008 sdc1
8 2 257040 sdc2
8 3 257040 sdc3
8 4 1 sdc4
8 5 506016 sdc5
8 6 487106833 sdc6
31 0 256 mtdblock0
9 0 487106752 md0
How exactly did my sd
a partitions become sd
c? Reboot? Yes, reboot!
... [reboot] ...
~ # cat /proc/partitions
major minor #blocks name
8 0 488386584 sda
8 1 257008 sda1
8 2 257040 sda2
8 3 257040 sda3
8 4 1 sda4
8 5 506016 sda5
8 6 487106833 sda6
8 16 488386584 sdb
8 17 257008 sdb1
8 18 257040 sdb2
8 19 257040 sdb3
8 20 1 sdb4
8 21 506016 sdb5
8 22 487106833 sdb6
31 0 256 mtdblock0
9 0 487106752 md0
That's better, but how... ??? Anyway, let's check sda6.
~ # mdadm --query /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 2 device mismatch raid1 md0. Use mdadm --examine for more detail.
~ # mdadm --examine /dev/sda6
/dev/sda6:
Magic : a92b4efc
Version : 00.90.01
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Fri May 1 20:10:03 2009
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 34c79134 - correct
Events : 0.513042
Number Major Minor RaidDevice State
this 1 8 6 1 active sync /dev/sda6
0 0 8 22 0 active sync /dev/sdb6
1 1 8 6 1 active sync /dev/sda6
Mismatched, as I would expect, but it's clean. Good.
~ # mdadm --manage --add /dev/md0 /dev/sda6
mdadm: hot added /dev/sda6
~ # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Array Size : 487106752 (464.54 GiB 498.80 GB)
Device Size : 487106752 (464.54 GiB 498.80 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue May 5 11:22:02 2009
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 0% complete
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Events : 0.515210
Number Major Minor RaidDevice State
0 8 22 0 active sync /dev/sdb6
1 0 0 - removed
2 8 6 1 spare rebuilding /dev/sda6
Rebuilding. Good sign, but why do I stil have device 1 - removed - in the list?
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[2] sdb6[0]
487106752 blocks [2/1] [U_]
[=>...................] recovery = 9.8% (47870464/487106752) finish=114.8min speed=63713K/sec
unused devices: none
Under 2 hours to sync up. Time for coffee.
... [coffee] ...
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[1] sdb6[0]
487106752 blocks [2/2] [UU]
unused devices: none
~ # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Array Size : 487106752 (464.54 GiB 498.80 GB)
Device Size : 487106752 (464.54 GiB 498.80 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue May 5 14:05:25 2009
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Events : 0.515939
Number Major Minor RaidDevice State
0 8 22 0 active sync /dev/sdb6
1 8 6 1 active sync /dev/sda6
One last reboot and we're back on track.
Sorted.