19 March 2010

Software Engineering Wishlist #1

This is just the beginning of a wishlist for my software engineering and development world. Things I would like to be able to do, technology trends etc. I expect this wishlist to be continuously expanding, so this is only part 1.

I want to be able to...
...check my continuous integration status and logs from FaceBook.
...get my continuous integration results via SMS.
...update Bugzilla/Jira/whatever discussion threads via Twitter
...do some pair programming with something like Google Docs
...do some pair programming with a some kind of real-time plugin for Eclipse
...bring online and shut down continuous integration nodes with Google App Engine
...bring online and shut down full QA stacks with Amazon EC2
...manage user stories with my smartphone
...write actual code with my smartphone on the train and upload/synchronise it later
...use my smartphone as a code repository for small projects
...use my smartphone's voice recognition capabilities to actually dictate code to it
...perform code reviews with some kind of real-time plugin for Bugzilla/Jira/whatever

More to come.
M.

15 March 2010

RAIDers of the Lost Disk

A few years ago I decided it was a good idea to have a dedicated file server in my home. After a bit of looking around, I set my mind on a Maxtor Shared Storage II - 1TB. This has 2 drives of 500GB each inside, and it can be set up as a Raid-0 or Raid-1 device. It is configured via a simple web interface.
I bought one and configured it as a Raid-1 device. After a short while, I also decided to update the firmware with a version based on OpenMSS.

Shortly after the warranty expired, one of the drives failed badly. The clicking that was coming out of it was pretty loud but in a twisted way also quite pleasant, somehow clicking along with Bob Marley's "Redemption Songs". Anyway, I managed to replace the faulty drive and rebuild the array, and my file server has been living happily ever since... until yesterday.

It was either a power failure or a loose PSU connector, or both. As a result, the power light started flashing alternatively green (once) and amber (once). I went to the diagnostics page only to find that my device was functioning "within normal parameters". Hmmm... that can't be right.

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 11:18:29 2009
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515034

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       0        0        -      removed

What??? Removed??? How???
~ # mdadm --examine /dev/sda6
mdadm: cannot open /dev/sda6: No such file or directory
mdadm: cannot find device size for /dev/sda6: No such file or directory

Hmmm...
~ # ls /dev/sd*
/dev/sda   /dev/sda3  /dev/sda6  /dev/sdb1  /dev/sdb4  /dev/sdb7
/dev/sda1  /dev/sda4  /dev/sda7  /dev/sdb2  /dev/sdb5
/dev/sda2  /dev/sda5  /dev/sdb   /dev/sdb3  /dev/sdb6

~ # cat /proc/partitions
major minor  #blocks  name

   8    16  488386584 sdb
   8    17     257008 sdb1
   8    18     257040 sdb2
   8    19     257040 sdb3
   8    20          1 sdb4
   8    21     506016 sdb5
   8    22  487106833 sdb6
   8     0  488386584 sdc
   8     1     257008 sdc1
   8     2     257040 sdc2
   8     3     257040 sdc3
   8     4          1 sdc4
   8     5     506016 sdc5
   8     6  487106833 sdc6
  31     0        256 mtdblock0
   9     0  487106752 md0

How exactly did my sda partitions become sdc? Reboot? Yes, reboot!
... [reboot] ...
~ # cat /proc/partitions
major minor  #blocks  name

   8     0  488386584 sda
   8     1     257008 sda1
   8     2     257040 sda2
   8     3     257040 sda3
   8     4          1 sda4
   8     5     506016 sda5
   8     6  487106833 sda6
   8    16  488386584 sdb
   8    17     257008 sdb1
   8    18     257040 sdb2
   8    19     257040 sdb3
   8    20          1 sdb4
   8    21     506016 sdb5
   8    22  487106833 sdb6
  31     0        256 mtdblock0
   9     0  487106752 md0

That's better, but how... ??? Anyway, let's check sda6.
~ # mdadm --query /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 2 device mismatch raid1 md0.  Use mdadm --examine for more detail.

~ # mdadm --examine /dev/sda6
/dev/sda6:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Fri May  1 20:10:03 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 34c79134 - correct
         Events : 0.513042


      Number   Major   Minor   RaidDevice State
this     1       8        6        1      active sync   /dev/sda6

   0     0       8       22        0      active sync   /dev/sdb6
   1     1       8        6        1      active sync   /dev/sda6

Mismatched, as I would expect, but it's clean. Good.
~ # mdadm --manage --add /dev/md0 /dev/sda6
mdadm: hot added /dev/sda6

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 11:22:02 2009
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515210

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       0        0        -      removed

       2       8        6        1      spare rebuilding   /dev/sda6

Rebuilding. Good sign, but why do I stil have device 1 - removed - in the list?
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[2] sdb6[0]
      487106752 blocks [2/1] [U_]
      [=>...................]  recovery =  9.8% (47870464/487106752) finish=114.8min speed=63713K/sec
unused devices: none

Under 2 hours to sync up. Time for coffee.
... [coffee] ...
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[1] sdb6[0]
      487106752 blocks [2/2] [UU]
unused devices: none

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 14:05:25 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515939

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       8        6        1      active sync   /dev/sda6

One last reboot and we're back on track.
Sorted.