Marco Scata

02 April 2010

RAIDers of the Lost Disk (again)

Here we go again. The NAS is not happy.
Just out of curiosity, I see if there is a hacked firmware based on a more recent image, and I find one based on version 3.4.90, with SSH of course.
So here we go. Let's check the RAID device...

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri May 22 15:20:30 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.525126

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8        6        1      active sync   /dev/sda6

Again only one drive out of two.
Let's see what happened to /dev/sdb.

~ # /usr/sbin/smartctl -l selftest /dev/sdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log, version number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended off-line   Completed                     00%      7474         -
# 2  Off-line            Interrupted (host reset)      50%      7466         -
# 3  Off-line            Interrupted (host reset)      50%      7379         -
# 4  Short off-line      Completed: read failure       50%      7334         0x00032141
# 5  Off-line            Interrupted (host reset)      00%      7334         -
# 6  Short off-line      Completed                     00%      7330         -
# 7  Off-line            Interrupted (host reset)      00%      7330         -
# 8  Off-line            Interrupted (host reset)      00%      5973         -
# 9  Off-line            Interrupted (host reset)      00%      5396         -
#10  Off-line            Interrupted (host reset)      00%      5393         -
#11  Off-line            Interrupted (host reset)      00%      5376         -
#12  Short off-line      Completed                     00%      4687         -
#13  Off-line            Interrupted (host reset)      00%      4687         -
#14  Off-line            Interrupted (host reset)      00%      4003         -
#15  Off-line            Interrupted (host reset)      00%      3819         -
#16  Short off-line      Completed                     00%      3659         -
#17  Short off-line      Completed                     00%      3659         -
#18  Short off-line      Completed                     00%      3655         -
#19  Off-line            Interrupted (host reset)      70%      3652         -
#20  Short off-line      Aborted by host               70%      3652         -
#21  Off-line            Interrupted (host reset)      00%      3651         -

Ouch! LBA_of_first_error = 0x32141 (= 205121 in base 10)
Let's check also the SMART attributes.

~ # /usr/sbin/smartctl -A /dev/sdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 32
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   168   162   063    Pre-fail  Always       -       18676
  4 Start_Stop_Count        0x0032   210   210   000    Old_age   Always       -       20884
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   247   243   187    Pre-fail  Always       -       41160
  9 Power_On_Hours          0x0032   232   232   000    Old_age   Always       -       7559
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       76
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Unknown_Attribute       0x0022   056   039   000    Old_age   Always       -       959119404
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   046   253   000    Old_age   Always       -       44
195 Hardware_ECC_Recovered  0x000a   252   210   000    Old_age   Always       -       37129
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 Unknown_Attribute       0x000a   253   252   000    Old_age   Always       -       0
203 Unknown_Attribute       0x000b   253   252   180    Pre-fail  Always       -       11
204 Unknown_Attribute       0x000a   253   252   000    Old_age   Always       -       0
205 Unknown_Attribute       0x000a   253   252   000    Old_age   Always       -       0
207 Unknown_Attribute       0x002a   253   252   000    Old_age   Always       -       0
208 Unknown_Attribute       0x002a   253   252   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
212 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

Not too bad after all, since
Current_Pending_Sector = 0
Offline_Uncorrectable = 0

Now let's find which partition has the problem.

~ # fsck.ext3 -nv /dev/sdb1
e2fsck 1.38 (30-Jun-2005)
/dev/sdb1: clean, 3045/64256 files, 24511/64252 blocks

~ # fsck.ext3 -nv /dev/sdb2
e2fsck 1.38 (30-Jun-2005)
Warning!  /dev/sdb2 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sdb2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? no

Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (39452, counted=39449).
Fix? no

Free inodes count wrong (61113, counted=61112).
Fix? no


/dev/sdb2: ********** WARNING: Filesystem still has errors **********


    2887 inodes used (4%)
      13 non-contiguous inodes (0.5%)
         # of inodes with ind/dind/tind blocks: 156/0/0
   24548 blocks used (38%)
       0 bad blocks
       1 large file

    2309 regular files
     175 directories
      47 character device files
      40 block device files
       0 fifos
       8 links
     308 symbolic links (308 fast symbolic links)
       0 sockets
--------
    2887 files

/dev/sdb3 is a swap partition, so we can skip that.
/dev/sdb4 is an extended partition, so we can skip that too.

~ # fsck.ext3 -nv /dev/sdb5
e2fsck 1.38 (30-Jun-2005)
Warning!  /dev/sdb5 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sdb5 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? no

Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (118269, counted=118286).
Fix? no

Free inodes count wrong (126523, counted=126537).
Fix? no


/dev/sdb5: ********** WARNING: Filesystem still has errors **********


      69 inodes used (0%)
       4 non-contiguous inodes (5.8%)
         # of inodes with ind/dind/tind blocks: 0/0/0
    8235 blocks used (6%)
       0 bad blocks
       1 large file

      32 regular files
      14 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
       0 symbolic links (0 fast symbolic links)
       0 sockets
--------
      46 files

~ # fsck.ext3 -nv /dev/sdb6
e2fsck 1.38 (30-Jun-2005)
/dev/sdb6: clean, 93575/60899328 files, 56875141/121776688 blocks

So the errors are on /dev/sdb2 and /dev/sdb5
Let's see where they mount to.

~ # mount | grep /sdb
/dev/sdb1 on /mnt/__mxo_sdb1 type ext3 (rw)

~ # cat /proc/mounts | grep /sdb
/dev/sdb5 /tmp ext3 rw 0 0

~ # cat /proc/cmdline
console=ttyS0,115200 root=/dev/sdb2 rw

Are we booting from /dev/sdb2?

~ # mxoparam -h

Maxtor mxoparam version 1.0
-a         show all maxtor params
-b         get wait for button status
-c [0-1]        set wait for button 0 = Off 1 = On
-d         show max number of drives
-e         enable watchdog in uboot
-f         disable watchdog in uboot
-g         set led solid green
-h         show help
-k         kick watchdog
-p         get boot partition
-q [part]  set boot partition
           0 = drive 0 partition 1
           1 = drive 0 partition 2
           2 = drive 1 partition 1
           3 = drive 1 partition 2
-r         reset partion fail count
-s         get serial number
-t [sn]    set serial number
-v         show version
-x         disable watchdog now
-w         enable watchdog now
-y         set led solid yellow

~ # mxoparam -p
Boot partition is 3

Looks like the system is booting from the second disk, second partition (/dev/sdb2)
This means we can't unmount it, and we need to unmount it before we can fix it.
Therefore, we need to make the system boot from /dev/sda2 otherwise we won't be able to fix /dev/sdb*

First of all, let's make sure /dev/sda2 is exactly the same as /dev/sdb2

~ # dd if=/dev/sdb2 of=/dev/sda2

~ # mount -n /dev/sda2 /mnt/__mxo_sda2 -t ext3

~ # cp -a /mnt/__mxo_sdb2 /mnt/__mxo_sda2


Now let's set the new boot partition





~ # mxoparam -q 1
REBOOT!
...
Let's check it's booting up from the right place now.





~ # cat /proc/cmdline
console=ttyS0,115200 root=/dev/sda2 rw
Right! We're ready to fix /dev/sdb2 and /dev/sdb5 now!





~ # fsck -v /dev/sdb2
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/sdb2 has gone 384 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdb2: ***** FILE SYSTEM WAS MODIFIED *****

    3157 inodes used (4%)
      14 non-contiguous inodes (0.4%)
         # of inodes with ind/dind/tind blocks: 159/0/0
   24622 blocks used (38%)
       0 bad blocks
       1 large file

    2317 regular files
     179 directories
     246 character device files
      84 block device files
       0 fifos
       7 links
     321 symbolic links (321 fast symbolic links)
       0 sockets
--------
    3154 files


~ # fsck -v /dev/sdb5
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/sdb5: recovering journal
/dev/sdb5 has been mounted 35 times without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdb5: ***** FILE SYSTEM WAS MODIFIED *****

    2939 inodes used (2%)
      28 non-contiguous inodes (1.0%)
         # of inodes with ind/dind/tind blocks: 159/0/0
   26690 blocks used (21%)
       0 bad blocks
       1 large file

    2345 regular files
     189 directories
      47 character device files
      40 block device files
       0 fifos
       8 links
     308 symbolic links (308 fast symbolic links)
       0 sockets
--------
    2937 files
Now we can rebuild the RAID array.







~ # mdadm --manage --add /dev/md0 /dev/sdb6
mdadm: hot added /dev/sdb6
NAS:~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat May 23 14:39:50 2009
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.541445

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8        6        1      active sync   /dev/sda6

       2       8       22        0      spare rebuilding   /dev/sdb6

~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sdb6[2] sda6[1]
      487106752 blocks [2/1] [_U]
      [>....................]  recovery =  1.5% (7396736/487106752) finish=145.1min speed=55092K/sec
unused devices: [none]
Good... 
2.5 hours later...






~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sdb6[0] sda6[1]
      487106752 blocks [2/2] [UU]

unused devices: [none]
Reboot again and we're done!

Maxtor Shared Storage II - LED Codes

I just want to write this for future reference, since the relevant link on the Seagate support website has already changed a couple of times and I don't want to keep wasting time looking for it next time it changes.

Back Panel LED Codes

LED	Definition	Status
Power LED (located in center of the power button)	Power Switch	Illuminated - Power On Not Illuminated - Power Off
Ethernet LED (located on bottom side of Ethernet connector)	Shows if the drive is connected through a 10/100 or a 1 Gb Ethernet connection. Left Green - 10/100 Mbps Ethernet connectivity Left Amber - 1 Gbps Ethernet connectivity	Illuminated - Power On Blinking - Network communication is occurring Not Illuminated - Power Off
Activity LED (located on top side of Ethernet connector)	A flashing Activity LED indicates that the network connection is functional and that packets are being transmitted or received.

Front Panel LED Codes

LED	Definition	Status
Top	Power Activity	Illuminated - Power On Blinking - Drive is either powering up or shutting down Not Illuminated - Power Off
Center	Hard Disk Activity	Illuminated - Power On Blinking - Data is being transferred to/from the drive Not Illuminated - Power Off
Bottom	Network Activity	Illuminated - Power On Blinking - Network communication is occurring Not Illuminated - Power Off

Front Panel LED Error Codes

Green LED - Number of Blinks	Amber LED - Number of Blinks	Status
1	4	/share file system error
1	3	Boot Error - Attempting to boot from disk 0
2	3	Boot Error - Attempting to boot from disk 1
1	2	HDD S.M.A.R.T. Error - Attempting to boot from disk 0
2	2	HDD S.M.A.R.T. Error - Attempting to boot from disk 1
1	1	RAID Error

19 March 2010

Software Engineering Wishlist #1

This is just the beginning of a wishlist for my software engineering and development world. Things I would like to be able to do, technology trends etc. I expect this wishlist to be continuously expanding, so this is only part 1.

I want to be able to...
...check my continuous integration status and logs from FaceBook.
...get my continuous integration results via SMS.
...update Bugzilla/Jira/whatever discussion threads via Twitter
...do some pair programming with something like Google Docs
...do some pair programming with a some kind of real-time plugin for Eclipse
...bring online and shut down continuous integration nodes with Google App Engine
...bring online and shut down full QA stacks with Amazon EC2
...manage user stories with my smartphone
...write actual code with my smartphone on the train and upload/synchronise it later
...use my smartphone as a code repository for small projects
...use my smartphone's voice recognition capabilities to actually dictate code to it
...perform code reviews with some kind of real-time plugin for Bugzilla/Jira/whatever

More to come.
M.

15 March 2010

RAIDers of the Lost Disk

A few years ago I decided it was a good idea to have a dedicated file server in my home. After a bit of looking around, I set my mind on a Maxtor Shared Storage II - 1TB. This has 2 drives of 500GB each inside, and it can be set up as a Raid-0 or Raid-1 device. It is configured via a simple web interface.
I bought one and configured it as a Raid-1 device. After a short while, I also decided to update the firmware with a version based on OpenMSS.

Shortly after the warranty expired, one of the drives failed badly. The clicking that was coming out of it was pretty loud but in a twisted way also quite pleasant, somehow clicking along with Bob Marley's "Redemption Songs". Anyway, I managed to replace the faulty drive and rebuild the array, and my file server has been living happily ever since... until yesterday.

It was either a power failure or a loose PSU connector, or both. As a result, the power light started flashing alternatively green (once) and amber (once). I went to the diagnostics page only to find that my device was functioning "within normal parameters". Hmmm... that can't be right.

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 11:18:29 2009
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515034

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       0        0        -      removed

What??? Removed??? How???

~ # mdadm --examine /dev/sda6
mdadm: cannot open /dev/sda6: No such file or directory
mdadm: cannot find device size for /dev/sda6: No such file or directory

Hmmm...

~ # ls /dev/sd*
/dev/sda   /dev/sda3  /dev/sda6  /dev/sdb1  /dev/sdb4  /dev/sdb7
/dev/sda1  /dev/sda4  /dev/sda7  /dev/sdb2  /dev/sdb5
/dev/sda2  /dev/sda5  /dev/sdb   /dev/sdb3  /dev/sdb6

~ # cat /proc/partitions
major minor  #blocks  name

   8    16  488386584 sdb
   8    17     257008 sdb1
   8    18     257040 sdb2
   8    19     257040 sdb3
   8    20          1 sdb4
   8    21     506016 sdb5
   8    22  487106833 sdb6
   8     0  488386584 sdc
   8     1     257008 sdc1
   8     2     257040 sdc2
   8     3     257040 sdc3
   8     4          1 sdc4
   8     5     506016 sdc5
   8     6  487106833 sdc6
  31     0        256 mtdblock0
   9     0  487106752 md0

How exactly did my sda partitions become sdc? Reboot? Yes, reboot!
... [reboot] ...

~ # cat /proc/partitions
major minor  #blocks  name

   8     0  488386584 sda
   8     1     257008 sda1
   8     2     257040 sda2
   8     3     257040 sda3
   8     4          1 sda4
   8     5     506016 sda5
   8     6  487106833 sda6
   8    16  488386584 sdb
   8    17     257008 sdb1
   8    18     257040 sdb2
   8    19     257040 sdb3
   8    20          1 sdb4
   8    21     506016 sdb5
   8    22  487106833 sdb6
  31     0        256 mtdblock0
   9     0  487106752 md0

That's better, but how... ??? Anyway, let's check sda6.

~ # mdadm --query /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 2 device mismatch raid1 md0.  Use mdadm --examine for more detail.

~ # mdadm --examine /dev/sda6
/dev/sda6:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Fri May  1 20:10:03 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 34c79134 - correct
         Events : 0.513042


      Number   Major   Minor   RaidDevice State
this     1       8        6        1      active sync   /dev/sda6

   0     0       8       22        0      active sync   /dev/sdb6
   1     1       8        6        1      active sync   /dev/sda6

Mismatched, as I would expect, but it's clean. Good.

~ # mdadm --manage --add /dev/md0 /dev/sda6
mdadm: hot added /dev/sda6

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 11:22:02 2009
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515210

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       0        0        -      removed

       2       8        6        1      spare rebuilding   /dev/sda6

Rebuilding. Good sign, but why do I stil have device 1 - removed - in the list?

~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[2] sdb6[0]
      487106752 blocks [2/1] [U_]
      [=>...................]  recovery =  9.8% (47870464/487106752) finish=114.8min speed=63713K/sec
unused devices: none

Under 2 hours to sync up. Time for coffee.
... [coffee] ...

~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda6[1] sdb6[0]
      487106752 blocks [2/2] [UU]
unused devices: none

~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Sat May  5 06:30:50 2007
     Raid Level : raid1
     Array Size : 487106752 (464.54 GiB 498.80 GB)
    Device Size : 487106752 (464.54 GiB 498.80 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue May  5 14:05:25 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
         Events : 0.515939

    Number   Major   Minor   RaidDevice State
       0       8       22        0      active sync   /dev/sdb6
       1       8        6        1      active sync   /dev/sda6

One last reboot and we're back on track.
Sorted.

18 February 2010

Eclipsed by an Ant

I've been stuck on a web project for a while. Well, not really on the project itself, I've actually been stuck on running it with the GlassFish Tools Bundle for Eclipse v1.2.

The bundle comes with two version of GlassFish: v2.1 and v3 Prelude. I never had problems running the project on v3, but every time I tried to use to v2.1 my Eclipse would get stuck during the publishing phase with this message: "Publishing to Bundled GlassFish, waiting for virtual machine to exit..."

It turns out that the difference in the publishing phase between the bundled GlassFish v2.1 and v3 is that the former uses Apache Ant. So I checked the Ant runtime home entries on Eclipse and found out they were messed up, pointing at the Axis2 plugin! I must have happened during the installation of the Axis2 Code Generator Wizard. Fixing the Ant home entries finally allowed me to publish the project to the bundled GlassFish v2.1

Here are the steps in Eclipse 3.4 (Ganymede):

Click on Windows -> Preferences
Expand the Ant node and select the Runtime child node
In the Classpath tab, select Ant Home Entries from the list
Click the Ant Home button to bring up the folder browser
In the folder browser, navigate to the folder where Ant was installed
Click OK to go back to the Eclipse preferences dialog
Click Apply, then click OK

Done.
M.

27 October 2009

Pythagoras On Screen

Euclid of Alexandria "invented" geometry. Before him, there was no squares or circles. Everything was just a collection of messy shapes. One even wonders how they managed to build houses without straight lines and triangles.

Ok, just kidding...

Pythagoras of Samos actually did know a thing or two about triangles before Euclid ever did, so much so that he discovered the mathematical relationship between the sides of right-angled triangles.
How are these two gentlemen related? Well, they lived a couple of centuries apart, so they are not directly related to each other as such, but in terms of mathematics they are very closely related: the pythagorean theorem only "works properly" in euclidean spaces.

What are euclidean spaces? They are those spaces that satisfy the axioms of Euclidean geometry. Most of us have studied Euclidean geometry at school: we learn that the shortest path between two points is the straight line, that the sum of the angles of a triangle is 180 degrees, and other wonderful rules.
A lot of these rules, however, start falling apart when we try to apply them to non-Euclidean geometries. In general, though, if we stick to flat 2-D surfaces (like a piece of paper), Euclid's rules will survive just fine.

Let's test this.

If we draw on a piece of paper a right-angled triangle, with one leg being 300mm long and the other being 400mm long, Pythagoras taught us that the hypotenuse should be 500mm long, and indeed it is. In general, the hypotenuse will always be longer than either of the two legs.
What else do we have, apart from a piece of paper resting on a desk, that is a flat 2-D surface? Well, there's the flat LCD screen on which I'm writing this article, of course. So if I draw a segment that is 300 pixels long, at a right angle with another segment that is 400 pixels long, then I should be able to join the two open ends of each segment with a third segment that is 500 pixels long, right?

Wrong.

Why? The reason is not that difficult to find: the computer screen is not a continuous surface: it's made out of "tiles" called pixels, and each pixel is either turned on or off. There is no such thing as a fractionary pixel, so you can't draw half a pixel, or two-third of a pixel: you either draw it entirely or you don't. In other words, the surface of my LCD screen is quantised. So when the two legs of my right-angled triangle on screen are 12 and 8 pixels long, the hypotenuse cannot possibly be 14.42 pixels. So how long is it? 14 pixels? 15 pixels? Does it round up by eccess or by defect?

Well... Neither.

It turns out that it depends on how you want to draw the line so that it appears continuous to the average person looking at the screen. I think two pixels can be considered contiguous if they touch each other one way or another, even if only by one corner. Sticking to this constraint, the minimum number of pixels required to draw the hypotenuse of the above triangle would be 12, which is the same number of pixels required to draw the longest leg of the triangle.

In my head, this means two very important things:

quantised surfaces are not Euclidean in nature, even though they may intuitively appear to be so
the concept of "distance" between two points on a quantised surface is very, very funny

10 September 2009

To Infinity and Beyond

Here's a thought...

Let's take two series, call them A and B.
Let's say each term in A is less than or equal to the corresponding term in B.
Logic would suggest that, if B is convergent, then A is convergent too.

So how about this...

Let B be the geometric series 1 + 2/3 + 4/9 + 8/27 + ... + (2/3)^N + ...
This series converges to 3.

Let A be the series 1 + 1/2 + 1/3 + 1/4 + ... + (1/N) + ...
This series is not convergent.

However, each term of A is less than or equal to the corresponding term in B, i.e.
1 <= 1 1/2 <= 2/3 1/3 <= 4/9 1/4 <= 8/27 ... 1/N <= (2/3)^N ...

If B is convergent, doesn't this mean that A should be convergent too?
Hmmmm...
M.

21 August 2009

Ushiro Irimi Nage

No, there's no such thing as ushiro irimi nage, as far as I know... or is there?

I was training as usual on Thursday, starting with one hour of aiki-jo (time to brush up the 13 jo kata), then off to tai jutsu. Without fail, we start with some tai no henko and morote dori kokkyu nage, and that's when I had the idea.

After years of kokkyu nage, I never cease to find inspiration from this technique. It's difficult to explain the technique in words, as most aikidokas will know, but I'll try, so here are the basic concepts (in my opinion) of the morote dori kokkyu nage basic form:

Engage kokkyu.
Slight step on the side, just enough to misalign the direction of attack.
Lower the centre while keeping an upright posture, to break a potential arm lock.
Turn and change hanmi while cutting upwards as if with aiki-ken.
Enter uke's space while extending kokkyu up and to the rear (this breaks uke's balance).
Rotate upper body (this starts the nage as uke starts falling backwards).
Settle extending kokkyu down and to the rear (this completes the nage).

...and there was my revelation: steps 5, 6, and 7 are also the steps that complete all irimi nage techniques (at least in my head they are). Here's what I see happening in the irimi nage basic form, independently of the attack:

Enter uke's space while extending kokkyu up and at the front (this breaks uke's balance).
Rotate upper body as in the fourth jo subury (this starts the nage as uke starts falling backwards).
Settle extending kokkyu downwards (this completes the nage).

There are technical differences, of course. For example, the nage in irimi nage is always performed to the front, and it uses aiki-jo movements instead of aiki-ken, but both techniques are about entering uke's space strongly, extending kokkyu through uke's space, rotating the upper body and completing the nage by settling the body weight and extending kokkyu downwards.

Now, every time I think of the basic form of kokkyu nage I think of "ushiro irimi nage", or "irimi nage to the rear", and that changes the whole perspective of this technique. Is it right? Is it wrong? It doesn't matter. Every so often in aikido I find something that turns all my understandings upside down, or even throw them out of the window to start again from scratch, and that's why I love it.
M.

31 July 2009

Lost in Reflection

The time for a new project had come. I was, as usual, excited about the idea of sowing the seed of a new technological creature and bear it for the initial gestation period, until it was mature enough for someone else to take over the incubation and seeing it through to begin its autonomous digital life.

This time it was about web services. We had already decided to write them in Java, and we had a rough idea of the interfaces we were going to expose, but the rest was "carte blanche". This was particularly exciting because, even though I had worked with web services on other projects, this was the first time I was actually going to design web services from the start, so there I was, rolling up my sleeves and reading the Axis2 reference pages.

The other exciting part was that someone else was going to develop an application that would consume my web services, and due to certain timing requirements in their development lifecycle they would have to do their work in parallel with mine. This introduced an interesting challenge: the interfaces would have to be pretty much rock-solid long before any actual web services code was written.

No problem, I thought. We already knew more or less what interfaces we were going to expose, so I can literally code them in Java, build the WSDL from the compiled interfaces and ship the WSDL to the developers on the "consumer project". This way, we can then work out the web services implementation and the consumer implementation independently.

First things first: the contract is not just about the information that a specific method takes as arguments and the information it will spit out as a response. There is a bit more to it. For example, we knew that in our implementation of the web services every method should have a set of "standard" arguments in its request, like the id of the application that originated the request, or the id of the specific language requested by the consumer (e.g. EN, DE, FR, etc). We also knew every method response must include some sort of status code and status message. Finally, we wanted to also include support for digitally signing both the request and the response.

Easy, I thought, we would obviously stick all of these in the relevant super-classes in our implementation, so I created the related super-interfaces in the package that I would use to create the WSDL.

Just for clarity, it worked out to be something like this:

// Base interface for all requests and responses
public interface WebServiceMessage {
    public String getSignature();
    public void setSignature(String signature);
}

// Base interface for all requests
public interface WebServiceRequest extends WebServiceMessage {
    public String getLanguageId();
    public void setLanguageId(String languageId);
}

// Base interface for all responses
public interface WebServiceResponse extends WebServiceMessage {
    public int getStatusCode();
    public String getStatusMsg();
}

// The request for the method doSomething() in the web service
public interface SomeRequest extends WebServiceRequest {
    public int getSomeArg();
    public void setSomeArg(int someArg);
}

// The response for the method doSomething() in the web service
public interface SomeResponse extends WebServiceResponse {
    public int getSomeResult();
    public void setSomeResult(int someResult);
}

// This is the actual web service
public interface MyWebService {
    public SomeResponse doSomething(SomeRequest request);
}

Just to prove the concept, I carried on and mocked up a basic implementation for a couple more methods, then packaged and published the mock service and proceeded to give it a simple test through SoapUI. I gave SoapUI the address of my service endpoint and it quickly digested the web service's WSDL, presenting all the right methods, which was good... until I asked SoapUI to generate a sample request for one of the methods.

The request was incomplete. There were some elements completely missing. So I checked the WSDL and... shock, horror... the schema for SomeRequest only showed some members but not others!

<complexType name="SomeRequest">
        <complexContent>
          <extension base="impl:SomeRequest">
            <sequence>
              <element name="someArg" nillable="true" type="xsd:int"/>
            </sequence>
          </extension>
        </complexContent>
      </complexType>

Where were the inherited members, like the language id? Lost. Gone. No trace.

Did I use the wrong arguments in java2WSDL? Nope. It turned out that java2WSDL did not support interface inheritance. Digging around online forums and googling up "java2WSDL support inheritance" unfortunately confirmed that. Apparently this is a limitation of the reflection engine they use in Axis2, to be rectified in a future release.

This exercise has now resulted in a couple of learning points for me.

even when you think you have identified your assumptions, think again: there are a lot of implicit assumptions (these are the assumptions that are not actually spelled out) in a piece of architecture, and "support for inheritance" is one of those
after you have identified your assumptions, check them out: it turned out that java2WSDL supported class inheritance, but not interface inheritance
hold on writing your mocks until you've been through the above: I could have saved a whole day of refactoring if I had waited mocking up the web service until I had checked the WSDL generation

Mea culpa.
M.

01 July 2009

Cloud Sweet Cloud

What's this cloud thing? Where is it? What does it do? Why should I bother? Why should I care?
I care because I'm a geek, and any new-tech stuff simply gets my unconditional attention, but aside from that I have recently found more reasons to care about it.
I am on this project with a fairly complex project whose main characteristics are:

zero client entry points (all web-based)
multi-tier architecture
inter-layer messaging based on HTTP or HTTPS
scalability is achieved by simply adding more modules
pretty much any kind of load balancing solution can be implemented at the client entry point

When we implement it, we figure out how many concurrent users we might expect, and size the implementation accordingly, in terms of number of processors needed, amount of memory needed, dedicated bandwidth, etc.

The key to any implementation is, indeed the number of concurrent users, but guess what... in many cases getting the project number of concurrent users is like choosing the numbers to play at the lottery: a wild guess. I've seen the same happening many through my career. It's just the way it goes.

You can call the country or regional manager, ask about the market projections for the first year, then talk with the marketing guys, see if there have been pre-registrations etc, and finally they all agree with a ballpark estimate of 'N' concurrent users. Perfect! What's the confidence level on this? Hmmm... they say pretty good, with a 20% floor and a 50% ceiling. Excellent!

So what do you do? You size for N x 2 concurrent users and proceed with the implementation. Then, of course demand will be way over the projections and your business unit will have to rush new equipment to take care of it, but until it does it will effectively lose business, and inevitably people will start pointing fingers...
Maybe sizing for N x 2 is not enough, so what do you do? Size for N x 3? N x 5? And what if the actual demand end up sticking to the projections or be less? You then have a lot of spare capacity that pulls down the return on investment. So what's the magic multiplier for sizing a projection of N concurrent users?

Of course, there is no such thing. But wait a minute...

Enter the cloud.

There have been a lot of discussion about 'what' actually is 'the cloud'. To me, there is no such thing as 'the cloud'. There are service providers. Some of the solutions are geared towards running web applications, like Google App Engine or Windows Azure. Other solutions are geared towards storage and backup, like Nirvanix. Some other solutions are geared towards data centre resources outsourcing, like Amazon EC2.
Amazon EC2 is the stuff I've been playing with and it's brilliant. The cost structure is somewhat exotic: you pay for CPU/hour, TB/month transfer rates, GB/month storage rates, etc. However, once you get to grips with this model I find it a lot more deterministic than traditional costing. A systems architect will have a good idea about these figures for system usage, so budgeting should not be a serious problem. But budgeting and costing is not why I'm a big fan of this type of cloud solution.

The reason why I like it is that in the cloud there are no such things as spare capacity, forecast errors, starved resources, supplier delays, or last-minute server re-allocation traumatic stress disorder.
Mind you, I'm not saying this s the silver bullet of all scalable implementations. It has its pros and cons, and things to consider are the loss of control over parts of your data centre, the (un)reliability of internet, remote administration costs, staff training, securing the data channels with the cloud, encription, compression, and so on.

Yes, nobody said it doesn't need any homework, and each project is different, but for me and some of my projects this is a godsend. Need more capacity for the Christmas rush online sales? No problem, let me open my ElasticFox plugin and I'll double your capacity in less than an hour, already configured, already load balanced, and fully operational... oh, and with no downtime of course. Then what? Need to downsize after the Christmas rush? No problem, let me open my ElasticFox plugin and remove some spare capacity.

Hold on, that's too simple. Let's try this... I am going to release an updated version of the sytem and planning for a good load test and a stress test to see how far I can push it, so I want a single stack system for the stress test, and a 3-stack load balanced system for the load test, then I want the same again but pointing at a different back-end (one has standard test data, the other has localised data in different languages). No problem, let me open my ElasticFox plugin and... well... you know the story.

M.

15 April 2009

Silly Application!

Note to self: always define in detail what parts of what file or resource in what module needs to be localised.
We have this JSF-based web application where the UI is a bunch of .xhtml files, and we sent it off for localisation through third parties.

Sure enough, they did a good job with the strings. We switch the locale, and everything comes up in a different language. Perfect. Then we noticed a bunch of forms would not submit any more, often throwing some javascript errors.

It didn't take long to find the problem: they had localised the names of the controls in the .xhtml files too!
Silly application! If it expects data from a control named "user" on an English locale, why can't it work out that on a French locale the data should be picked up from a control named "utilisateur" instead? :-)

M.

07 April 2009

Architectural Inertia

Inertia: indisposition to motion, exertion, or change (source: Merriam-Webster Online Dictionary)

What happens when you subscribe to some web-based service and then you forget your password? No big deal, you can usually click on a convenient hyperlink next to the logon box labelled "Forgot your password?". That usually takes you to some sort of password-retrieval process involving the service provider sending you an email with a specially crafted URL; you then open that email, click on that URL, and enter a new password in the web form that has just opened up for you. Job done.

There are many variations on the theme, some of which are more involved than others, for example asking a whole set of personal questions to confirm your identity, but the general idea is always the same, and in many cases you can also retrieve your username through a similar procedure. These are good examples of helping users help themselves when a problem arises, or automated technical support.

Web applications, however, are moving away from usernames in favor of some other identifier that is more directly linked with the user, like the user's primary email address.

A number of web application designers, therefore, thought it would be a good idea to keep the same theme for automated tech support by offering right next to the logon box a convenient hyperlink labelled... "Forgot your email address?"

:-)

31 March 2009

The Cloakroom Pattern

I am writing this in the context of a web application.

In the "good old days", browsers were single-viewed: there was no such thing as tabbed browsing. If you wanted to browse multiple web pages at the same time, you had to open multiple browser windows. Web architectures, application servers and frameworks have evolved to satisfy this single view of the world by managing context at three different levels: application, session and request.

A request exists when the client sends a message to the server. The moment I open my browser google up some stuff, a request springs to life. This can store transient (short-lived) information, like the terms I want to search for, which are unlikely to be re-used after the request has been satisfied.
At the application level, a context exists for the entire life of the application, until the server is restarted. For example, I might need to audit certain user activities to a database, so I might want load the database connection details when my application starts up and store them in the application context.

Somewhere between the application context and the request context, a web application might also want to manage a user session. For example, when I am browsing my web mail, there must be some kind of information that persists between one request and the other, so that for example I don't need to login again if I move from one mail item to the next.

How does caching takes place and why?

Let's say I sell pizza, and that part of my web application has a search function that goes through my catalog and returns to the user all the pizza styles that match the user's choices. Let's also say that the application stores the search criteria in an object that can be cached, let's call it searchObj, so if the user refreshes the page (without changing the search criteria), the application saves time and resources by simply re-using the same data instead of making a new round trip to the database.

According to what we said above, if searchObj needs to be persisted across requests, it makes sense to cache it at the session level.

So here I am as a potential customer using this pizza web application, searching for pizza that contains ham, so I type "ham" in the input box, click the submit button and look at the resulting list. All the listed pizzas have ham in the ingredients. If I happen to refresh the browser, the application simply re-uses the same list without making a new round trip to the database.

Now let's say I open a new browser tab (not a new browser window) to display the results for a different search. This time I want to search for pizza that contains olives, so I type "olives" in the input box, click the submit button and look at the resulting list. All the listed pizzas have olives in the ingredients. Great.
Now I go back to the previous browser tab, the one with ham-based pizzas, and hit the refresh button. All the listed pizzas now have olives in the ingredients.

What happened?

It happened that searchObj was overwritten by the second search, but how?
Let's think of this scenario in a different way. Let's say I need some milk, and that I suffer from particularly bad memory, so before I forget I decide to write "Milk" on a post-it note and stick it to the door, then I go to get your jacket, car keys, etc. Now let's say my lodger, the sentient invisible pet chimp Iain, needs some fruit juice but instead of writing "Fruit Juice" alongside "Milk" on my post-it note, he decides to replace my post-it note with another one saying "Fruit Juice". Now I'm ready to go out, but of course I have forgotten what I needed to buy, so I pick up the post-it note from the door and happily go on to buy... fruit juice!

In this example, the post-it note is searchObj, Iain and I are the request-scope beans activated from two different tabs of the same browser, and the door is the session. Assume my house only has one door, the entrance/exit door (multiple tabs on the same browser share the same session).

How can we solve the problem?

In terms of "post-it note on the door", it's fairly easy: we draw two squares on the door and label them "Marco" and "Iain". Now we can use our own post-it notes, as long as we stick them in our own designated area on the door.

How does that translate into a web application?

We need to think of this type of context as sitting somewhere between the request scope and the session scope. If we think of each browser tab as a different "view" of our user session, then we can talk of view-level context and view-scoped objects. However, this is not a built-in functionality in the well-known web application frameworks or containers, so we need to simulate it, but how?

In the above example, we said the door represents the session, so we need to stick into the session some kind of container that can hold labelled compartments. How about a Map, for example a Hashtable? Yep, could do with one of those, but how do we actually generate the keys? In other words, how do we make sure that each tabbed view of the same browser unequivocally identifies itself when posting information to the session and retrieving information from the session?

I'm not sure we can handle the "unequivocally" part, but here's what I would do: I would use the coat check pattern, also referred to as the cloakroom pattern. I don't think you'll find that in reputable software engineering books, so don't bother looking.

This is a snippet from Wikipedia's definition of "Cloakroom".
"Attended cloakrooms, or coat checks, are staffed rooms where coats and bags can be stored securely. Typically, a ticket is given to the customer, with a corresponding ticket attached to the garment or item."

In particular, you'll see that tickets are generally given away in sequential order, and that you don't actually need to show personal ID documents when picking up your coat: you simply produce the ticket that was given to you when you gave them your coat. For our web application, the tickets are issued by some sort of decorator class that intercepts HTTP requests and does something like this...

check if there is a querystring variable called "coatNum" (or whatever you fancy)
if there is one, do nothing, otherwise increment an internal counter and use it to decorate the querystring by appending the "coatNum" variable name and value

For a JSF application, this might be a phase listener (maybe for the RESTORE_VIEW event?). For an IceFaces application, things have already been worked out in the form of the "rvn" querystring variable.

For added security, some might argue that the view number should not be easily guessed, so sequential numbering is actually discouraged, but remember that we are talking about different tabs or views of the same browser window. In any case, just for clarity, I will stick to a counter that gets incremented every time it's used.

There is a second part to this: once we know what view number the request wants to use, how do we use that view number to organise our view-scope objects? We said we could use a Map, but where does that live? In the real life coat check scenario, let's say at a classical concert, there can be multiple coat rooms, with each coat room used by multiple punters. This might suggest that the Map holding view-scoped objects should be application-scoped. Even though it is possible to do so, it would require additional overhead in terms of resources, because *all* view-scoped objects for *all* users in the entire application. Also, we would have to write additional code to manage object expiry and cleanup, otherwise we would see the Map growing to infinity and beyond. There are also some security and privacy concerns, since every request would have access to *all* view-scoped objects.

One solution is therefore to stick the Map in the session, or a session-bound bean. As a result, the internal counter that identifies the view number must also be session-bound, so that it starts at zero every time a new session is generated.

In summary, here is what I would do every time a new request comes in:

use an internal session-bound variable to generate view identifiers (e.g. a counter) and a session-bound Map to cache view-level objects
intercept requests and check for a query string variable that identifies the view number
if it's not present, then decorate the query string with a variable that identifies the view number
if the session-bound Map already contains an object for the given view number, then discard the object received from the request and re-use the cached object instead, otherwise take the object from the request and cache it
process the request and return a response to the client

Hold on a minute... what if the request actually uses more than one object?

In that case we don't simply have a session-bound Map, but rather a Map of Maps. In other words, the session-bound Map will still be keyed by view ID, but the value for a given view ID will be a Map of objects, keyed by object ID. We can therefore talk about a "view map" being a collection of "object maps".
This is the revised workflow:

use an internal session-bound variable to generate view identifiers (e.g. a counter) and a session-bound Map (the view map) to cache view-level object maps
intercept requests and check for a query string variable that identifies the view number
if it's not present, then decorate the query string with a variable that identifies the view number
for the view-level object that should be cached, find its identifier (might well be a simple obj.toString() call)
if the view map contains an object map for the given view ID, then retrieve it, otherwise create a new object map and put it in the view map for the given view ID
if the object map contains an object with the same object ID, then discard the object received from the request and re-use the cached object, otherwise take the object from the request and put it in the object map
process the request and send a response to the client

The mechanism can also be extended with an expiration algorithm that kicks in on steps 5 and 6, so that cached objects are refreshed from time to time if needed, but that is another matter altogether.

29 December 2008

Regulating the Internet

The Italian prime minister Silvio Berlusconi reportedly said (in Italian) that he wants to put forward an international effort to regulate the internet when leading the next presidency of the G8. Just google up the terms "berlusconi g8 internet" if you want to know more.

Wow, the guys is a genius. This level of enlightenment is almost super-human. Nobody ever thought of it before, or - if they did - they didn't have the intellectual acumen necessary to make it happen.

I don't precisely know what the reactions from the G8 benches will be when this item will be brought up for actual discussion, but I have a feeling they might be along the lines of:

big fat hysterical general laughter
various exclamations like WTF, N00B, ROTFL rising from all sides
Berlusconi accusing everyone of being an antidemocratic communist, as he usually does when people start disagreeing with him
more hysterical general laughter
meeting coming to a closure leaving everyone but him in tears (of laughter)

Well, at least he would brighten up the day.

23 December 2008

Timely Mess

Is it mass that bends space-time, or is it the folds in the space-time fabric that "create" mass? Which one comes first? Far from being an expert in special and general relativity, I am fascinated by the concept of time. Can we actually think of what comes first and what comes second in relativistic terms? Is there such thing as causality when we think of time as "just" another dimension? Causality is the principle that makes us say "B happens because of A". This should imply that A happens "before" B - how else could B exist? This is because in simple human terms there is only one direction in time.

Causality has already been defined in relativistic contexts, but I still can't get my head around it. If we think that time is not such a "special" dimension, but rather an "ordinary" dimension, like length or width, then saying that A happens before B is no different than saying A is to the left/right/top/bottom/in-front-of/behind B. In other words, there would be no particular meaning to the words "before" and "after", at least no more than "top" and "bottom".

Can there be no such thing as causality? Can children exist independently from their parents? Can trees exist independently from the seeds they sprouted from? Not in our everyday experience, sure, but think about it from the perspective of an entity that moves precisely at the speed of light: for example a photon.

Time is, effectively, standing still for a ray of light. Were the photon bestowed with the gifts of sight and intellect, it would "see" the world in a very interesting way. For example, let's take my current state of "being". I am currently sitting on a double-decker London bus, going to work. I hopped on the bus about 30 minutes ago and I expect to get off it in another 15-20 minutes - traffic permitting. Imagine the route this bus takes in its hour-long journey, then take a mental snapshot of this bus and imagine that snapshot filling every point of this route. The result is a very long and flexible bus-like shape that precisely traces the entire bus route. It's a bit like the starship Enterprise when it goes warp and we see it suddenly taking this elongated shape before it enters hyperspace.

Now back to our bus. Imagine tracing this bus-like shape not only through this hour-long bus route, but through all the places this bus has ever been and will be in its working life. Looks pretty messy. If a photon came into existence the moment this very bus came into existence, that's how that photon would see this bus: a messy, fuzzy, zig-zagging bus-like shape tracing a very complex pattern that goes through all the places wher this bus has ever been and where this bus will ever be in the future.

Now imagine not only this bus, but everything else, in the same way: people on the streets, buildings and trees following the rotation of the Eearth, the planets in the solar system tracing their paths, our sun tracing its path across the galaxy, and so on. That's how a photon would see the world. Nothing comes "first" or "second" or "third": everything just "is", in a big, fuzzy, dark-brown-ish mess with no concept of causality. Why dark-brown-ish? Because of innumerable objects of different colors intersecting at innumerable different points: the more different colors you mix, the more dark and brown-ish the resulting color will be.

We just said "with no concept of causality". But is that true? After all, we still know from everyday experience that children come from parents, that trees come from seeds, and so on. There must be something that relates a "cause" with an "effect". How is that going to work when time stands still? I haven't the foggiest idea, but that is a pretty good indication that, in these terms, the definition of causality must be very different from what we are used to.

28 October 2008

Random Thought On The Internet

This is one of the random thoughts that usually pop right into my head when I least expect it.

I was on a work trip. One of those where you get to travel to exotic destinations but all you get to see is the office, the company cafeteria and the hotel. I was having dinner at the hotel and I started thinking of how come people have this strange need to aggregate in some area. How did people start living with other people, and how did they manage to eventually take this concept to such gigantic proportions as to create today's biggest cities, with millions and millions of individuals living close to each other.

The first reason I thought of was procreation and the continuation of the species: a certain gene pool needs a minimum degree of diversity in order to avoid extinction. Did humans in the upper palaeolithic or in the mesolithic really think along those lines? Probably not. So here I am, I'm not very good at running, hiding, or throwing spears. I can't hunt for food, but I'm good at growing legumes. If I find someone who's good at hunting, maybe we can trade food. Now, wouldn't it be easier if the two of us lived relatively close to each other instead of a half-day walking distance? I haven't studied anthropology, archaeology or any related branches, but that kind of makes sense to me.

Without going that far back in time, just think that a hundred years ago a grain of black pepper needed to travel for months to reach Europe, from south west India to Italy, but people could still find black papper in grocery shops, because they lived in human settlements where specialisation of roles could support a complex social infrastructure: different people can do different things and provide different needs, but they need to be in relatively close proximity in order to maximise their reciprocal advantage.

In physics, this is equivalent to a system trying to find its equilibrium by minimising its energy. An electron in an excited atom "prefers" to drop to a lower energy level if that level is not already full. This is what the electron thinks: what's the point in doing so much work just to keep running like mad in a "higher orbit", when I could just cruise casually around a "lower orbit"? So the electron "drops down" an energy level and sheds the excess energy in a flash of light. And no, I haven't studied much physics either, but that kind of makes sense to me.

Extrapolating the principle of minimising a system's energy, here's what a human might think instead: what's the point in spending half a day travelling to buy some food, load up enough food to last me a while (because it's not like I'll do this again tomorrow, or the day after), then carry all that food back home and manage its storage, when I could just move downtown and simply pop down to the corner shop whenever I need to?
The real achievement in moving downtown, in really fancy terms, is not that I have found a low-energy equilibrium. My half-day walk has become a two-minute stroll. The 20 kilometers between the shop and my house have become less than 100 metres. By moving downtown I have effectively compressed space and time.

Today, as a human living in the remote countryside, I don't actually need to move at all, but I can still compress space and time. How? With Internet, of course. I don't need to live in relatively close proximity to other humans and human structures: I need to move to wherever there is an Internet connection and a postal service, and these might well exist in the remote countryside as well as downtown. So will internet reverse the process of urbanisation, or at least will it make such reversal possible? I think so. Virtual offices, internet videoconferencing, voice-over-IP, news streaming, bla bla bla... it's all there already. The recent hype of "going green" even encourages to great extents to stay where you are, avoid travelling, avoid lighting up an entire office or unnecessarily loading the public transport system if you can work from home. The fascination for "going downtown", perhaps, will eventually be confined only to touristic attractions. I'm not saying people will have no more reasons for sticking together in organised physical conglomerates: I'm just saying, IMHO, that there will be a lot less of a motivation to do so in the future.

M.

23 October 2008

Expressions in IceFaces Navigation Targets

I must write this lest I forget.

I've been working on an exception handling mechanism for a JSF-based application using IceFaces, and I was thinking... "why oh why can't we dynamically navigate to error pages using a navigation rule?"
Navigation rules in the faces-config.xml look like this:

<navigation-rule>
    <from-view-id>/somePage.xhtml</from-view-id>
    <navigation-case>
        <from-outcome>error</from-outcome>
        <to-view-id>/errorPage.xhtml</to-view-id>
    </navigation-case>
</navigation-rule>

That is fine but very limiting for my navigation purposes. What if I want to navigate to a different page according to the actual error code? In other words, I want to be able to do this:

<navigation-rule>
    <from-view-id>/somePage.xhtml</from-view-id>
    <navigation-case>
        <from-outcome>error</from-outcome>
        <to-view-id>/#{errorBean.errorCode}.xhtml</to-view-id>
    </navigation-case> </navigation-rule>

Well, it seems that I can't do that with the Sun JSF RI or IceFaces, so I decided to make it happen. I figured that if I wanted to add expression evaluation in a navigation target URL I needed to write a view handler. With IceFaces, the application view handler is com.icesoft.faces.facelets.D2DFaceletViewHandler, which is reponsible for setting up the direct-to-DOM rendering etc, so I needed to extend that class and find what methods I needed to override in order to get me to where I wanted to be. After a bit of experimentation I found there are two scenarios:
Scenario #1: Navigation Rule With Redirection
This is where the navigation rule has a <redirect/> tag. The method in D2DFaceletViewHandler that handles this is

public String getActionURL(FacesContext context, String viewId)

Scenario #2: Navigation Rule Without Redirection
This is where the navigation rule does not have a <redirect/> tag. The method in D2DFaceletViewHandler that handles this is

public void renderView(FacesContext context, UIViewRoot viewToRender)

The way I decided to process the expression is very simple, almost elementary:

Parse the URL/ViewId looking for a sub-string that begins with '#{' or '${' and ends with '}'
capture the sub-string and create a value binding to evaluate the expression
Replace the expression in the URL/ViewId with the actual value
Process the newly evaluated URL/ViewId

Before I get any comments on step 1... no, I don't like regex because my brain just doesn't get it, and it takes me considerably longer to figure out a regex pattern to capture such a simple substring than to actually write a few lines of code to do the parsing.

So here's my (edited) code.

/**
* Constructor
*/
public MyViewHandler(ViewHandler delegate) {
    super(delegate);
}

/**
* Processes a view id that may contain an expression, by evaluating the
* expression and replacing the expression tag in the original view id with
* the expression result.
*
* @param context The faces context.
* @param viewId The view id to process.
* @return The processed view id.
*/
private String processViewId(FacesContext context, String viewId) {
    String processedViewId = viewId;

    int startExpression = processedViewId.indexOf("{") - 1;
    if (startExpression > 0) {
        char expChar = processedViewId.charAt(startExpression);

        // expressions start with # or $
        if ((expChar == '#') || (expChar == '$')) {
            int endExpression = processedViewId.indexOf("}", startExpression);

            if (endExpression > startExpression) {
                // viewId contains an expression
                String expression = processedViewId.substring(startExpression, endExpression + 1);

                try {
                    ValueBinding vb = context.getApplication().createValueBinding(expression);

                    if (vb != null) {
                        String evaluatedExpression = vb.getValue(context).toString();

                        // replace the expression tag in the view id
                        // with the expression's actual value
                        processedViewId = processedViewId.replace(expression, evaluatedExpression);
                    }
                }
catch (ReferenceSyntaxException ex) {
                    // do nothing: processedViewId = viewId;
                }
            }
        }
    }

    return processedViewId;
}

/**
* Used to process a URL that may contain an expression. If a navigation
* rule in the faces configuration file has a <redirect> tag, this
* method will be used to process the URL specified in the
* <to-view-id> tag
*
* @see javax.faces.application.ViewHandler#getActionURL(FacesContext, String)
*/
@Override
public String getActionURL(FacesContext context, String viewId) {

    String processedViewId = super.getActionURL(context, viewId);
    processedViewId = this.processViewId(context, processedViewId);

    return processedViewId;
}

/**
* If a navigation rule in the faces configuration file does not have a
* <redirect> tag, this method will be used to process the URL
* specified in the <to-view-id> tag
*
* @see com.icesoft.faces.application.D2DViewHandler#renderView(FacesContext,
* UIViewRoot)
*/
@Override
public void renderView(FacesContext context, UIViewRoot viewToRender)
throws IOException, NullPointerException {

    String viewId = this.processViewId(context, viewToRender.getViewId());
    viewToRender.setViewId(viewId);

    super.renderView(context, viewToRender);
}

To use my spanking new view handler I just have to change the application section in the faces-config.xml file:

<faces-config>
    <application>
        ...
        ...
        <view-handler>
            
            myViewHandler
        </view-handler>
    </application>
</faces-config>

21 October 2008

Finding Kokkyu

This is about one of those amazing discoveries in my world of Aikido.

I'm walking fairly fast to reach home before the Simpsons start on TV. I get to an underpass to get to the other side of a large busy road. I have a 5-year-old sitting on my shoulders, let's say about 20 kilograms. I have my techno-backpack on my back, let's say another 10 kg, for a total of 30 kg concentrated on my neck and shoulders. I am also pushing a buggy with a 1-year-old, for a total of, say, another 10 kg. The underpass, by definition, routes me downhill at first, then across, then uphill, and it's during the uphill phase that I start struggling.

Instinctively, I simply stretch out my arms to push the buggy, and compensate the additional weight on my neck and shoulders by tensing the muscles in my upper body.

WRONG! That's hard work!

Then it just dawned on me: I relaxed my shoulders, dropped my weight towards my hips, energised my forearms, putting just a tiny bit of zanshin on each step... and there I was, cruising uphill without noticeable effort!

Aikido will never cease to amaze me.

16 October 2008

2309

sorridi come una volta al chiaro del mare ai sogni di luna su un volto solare		smile like you used to in the brightness of the sea to your moondreams on your sun-kissed face
canta ai colori del mondo alle amate colline alle palme incantate con la tua voce eterna di brezza d'estate		sing to the colors of the world to your loved highland to the enchanted palms with the eternal voice of a summer breeze
vola sereno oltre le mura di tutti i colori senza paura senza dolori		fly unworried over these walls of all colors without fear without pain
ecco un artista! un autore un sole un padre		here is an artist! an author a sun a father
adesso solo un pianto e un sussurro un ultimo ciao		and now only a cry and a whisper one last hello
luceran ancora le stelle		the stars will shine again

20 September 2008

Architectural Agility - Part 1

Software Engineering is a wonderful world full of surprises and challenges. We learn early on in life that any activity that floods your system with adrenaline is an addictive activity.
In the life of a ten year old, this might take the form of climbing a tree for the first time, ascending to seemingly unreachable altitudes with a non-zero probability of following that up with weeks of hospitalisation.
In the life of a programmer, this might be the result of producing thousands of lines of code under deadlines that you never considered even remotely realistic.

Let's face it: the feeling of impending doom is just great.

In the life of an architect, the main perverse sense of doom and destruction originates from the fact that you are supposed to shape the system in your head, then somehow implant that picture in the heads of over 50 developers, and pray that you've been clear enough with your specifications.
The uninitiated might rightfully ask "Why? Just write the darn specs and pass them on: if they're good developers, they'll work them out..." Alternatively, "Ever heard of agile?".

Well, I don't know about the rest of the world, but in my projects I've never managed to do either. Usually, the situation involves very vague or even yet-to-be-discovered requirements, and only one or two agile developers in a team of 50-plus. So how have I managed so far? Well, it's surely a continuous learning for me, and I still discover something new almost on a daily basis, but here are some points that I have picked up along the way and found very valuable.

#1 : Learn how to produce on-the-fly specs
Why? Because on one hand I have this very hazy and almost monochromatic picture of the requirements, on the other hand I have 50 developers expecting fairly detailed specifications of what they are going to produce, and somewhere in the middle I have a bunch of reasons for being unable to cultivate an agile team.

#2 : Learn how to be ALWAYS available for the team
Why? Perhaps it's just me, but on-the-fly specs are NEVER good enough.

#3 : Learn how to monitor progress all the time
This doesn't mean to become a control freak, but rather to understand the dynamics of the project and minimise the risk of producing the wrong thing due to sub-optimal specifications. Why? Because 50 developers, over (officially) 8 working hours in a day, makes 400 hours/day of code writing and testing time. In one week, that's at least 2000 hours, or about one developer/year worth of effort. That means that failing to convey an architectural concept for the system and leaving 50 developers alone for a whole week translates into seriously refactoring one developer/year worth of code, which is not something you usually do in your spare time. It's like steering the proverbial oil tanker: one mistake in plotting the route, and it will take considerable effort trying to get it back on the right track if you don't spot the mistake right away.

#4 : Functional refactoring is inevitable
Why? Well, due to all of the above. However, I tend to view refactoring as falling into one of three categories: architectural, functional, and schematic.
Architectural refactoring is serious: that is what happens when we change parts of the architecture , for example when half way through the project we realise we need a caching framework for our web application.
Functional refactoring is somewhat less serious and could be considered a minor version of architectural refactoring: that happens when we change some of the application's behavior, for example when we move hard-coded values to some configuration file, and surely when we are bugfixing.
Schematic refactoring is standard routine: that is when the application's functionality is unaffected while changing its internal structure, for example when we abstract common functionality from several classes into one parent class, or formalise common class contracts into interfaces. I'm now learning to shape functional refactoring into an agile project in its own right, and probably write some considerations on that in another post.

M.

14 September 2008

The Eternal Beginner

Recently, I've been thinking about my learning journeys in the wonderful world of Takemusu Iwama Aikido. In particular, I've been trying to understand the changes in my "learning methodology", for want of a better term. In other words, how do I actually manage to learn a technique, and how has that approach changed over time?

I could go on for days trying to explain this, but I'll try to summarise the main ideas here.

When I first started practising Aikido, I would try to learn a technique by thinking only of my own body movements. I would look at my sensei offering his right wrist, moving his right foot a little on the side, settle his hips, then turn 180 degrees and face the other way, before adjusting his left foot into hanmi. I would then try to replicate the exact movements, step by step, with clumsy results at first, but then progressing to make my moves smoother and more efficient. The whole process would be based on the mindset of "how do I move my body so that I end up in this or that position?".

I later figured that analysing my own would only be stage one (out of many) for learning a technique, and nowadays I try to learn a technique by going through as many stages as possible, as I'll shortly explain.

Stage two is when I start to look at my training partner's movements instead of mine. Different people have different body shapes, different statures, different degrees of mobility, and so on. It wouldn't make much sense to me to learn a technique through one sequence of my own body movement, and expect the same sequence of movements to work without fail on any training partner. Stage two would therefore be based on the mindset of "how do I move my body so that my training partner ends up in this or that position?".

Stage three is when I start considering the communion of the situation. It's not about my own body, or my training partner's. It's about the combination of both: what shapes they make, what balance they achieve, and so on. This is where my mindset goes somewhere along the lines of "how do I move my body to induce my training partner to move in such a way that we both end up in this or that position?".

The next stage is about direction and redirection of energy. This is when I usually focus on the "path of least resistance": if it's hard to move around my training partner in a particular way, then chances are it's the wrong way. For example, if my training partner is pushing, why should I push back? It's a lot easier to follow his/her lead and pivot around that energy vector, modifying (even if only slightly) the vector's components. This then leads my training partner to follow the new vector, so my mindset would then be "how do I move my body to redirect my training partner's energy in such a way that we end up in this or that position?".

I noted that there is a very clear similarity between the last two stages of training awareness, but there is also a very important different: the former is static, whilst the latter is dynamic. I have also noted that all of the above stages are still focused on my own body movements, albeit with very different mindsets, and I find this extremely interesting because, at the end of the day, that's what I think is the actual nature of a martial art: mind over body.

I am sure there are many more stages of training awareness, but that's pretty much where I am at the moment. For example, for anyone knowing this terminology, all of the above are my consideration on ki-hon forms. I haven't even started considering the type of mindset involved in ki-no-nagare forms. Who knows, maybe one day...

Hi There

Call this a warning, a disclaimer, or whatever you like.
I'm starting this blog to share my personal views on different subjects.
In other words, these are my experiences, and these are the ways I've travelled my own path.
It's just a bunch of information that someone might or might not find useful.

In the end, if only one person finds any of this stuff useful in any way, then I will have done something good with my time.

Enjoy.