Just out of curiosity, I see if there is a hacked firmware based on a more recent image, and I find one based on version 3.4.90, with SSH of course.
So here we go. Let's check the RAID device...
~ # mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Sat May 5 06:30:50 2007 Raid Level : raid1 Array Size : 487106752 (464.54 GiB 498.80 GB) Device Size : 487106752 (464.54 GiB 498.80 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri May 22 15:20:30 2009 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 7ff7415e:4719112d:d63dd33d:40ff685f Events : 0.525126 Number Major Minor RaidDevice State 0 0 0 - removed 1 8 6 1 active sync /dev/sda6
Again only one drive out of two.
Let's see what happened to /dev/sdb.
~ # /usr/sbin/smartctl -l selftest /dev/sdb smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log, version number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended off-line Completed 00% 7474 - # 2 Off-line Interrupted (host reset) 50% 7466 - # 3 Off-line Interrupted (host reset) 50% 7379 - # 4 Short off-line Completed: read failure 50% 7334 0x00032141 # 5 Off-line Interrupted (host reset) 00% 7334 - # 6 Short off-line Completed 00% 7330 - # 7 Off-line Interrupted (host reset) 00% 7330 - # 8 Off-line Interrupted (host reset) 00% 5973 - # 9 Off-line Interrupted (host reset) 00% 5396 - #10 Off-line Interrupted (host reset) 00% 5393 - #11 Off-line Interrupted (host reset) 00% 5376 - #12 Short off-line Completed 00% 4687 - #13 Off-line Interrupted (host reset) 00% 4687 - #14 Off-line Interrupted (host reset) 00% 4003 - #15 Off-line Interrupted (host reset) 00% 3819 - #16 Short off-line Completed 00% 3659 - #17 Short off-line Completed 00% 3659 - #18 Short off-line Completed 00% 3655 - #19 Off-line Interrupted (host reset) 70% 3652 - #20 Short off-line Aborted by host 70% 3652 - #21 Off-line Interrupted (host reset) 00% 3651 -
Ouch! LBA_of_first_error = 0x32141 (= 205121 in base 10)
Let's check also the SMART attributes.
~ # /usr/sbin/smartctl -A /dev/sdb smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 32 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 168 162 063 Pre-fail Always - 18676 4 Start_Stop_Count 0x0032 210 210 000 Old_age Always - 20884 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 247 243 187 Pre-fail Always - 41160 9 Power_On_Hours 0x0032 232 232 000 Old_age Always - 7559 10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 76 189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x0022 056 039 000 Old_age Always - 959119404 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0032 046 253 000 Old_age Always - 44 195 Hardware_ECC_Recovered 0x000a 252 210 000 Old_age Always - 37129 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0 202 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0 203 Unknown_Attribute 0x000b 253 252 180 Pre-fail Always - 11 204 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0 205 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0 207 Unknown_Attribute 0x002a 253 252 000 Old_age Always - 0 208 Unknown_Attribute 0x002a 253 252 000 Old_age Always - 0 210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0 211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0 212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
Not too bad after all, since
Current_Pending_Sector = 0
Offline_Uncorrectable = 0
Now let's find which partition has the problem.
~ # fsck.ext3 -nv /dev/sdb1 e2fsck 1.38 (30-Jun-2005) /dev/sdb1: clean, 3045/64256 files, 24511/64252 blocks ~ # fsck.ext3 -nv /dev/sdb2 e2fsck 1.38 (30-Jun-2005) Warning! /dev/sdb2 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. /dev/sdb2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity /lost+found not found. Create? no Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (39452, counted=39449). Fix? no Free inodes count wrong (61113, counted=61112). Fix? no /dev/sdb2: ********** WARNING: Filesystem still has errors ********** 2887 inodes used (4%) 13 non-contiguous inodes (0.5%) # of inodes with ind/dind/tind blocks: 156/0/0 24548 blocks used (38%) 0 bad blocks 1 large file 2309 regular files 175 directories 47 character device files 40 block device files 0 fifos 8 links 308 symbolic links (308 fast symbolic links) 0 sockets -------- 2887 files
/dev/sdb3 is a swap partition, so we can skip that.
/dev/sdb4 is an extended partition, so we can skip that too.
~ # fsck.ext3 -nv /dev/sdb5 e2fsck 1.38 (30-Jun-2005) Warning! /dev/sdb5 is mounted. Warning: skipping journal recovery because doing a read-only filesystem check. /dev/sdb5 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity /lost+found not found. Create? no Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (118269, counted=118286). Fix? no Free inodes count wrong (126523, counted=126537). Fix? no /dev/sdb5: ********** WARNING: Filesystem still has errors ********** 69 inodes used (0%) 4 non-contiguous inodes (5.8%) # of inodes with ind/dind/tind blocks: 0/0/0 8235 blocks used (6%) 0 bad blocks 1 large file 32 regular files 14 directories 0 character device files 0 block device files 0 fifos 0 links 0 symbolic links (0 fast symbolic links) 0 sockets -------- 46 files ~ # fsck.ext3 -nv /dev/sdb6 e2fsck 1.38 (30-Jun-2005) /dev/sdb6: clean, 93575/60899328 files, 56875141/121776688 blocks
So the errors are on /dev/sdb2 and /dev/sdb5
Let's see where they mount to.
~ # mount | grep /sdb /dev/sdb1 on /mnt/__mxo_sdb1 type ext3 (rw) ~ # cat /proc/mounts | grep /sdb /dev/sdb5 /tmp ext3 rw 0 0 ~ # cat /proc/cmdline console=ttyS0,115200 root=/dev/sdb2 rw
Are we booting from /dev/sdb2?
~ # mxoparam -h Maxtor mxoparam version 1.0 -a show all maxtor params -b get wait for button status -c [0-1] set wait for button 0 = Off 1 = On -d show max number of drives -e enable watchdog in uboot -f disable watchdog in uboot -g set led solid green -h show help -k kick watchdog -p get boot partition -q [part] set boot partition 0 = drive 0 partition 1 1 = drive 0 partition 2 2 = drive 1 partition 1 3 = drive 1 partition 2 -r reset partion fail count -s get serial number -t [sn] set serial number -v show version -x disable watchdog now -w enable watchdog now -y set led solid yellow ~ # mxoparam -p Boot partition is 3Looks like the system is booting from the second disk, second partition (/dev/sdb2)
This means we can't unmount it, and we need to unmount it before we can fix it.
Therefore, we need to make the system boot from /dev/sda2 otherwise we won't be able to fix /dev/sdb*
First of all, let's make sure /dev/sda2 is exactly the same as /dev/sdb2
~ # dd if=/dev/sdb2 of=/dev/sda2 ~ # mount -n /dev/sda2 /mnt/__mxo_sda2 -t ext3 ~ # cp -a /mnt/__mxo_sdb2 /mnt/__mxo_sda2 Now let's set the new boot partition~ # mxoparam -q 1REBOOT! ... Let's check it's booting up from the right place now.~ # cat /proc/cmdline console=ttyS0,115200 root=/dev/sda2 rwRight! We're ready to fix /dev/sdb2 and /dev/sdb5 now!~ # fsck -v /dev/sdb2 fsck 1.38 (30-Jun-2005) e2fsck 1.38 (30-Jun-2005) /dev/sdb2 has gone 384 days without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity /lost+found not found. CreateNow we can rebuild the RAID array.? yes Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sdb2: ***** FILE SYSTEM WAS MODIFIED ***** 3157 inodes used (4%) 14 non-contiguous inodes (0.4%) # of inodes with ind/dind/tind blocks: 159/0/0 24622 blocks used (38%) 0 bad blocks 1 large file 2317 regular files 179 directories 246 character device files 84 block device files 0 fifos 7 links 321 symbolic links (321 fast symbolic links) 0 sockets -------- 3154 files ~ # fsck -v /dev/sdb5 fsck 1.38 (30-Jun-2005) e2fsck 1.38 (30-Jun-2005) /dev/sdb5: recovering journal /dev/sdb5 has been mounted 35 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity /lost+found not found. Create ? yes Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sdb5: ***** FILE SYSTEM WAS MODIFIED ***** 2939 inodes used (2%) 28 non-contiguous inodes (1.0%) # of inodes with ind/dind/tind blocks: 159/0/0 26690 blocks used (21%) 0 bad blocks 1 large file 2345 regular files 189 directories 47 character device files 40 block device files 0 fifos 8 links 308 symbolic links (308 fast symbolic links) 0 sockets -------- 2937 files ~ # mdadm --manage --add /dev/md0 /dev/sdb6 mdadm: hot added /dev/sdb6 NAS:~ # mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Sat May 5 06:30:50 2007 Raid Level : raid1 Array Size : 487106752 (464.54 GiB 498.80 GB) Device Size : 487106752 (464.54 GiB 498.80 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat May 23 14:39:50 2009 State : clean, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 0% complete UUID : 7ff7415e:4719112d:d63dd33d:40ff685f Events : 0.541445 Number Major Minor RaidDevice State 0 0 0 - removed 1 8 6 1 active sync /dev/sda6 2 8 22 0 spare rebuilding /dev/sdb6 ~ # cat /proc/mdstat Personalities : [linear] [raid1] md0 : active raid1 sdb6[2] sda6[1] 487106752 blocks [2/1] [_U] [>....................] recovery = 1.5% (7396736/487106752) finish=145.1min speed=55092K/sec unused devices: [none]Good... 2.5 hours later...~ # cat /proc/mdstat Personalities : [linear] [raid1] md0 : active raid1 sdb6[0] sda6[1] 487106752 blocks [2/2] [UU] unused devices: [none]Reboot again and we're done!
No comments:
Post a Comment