Here we go again. The NAS is not happy.
Just out of curiosity, I see if there is a hacked firmware based on a more recent image, and I find
one based on version 3.4.90, with SSH of course.
So here we go. Let's check the RAID device...
~ # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Array Size : 487106752 (464.54 GiB 498.80 GB)
Device Size : 487106752 (464.54 GiB 498.80 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri May 22 15:20:30 2009
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Events : 0.525126
Number Major Minor RaidDevice State
0 0 0 - removed
1 8 6 1 active sync /dev/sda6
Again only one drive out of two.
Let's see what happened to /dev/sdb.
~ # /usr/sbin/smartctl -l selftest /dev/sdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log, version number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended off-line Completed 00% 7474 -
# 2 Off-line Interrupted (host reset) 50% 7466 -
# 3 Off-line Interrupted (host reset) 50% 7379 -
# 4 Short off-line Completed: read failure 50% 7334 0x00032141
# 5 Off-line Interrupted (host reset) 00% 7334 -
# 6 Short off-line Completed 00% 7330 -
# 7 Off-line Interrupted (host reset) 00% 7330 -
# 8 Off-line Interrupted (host reset) 00% 5973 -
# 9 Off-line Interrupted (host reset) 00% 5396 -
#10 Off-line Interrupted (host reset) 00% 5393 -
#11 Off-line Interrupted (host reset) 00% 5376 -
#12 Short off-line Completed 00% 4687 -
#13 Off-line Interrupted (host reset) 00% 4687 -
#14 Off-line Interrupted (host reset) 00% 4003 -
#15 Off-line Interrupted (host reset) 00% 3819 -
#16 Short off-line Completed 00% 3659 -
#17 Short off-line Completed 00% 3659 -
#18 Short off-line Completed 00% 3655 -
#19 Off-line Interrupted (host reset) 70% 3652 -
#20 Short off-line Aborted by host 70% 3652 -
#21 Off-line Interrupted (host reset) 00% 3651 -
Ouch! LBA_of_first_error = 0x32141 (= 205121 in base 10)
Let's check also the SMART attributes.
~ # /usr/sbin/smartctl -A /dev/sdb
smartctl version 5.1-14 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 32
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 168 162 063 Pre-fail Always - 18676
4 Start_Stop_Count 0x0032 210 210 000 Old_age Always - 20884
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 247 243 187 Pre-fail Always - 41160
9 Power_On_Hours 0x0032 232 232 000 Old_age Always - 7559
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 76
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Unknown_Attribute 0x0022 056 039 000 Old_age Always - 959119404
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 046 253 000 Old_age Always - 44
195 Hardware_ECC_Recovered 0x000a 252 210 000 Old_age Always - 37129
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0
202 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0
203 Unknown_Attribute 0x000b 253 252 180 Pre-fail Always - 11
204 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0
205 Unknown_Attribute 0x000a 253 252 000 Old_age Always - 0
207 Unknown_Attribute 0x002a 253 252 000 Old_age Always - 0
208 Unknown_Attribute 0x002a 253 252 000 Old_age Always - 0
210 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
211 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
212 Unknown_Attribute 0x0032 253 252 000 Old_age Always - 0
Not too bad after all, since
Current_Pending_Sector = 0
Offline_Uncorrectable = 0
Now let's find which partition has the problem.
~ # fsck.ext3 -nv /dev/sdb1
e2fsck 1.38 (30-Jun-2005)
/dev/sdb1: clean, 3045/64256 files, 24511/64252 blocks
~ # fsck.ext3 -nv /dev/sdb2
e2fsck 1.38 (30-Jun-2005)
Warning! /dev/sdb2 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sdb2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create? no
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (39452, counted=39449).
Fix? no
Free inodes count wrong (61113, counted=61112).
Fix? no
/dev/sdb2: ********** WARNING: Filesystem still has errors **********
2887 inodes used (4%)
13 non-contiguous inodes (0.5%)
# of inodes with ind/dind/tind blocks: 156/0/0
24548 blocks used (38%)
0 bad blocks
1 large file
2309 regular files
175 directories
47 character device files
40 block device files
0 fifos
8 links
308 symbolic links (308 fast symbolic links)
0 sockets
--------
2887 files
/dev/sdb3 is a swap partition, so we can skip that.
/dev/sdb4 is an extended partition, so we can skip that too.
~ # fsck.ext3 -nv /dev/sdb5
e2fsck 1.38 (30-Jun-2005)
Warning! /dev/sdb5 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sdb5 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create? no
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (118269, counted=118286).
Fix? no
Free inodes count wrong (126523, counted=126537).
Fix? no
/dev/sdb5: ********** WARNING: Filesystem still has errors **********
69 inodes used (0%)
4 non-contiguous inodes (5.8%)
# of inodes with ind/dind/tind blocks: 0/0/0
8235 blocks used (6%)
0 bad blocks
1 large file
32 regular files
14 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
46 files
~ # fsck.ext3 -nv /dev/sdb6
e2fsck 1.38 (30-Jun-2005)
/dev/sdb6: clean, 93575/60899328 files, 56875141/121776688 blocks
So the errors are on /dev/sdb2 and /dev/sdb5
Let's see where they mount to.
~ # mount | grep /sdb
/dev/sdb1 on /mnt/__mxo_sdb1 type ext3 (rw)
~ # cat /proc/mounts | grep /sdb
/dev/sdb5 /tmp ext3 rw 0 0
~ # cat /proc/cmdline
console=ttyS0,115200 root=/dev/sdb2 rw
Are we booting from /dev/sdb2?
~ # mxoparam -h
Maxtor mxoparam version 1.0
-a show all maxtor params
-b get wait for button status
-c [0-1] set wait for button 0 = Off 1 = On
-d show max number of drives
-e enable watchdog in uboot
-f disable watchdog in uboot
-g set led solid green
-h show help
-k kick watchdog
-p get boot partition
-q [part] set boot partition
0 = drive 0 partition 1
1 = drive 0 partition 2
2 = drive 1 partition 1
3 = drive 1 partition 2
-r reset partion fail count
-s get serial number
-t [sn] set serial number
-v show version
-x disable watchdog now
-w enable watchdog now
-y set led solid yellow
~ # mxoparam -p
Boot partition is 3
Looks like the system is booting from the second disk, second partition (/dev/sdb2)
This means we can't unmount it, and we need to unmount it before we can fix it.
Therefore, we need to make the system boot from /dev/sda2 otherwise we won't be able to fix /dev/sdb*
First of all, let's make sure /dev/sda2 is exactly the same as /dev/sdb2
~ # dd if=/dev/sdb2 of=/dev/sda2
~ # mount -n /dev/sda2 /mnt/__mxo_sda2 -t ext3
~ # cp -a /mnt/__mxo_sdb2 /mnt/__mxo_sda2
Now let's set the new boot partition
~ # mxoparam -q 1
REBOOT!
...
Let's check it's booting up from the right place now.
~ # cat /proc/cmdline
console=ttyS0,115200 root=/dev/sda2 rw
Right! We're ready to fix /dev/sdb2 and /dev/sdb5 now!
~ # fsck -v /dev/sdb2
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/sdb2 has gone 384 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create? yes
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb2: ***** FILE SYSTEM WAS MODIFIED *****
3157 inodes used (4%)
14 non-contiguous inodes (0.4%)
# of inodes with ind/dind/tind blocks: 159/0/0
24622 blocks used (38%)
0 bad blocks
1 large file
2317 regular files
179 directories
246 character device files
84 block device files
0 fifos
7 links
321 symbolic links (321 fast symbolic links)
0 sockets
--------
3154 files
~ # fsck -v /dev/sdb5
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/sdb5: recovering journal
/dev/sdb5 has been mounted 35 times without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create? yes
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb5: ***** FILE SYSTEM WAS MODIFIED *****
2939 inodes used (2%)
28 non-contiguous inodes (1.0%)
# of inodes with ind/dind/tind blocks: 159/0/0
26690 blocks used (21%)
0 bad blocks
1 large file
2345 regular files
189 directories
47 character device files
40 block device files
0 fifos
8 links
308 symbolic links (308 fast symbolic links)
0 sockets
--------
2937 files
Now we can rebuild the RAID array.
~ # mdadm --manage --add /dev/md0 /dev/sdb6
mdadm: hot added /dev/sdb6
NAS:~ # mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sat May 5 06:30:50 2007
Raid Level : raid1
Array Size : 487106752 (464.54 GiB 498.80 GB)
Device Size : 487106752 (464.54 GiB 498.80 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat May 23 14:39:50 2009
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 0% complete
UUID : 7ff7415e:4719112d:d63dd33d:40ff685f
Events : 0.541445
Number Major Minor RaidDevice State
0 0 0 - removed
1 8 6 1 active sync /dev/sda6
2 8 22 0 spare rebuilding /dev/sdb6
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sdb6[2] sda6[1]
487106752 blocks [2/1] [_U]
[>....................] recovery = 1.5% (7396736/487106752) finish=145.1min speed=55092K/sec
unused devices: [none]
Good...
2.5 hours later...
~ # cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sdb6[0] sda6[1]
487106752 blocks [2/2] [UU]
unused devices: [none]
Reboot again and we're done!