linux - How to determine which partition has badblocks?

Wednesday, June 13, 2018

linux - How to determine which partition has badblocks?

I have the following device:

Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD15EARS-00MVWB0
Serial Number:    WD-WCAZA3607921
LU WWN Device Id: 5 0014ee 2b01eac3e
Firmware Version: 51.0AB51
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Nov 21 00:08:20 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

and recently I got an error while reading the surface of this disk. This is the error:

Complete error log:
SMART Error Log Version: 1
ATA Error Count: 25 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 25 occurred at disk power-on lifetime: 18798 hours (783 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 00 40 37 e6  Error: UNC 8 sectors at LBA = 0x06374000 = 104284160
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 00 40 37 e6 08      08:54:35.771  READ DMA
  ec 00 00 00 00 00 a0 08      08:54:35.763  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      08:54:35.763  SET FEATURES [Set transfer mode]

This is the 25th error but previous errors are exactly the same.

Here's a smart report:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   253   189   021    Pre-fail  Always       -       2066
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1118
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       18833
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1101
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       277
193 Load_Cycle_Count        0x0032   085   085   000    Old_age   Always       -       346753
194 Temperature_Celsius     0x0022   122   109   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   196   000    Old_age   Always       -       11
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

So, it's not a bad sector yet, but I think it will be.

I have 7 partitions on that drive, and the problem is that I don't know where the sector(s) is, which partition or/and which MiB, KiB, etc. starting from the beginning of the disk. Is there a way to figure that out?

Answer

I found how to do it. The following line in smart report determines the LBA:

40 51 08 00 40 37 e6  Error: UNC 8 sectors at LBA = 0x06374000 = 104284160

So, it's 104284160. If we know that, we also know which partition is involved:

root:~# fdisk -lu /dev/sda
Device Boot      Start         End      Blocks   Id  System
...
/dev/sda3        99610624  1466798079   683593728   83  Linux

To determine where exactly on the 3rd partition that is:

104284160 - 99610624 = 4673536

We also have to know the block size:

# tune2fs -l /dev/mapper/crypt_data  | grep Block
Block count:              170897920
Block size:               4096
Blocks per group:         32768

And now we can determine which File System Block contains this LBA using the following formula:

   b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

In my case that would be:

b = (int)((104284160-99610624)*512/4096
b=584192

Now we have to check if there's a file there:

# debugfs
debugfs 1.42.8 (20-Jun-2013)
debugfs:  open /dev/mapper/crypt_data
debugfs:  testb 584192
Block 584192 marked in use
debugfs:  icheck 584192
Block   Inode number
584192  37486656
debugfs:  ncheck 37486656
Inode   Pathname
37486656    /some/file

And that's basically it. Now I have to manually reallocate the sector. More info how to do it, you can find here.

Blog

Wednesday, June 13, 2018

linux - How to determine which partition has badblocks?

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?