• Alexey Asemov's avatar
    libata: EH should handle AMNF error condition as a media error · eec7e1c1
    Alexey Asemov authored
    libata-eh.c should handle AMNF error condition (error byte bit 0,
    usually code 0x01) in libata-eh.c along with UNC as a media error so
    SCSI stack can handle it properly (translation code 0x01 is already
    present in libata-scsi.c) but was never passed down due to lack of
    handling in EH.
    
    While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the
    controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data
    from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure
    AMNF 0x01 error code generates generic "device error" that is retried
    several times by SCSI stack instead of "media error" that is passed up to
    software.
    
    So we may assume deprecated AMNF error code is surely not dead yet, and
    it's better for it to be handled properly. As we may see it is used by
    modern enough devices, and used properly: drive returned AMNF only when IDs
    for track cannot be read completely due to dying head or positioning,
    otherwise it returned UNC(orrectables).
    
    Not handling it causes wrong generic error code ("device error") reporting
    down the stack, can damage failing drives further because of excessive
    retries, and slows salvaging down a lot. Also, there is handling code in
    libata-scsi.c for 0x01 AMNF error already.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=80031
    
    tj: Shortened $SUBJ and moved its content to the first paragraph.
    Signed-off-by: default avatarAlexey Asemov <alex@alex-at.ru>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    eec7e1c1
libata-eh.c 108 KB