• Bryan Boatright's avatar
    drivers/edac: pci: broken parity regression · 6b09ff9d
    Bryan Boatright authored
    Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
    following problem:
    
        In the kernel there is a pci device attribute located in sysfs that is
        checked by the EDAC PCI scanning code. If that attribute is set,
        PCI parity/error scannining is skipped for that device. The attribute
        is:
    
                broken_parity_status
    
        as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
        PCI devices.
    
    I don't think this check was actually implemented.  I have a misbehaved card
    that reports a parity error every 1000 ms:
    
    Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
    Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
    Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
    
    Setting that card's broken_parity_status bit did not mask the error:
    
    echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status
    
    I looked through the EDAC code and did not readily see any reference to
    broken_parity_status at all (which makes sense based on the behavior I am
    seeing).  I applied the following patch as a proof-of-concept and now EDAC's
    PCI parity error reporting behaves as documented:
    
    bryan
    
    Good regression find, bryan. It used to work. sigh.
    I added more logic to your patch, for more coverage of the error.
    
    Doug T
    Signed-off-by: default avatarBryan Boatright <b1@omega71.com>
    Signed-off-by: default avatarDoug Thompson <dougthompson@xmisson.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6b09ff9d
edac_pci_sysfs.c 19.4 KB