• Martin Peschke's avatar
    zfcp: auto port scan resiliency · 18f87a67
    Martin Peschke authored
    This patch improves the Fibre Channel port scan behaviour of the zfcp lldd.
    Without it the zfcp device driver may churn up the storage area network by
    excessive scanning and scan bursts, particularly in big virtual server
    environments, potentially resulting in interference of virtual servers and
    reduced availability of storage connectivity.
    
    The two main issues as to the zfcp device drivers automatic port scan in
    virtual server environments are frequency and simultaneity.
    On the one hand, there is no point in allowing lots of ports scans
    in a row. It makes sense, though, to make sure that a scan is conducted
    eventually if there has been any indication for potential SAN changes.
    On the other hand, lots of virtual servers receiving the same indication
    for a SAN change had better not attempt to conduct a scan instantly,
    that is, at the same time.
    
    Hence this patch has a two-fold approach for better port scanning:
    the introduction of a rate limit to amend frequency issues, and the
    introduction of a short random backoff to amend simultaneity issues.
    Both approaches boil down to deferred port scans, with delays
    comprising parts for both approaches.
    
    The new port scan behaviour is summarised best by:
    
                                                   NEW:    NEW:
                              no_auto_port_rescan  random  rate    flush
                                                   backoff limit   =wait
    
    adapter resume/thaw       yes                  yes     no      yes*
    adapter online (user)     no                   yes     no      yes*
    port rescan (user)        no                   no      no      yes
    adapter recovery (user)   yes                  yes     yes     no
    adapter recovery (other)  yes                  yes     yes     no
    incoming ELS              yes                  yes     yes     no
    incoming ELS lost         yes                  yes     yes     no
    
    Implementation is straight-forward by converting an existing worker to
    a delayed worker. But care is needed whenever that worker is going to be
    flushed (in order to make sure work has been completed), since a flush
    operation cancels the timer set up for deferred execution (see * above).
    
    There is a small race window whenever a port scan work starts
    running up to the point in time of storing the time stamp for that port
    scan. The impact is negligible. Closing that gap isn't trivial, though, and
    would the destroy the beauty of a simple work-to-delayed-work conversion.
    Signed-off-by: default avatarMartin Peschke <mpeschke@linux.vnet.ibm.com>
    Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    18f87a67
zfcp_fc.c 27.9 KB