1. 07 Oct, 2012 1 commit
    • Gao feng's avatar
      netlink: add reference of module in netlink_dump_start · 6dc878a8
      Gao feng authored
      I get a panic when I use ss -a and rmmod inet_diag at the
      same time.
      
      It's because netlink_dump uses inet_diag_dump which belongs to module
      inet_diag.
      
      I search the codes and find many modules have the same problem.  We
      need to add a reference to the module which the cb->dump belongs to.
      
      Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.
      
      Change From v3:
      change netlink_dump_start to inline,suggestion from Pablo and
      Eric.
      
      Change From v2:
      delete netlink_dump_done,and call module_put in netlink_dump
      and netlink_sock_destruct.
      Signed-off-by: default avatarGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dc878a8
  2. 06 Oct, 2012 3 commits
    • Linus Torvalds's avatar
      Merge branch 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers · ed5062dd
      Linus Torvalds authored
      Pull UAPI disintegration fixes from David Howells:
       "There are three main parts:
      
       (1) I found I needed some more fixups in the wake of testing Arm64
           (some asm/unistd.h files had weird guards that caused problems -
           mostly in arches for which I don't have a compiler) and some
           __KERNEL__ splitting needed to take place in Arm64.
      
       (2) I found that c6x was missing some __KERNEL__ guards in its
           asm/signal.h.  Mark Salter pointed me at a tree with a patch to
           remove that file entirely and use the asm-generic variant instead.
      
       (3) Lastly, m68k turned out to have a header installation problem due
           to it lacking a kvm_para.h file.
      
           The conditional installation bits for linux/kvm_para.h, linux/kvm.h
           and linux/a.out.h weren't very well specified - and didn't work if
           an arch didn't have the asm/ version of that file, but there *was*
           an asm-generic/ version.
      
           It seems the "ifneq $((wildcard ...),)" for each of those three
           headers in include/kernel/Kbuild is invoked twice during header
           installation, and the second time it matches on the just installed
           asm-generic/kvm_para.h file and thus incorrectly installs
           linux/kvm_para.h as well.
      
           Most arches actually have an asm/kvm_para.h, so this wasn't
           detectable in those."
      
      * 'uapi-prep' of git://git.infradead.org/users/dhowells/linux-headers:
        UAPI: Fix conditional header installation handling (notably kvm_para.h on m68k)
        c6x: remove c6x signal.h
        UAPI: Split compound conditionals containing __KERNEL__ in Arm64
        UAPI: Fix the guards on various asm/unistd.h files
        c6x: make dsk6455 the default config
      ed5062dd
    • Linus Torvalds's avatar
      Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux · 125b79d7
      Linus Torvalds authored
      Pull SLAB changes from Pekka Enberg:
       "New and noteworthy:
      
        * More SLAB allocator unification patches from Christoph Lameter and
          others.  This paves the way for slab memcg patches that hopefully
          will land in v3.8.
      
        * SLAB tracing improvements from Ezequiel Garcia.
      
        * Kernel tainting upon SLAB corruption from Dave Jones.
      
        * Miscellanous SLAB allocator bug fixes and improvements from various
          people."
      
      * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (43 commits)
        slab: Fix build failure in __kmem_cache_create()
        slub: init_kmem_cache_cpus() and put_cpu_partial() can be static
        mm/slab: Fix kmem_cache_alloc_node_trace() declaration
        Revert "mm/slab: Fix kmem_cache_alloc_node_trace() declaration"
        mm, slob: fix build breakage in __kmalloc_node_track_caller
        mm/slab: Fix kmem_cache_alloc_node_trace() declaration
        mm/slab: Fix typo _RET_IP -> _RET_IP_
        mm, slub: Rename slab_alloc() -> slab_alloc_node() to match SLAB
        mm, slab: Rename __cache_alloc() -> slab_alloc()
        mm, slab: Match SLAB and SLUB kmem_cache_alloc_xxx_trace() prototype
        mm, slab: Replace 'caller' type, void* -> unsigned long
        mm, slob: Add support for kmalloc_track_caller()
        mm, slab: Remove silly function slab_buffer_size()
        mm, slob: Use NUMA_NO_NODE instead of -1
        mm, sl[au]b: Taint kernel when we detect a corrupted slab
        slab: Only define slab_error for DEBUG
        slab: fix the DEADLOCK issue on l3 alien lock
        slub: Zero initial memory segment for kmem_cache and kmem_cache_node
        Revert "mm/sl[aou]b: Move sysfs_slab_add to common"
        mm/sl[aou]b: Move kmem_cache refcounting to common code
        ...
      125b79d7
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.7-arm-tag' of... · f1c6872e
      Linus Torvalds authored
      Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull ADM Xen support from Konrad Rzeszutek Wilk:
      
        Features:
         * Allow a Linux guest to boot as initial domain and as normal guests
           on Xen on ARM (specifically ARMv7 with virtualized extensions).  PV
           console, block and network frontend/backends are working.
        Bug-fixes:
         * Fix compile linux-next fallout.
         * Fix PVHVM bootup crashing.
      
        The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
        ARMv7 platforms.
      
        The goal in implementing this architecture is to exploit the hardware
        as much as possible.  That means use as little as possible of PV
        operations (so no PV MMU) - and use existing PV drivers for I/Os
        (network, block, console, etc).  This is similar to how PVHVM guests
        operate in X86 platform nowadays - except that on ARM there is no need
        for QEMU.  The end result is that we share a lot of the generic Xen
        drivers and infrastructure.
      
        Details on how to compile/boot/etc are available at this Wiki:
      
          http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions
      
        and this blog has links to a technical discussion/presentations on the
        overall architecture:
      
          http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/
      
      * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
        xen/xen_initial_domain: check that xen_start_info is initialized
        xen: mark xen_init_IRQ __init
        xen/Makefile: fix dom-y build
        arm: introduce a DTS for Xen unprivileged virtual machines
        MAINTAINERS: add myself as Xen ARM maintainer
        xen/arm: compile netback
        xen/arm: compile blkfront and blkback
        xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
        xen/arm: receive Xen events on ARM
        xen/arm: initialize grant_table on ARM
        xen/arm: get privilege status
        xen/arm: introduce CONFIG_XEN on ARM
        xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
        xen/arm: Introduce xen_ulong_t for unsigned long
        xen/arm: Xen detection and shared_info page mapping
        docs: Xen ARM DT bindings
        xen/arm: empty implementation of grant_table arch specific functions
        xen/arm: sync_bitops
        xen/arm: page.h definitions
        xen/arm: hypercalls
        ...
      f1c6872e
  3. 05 Oct, 2012 36 commits
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 5f3d2f2e
      Linus Torvalds authored
      Pull powerpc updates from Benjamin Herrenschmidt:
       "Some highlights in addition to the usual batch of fixes:
      
         - 64TB address space support for 64-bit processes by Aneesh Kumar
      
         - Gavin Shan did a major cleanup & re-organization of our EEH support
           code (IBM fancy PCI error handling & recovery infrastructure) which
           paves the way for supporting different platform backends, along
           with some rework of the PCIe code for the PowerNV platform in order
           to remove home made resource allocations and instead use the
           generic code (which is possible after some small improvements to it
           done by Gavin).
      
         - Uprobes support by Ananth N Mavinakayanahalli
      
         - A pile of embedded updates from Freescale folks, including new SoC
           and board supports, more KVM stuff including preparing for 64-bit
           BookE KVM support, ePAPR 1.1 updates, etc..."
      
      Fixup trivial conflicts in drivers/scsi/ipr.c
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
        powerpc/iommu: Fix multiple issues with IOMMU pools code
        powerpc: Fix VMX fix for memcpy case
        driver/mtd:IFC NAND:Initialise internal SRAM before any write
        powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
        powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
        powerpc: Remove tlb batching hack for nighthawk
        powerpc: Set paca->data_offset = 0 for boot cpu
        powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
        powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
        powerpc/mpc85xx: Update interrupt handling for IFC controller
        powerpc/85xx: Enable USB support in p1023rds_defconfig
        powerpc/smp: Do not disable IPI interrupts during suspend
        powerpc/eeh: Fix crash on converting OF node to edev
        powerpc/eeh: Lock module while handling EEH event
        powerpc/kprobe: Don't emulate store when kprobe stwu r1
        powerpc/kprobe: Complete kprobe and migrate exception frame
        powerpc/kprobe: Introduce a new thread flag
        powerpc: Remove unused __get_user64() and __put_user64()
        powerpc/eeh: Global mutex to protect PE tree
        powerpc/eeh: Remove EEH PE for normal PCI hotplug
        ...
      5f3d2f2e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 283dbd82
      Linus Torvalds authored
      Pull networking changes from David Miller:
       "The most important bit in here is the fix for input route caching from
        Eric Dumazet, it's a shame we couldn't fully analyze this in time for
        3.6 as it's a 3.6 regression introduced by the routing cache removal.
      
        Anyways, will send quickly to -stable after you pull this in.
      
        Other changes of note:
      
         1) Fix lockdep splats in team and bonding, from Eric Dumazet.
      
         2) IPV6 adds link local route even when there is no link local
            address, from Nicolas Dichtel.
      
         3) Fix ixgbe PTP implementation, from Jacob Keller.
      
         4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.
      
         5) MAC length computed improperly in VLAN demux, from Antonio
            Quartulli."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
        ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
        Remove noisy printks from llcp_sock_connect
        tipc: prevent dropped connections due to rcvbuf overflow
        silence some noisy printks in irda
        team: set qdisc_tx_busylock to avoid LOCKDEP splat
        bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
        sctp: check src addr when processing SACK to update transport state
        sctp: fix a typo in prototype of __sctp_rcv_lookup()
        ipv4: add a fib_type to fib_info
        can: mpc5xxx_can: fix section type conflict
        can: peak_pcmcia: fix error return code
        can: peak_pci: fix error return code
        cxgb4: Fix build error due to missing linux/vmalloc.h include.
        bnx2x: fix ring size for 10G functions
        cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
        ixgbe: add support for X540-AT1
        ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
        ixgbe: fix PTP ethtool timestamping function
        ixgbe: (PTP) Fix PPS interrupt code
        ixgbe: Fix PTP X540 SDP alignment code for PPS signal
        ...
      283dbd82
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch-bomb) · 11126c61
      Linus Torvalds authored
      Merge misc patches from Andrew Morton:
       "The MM tree is rather stuck while I wait to find out what the heck is
        happening with sched/numa.  Probably I'll need to route around all the
        code which was added to -next, sigh.
      
        So this is "everything else", or at least most of it - other small
        bits are still awaiting resolutions of various kinds."
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits)
        lib/decompress.c add __init to decompress_method and data
        kernel/resource.c: fix stack overflow in __reserve_region_with_split()
        omfs: convert to use beXX_add_cpu()
        taskstats: cgroupstats_user_cmd() may leak on error
        aoe: update aoe-internal version number to 50
        aoe: update documentation to better reflect aoe-plus-udev usage
        aoe: remove unused code
        aoe: make dynamic block minor numbers the default
        aoe: update and specify AoE address guards and error messages
        aoe: retain static block device numbers for backwards compatibility
        aoe: support more AoE addresses with dynamic block device minor numbers
        aoe: update documentation with new URL and VM settings reference
        aoe: update copyright year in touched files
        aoe: update internal version number to 49
        aoe: remove unused code and add cosmetic improvements
        aoe: increase net_device reference count while using it
        aoe: associate frames with the AoE storage target
        aoe: disallow unsupported AoE minor addresses
        aoe: do revalidation steps in order
        aoe: failover remote interface based on aoe_deadsecs parameter
        ...
      11126c61
    • Hein Tibosch's avatar
      lib/decompress.c add __init to decompress_method and data · 33e2a422
      Hein Tibosch authored
      Fix the warning:
      
        WARNING: vmlinux.o(.text+0x14cfd8): Section mismatch in reference from the variable compressed_formats to the function .init.text:gunzip()
        The function compressed_formats() references
        the function __init gunzip().
        etc..
      
      Within decompress.c, compressed_formats[] needs 'a __initdata annotation',
      because some of it's data members refer to functions which will be
      unloaded after init.
      
      Consequently, its user decompress_method() will get the __init prefix.
      Signed-off-by: default avatarHein Tibosch <hein_tibosch@yahoo.es>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33e2a422
    • T Makphaibulchoke's avatar
      kernel/resource.c: fix stack overflow in __reserve_region_with_split() · 4965f566
      T Makphaibulchoke authored
      Using a recursive call add a non-conflicting region in
      __reserve_region_with_split() could result in a stack overflow in the case
      that the recursive calls are too deep.  Convert the recursive calls to an
      iterative loop to avoid the problem.
      
      Tested on a machine containing 135 regions.  The kernel no longer panicked
      with stack overflow.
      
      Also tested with code arbitrarily adding regions with no conflict,
      embedding two consecutive conflicts and embedding two non-consecutive
      conflicts.
      Signed-off-by: default avatarT Makphaibulchoke <tmac@hp.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
      Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4965f566
    • Wei Yongjun's avatar
      omfs: convert to use beXX_add_cpu() · c99b6841
      Wei Yongjun authored
      Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().
      
      dpatch engine is used to auto generate this patch.
      (https://github.com/weiyj/dpatch)
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: default avatarBob Copeland <me@bobcopeland.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c99b6841
    • Jesper Juhl's avatar
      taskstats: cgroupstats_user_cmd() may leak on error · 0324b5a4
      Jesper Juhl authored
      If prepare_reply() succeeds we have allocated memory for 'rep_skb'.  If
      nla_reserve() then subsequently fails and returns NULL we fail to release
      the memory we allocated, thus causing a leak.
      Signed-off-by: default avatarJesper Juhl <jj@chaosbits.net>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0324b5a4
    • Ed Cashin's avatar
    • Ed Cashin's avatar
    • Ed Cashin's avatar
      aoe: remove unused code · 1ac9e602
      Ed Cashin authored
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ac9e602
    • Ed Cashin's avatar
      aoe: make dynamic block minor numbers the default · 08b60623
      Ed Cashin authored
      Because udev use is so widespread, making the old static mapping the
      default is too conservative, given the severe limitations it places on
      usable AoE addresses.  Storage virtualization and larger shelves have made
      the old limitations too confining.
      
      These changes make the dynamic block device minor numbers the default,
      removing the limitations on usable AoE addresses.
      
      The static arrangement is still available with aoe_dyndevs=0, and the
      aoe-stat tool from the userland aoetools package, the user space
      counterpart to the aoe driver, recognizes the case where there is a
      mismatch between the minor number in sysfs and the minor number in a
      special device file.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08b60623
    • Ed Cashin's avatar
      aoe: update and specify AoE address guards and error messages · 7159e969
      Ed Cashin authored
      In general, specific is better when it comes to messages about AoE usage
      problems.  Also, explicit checks for the AoE broadcast addresses are
      added.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7159e969
    • Ed Cashin's avatar
      aoe: retain static block device numbers for backwards compatibility · 4bcce1a3
      Ed Cashin authored
      The old mapping between AoE target shelf and slot addresses and the block
      device minor number is retained as a backwards-compatible feature, with a
      new "aoe_dyndevs" module parameter available for enabling dynamic block
      device minor numbers.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bcce1a3
    • Ed Cashin's avatar
      aoe: support more AoE addresses with dynamic block device minor numbers · 0c966214
      Ed Cashin authored
      The ATA over Ethernet protocol uses a major (shelf) and minor (slot)
      address to identify a particular storage target.  These changes remove an
      artificial limitation the aoe driver imposes on the use of AoE addresses.
      For example, without these changes, the slot address has a maximum of 15,
      but users commonly use slot numbers much greater than that.
      
      The AoE shelf and slot address space is often used sparsely.  Instead of
      using a static mapping between AoE addresses and the block device minor
      number, the block device minor numbers are now allocated on demand.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c966214
    • Ed Cashin's avatar
      aoe: update documentation with new URL and VM settings reference · eecdf226
      Ed Cashin authored
      The old area has a new URL.  Also, now that the driver can perform better,
      it is worth mentioning the VM settings that help aoe to sink dirty pages
      out early, avoiding unecessary memory pressure when much I/O is going on.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eecdf226
    • Ed Cashin's avatar
    • Ed Cashin's avatar
      aoe: update internal version number to 49 · 7392fbe5
      Ed Cashin authored
      The internal version number of the aoe driver appears in a console message
      when the driver loads and is usually obtained by the user with the
      userland aoe-version tool, part of the aoetools.[1]
      
      Although this patchset includes bugfixes backported from higher-numbered
      versions published on the coraid.com website, it is a form of version 49.
      
      1. http://aoetools.sourceforge.net/Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7392fbe5
    • Ed Cashin's avatar
      aoe: remove unused code and add cosmetic improvements · b21faa25
      Ed Cashin authored
      This change removes some unused code and attempts to increase code
      consistency.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b21faa25
    • Ed Cashin's avatar
      aoe: increase net_device reference count while using it · 1b86fda9
      Ed Cashin authored
      This change eliminates the danger that the user could rmmod the driver for
      a network interface that is being used for AoE by the aoe driver.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b86fda9
    • Ed Cashin's avatar
      aoe: associate frames with the AoE storage target · 64a80f5a
      Ed Cashin authored
      In the driver code, "target" and aoetgt refer to a particular remote
      interface on the AoE storage target.  The latter is identified by its AoE
      major and minor addresses.  Commands that are being sent to an AoE storage
      target {major, minor} can be sent or retransmitted to any of the remote
      MAC addresses associated with the AoE storage target.
      
      That is, frames are naturally associated with not an aoetgt (AoE major,
      AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor).
      Making the code reflect that reality simplifies the driver, especially
      when the path to a remote MAC address becomes unusable.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64a80f5a
    • Ed Cashin's avatar
      aoe: disallow unsupported AoE minor addresses · 6583303c
      Ed Cashin authored
      A guard is inserted to prevent AoE minor addresses (slot addresses) higher
      than 15 to be used, as they are not yet supported by the driver.
      
      There is a change coming that will allow the aoe driver to overcome this
      limit by using system device minor numbers dynamically, but until then,
      this guard prevents unexpected targets from being used by the driver when
      AoE targets with high minor numbers are on the AoE network.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6583303c
    • Ed Cashin's avatar
      aoe: do revalidation steps in order · 25f4d75e
      Ed Cashin authored
      The discovery process begins with an optional AoE config query command and
      an AoE config query response.  Normally when an aoe device is already
      open, the config query response does not trigger an ATA identify device
      command to be sent out, since the response contains storage capacity
      information that, if changed, could surprise the user of the device.
      
      The userland "aoe-revalidate" tool uses a character device to trigger an
      AoE config query for a particular AoE storage target and an ATA device
      identify command, even when the device is open.
      
      This change causes the config query to go out first, reflecting the normal
      discovery sequence.  The responses could come back in any order, so this
      change is fairly cosmetic.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25f4d75e
    • Ed Cashin's avatar
      aoe: failover remote interface based on aoe_deadsecs parameter · d54d35ac
      Ed Cashin authored
      The aoe_deadsecs module parameter allows the user to specify a hard limit
      on the number of seconds an AoE command can be retransmitted before the
      AoE block device is considered to have failed.
      
      Using aoe_deadsecs to determine the time we try using a different remote
      interface helps to ensure that the hard limit is not reached before we've
      tried to recover by sending to a different remote port.
      
      As a data storage target, the AoE target is unambiguously identified by
      its {major, minor} AoE address tuple, and an AoE target can have multiple
      MAC addresses.  However, note that "target" in the driver code and
      comments means a {major, minor, MAC address} tuple, as in "somewhere to
      send packets".
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d54d35ac
    • Ed Cashin's avatar
      aoe: use packets that work with the smallest-MTU local interface · 3f0f0133
      Ed Cashin authored
      Users with several network interfaces dedicated to AoE generally do not
      configure them to support different-sized AoE data payloads on purpose.
      
      For a given AoE target, there will be a set of local network interfaces
      that can reach it.  Using only the payload that will fit in the
      smallest-sized MTU of all those local interfaces greatly simplifies the
      driver, especially in failure scenarios.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f0f0133
    • Ed Cashin's avatar
      aoe: use a kernel thread for transmissions · eb086ec5
      Ed Cashin authored
      The dev_queue_xmit function needs to have interrupts enabled, so the most
      simple way to get the locking right but still fulfill that requirement is
      to use a process that can call dev_queue_xmit serially over queued
      transmissions.
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb086ec5
    • Ed Cashin's avatar
      aoe: become I/O request queue handler for increased user control · 69cf2d85
      Ed Cashin authored
      To allow users to choose an elevator algorithm for their particular
      workloads, change from a make_request-style driver to an
      I/O-request-queue-handler-style driver.
      
      We have to do a couple of things that might be surprising.  We manipulate
      the page _count directly on the assumption that we still have no guarantee
      that users of the block layer are prohibited from submitting bios
      containing pages with zero reference counts.[1] If such a prohibition now
      exists, I can get rid of the _count manipulation.
      
      Just as before this patch, we still keep track of the sk_buffs that the
      network layer still hasn't finished yet and cap the resources we use with
      a "pool" of skbs.[2]
      
      Now that the block layer maintains the disk stats, the aoe driver's
      diskstats function can go away.
      
      1. https://lkml.org/lkml/2007/3/1/374
      2. https://lkml.org/lkml/2007/7/6/241Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69cf2d85
    • Ed Cashin's avatar
      aoe: kernel thread handles I/O completions for simple locking · 896831f5
      Ed Cashin authored
      Make the frames the aoe driver uses to track the relationship between bios
      and packets more flexible and detached, so that they can be passed to an
      "aoe_ktio" thread for completion of I/O.
      
      The frames are handled much like skbs, with a capped amount of
      preallocation so that real-world use cases are likely to run smoothly and
      degenerate gracefully even under memory pressure.
      
      Decoupling I/O completion from the receive path and serializing it in a
      process makes it easier to think about the correctness of the locking in
      the driver, especially in the case of a remote MAC address becoming
      unusable.
      
      [dan.carpenter@oracle.com: cleanup an allocation a bit]
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      896831f5
    • Ed Cashin's avatar
      aoe: for performance support larger packet payloads · 3d5b0605
      Ed Cashin authored
      tAdd adds the ability to work with large packets composed of a number of
      segments, using the scatter gather feature of the block layer (biovecs)
      and the network layer (skb frag array).  The motivation is the performance
      gained by using a packet data payload greater than a page size and by
      using the network card's scatter gather feature.
      
      Users of the out-of-tree aoe driver already had these changes, but since
      early 2011, they have complained of increased memory utilization and
      higher CPU utilization during heavy writes.[1] The commit below appears
      related, as it disables scatter gather on non-IP protocols inside the
      harmonize_features function, even when the NIC supports sg.
      
        commit f01a5236
        Author: Jesse Gross <jesse@nicira.com>
        Date:   Sun Jan 9 06:23:31 2011 +0000
      
            net offloading: Generalize netif_get_vlan_features().
      
      With that regression in place, transmits always linearize sg AoE packets,
      but in-kernel users did not have this patch.  Before 2.6.38, though, these
      changes were working to allow sg to increase performance.
      
      1. http://www.spinics.net/lists/linux-mm/msg15184.htmlSigned-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d5b0605
    • Paul Clements's avatar
      nbd: handle discard requests · a336d298
      Paul Clements authored
      Add discard support to nbd.  If the nbd-server supports discard, it will
      send NBD_FLAG_SEND_TRIM to the client.  The client will then set the flag
      in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
      for the device (QUEUE_FLAG_DISCARD).
      
      If discard support is enabled, then when the nbd client system receives a
      discard request, this will be passed along to the nbd-server.  When the
      discard request is received by the nbd-server, it will perform:
      
      	fallocate(.. FALLOC_FL_PUNCH_HOLE ..)
      
      To punch a hole in the backend storage, which is no longer needed.
      Signed-off-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a336d298
    • Paul Clements's avatar
      nbd: add set flags ioctl · 2f012508
      Paul Clements authored
      Add a set-flags ioctl, allowing various option flags to be set on an nbd
      device.  This allows the nbd-client to set the device flags (to enable
      read-only mode, or enable discard support, etc.).
      
      Flags are typically specified by the nbd-server.  During the negotiation
      phase of the nbd connection, the server sends its flags to the client.
      The client then uses NBD_SET_FLAGS to inform the kernel of the options.
      
      Also included is a one-line fix to debug output for the set-timeout ioctl.
      Signed-off-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f012508
    • Alexandre Bounine's avatar
      rapidio: add destination ID allocation mechanism · de74e00a
      Alexandre Bounine authored
      Replace the single global destination ID counter with per-net allocation
      mechanism to allow independent destID management for each available
      RapidIO network.  Using bitmap based mechanism instead of counters allows
      destination ID release and reuse in systems that support hot-swap.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de74e00a
    • Alexandre Bounine's avatar
      rapidio/rionet: rework to support multiple RIO master ports · 2fb717ec
      Alexandre Bounine authored
      Make RIONET driver multi-net safe/capable by introducing per-net lists of
      RapidIO network peers.  Rework registration of network adapters to support
      all available RIO master port devices.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2fb717ec
    • Alexandre Bounine's avatar
      rapidio: run discovery as an asynchronous process · 005842ef
      Alexandre Bounine authored
      Modify mport initialization routine to run the RapidIO discovery process
      asynchronously.  This allows to have an arbitrary order of enumerating and
      discovering ports in systems with multiple RapidIO controllers without
      creating a deadlock situation if enumerator port is registered after a
      discovering one.
      
      Making netID matching to mportID ensures consistent net ID assignment in
      multiport RapidIO systems with asynchronous discovery process (global
      counter implementation is affected by race between threads).
      
      [akpm@linux-foundation.org: tweak code layput]
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      005842ef
    • Alexandre Bounine's avatar
      rapidio: use device lists handling on per-net basis · a7071efc
      Alexandre Bounine authored
      Modify handling of device lists to resolve issues caused by using single
      global list of RIO devices during enumeration/discovery.  The most common
      sign of existing issue is incorrect contents of switch routing tables in
      systems with multiple mport controllers while single-port configuration
      performs as expected.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7071efc
    • Alexandre Bounine's avatar
      rapidio: fix blocking wait for discovery ready · fa3dbaa0
      Alexandre Bounine authored
      The following set of patches provides modifications targeting support of
      multiple RapidIO master port (mport) devices on a CPU-side of
      RapidIO-capable board.  While the RapidIO subsystem code has definitions
      suitable for multi-controller/multi-net support, the existing
      implementation cannot be considered ready for multiple mport
      configurations.
      
      =========== NOTES: =============
      
      a) The patches below do not address RapidIO side view of multiport
         processing elements defined in Part 6 of RapidIO spec Rev.2.1 (section
         6.4.1).  These devices have Base Device ID CSR (0x60) and Component Tag
         CSR (0x6C) shared by all SRIO ports.  For example, Freescale's P4080,
         P3041 and P5020 have a dual-port SRIO controller implemented according
         the specification.  Enumeration/discovery of such devices from RapidIO
         side may require device-specific fixups.
      
      b) Devices referenced above may also require implementation specific
         code to setup a host device ID for mport device.  These operations are
         not addressed by patches in this package.
      
      =================================
      
      Details about provided patches:
      
      1. Fix blocking wait for discovery ready
      
         While it does not happen on PowerPC based platforms, there is
         possibility of stalled CPU warning dump on x86 based platforms that run
         RapidIO discovery process if they wait too long for being enumerated.
      
         Currently users can avoid it by disabling the soft-lockup detector
         using "nosoftlockup" kernel parameter OR by ensuring that enumeration
         is completed before soft-lockup is detected.
      
         This patch eliminates blocking wait and keeps a scheduler running.
         It also is required for patch 3 below which introduces asynchronous
         discovery process.
      
      2. Use device lists handling on per-net basis
      
         This patch allows to correctly support multiple RapidIO nets and
         resolves possible issues caused by using single global list of devices
         during RapidIO system enumeration/discovery.  The most common sign of
         existing issue is incorrect contents of switch routing tables in
         systems with multiple mport controllers while single-port configuration
         performs as expected.
      
         The patch does not eliminate the global RapidIO device list but
         changes some routines in enumeration/discovery to use per-net device
         lists instead.  This way compatibility with upper layer RIO routines is
         preserved.
      
      3.  Run discovery as an asynchronous process
      
         This patch modifies RapidIO initialization routine to asynchronously
         run the discovery process for each corresponding mport.  This allows
         having an arbitrary order of enumerating and discovering mports without
         creating a deadlock situation if an enumerator port was registered
         after a discovering one.
      
         On boards with multiple discovering mports it also eliminates order
         dependency between mports and may reduce total time of RapidIO
         subsystem initialization.
      
         Making netID matching to mportID ensures consistent netID assignment
         in multiport RapidIO systems with asynchronous discovery process
         (global counter implementation is affected by race between threads).
      
      4. Rework RIONET to support multiple RIO master ports
      
         In the current version of the driver rionet_probe() has comment "XXX
         Make multi-net safe".  Now it is a good time to address this comment.
      
         This patch makes RIONET driver multi-net safe/capable by introducing
         per-net lists of RapidIO network peers.  It also enables to register
         network adapters for all available mport devices.
      
      5. Add destination ID allocation mechanism
      
         The patch replaces a single global destination ID counter with
         per-net allocation mechanism to allow independent destID management for
         each available RapidIO network.  Using bitmap based mechanism instead
         of counters allows destination ID release and reuse in systems that
         support hot-swap.
      
      This patch:
      
      Fix blocking wait loop in the RapidIO discovery routine to avoid warning
      dumps about stalled CPU on x86 platforms.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa3dbaa0
    • Alexandre Bounine's avatar
      rapidio: apply RX/TX enable to active switch ports only · 8d4630dc
      Alexandre Bounine authored
      Apply port RX/TX enable operations only to active switch ports.
      
      RapidIO specification (Part 6: LP-Serial Physical Layer) recommends to
      keep Output Port Enable (TX) and Input Port Enable (RX) control bits in
      disabled state (0b0) after device reset.  It also allows to have
      implementation specific reset state for these bits.
      
      This patch ensures that TX/RX enable action is applied only to active
      switch's ports while preserving an initial state of inactive ones.
      
      This patch is intended to keep inactive switch ports with inbound and
      outbound packet transfers disabled to block unexpected packets during hot
      insertion event.  While it does not fix any visible malfunction it is
      intended to prevent such events in future.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d4630dc