1. 01 Dec, 2017 4 commits
    • Sowmini Varadhan's avatar
      rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete · f10b4cff
      Sowmini Varadhan authored
      The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
      to find the rds_connection entries marked for deletion as part
      of the netns deletion under the protection of the rds_tcp_conn_lock.
      Since the rds_tcp_conn_list tracks rds_tcp_connections (which
      have a 1:1 mapping with rds_conn_path), multiple tc entries in
      the rds_tcp_conn_list will map to a single rds_connection, and will
      be deleted as part of the rds_conn_destroy() operation that is
      done outside the rds_tcp_conn_lock.
      
      The rds_tcp_conn_list traversal done under the protection of
      rds_tcp_conn_lock should not leave any doomed tc entries in
      the list after the rds_tcp_conn_lock is released, else another
      concurrently executiong netns delete (for a differnt netns) thread
      may trip on these entries.
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f10b4cff
    • Sowmini Varadhan's avatar
      rds: tcp: correctly sequence cleanup on netns deletion. · 681648e6
      Sowmini Varadhan authored
      Commit 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
      (and thus rds_tcp_dev_event notification) is only called from put_net()
      when all netns refcounts go to 0, but this cannot happen if the
      rds_connection itself is holding a c_net ref that it expects to
      release in rds_tcp_kill_sock.
      
      Instead, the rds_tcp_kill_sock callback should make sure to
      tear down state carefully, ensuring that the socket teardown
      is only done after all data-structures and workqs that depend
      on it are quiesced.
      
      The original motivation for commit 8edc3aff ("rds: tcp: Take explicit
      refcounts on struct net") was to resolve a race condition reported by
      syzkaller where workqs for tx/rx/connect were triggered after the
      namespace was deleted. Those worker threads should have been
      cancelled/flushed before socket tear-down and indeed,
      rds_conn_path_destroy() does try to sequence this by doing
           /* cancel cp_send_w */
           /* cancel cp_recv_w */
           /* flush cp_down_w */
           /* free data structures */
      Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
      invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
      we ought to have satisfied the requirement that "socket-close is
      done after all other dependent state is quiesced". However,
      rds_conn_shutdown has a bug in that it *always* triggers the reconnect
      workq (and if connection is successful, we always restart tx/rx
      workqs so with the right timing, we risk the race conditions reported
      by syzkaller).
      
      Netns deletion is like module teardown- no need to restart a
      reconnect in this case. We can use the c_destroy_in_prog bit
      to avoid restarting the reconnect.
      
      Fixes: 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      681648e6
    • Sowmini Varadhan's avatar
      rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() · 2d746c93
      Sowmini Varadhan authored
      A side-effect of Commit c14b0366 ("rds: tcp: set linger to 1
      when unloading a rds-tcp") is that we always send a RST on the tcp
      connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
      is not needed any more and is removed in this patch.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d746c93
    • Jon Maloy's avatar
      tipc: fall back to smaller MTU if allocation of local send skb fails · 4c94cc2d
      Jon Maloy authored
      When sending node local messages the code is using an 'mtu' of 66060
      bytes to avoid unnecessary fragmentation. During situations of low
      memory tipc_msg_build() may sometimes fail to allocate such large
      buffers, resulting in unnecessary send failures. This can easily be
      remedied by falling back to a smaller MTU, and then reassemble the
      buffer chain as if the message were arriving from a remote node.
      
      At the same time, we change the initial MTU setting of the broadcast
      link to a lower value, so that large messages always are fragmented
      into smaller buffers even when we run in single node mode. Apart from
      obtaining the same advantage as for the 'fallback' solution above, this
      turns out to give a significant performance improvement. This can
      probably be explained with the __pskb_copy() operation performed on the
      buffer for each recipient during reception. We found the optimal value
      for this, considering the most relevant skb pool, to be 3744 bytes.
      Acked-by: default avatarYing Xue <ying.xue@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c94cc2d
  2. 30 Nov, 2017 29 commits
  3. 29 Nov, 2017 7 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux · b9151761
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "I screwed up my merge window pull request; I only sent half of what I
        meant to.
      
        There were no new features, just bugfixes of various importance and
        some very minor cleanup, so I think it's all still appropriate for
        -rc2.
      
        Highlights:
      
         - Fixes from Trond for some races in the NFSv4 state code.
      
         - Fix from Naofumi Honda for a typo in the blocked lock notificiation
           code
      
         - Fixes from Vasily Averin for some problems starting and stopping
           lockd especially in network namespaces"
      
      * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
        lockd: fix "list_add double add" caused by legacy signal interface
        nlm_shutdown_hosts_net() cleanup
        race of nfsd inetaddr notifiers vs nn->nfsd_serv change
        race of lockd inetaddr notifiers vs nlmsvc_rqst change
        SUNRPC: make cache_detail structures const
        NFSD: make cache_detail structures const
        sunrpc: make the function arg as const
        nfsd: check for use of the closed special stateid
        nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
        lockd: lost rollback of set_grace_period() in lockd_down_net()
        lockd: added cleanup checks in exit_net hook
        grace: replace BUG_ON by WARN_ONCE in exit_net hook
        nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
        lockd: remove net pointer from messages
        nfsd: remove net pointer from debug messages
        nfsd: Fix races with check_stateid_generation()
        nfsd: Ensure we check stateid validity in the seqid operation checks
        nfsd: Fix race in lock stateid creation
        nfsd4: move find_lock_stateid
        nfsd: Ensure we don't recognise lock stateids after freeing them
        ...
      b9151761
    • Linus Torvalds's avatar
      Merge tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 26cd9474
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "We've collected some fixes in since the pre-merge window freeze.
      
        There's technically only one regression fix for 4.15, but the rest
        seems important and candidates for stable.
      
         - fix missing flush bio puts in error cases (is serious, but rarely
           happens)
      
         - fix reporting stat::st_blocks for buffered append writes
      
         - fix space cache invalidation
      
         - fix out of bound memory access when setting zlib level
      
         - fix potential memory corruption when fsync fails in the middle
      
         - fix crash in integrity checker
      
         - incremetnal send fix, path mixup for certain unlink/rename
           combination
      
         - pass flags to writeback so compressed writes can be throttled
           properly
      
         - error handling fixes"
      
      * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: incremental send, fix wrong unlink path after renaming file
        btrfs: tree-checker: Fix false panic for sanity test
        Btrfs: fix list_add corruption and soft lockups in fsync
        btrfs: Fix wild memory access in compression level parser
        btrfs: fix deadlock when writing out space cache
        btrfs: clear space cache inode generation always
        Btrfs: fix reported number of inode blocks after buffered append writes
        Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
        Btrfs: bail out gracefully rather than BUG_ON
        btrfs: dev_alloc_list is not protected by RCU, use normal list_del
        btrfs: add missing device::flush_bio puts
        btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
        Btrfs: add write_flags for compression bio
      26cd9474
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze · 198e0c0c
      Linus Torvalds authored
      Pull Microblaze fix from Michal Simek:
       "Add missing header to mmu_context_mm.h"
      
      * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: add missing include to mmu_context_mm.h
      198e0c0c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · fccfde44
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "Sparc T4 and later cpu bootup regression fix"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix boot on T4 and later.
      fccfde44
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 96c22a49
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
          missed one spot. From Zhu Yanjun.
      
       2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.
      
       3) Fix checksum offloading in thunderx driver, from Sunil Goutham.
      
       4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.
      
       5) Fix use after free of packet headers in TIPC, from Jon Maloy.
      
       6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.
      
       7) Tunneling fixes in mlxsw driver, from Petr Machata.
      
       8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
          Maloney.
      
       9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
          Dumazet.
      
      10) Fix regression in sch_sfq.c due to one of the timer_setup()
          conversions. From Paolo Abeni.
      
      11) SCTP does list_for_each_entry() using wrong struct member, fix from
          Xin Long.
      
      12) Don't use big endian netlink attribute read for
          IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
          Long.
      
      13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
          adding filters there. From Jiri Pirko.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
        ethernet: dwmac-stm32: Fix copyright
        net: via: via-rhine: use %p to format void * address instead of %x
        net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
        myri10ge: Update MAINTAINERS
        net: sched: cbq: create block for q->link.block
        atm: suni: remove extraneous space to fix indentation
        atm: lanai: use %p to format kernel addresses instead of %x
        VSOCK: Don't set sk_state to TCP_CLOSE before testing it
        atm: fore200e: use %pK to format kernel addresses instead of %x
        ambassador: fix incorrect indentation of assignment statement
        vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
        bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
        sctp: use right member as the param of list_for_each_entry
        sch_sfq: fix null pointer dereference at timer expiration
        cls_bpf: don't decrement net's refcount when offload fails
        net/packet: fix a race in packet_bind() and packet_notifier()
        packet: fix crash in fanout_demux_rollover()
        sctp: remove extern from stream sched
        sctp: force the params with right types for sctp csum apis
        sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
        ...
      96c22a49
    • David S. Miller's avatar
      sparc64: Fix boot on T4 and later. · e5372cd5
      David S. Miller authored
      If we don't put the NG4fls.o object into the same part of
      the link as the generic sparc64 objects for fls() and __fls()
      then the relocation in the branch we use for patching will
      not fit.
      
      Move NG4fls.o into lib-y to fix this problem.
      
      Fixes: 46ad8d2d ("sparc64: Use sparc optimized fls and __fls for T4 and above")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      e5372cd5
    • Linus Torvalds's avatar
      vsprintf: don't use 'restricted_pointer()' when not restricting · ef0010a3
      Linus Torvalds authored
      Instead, just fall back on the new '%p' behavior which hashes the
      pointer.
      
      Otherwise, '%pK' - that was intended to mark a pointer as restricted -
      just ends up leaking pointers that a normal '%p' wouldn't leak.  Which
      just make the whole thing pointless.
      
      I suspect we should actually get rid of '%pK' entirely, and make it just
      work as '%p' regardless, but this is the minimal obvious fix.  People
      who actually use 'kptr_restrict' should weigh in on which behavior they
      want.
      
      Cc: Tobin Harding <me@tobin.cc>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef0010a3