1. 26 Jun, 2018 21 commits
    • Omar Sandoval's avatar
      Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2() · 5f7e3b5b
      Omar Sandoval authored
      commit fd4e994b upstream.
      
      If we have invalid flags set, when we error out we must drop our writer
      counter and free the buffer we allocated for the arguments. This bug is
      trivially reproduced with the following program on 4.7+:
      
      	#include <fcntl.h>
      	#include <stdint.h>
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <sys/ioctl.h>
      	#include <sys/stat.h>
      	#include <sys/types.h>
      	#include <linux/btrfs.h>
      	#include <linux/btrfs_tree.h>
      
      	int main(int argc, char **argv)
      	{
      		struct btrfs_ioctl_vol_args_v2 vol_args = {
      			.flags = UINT64_MAX,
      		};
      		int ret;
      		int fd;
      
      		if (argc != 2) {
      			fprintf(stderr, "usage: %s PATH\n", argv[0]);
      			return EXIT_FAILURE;
      		}
      
      		fd = open(argv[1], O_WRONLY);
      		if (fd == -1) {
      			perror("open");
      			return EXIT_FAILURE;
      		}
      
      		ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args);
      		if (ret == -1)
      			perror("ioctl");
      
      		close(fd);
      		return EXIT_SUCCESS;
      	}
      
      When unmounting the filesystem, we'll hit the
      WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the
      filesystem to be remounted read-only as the writer count will stay
      lifted.
      
      Fixes: 6b526ed7 ("btrfs: introduce device delete by devid")
      CC: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f7e3b5b
    • Omar Sandoval's avatar
      Btrfs: fix clone vs chattr NODATASUM race · 55d29ff4
      Omar Sandoval authored
      commit b5c40d59 upstream.
      
      In btrfs_clone_files(), we must check the NODATASUM flag while the
      inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
      will change the flags after we check and we can end up with a party
      checksummed file.
      
      The race window is only a few instructions in size, between the if and
      the locks which is:
      
      3834         if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
      3835                 return -EISDIR;
      
      where the setflags must be run and toggle the NODATASUM flag (provided
      the file size is 0).  The clone will block on the inode lock, segflags
      takes the inode lock, changes flags, releases log and clone continues.
      
      Not impossible but still needs a lot of bad luck to hit unintentionally.
      
      Fixes: 0e7b824c ("Btrfs: don't make a file partly checksummed through file clone")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55d29ff4
    • Tetsuo Handa's avatar
      driver core: Don't ignore class_dir_create_and_add() failure. · c81a6be9
      Tetsuo Handa authored
      commit 84d0c27d upstream.
      
      syzbot is hitting WARN() at kernfs_add_one() [1].
      This is because kernfs_create_link() is confused by previous device_add()
      call which continued without setting dev->kobj.parent field when
      get_device_parent() failed by memory allocation fault injection.
      Fix this by propagating the error from class_dir_create_and_add() to
      the calllers of get_device_parent().
      
      [1] https://syzkaller.appspot.com/bug?id=fae0fb607989ea744526d1c082a5b8de6529116fSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c81a6be9
    • Jan Kara's avatar
      ext4: fix fencepost error in check for inode count overflow during resize · f3233cb2
      Jan Kara authored
      commit 4f2f76f7 upstream.
      
      ext4_resize_fs() has an off-by-one bug when checking whether growing of
      a filesystem will not overflow inode count. As a result it allows a
      filesystem with 8192 inodes per group to grow to 64TB which overflows
      inode count to 0 and makes filesystem unusable. Fix it.
      
      Cc: stable@vger.kernel.org
      Fixes: 3f8a6411Reported-by: default avatarJaco Kroon <jaco@uls.co.za>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3233cb2
    • Theodore Ts'o's avatar
      ext4: correctly handle a zero-length xattr with a non-zero e_value_offs · 21542545
      Theodore Ts'o authored
      commit 8a2b307c upstream.
      
      Ext4 will always create ext4 extended attributes which do not have a
      value (where e_value_size is zero) with e_value_offs set to zero.  In
      most places e_value_offs will not be used in a substantive way if
      e_value_size is zero.
      
      There was one exception to this, which is in ext4_xattr_set_entry(),
      where if there is a maliciously crafted file system where there is an
      extended attribute with e_value_offs is non-zero and e_value_size is
      0, the attempt to remove this xattr will result in a negative value
      getting passed to memmove, leading to the following sadness:
      
      [   41.225365] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
      [   44.538641] BUG: unable to handle kernel paging request at ffff9ec9a3000000
      [   44.538733] IP: __memmove+0x81/0x1a0
      [   44.538755] PGD 1249bd067 P4D 1249bd067 PUD 1249c1067 PMD 80000001230000e1
      [   44.538793] Oops: 0003 [#1] SMP PTI
      [   44.539074] CPU: 0 PID: 1470 Comm: poc Not tainted 4.16.0-rc1+ #1
          ...
      [   44.539475] Call Trace:
      [   44.539832]  ext4_xattr_set_entry+0x9e7/0xf80
          ...
      [   44.539972]  ext4_xattr_block_set+0x212/0xea0
          ...
      [   44.540041]  ext4_xattr_set_handle+0x514/0x610
      [   44.540065]  ext4_xattr_set+0x7f/0x120
      [   44.540090]  __vfs_removexattr+0x4d/0x60
      [   44.540112]  vfs_removexattr+0x75/0xe0
      [   44.540132]  removexattr+0x4d/0x80
          ...
      [   44.540279]  path_removexattr+0x91/0xb0
      [   44.540300]  SyS_removexattr+0xf/0x20
      [   44.540322]  do_syscall_64+0x71/0x120
      [   44.540344]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199347
      
      This addresses CVE-2018-10840.
      Reported-by: default avatar"Xu, Wen" <wen.xu@gatech.edu>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Cc: stable@kernel.org
      Fixes: dec214d0 ("ext4: xattr inode deduplication")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21542545
    • Theodore Ts'o's avatar
      ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget() · 02d45ec6
      Theodore Ts'o authored
      commit eb9b5f01 upstream.
      
      If ext4_find_inline_data_nolock() returns an error it needs to get
      reflected up to ext4_iget().  In order to fix this,
      ext4_iget_extra_inode() needs to return an error (and not return
      void).
      
      This is related to "ext4: do not allow external inodes for inline
      data" (which fixes CVE-2018-11412) in that in the errors=continue
      case, it would be useful to for userspace to receive an error
      indicating that file system is corrupted.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02d45ec6
    • Theodore Ts'o's avatar
      ext4: do not allow external inodes for inline data · e81d371d
      Theodore Ts'o authored
      commit 117166ef upstream.
      
      The inline data feature was implemented before we added support for
      external inodes for xattrs.  It makes no sense to support that
      combination, but the problem is that there are a number of extended
      attribute checks that are skipped if e_value_inum is non-zero.
      
      Unfortunately, the inline data code is completely e_value_inum
      unaware, and attempts to interpret the xattr fields as if it were an
      inline xattr --- at which point, Hilarty Ensues.
      
      This addresses CVE-2018-11412.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199803Reported-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Fixes: e50e5129 ("ext4: xattr-in-inode support")
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e81d371d
    • Lukas Czerner's avatar
      ext4: update mtime in ext4_punch_hole even if no blocks are released · bd713edf
      Lukas Czerner authored
      commit eee597ac upstream.
      
      Currently in ext4_punch_hole we're going to skip the mtime update if
      there are no actual blocks to release. However we've actually modified
      the file by zeroing the partial block so the mtime should be updated.
      
      Moreover the sync and datasync handling is skipped as well, which is
      also wrong. Fix it.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reported-by: default avatarJoe Habermann <joe.habermann@quantum.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd713edf
    • Jan Kara's avatar
      ext4: fix hole length detection in ext4_ind_map_blocks() · f70af46a
      Jan Kara authored
      commit 2ee3ee06 upstream.
      
      When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
      with the fact that mapped offset may be somewhere in the middle of the
      completely empty subtree. In such case it will return too large length
      of the hole which then results in lseek(SEEK_DATA) to end up returning
      an incorrect offset beyond the end of the hole.
      
      Fix the problem by correctly taking offset within a subtree into account
      when computing a length of a hole.
      
      Fixes: facab4d9
      CC: stable@vger.kernel.org
      Reported-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f70af46a
    • Trond Myklebust's avatar
      NFSv4.1: Fix up replays of interrupted requests · 84f4d2c6
      Trond Myklebust authored
      commit 3be0f80b upstream.
      
      If the previous request on a slot was interrupted before it was
      processed by the server, then our slot sequence number may be out of whack,
      and so we try the next operation using the old sequence number.
      
      The problem with this, is that not all servers check to see that the
      client is replaying the same operations as previously when they decide
      to go to the replay cache, and so instead of the expected error of
      NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could
      (if the operations match up) be mistaken by the client for a new reply.
      
      To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op
      in order to resync our slot sequence number.
      
      Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com>
      [olga.kornievskaia@gmail.com: fix an Oops]
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84f4d2c6
    • Daniel Borkmann's avatar
      tls: fix use-after-free in tls_push_record · 5e8a5c30
      Daniel Borkmann authored
      [ Upstream commit a447da7d ]
      
      syzkaller managed to trigger a use-after-free in tls like the
      following:
      
        BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
        Write of size 1 at addr ffff88037aa08000 by task a.out/2317
      
        CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        Call Trace:
         dump_stack+0x71/0xab
         print_address_description+0x6a/0x280
         kasan_report+0x258/0x380
         ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_sw_push_pending_record+0x2e/0x40 [tls]
         tls_sk_proto_close+0x3fe/0x710 [tls]
         ? tcp_check_oom+0x4c0/0x4c0
         ? tls_write_space+0x260/0x260 [tls]
         ? kmem_cache_free+0x88/0x1f0
         inet_release+0xd6/0x1b0
         __sock_release+0xc0/0x240
         sock_close+0x11/0x20
         __fput+0x22d/0x660
         task_work_run+0x114/0x1a0
         do_exit+0x71a/0x2780
         ? mm_update_next_owner+0x650/0x650
         ? handle_mm_fault+0x2f5/0x5f0
         ? __do_page_fault+0x44f/0xa50
         ? mm_fault_error+0x2d0/0x2d0
         do_group_exit+0xde/0x300
         __x64_sys_exit_group+0x3a/0x50
         do_syscall_64+0x9a/0x300
         ? page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happened through fault injection where aead_req allocation in
      tls_do_encryption() eventually failed and we returned -ENOMEM from
      the function. Turns out that the use-after-free is triggered from
      tls_sw_sendmsg() in the second tls_push_record(). The error then
      triggers a jump to waiting for memory in sk_stream_wait_memory()
      resp. returning immediately in case of MSG_DONTWAIT. What follows is
      the trim_both_sgl(sk, orig_size), which drops elements from the sg
      list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
      when the socket is being closed, where tls_sk_proto_close() callback
      is invoked. The tls_complete_pending_work() will figure that there's
      a pending closed tls record to be flushed and thus calls into the
      tls_push_pending_closed_record() from there. ctx->push_pending_record()
      is called from the latter, which is the tls_sw_push_pending_record()
      from sw path. This again calls into tls_push_record(). And here the
      tls_fill_prepend() will panic since the buffer address has been freed
      earlier via trim_both_sgl(). One way to fix it is to move the aead
      request allocation out of tls_do_encryption() early into tls_push_record().
      This means we don't prep the tls header and advance state to the
      TLS_PENDING_CLOSED_RECORD before allocation which could potentially
      fail happened. That fixes the issue on my side.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e8a5c30
    • Dexuan Cui's avatar
      hv_netvsc: Fix a network regression after ifdown/ifup · 244c10f9
      Dexuan Cui authored
      [ Upstream commit 52acf73b ]
      
      Recently people reported the NIC stops working after
      "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
      enabled, after the refactoring of the common detach logic: when the NIC
      has sub-channels, usually we enable all the TX queues after all
      sub-channels are set up: see rndis_set_subchannel() ->
      netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
      the number of channels doesn't change, we also must make sure the TX queues
      are enabled. The patch fixes the regression.
      
      Fixes: 7b2ee50c ("hv_netvsc: common detach logic")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      244c10f9
    • Willem de Bruijn's avatar
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · 5320e035
      Willem de Bruijn authored
      [ Upstream commit fd3a8862 ]
      
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5320e035
    • Paolo Abeni's avatar
      udp: fix rx queue len reported by diag and proc interface · 2e5d3168
      Paolo Abeni authored
      [ Upstream commit 6c206b20 ]
      
      After commit 6b229cf7 ("udp: add batching to udp_rmem_release()")
      the sk_rmem_alloc field does not measure exactly anymore the
      receive queue length, because we batch the rmem release. The issue
      is really apparent only after commit 0d4a6608 ("udp: do rmem bulk
      free even if the rx sk queue is empty"): the user space can easily
      check for an empty socket with not-0 queue length reported by the 'ss'
      tool or the procfs interface.
      
      We need to use a custom UDP helper to report the correct queue length,
      taking into account the forward allocation deficit.
      
      Reported-by: trevor.francis@46labs.com
      Fixes: 6b229cf7 ("UDP: add batching to udp_rmem_release()")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e5d3168
    • Cong Wang's avatar
      socket: close race condition between sock_close() and sockfs_setattr() · 91717ffc
      Cong Wang authored
      [ Upstream commit 6d8c50dc ]
      
      fchownat() doesn't even hold refcnt of fd until it figures out
      fd is really needed (otherwise is ignored) and releases it after
      it resolves the path. This means sock_close() could race with
      sockfs_setattr(), which leads to a NULL pointer dereference
      since typically we set sock->sk to NULL in ->release().
      
      As pointed out by Al, this is unique to sockfs. So we can fix this
      in socket layer by acquiring inode_lock in sock_close() and
      checking against NULL in sockfs_setattr().
      
      sock_release() is called in many places, only the sock_close()
      path matters here. And fortunately, this should not affect normal
      sock_close() as it is only called when the last fd refcnt is gone.
      It only affects sock_close() with a parallel sockfs_setattr() in
      progress, which is not common.
      
      Fixes: 86741ec2 ("net: core: Add a UID field to struct sock.")
      Reported-by: default avatarshankarapailoor <shankarapailoor@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91717ffc
    • Frank van der Linden's avatar
      tcp: verify the checksum of the first data segment in a new connection · 39f4ae01
      Frank van der Linden authored
      [ Upstream commit 4fd44a98 ]
      
      commit 079096f1 ("tcp/dccp: install syn_recv requests into ehash
      table") introduced an optimization for the handling of child sockets
      created for a new TCP connection.
      
      But this optimization passes any data associated with the last ACK of the
      connection handshake up the stack without verifying its checksum, because it
      calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
      directly.  These lower-level processing functions do not do any checksum
      verification.
      
      Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
      fix this.
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarFrank van der Linden <fllinden@amazon.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39f4ae01
    • Davide Caratti's avatar
      net/sched: act_simple: fix parsing of TCA_DEF_DATA · 81d15944
      Davide Caratti authored
      [ Upstream commit 8d499533 ]
      
      use nla_strlcpy() to avoid copying data beyond the length of TCA_DEF_DATA
      netlink attribute, in case it is less than SIMP_MAX_DATA and it does not
      end with '\0' character.
      
      v2: fix errors in the commit message, thanks Hangbin Liu
      
      Fixes: fa1b1cff ("net_cls_act: Make act_simple use of netlink policy.")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81d15944
    • Zhouyang Jia's avatar
      net: dsa: add error handling for pskb_trim_rcsum · 73c0eab8
      Zhouyang Jia authored
      [ Upstream commit 349b71d6 ]
      
      When pskb_trim_rcsum fails, the lack of error-handling code may
      cause unexpected results.
      
      This patch adds error-handling code after calling pskb_trim_rcsum.
      Signed-off-by: default avatarZhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73c0eab8
    • Julian Anastasov's avatar
      ipv6: allow PMTU exceptions to local routes · 6bcc27ab
      Julian Anastasov authored
      [ Upstream commit 09757646 ]
      
      IPVS setups with local client and remote tunnel server need
      to create exception for the local virtual IP. What we do is to
      change PMTU from 64KB (on "lo") to 1460 in the common case.
      Suggested-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Fixes: 7343ff31 ("ipv6: Don't create clones of host routes.")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bcc27ab
    • Bjørn Mork's avatar
      cdc_ncm: avoid padding beyond end of skb · 6e48ee02
      Bjørn Mork authored
      [ Upstream commit 49c2c3f2 ]
      
      Commit 4a0e3e98 ("cdc_ncm: Add support for moving NDP to end
      of NCM frame") added logic to reserve space for the NDP at the
      end of the NTB/skb.  This reservation did not take the final
      alignment of the NDP into account, causing us to reserve too
      little space. Additionally the padding prior to NDP addition did
      not ensure there was enough space for the NDP.
      
      The NTB/skb with the NDP appended would then exceed the configured
      max size. This caused the final padding of the NTB to use a
      negative count, padding to almost INT_MAX, and resulting in:
      
      [60103.825970] BUG: unable to handle kernel paging request at ffff9641f2004000
      [60103.825998] IP: __memset+0x24/0x30
      [60103.826001] PGD a6a06067 P4D a6a06067 PUD 4f65a063 PMD 72003063 PTE 0
      [60103.826013] Oops: 0002 [#1] SMP NOPTI
      [60103.826018] Modules linked in: (removed(
      [60103.826158] CPU: 0 PID: 5990 Comm: Chrome_DevTools Tainted: G           O 4.14.0-3-amd64 #1 Debian 4.14.17-1
      [60103.826162] Hardware name: LENOVO 20081 BIOS 41CN28WW(V2.04) 05/03/2012
      [60103.826166] task: ffff964193484fc0 task.stack: ffffb2890137c000
      [60103.826171] RIP: 0010:__memset+0x24/0x30
      [60103.826174] RSP: 0000:ffff964316c03b68 EFLAGS: 00010216
      [60103.826178] RAX: 0000000000000000 RBX: 00000000fffffffd RCX: 000000001ffa5000
      [60103.826181] RDX: 0000000000000005 RSI: 0000000000000000 RDI: ffff9641f2003ffc
      [60103.826184] RBP: ffff964192f6c800 R08: 00000000304d434e R09: ffff9641f1d2c004
      [60103.826187] R10: 0000000000000002 R11: 00000000000005ae R12: ffff9642e6957a80
      [60103.826190] R13: ffff964282ff2ee8 R14: 000000000000000d R15: ffff9642e4843900
      [60103.826194] FS:  00007f395aaf6700(0000) GS:ffff964316c00000(0000) knlGS:0000000000000000
      [60103.826197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [60103.826200] CR2: ffff9641f2004000 CR3: 0000000013b0c000 CR4: 00000000000006f0
      [60103.826204] Call Trace:
      [60103.826212]  <IRQ>
      [60103.826225]  cdc_ncm_fill_tx_frame+0x5e3/0x740 [cdc_ncm]
      [60103.826236]  cdc_ncm_tx_fixup+0x57/0x70 [cdc_ncm]
      [60103.826246]  usbnet_start_xmit+0x5d/0x710 [usbnet]
      [60103.826254]  ? netif_skb_features+0x119/0x250
      [60103.826259]  dev_hard_start_xmit+0xa1/0x200
      [60103.826267]  sch_direct_xmit+0xf2/0x1b0
      [60103.826273]  __dev_queue_xmit+0x5e3/0x7c0
      [60103.826280]  ? ip_finish_output2+0x263/0x3c0
      [60103.826284]  ip_finish_output2+0x263/0x3c0
      [60103.826289]  ? ip_output+0x6c/0xe0
      [60103.826293]  ip_output+0x6c/0xe0
      [60103.826298]  ? ip_forward_options+0x1a0/0x1a0
      [60103.826303]  tcp_transmit_skb+0x516/0x9b0
      [60103.826309]  tcp_write_xmit+0x1aa/0xee0
      [60103.826313]  ? sch_direct_xmit+0x71/0x1b0
      [60103.826318]  tcp_tasklet_func+0x177/0x180
      [60103.826325]  tasklet_action+0x5f/0x110
      [60103.826332]  __do_softirq+0xde/0x2b3
      [60103.826337]  irq_exit+0xae/0xb0
      [60103.826342]  do_IRQ+0x81/0xd0
      [60103.826347]  common_interrupt+0x98/0x98
      [60103.826351]  </IRQ>
      [60103.826355] RIP: 0033:0x7f397bdf2282
      [60103.826358] RSP: 002b:00007f395aaf57d8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff6e
      [60103.826362] RAX: 0000000000000000 RBX: 00002f07bc6d0900 RCX: 00007f39752d7fe7
      [60103.826365] RDX: 0000000000000022 RSI: 0000000000000147 RDI: 00002f07baea02c0
      [60103.826368] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
      [60103.826371] R10: 00000000ffffffff R11: 0000000000000000 R12: 00002f07baea02c0
      [60103.826373] R13: 00002f07bba227a0 R14: 00002f07bc6d090c R15: 0000000000000000
      [60103.826377] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83
      e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48
      ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1
      [60103.826442] RIP: __memset+0x24/0x30 RSP: ffff964316c03b68
      [60103.826444] CR2: ffff9641f2004000
      
      Commit e1069bbf ("net: cdc_ncm: Reduce memory use when kernel
      memory low") made this bug much more likely to trigger by reducing
      the NTB size under memory pressure.
      
      Link: https://bugs.debian.org/893393Reported-by: default avatarГорбешко Богдан <bodqhrohro@gmail.com>
      Reported-and-tested-by: default avatarDennis Wassenberg <dennis.wassenberg@secunet.com>
      Cc: Enrico Mioso <mrkiko.rs@gmail.com>
      Fixes: 4a0e3e98 ("cdc_ncm: Add support for moving NDP to end of NCM frame")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e48ee02
    • Xiangning Yu's avatar
      bonding: re-evaluate force_primary when the primary slave name changes · 584b975a
      Xiangning Yu authored
      [ Upstream commit eb55bbf8 ]
      
      There is a timing issue under active-standy mode, when bond_enslave() is
      called, bond->params.primary might not be initialized yet.
      
      Any time the primary slave string changes, bond->force_primary should be
      set to true to make sure the primary becomes the active slave.
      Signed-off-by: default avatarXiangning Yu <yuxiangning@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      584b975a
  2. 20 Jun, 2018 19 commits