1. 16 Nov, 2015 5 commits
  2. 30 Oct, 2015 23 commits
  3. 28 Oct, 2015 12 commits
    • Filipe Manana's avatar
      Btrfs: update fix for read corruption of compressed and shared extents · 5e7dd6c7
      Filipe Manana authored
      commit 808f80b4 upstream.
      
      My previous fix in commit 005efedf ("Btrfs: fix read corruption of
      compressed and shared extents") was effective only if the compressed
      extents cover a file range with a length that is not a multiple of 16
      pages. That's because the detection of when we reached a different range
      of the file that shares the same compressed extent as the previously
      processed range was done at extent_io.c:__do_contiguous_readpages(),
      which covers subranges with a length up to 16 pages, because
      extent_readpages() groups the pages in clusters no larger than 16 pages.
      So fix this by tracking the start of the previously processed file
      range's extent map at extent_readpages().
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create our test file with a single extent of 64Kb that is going to
            # be compressed no matter which compression algo is used (zlib/lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone the compressed extent into an adjacent file offset.
            $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            echo "File digest before unmount:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
      
            # Remount the fs or clear the page cache to trigger the bug in
            # btrfs. Because the extent has an uncompressed length that is a
            # multiple of 16 pages, all the pages belonging to the second range
            # of the file (64K to 128K), which points to the same extent as the
            # first range (0K to 64K), had their contents full of zeroes instead
            # of the byte 0xaa. This was a bug exclusively in the read path of
            # compressed extents, the correct data was stored on disk, btrfs
            # just failed to fill in the pages correctly.
            _scratch_remount
      
            echo "File digest after remount:"
            # Must match the digest we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      5e7dd6c7
    • David Howells's avatar
      KEYS: Don't permit request_key() to construct a new keyring · ca564ff2
      David Howells authored
      commit 911b79cd upstream.
      
      If request_key() is used to find a keyring, only do the search part - don't
      do the construction part if the keyring was not found by the search.  We
      don't really want keyrings in the negative instantiated state since the
      rejected/negative instantiation error value in the payload is unioned with
      keyring metadata.
      
      Now the kernel gives an error:
      
      	request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Kamal Mostafa <kamal@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ca564ff2
    • David Howells's avatar
      KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring · 4944cea7
      David Howells authored
      commit f05819df upstream.
      
      The following sequence of commands:
      
          i=`keyctl add user a a @s`
          keyctl request2 keyring foo bar @t
          keyctl unlink $i @s
      
      tries to invoke an upcall to instantiate a keyring if one doesn't already
      exist by that name within the user's keyring set.  However, if the upcall
      fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
      other error code.  When the key is garbage collected, the key destroy
      function is called unconditionally and keyring_destroy() uses list_empty()
      on keyring->type_data.link - which is in a union with reject_error.
      Subsequently, the kernel tries to unlink the keyring from the keyring names
      list - which oopses like this:
      
      	BUG: unable to handle kernel paging request at 00000000ffffff8a
      	IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	...
      	Workqueue: events key_garbage_collector
      	...
      	RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
      	RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
      	RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
      	RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
      	R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
      	R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
      	...
      	CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
      	...
      	Call Trace:
      	 [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
      	 [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
      	 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547
      	 [<ffffffff8105fd17>] worker_thread+0x26e/0x361
      	 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
      	 [<ffffffff810648ad>] kthread+0xf3/0xfb
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      	 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
      	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
      
      Note the value in RAX.  This is a 32-bit representation of -ENOKEY.
      
      The solution is to only call ->destroy() if the key was successfully
      instantiated.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Kamal Mostafa <kamal@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4944cea7
    • David Howells's avatar
      KEYS: Fix race between key destruction and finding a keyring by name · 35694b04
      David Howells authored
      commit 94c4554b upstream.
      
      There appears to be a race between:
      
       (1) key_gc_unused_keys() which frees key->security and then calls
           keyring_destroy() to unlink the name from the name list
      
       (2) find_keyring_by_name() which calls key_permission(), thus accessing
           key->security, on a key before checking to see whether the key usage is 0
           (ie. the key is dead and might be cleaned up).
      
      Fix this by calling ->destroy() before cleaning up the core key data -
      including key->security.
      Reported-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Kamal Mostafa <kamal@canonical.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      35694b04
    • Sabrina Dubroca's avatar
      [stable-only] net: add length argument to skb_copy_and_csum_datagram_iovec · fa89ae55
      Sabrina Dubroca authored
      Without this length argument, we can read past the end of the iovec in
      memcpy_toiovec because we have no way of knowing the total length of the
      iovec's buffers.
      
      This is needed for stable kernels where 89c22d8c ("net: Fix skb
      csum races when peeking") has been backported but that don't have the
      ioviter conversion, which is almost all the stable trees <= 3.18.
      
      This also fixes a kernel crash for NFS servers when the client uses
       -onfsvers=3,proto=udp to mount the export.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      [ luis: backported to 3.16:
        - dropped changes to net/rxrpc/ar-recvmsg.c ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      fa89ae55
    • Arad, Ronen's avatar
      netlink: Trim skb to alloc size to avoid MSG_TRUNC · 9a11693d
      Arad, Ronen authored
      commit db65a3aa upstream.
      
      netlink_dump() allocates skb based on the calculated min_dump_alloc or
      a per socket max_recvmsg_len.
      min_alloc_size is maximum space required for any single netdev
      attributes as calculated by rtnl_calcit().
      max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
      It is capped at 16KiB.
      The intention is to avoid small allocations and to minimize the number
      of calls required to obtain dump information for all net devices.
      
      netlink_dump packs as many small messages as could fit within an skb
      that was sized for the largest single netdev information. The actual
      space available within an skb is larger than what is requested. It could
      be much larger and up to near 2x with align to next power of 2 approach.
      
      Allowing netlink_dump to use all the space available within the
      allocated skb increases the buffer size a user has to provide to avoid
      truncaion (i.e. MSG_TRUNG flag set).
      
      It was observed that with many VLANs configured on at least one netdev,
      a larger buffer of near 64KiB was necessary to avoid "Message truncated"
      error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
      min_alloc_size was only little over 32KiB.
      
      This patch trims skb to allocated size in order to allow the user to
      avoid truncation with more reasonable buffer size.
      Signed-off-by: default avatarRonen Arad <ronen.arad@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      9a11693d
    • Konstantin Khlebnikov's avatar
      ovs: do not allocate memory from offline numa node · 67f4c396
      Konstantin Khlebnikov authored
      commit 598c12d0 upstream.
      
      When openvswitch tries allocate memory from offline numa node 0:
      stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
      It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
      [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
      This patch disables numa affinity in this case.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      67f4c396
    • Joe Perches's avatar
      ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings · d1a84d87
      Joe Perches authored
      commit 077cb37f upstream.
      
      It seems that kernel memory can leak into userspace by a
      kmalloc, ethtool_get_strings, then copy_to_user sequence.
      
      Avoid this by using kcalloc to zero fill the copied buffer.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      d1a84d87
    • Guillaume Nault's avatar
      ppp: don't override sk->sk_state in pppoe_flush_dev() · 48af18da
      Guillaume Nault authored
      commit e6740165 upstream.
      
      Since commit 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
      pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
      PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
      PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
      following oops:
      
      [  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
      [  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
      [  570.144601] Oops: 0000 [#1] SMP
      [  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
      [  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
      [  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
      [  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
      [  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
      [  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
      [  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
      [  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
      [  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
      [  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
      [  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
      [  570.144601] Stack:
      [  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
      [  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
      [  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
      [  570.144601] Call Trace:
      [  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
      [  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
      [  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
      [  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
      [  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
      [  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
      [  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
      [  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
      [  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
      [  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601]  RSP <ffff880036b63e08>
      [  570.144601] CR2: 00000000000004e0
      [  570.200518] ---[ end trace 46956baf17349563 ]---
      
      pppoe_flush_dev() has no reason to override sk->sk_state with
      PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
      PPPOX_DEAD, which is the correct state given that sk is unbound and
      po->pppoe_dev is NULL.
      
      Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
      Tested-by: default avatarOleksii Berezhniak <core@irc.lg.ua>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      48af18da
    • Eric Dumazet's avatar
      net: add pfmemalloc check in sk_add_backlog() · 9913ff9c
      Eric Dumazet authored
      commit c7c49b8f upstream.
      
      Greg reported crashes hitting the following check in __sk_backlog_rcv()
      
      	BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
      
      The pfmemalloc bit is currently checked in sk_filter().
      
      This works correctly for TCP, because sk_filter() is ran in
      tcp_v[46]_rcv() before hitting the prequeue or backlog checks.
      
      For UDP or other protocols, this does not work, because the sk_filter()
      is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
      queuing if socket is owned by user by the time packet is processed by
      softirq handler.
      
      Fixes: b4b9e355 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      9913ff9c
    • Pravin B Shelar's avatar
      skbuff: Fix skb checksum partial check. · fc3befaa
      Pravin B Shelar authored
      commit 31b33dfb upstream.
      
      Earlier patch 6ae459bd tried to detect void ckecksum partial
      skb by comparing pull length to checksum offset. But it does
      not work for all cases since checksum-offset depends on
      updates to skb->data.
      
      Following patch fixes it by validating checksum start offset
      after skb-data pointer is updated. Negative value of checksum
      offset start means there is no need to checksum.
      
      Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull")
      Reported-by: default avatarAndrew Vagin <avagin@odin.com>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      fc3befaa
    • Pravin B Shelar's avatar
      skbuff: Fix skb checksum flag on skb pull · a5162a04
      Pravin B Shelar authored
      commit 6ae459bd upstream.
      
      VXLAN device can receive skb with checksum partial. But the checksum
      offset could be in outer header which is pulled on receive. This results
      in negative checksum offset for the skb. Such skb can cause the assert
      failure in skb_checksum_help(). Following patch fixes the bug by setting
      checksum-none while pulling outer header.
      
      Following is the kernel panic msg from old kernel hitting the bug.
      
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:1906!
      RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
      Call Trace:
      <IRQ>
      [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
      [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
      [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
      [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
      [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
      [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
      [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
      [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
      [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
      [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
      [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
      [<ffffffff81550128>] ip_local_deliver+0x88/0x90
      [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
      [<ffffffff81550365>] ip_rcv+0x235/0x300
      [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
      [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
      [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
      [<ffffffff8151cd04>] net_rx_action+0x134/0x290
      [<ffffffff810683d8>] __do_softirq+0xa8/0x210
      [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
      [<ffffffff810161a5>] do_softirq+0x65/0xa0
      [<ffffffff810687be>] irq_exit+0x8e/0xb0
      [<ffffffff81630733>] do_IRQ+0x63/0xe0
      [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
      Reported-by: default avatarAnupam Chanda <achanda@vmware.com>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a5162a04