1. 18 Sep, 2015 1 commit
  2. 14 Sep, 2015 2 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · a7775d15
      Pablo Neira Ayuso authored
      [ Upstream commit e53376be ]
      
      With this patch, the conntrack refcount is initially set to zero and
      it is bumped once it is added to any of the list, so we fulfill
      Eric's golden rule which is that all released objects always have a
      refcount that equals zero.
      
      Andrey Vagin reports that nf_conntrack_free can't be called for a
      conntrack with non-zero ref-counter, because it can race with
      nf_conntrack_find_get().
      
      A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
      ref-counter says that this conntrack is used. So when we release
      a conntrack with non-zero counter, we break this assumption.
      
      CPU1                                    CPU2
      ____nf_conntrack_find()
                                              nf_ct_put()
                                               destroy_conntrack()
                                              ...
                                              init_conntrack
                                               __nf_conntrack_alloc (set use = 1)
      atomic_inc_not_zero(&ct->use) (use = 2)
                                               if (!l4proto->new(ct, skb, dataoff, timeouts))
                                                nf_conntrack_free(ct); (use = 2 !!!)
                                              ...
                                              __nf_conntrack_alloc (set use = 1)
       if (!nf_ct_key_equal(h, tuple, zone))
        nf_ct_put(ct); (use = 0)
         destroy_conntrack()
                                              /* continue to work with CT */
      
      After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
      race in nf_conntrack_find_get" another bug was triggered in
      destroy_conntrack():
      
      <4>[67096.759334] ------------[ cut here ]------------
      <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
      ...
      <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
      <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
      <4>[67096.760255] Call Trace:
      <4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
      <4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
      <4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
      <4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
      <4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
      <4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
      <4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
      <4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
      <4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
      <4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
      <4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
      <4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
      <4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
      <4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      
      I have reused the original title for the RFC patch that Andrey posted and
      most of the original patch description.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Vagin <avagin@parallels.com>
      Cc: Florian Westphal <fw@strlen.de>
      Reported-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a7775d15
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · 336323b7
      Andrey Vagin authored
      [ Upstream commit c6825c09 ]
      
      Lets look at destroy_conntrack:
      
      hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
      ...
      nf_conntrack_free(ct)
      	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
      
      net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
      
      The hash is protected by rcu, so readers look up conntracks without
      locks.
      A conntrack is removed from the hash, but in this moment a few readers
      still can use the conntrack. Then this conntrack is released and another
      thread creates conntrack with the same address and the equal tuple.
      After this a reader starts to validate the conntrack:
      * It's not dying, because a new conntrack was created
      * nf_ct_tuple_equal() returns true.
      
      But this conntrack is not initialized yet, so it can not be used by two
      threads concurrently. In this case BUG_ON may be triggered from
      nf_nat_setup_info().
      
      Florian Westphal suggested to check the confirm bit too. I think it's
      right.
      
      task 1			task 2			task 3
      			nf_conntrack_find_get
      			 ____nf_conntrack_find
      destroy_conntrack
       hlist_nulls_del_rcu
       nf_conntrack_free
       kmem_cache_free
      						__nf_conntrack_alloc
      						 kmem_cache_alloc
      						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
      			 if (nf_ct_is_dying(ct))
      			 if (!nf_ct_tuple_equal()
      
      I'm not sure, that I have ever seen this race condition in a real life.
      Currently we are investigating a bug, which is reproduced on a few nodes.
      In our case one conntrack is initialized from a few tasks concurrently,
      we don't have any other explanation for this.
      
      <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
      ...
      <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
      ...
      <4>[46267.085549] Call Trace:
      <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
      <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
      <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
      <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
      <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
      <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
      <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
      <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
      <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
      <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
      <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
      <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
      <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
      <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
      <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
      <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
      <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
      <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
      <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
      <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
      <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
      <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
      <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
      <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
      <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      336323b7
  3. 02 Sep, 2015 5 commits
    • Benjamin LaHaise's avatar
      aio: fix reqs_available handling · 09c59fc8
      Benjamin LaHaise authored
      commit d856f32a upstream.
      
      As reported by Dan Aloni, commit f8567a38 ("aio: fix aio request
      leak when events are reaped by userspace") introduces a regression when
      user code attempts to perform io_submit() with more events than are
      available in the ring buffer.  Reverting that commit would reintroduce a
      regression when user space event reaping is used.
      
      Fixing this bug is a bit more involved than the previous attempts to fix
      this regression.  Since we do not have a single point at which we can
      count events as being reaped by user space and io_getevents(), we have
      to track event completion by looking at the number of events left in the
      event ring.  So long as there are as many events in the ring buffer as
      there have been completion events generate, we cannot call
      put_reqs_available().  The code to check for this is now placed in
      refill_reqs_available().
      
      A test program from Dan and modified by me for verifying this bug is available
      at http://www.kvack.org/~bcrl/20140824-aio_bug.c .
      Reported-by: default avatarDan Aloni <dan@kernelim.com>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: default avatarDan Aloni <dan@kernelim.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      09c59fc8
    • Heinz Mauelshagen's avatar
      dm cache mq: fix memory allocation failure for large cache devices · ca6292e4
      Heinz Mauelshagen authored
      commit 14f398ca upstream.
      
      The memory allocated for the multiqueue policy's hash table doesn't need
      to be physically contiguous.  Use vzalloc() instead of kzalloc().
      Fedora has been carrying this fix since 10/10/2013.
      
      Failure seen during creation of a 10TB cached device with a 2048 sector
      block size and 411GB cache size:
      
       dmsetup: page allocation failure: order:9, mode:0x10c0d0
       CPU: 11 PID: 29235 Comm: dmsetup Not tainted 3.10.4 #3
       Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a       12/30/2011
        000000000010c0d0 ffff880090941898 ffffffff81387ab4 ffff880090941928
        ffffffff810bb26f 0000000000000009 000000000010c0d0 ffff880090941928
        ffffffff81385dbc ffffffff815f3840 ffffffff00000000 000002000010c0d0
       Call Trace:
        [<ffffffff81387ab4>] dump_stack+0x19/0x1b
        [<ffffffff810bb26f>] warn_alloc_failed+0x110/0x124
        [<ffffffff81385dbc>] ? __alloc_pages_direct_compact+0x17c/0x18e
        [<ffffffff810bda2e>] __alloc_pages_nodemask+0x6c7/0x75e
        [<ffffffff810bdad7>] __get_free_pages+0x12/0x3f
        [<ffffffff810ea148>] kmalloc_order_trace+0x29/0x88
        [<ffffffff810ec1fd>] __kmalloc+0x36/0x11b
        [<ffffffffa031eeed>] ? mq_create+0x1dc/0x2cf [dm_cache_mq]
        [<ffffffffa031efc0>] mq_create+0x2af/0x2cf [dm_cache_mq]
        [<ffffffffa0314605>] dm_cache_policy_create+0xa7/0xd2 [dm_cache]
        [<ffffffffa0312530>] ? cache_ctr+0x245/0xa13 [dm_cache]
        [<ffffffffa031263e>] cache_ctr+0x353/0xa13 [dm_cache]
        [<ffffffffa012b916>] dm_table_add_target+0x227/0x2ce [dm_mod]
        [<ffffffffa012e8e4>] table_load+0x286/0x2ac [dm_mod]
        [<ffffffffa012e65e>] ? dev_wait+0x8a/0x8a [dm_mod]
        [<ffffffffa012e324>] ctl_ioctl+0x39a/0x3c2 [dm_mod]
        [<ffffffffa012e35a>] dm_ctl_ioctl+0xe/0x12 [dm_mod]
        [<ffffffff81101181>] vfs_ioctl+0x21/0x34
        [<ffffffff811019d3>] do_vfs_ioctl+0x3b1/0x3f4
        [<ffffffff810f4d2e>] ? ____fput+0x9/0xb
        [<ffffffff81050b6c>] ? task_work_run+0x7e/0x92
        [<ffffffff81101a68>] SyS_ioctl+0x52/0x82
        [<ffffffff81391d92>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ca6292e4
    • Akinobu Mita's avatar
      bio: fix argument of __bio_add_page() for max_sectors > 0xffff · 52c31210
      Akinobu Mita authored
      commit 34f2fd8d upstream.
      
      The data type of max_sectors and max_hw_sectors in queue settings are
      unsigned int.  But these values are passed to __bio_add_page() as an
      argument whose data type is unsigned short.  In the worst case such as
      max_sectors is 0x10000, bio_add_page() can't add a page and IOs can't
      proceed.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      52c31210
    • James Smart's avatar
      lpfc: Fix scsi prep dma buf error. · 62d80e38
      James Smart authored
      commit 5116fbf1 upstream.
      
      Didn't check for less-than-or-equal zero. Means we may later call
      scsi_dma_unmap() even though we don't have valid mappings.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@avagotech.com>
      Signed-off-by: default avatarJames Smart <james.smart@avagotech.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      62d80e38
    • Shirish Pargaonkar's avatar
      cifs: Send a logoff request before removing a smb session · a67172a0
      Shirish Pargaonkar authored
      commit 7f48558e upstream.
      
      Send a smb session logoff request before removing smb session off of the list.
      On a signed smb session, remvoing a session off of the list before sending
      a logoff request results in server returning an error for lack of
      smb signature.
      
      Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1,
      not sure how an error during logoff should be retried. So for now,
      if a server returns an error to a logoff request, log the error and
      remove the session off of the list.
      Signed-off-by: default avatarShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a67172a0
  4. 31 Aug, 2015 1 commit
  5. 27 Aug, 2015 21 commits
    • Dan Carpenter's avatar
      rds: fix an integer overflow test in rds_info_getsockopt() · e19d1363
      Dan Carpenter authored
      [ Upstream commit 468b732b ]
      
      "len" is a signed integer.  We check that len is not negative, so it
      goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
      is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
      INT_MAX so the condition can never be true.
      
      I don't know if this is harmful but it seems safe to limit "len" to
      INT_MAX - 4095.
      
      Fixes: a8c879a7 ('RDS: Info and stats')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e19d1363
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix wrong index in propagating port change event to VFs · 80bcc280
      Jack Morgenstein authored
      [ Upstream commit 1c1bf349 ]
      
      The port-change event processing in procedure mlx4_eq_int() uses "slave"
      as the vf_oper array index. Since the value of "slave" is the PF function
      index, the result is that the PF link state is used for deciding to
      propagate the event for all the VFs. The VF link state should be used,
      so the VF function index should be used here.
      
      Fixes: 948e306d ('net/mlx4: Add VF link state support')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      80bcc280
    • Florian Westphal's avatar
      netlink: don't hold mutex in rcu callback when releasing mmapd ring · 6056cd78
      Florian Westphal authored
      [ Upstream commit 0470eb99 ]
      
      Kirill A. Shutemov says:
      
      This simple test-case trigers few locking asserts in kernel:
      
      int main(int argc, char **argv)
      {
              unsigned int block_size = 16 * 4096;
              struct nl_mmap_req req = {
                      .nm_block_size          = block_size,
                      .nm_block_nr            = 64,
                      .nm_frame_size          = 16384,
                      .nm_frame_nr            = 64 * block_size / 16384,
              };
              unsigned int ring_size;
      	int fd;
      
      	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                      exit(1);
              if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                      exit(1);
      
      	ring_size = req.nm_block_nr * req.nm_block_size;
      	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	return 0;
      }
      
      +++ exited with 0 +++
      BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
      in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
      3 locks held by init/1:
       #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
       #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
       #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
      Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
      
      CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
       ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
       0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
       ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
      Call Trace:
       <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
       [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
       [<ffffffff81085bed>] __might_sleep+0x4d/0x90
       [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
       [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
       [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
       [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
       [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
       [<ffffffff817e484d>] __sk_free+0x1d/0x160
       [<ffffffff817e49a9>] sk_free+0x19/0x20
      [..]
      
      Cong Wang says:
      
      We can't hold mutex lock in a rcu callback, [..]
      
      Thomas Graf says:
      
      The socket should be dead at this point. It might be simpler to
      add a netlink_release_ring() function which doesn't require
      locking at all.
      Reported-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Diagnosed-by: default avatarCong Wang <cwang@twopensource.com>
      Suggested-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6056cd78
    • Edward Hyunkoo Jee's avatar
      inet: frags: fix defragmented packet's IP header for af_packet · 76c4d538
      Edward Hyunkoo Jee authored
      [ Upstream commit 0848f642 ]
      
      When ip_frag_queue() computes positions, it assumes that the passed
      sk_buff does not contain L2 headers.
      
      However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
      functions can be called on outgoing packets that contain L2 headers.
      
      Also, IPv4 checksum is not corrected after reassembly.
      
      Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.")
      Signed-off-by: default avatarEdward Hyunkoo Jee <edjee@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jerry Chu <hkchu@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      76c4d538
    • dingtianhong's avatar
      bonding: correct the MAC address for "follow" fail_over_mac policy · 0068a2e0
      dingtianhong authored
      [ Upstream commit a951bc1e ]
      
      The "follow" fail_over_mac policy is useful for multiport devices that
      either become confused or incur a performance penalty when multiple
      ports are programmed with the same MAC address, but the same MAC
      address still may happened by this steps for this policy:
      
      1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
         bond0 has the same mac address with eth0, it is MAC1.
      
      2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
         eth1 is backup, eth1 has MAC2.
      
      3) ifconfig eth0 down
         eth1 became active slave, bond will swap MAC for eth0 and eth1,
         so eth1 has MAC1, and eth0 has MAC2.
      
      4) ifconfig eth1 down
         there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
      
      5) ifconfig eth0 up
         the eth0 became active slave again, the bond set eth0 to MAC1.
      
      Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
      MAC address, it will break this policy for ACTIVE_BACKUP mode.
      
      This patch will fix this problem by finding the old active slave and
      swap them MAC address before change active slave.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Tested-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0068a2e0
    • Nikolay Aleksandrov's avatar
      bonding: fix destruction of bond with devices different from arphrd_ether · 78325611
      Nikolay Aleksandrov authored
      [ Upstream commit 06f6d109 ]
      
      When the bonding is being unloaded and the netdevice notifier is
      unregistered it executes NETDEV_UNREGISTER for each device which should
      remove the bond's proc entry but if the device enslaved is not of
      ARPHRD_ETHER type and is in front of the bonding, it may execute
      bond_release_and_destroy() first which would release the last slave and
      destroy the bond device leaving the proc entry and thus we will get the
      following error (with dynamic debug on for bond_netdev_event to see the
      events order):
      [  908.963051] eql: event: 9
      [  908.963052] eql: IFF_SLAVE
      [  908.963054] eql: event: 2
      [  908.963056] eql: IFF_SLAVE
      [  908.963058] eql: event: 6
      [  908.963059] eql: IFF_SLAVE
      [  908.963110] bond0: Releasing active interface eql
      [  908.976168] bond0: Destroying bond bond0
      [  908.976266] bond0 (unregistering): Released all slaves
      [  908.984097] ------------[ cut here ]------------
      [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
      remove_proc_entry+0x112/0x160()
      [  908.984110] remove_proc_entry: removing non-empty directory
      'net/bonding', leaking at least 'bond0'
      [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
      snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
      gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
      psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
      drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
      pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
      autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
      ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
      virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
      usb_common [last unloaded: bonding]
      
      [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
      4.2.0-rc2+ #8
      [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
      ffff8800358dfda8
      [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
      ffff88003e3a4280
      [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
      ffffffff8172ebd0
      [  908.984181] Call Trace:
      [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
      [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
      [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
      [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
      [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
      [bonding]
      [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
      [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
      [  908.984225]  [<ffffffff8142f52d>] ?
      unregister_pernet_operations+0x8d/0xd0
      [  908.984228]  [<ffffffff8142f58d>] ?
      unregister_pernet_subsys+0x1d/0x30
      [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
      [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
      [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
      [  908.984244]  [<ffffffff8152b732>] ?
      entry_SYSCALL_64_fastpath+0x16/0x75
      [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
      
      Thus remove the proc entry manually if bond_release_and_destroy() is
      used. Because of the checks in bond_remove_proc_entry() it's not a
      problem for a bond device to change namespaces (the bug fixed by the
      Fixes commit) but since commit
      f9399814 ("bonding: Don't allow bond devices to change network
      namespaces.") that can't happen anyway.
      Reported-by: default avatarCarol Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: a64d49c3 ("bonding: Manage /proc/net/bonding/ entries from
                            the netdev events")
      Tested-by: default avatarCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      78325611
    • Eric Dumazet's avatar
      ipv6: lock socket in ip6_datagram_connect() · b29321b9
      Eric Dumazet authored
      [ Upstream commit 03645a11 ]
      
      ip6_datagram_connect() is doing a lot of socket changes without
      socket being locked.
      
      This looks wrong, at least for udp_lib_rehash() which could corrupt
      lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b29321b9
    • Tilman Schmidt's avatar
      isdn/gigaset: reset tty->receive_room when attaching ser_gigaset · 4a187391
      Tilman Schmidt authored
      [ Upstream commit fd98e941 ]
      
      Commit 79901317 ("n_tty: Don't flush buffer when closing ldisc"),
      first merged in kernel release 3.10, caused the following regression
      in the Gigaset M101 driver:
      
      Before that commit, when closing the N_TTY line discipline in
      preparation to switching to N_GIGASET_M101, receive_room would be
      reset to a non-zero value by the call to n_tty_flush_buffer() in
      n_tty's close method. With the removal of that call, receive_room
      might be left at zero, blocking data reception on the serial line.
      
      The present patch fixes that regression by setting receive_room
      to an appropriate value in the ldisc open method.
      
      Fixes: 79901317 ("n_tty: Don't flush buffer when closing ldisc")
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4a187391
    • Nikolay Aleksandrov's avatar
      bridge: mdb: fix double add notification · eefc4b88
      Nikolay Aleksandrov authored
      [ Upstream commit 5ebc7846 ]
      
      Since the mdb add/del code was introduced there have been 2 br_mdb_notify
      calls when doing br_mdb_add() resulting in 2 notifications on each add.
      
      Example:
       Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
       Before patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      
       After patch:
       root@debian:~# bridge monitor all
       [MDB]dev br0 port eth1 grp 239.0.0.1 permanent
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: cfd56754 ("bridge: add support of adding and deleting mdb entries")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      eefc4b88
    • Herbert Xu's avatar
      net: Fix skb_set_peeked use-after-free bug · 9b87cdee
      Herbert Xu authored
      [ Upstream commit a0a2a660 ]
      
      The commit 738ac1eb ("net: Clone
      skb before setting peeked flag") introduced a use-after-free bug
      in skb_recv_datagram.  This is because skb_set_peeked may create
      a new skb and free the existing one.  As it stands the caller will
      continue to use the old freed skb.
      
      This patch fixes it by making skb_set_peeked return the new skb
      (or the old one if unchanged).
      
      Fixes: 738ac1eb ("net: Clone skb before setting peeked flag")
      Reported-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9b87cdee
    • Herbert Xu's avatar
      net: Fix skb csum races when peeking · 75ebad1b
      Herbert Xu authored
      [ Upstream commit 89c22d8c ]
      
      When we calculate the checksum on the recv path, we store the
      result in the skb as an optimisation in case we need the checksum
      again down the line.
      
      This is in fact bogus for the MSG_PEEK case as this is done without
      any locking.  So multiple threads can peek and then store the result
      to the same skb, potentially resulting in bogus skb states.
      
      This patch fixes this by only storing the result if the skb is not
      shared.  This preserves the optimisations for the few cases where
      it can be done safely due to locking or other reasons, e.g., SIOCINQ.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      75ebad1b
    • Herbert Xu's avatar
      net: Clone skb before setting peeked flag · 339b0b16
      Herbert Xu authored
      [ Upstream commit 738ac1eb ]
      
      Shared skbs must not be modified and this is crucial for broadcast
      and/or multicast paths where we use it as an optimisation to avoid
      unnecessary cloning.
      
      The function skb_recv_datagram breaks this rule by setting peeked
      without cloning the skb first.  This causes funky races which leads
      to double-free.
      
      This patch fixes this by cloning the skb and replacing the skb
      in the list when setting skb->peeked.
      
      Fixes: a59322be ("[UDP]: Only increment counter on first peek/recv")
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      339b0b16
    • Julian Anastasov's avatar
      net: call rcu_read_lock early in process_backlog · d5f8ff93
      Julian Anastasov authored
      [ Upstream commit 2c17d27c ]
      
      Incoming packet should be either in backlog queue or
      in RCU read-side section. Otherwise, the final sequence of
      flush_backlog() and synchronize_net() may miss packets
      that can run without device reference:
      
      CPU 1                  CPU 2
                             skb->dev: no reference
                             process_backlog:__skb_dequeue
                             process_backlog:local_irq_enable
      
      on_each_cpu for
      flush_backlog =>       IPI(hardirq): flush_backlog
                             - packet not found in backlog
      
                             CPU delayed ...
      synchronize_net
      - no ongoing RCU
      read-side sections
      
      netdev_run_todo,
      rcu_barrier: no
      ongoing callbacks
                             __netif_receive_skb_core:rcu_read_lock
                             - too late
      free dev
                             process packet for freed dev
      
      Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue")
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d5f8ff93
    • Julian Anastasov's avatar
      net: do not process device backlog during unregistration · f6533fed
      Julian Anastasov authored
      [ Upstream commit e9e4dd32 ]
      
      commit 381c759d ("ipv4: Avoid crashing in ip_error")
      fixes a problem where processed packet comes from device
      with destroyed inetdev (dev->ip_ptr). This is not expected
      because inetdev_destroy is called in NETDEV_UNREGISTER
      phase and packets should not be processed after
      dev_close_many() and synchronize_net(). Above fix is still
      required because inetdev_destroy can be called for other
      reasons. But it shows the real problem: backlog can keep
      packets for long time and they do not hold reference to
      device. Such packets are then delivered to upper levels
      at the same time when device is unregistered.
      Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
      accounts all packets from backlog but before that some packets
      continue to be delivered to upper levels long after the
      synchronize_net call which is supposed to wait the last
      ones. Also, as Eric pointed out, processed packets, mostly
      from other devices, can continue to add new packets to backlog.
      
      Fix the problem by moving flush_backlog early, after the
      device driver is stopped and before the synchronize_net() call.
      Then use netif_running check to make sure we do not add more
      packets to backlog. We have to do it in enqueue_to_backlog
      context when the local IRQ is disabled. As result, after the
      flush_backlog and synchronize_net sequence all packets
      should be accounted.
      
      Thanks to Eric W. Biederman for the test script and his
      valuable feedback!
      Reported-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue")
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f6533fed
    • Oleg Nesterov's avatar
      net: pktgen: fix race between pktgen_thread_worker() and kthread_stop() · 6f137d15
      Oleg Nesterov authored
      [ Upstream commit fecdf8be ]
      
      pktgen_thread_worker() is obviously racy, kthread_stop() can come
      between the kthread_should_stop() check and set_current_state().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Reported-by: default avatarMarcelo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6f137d15
    • Nikolay Aleksandrov's avatar
      bridge: mdb: zero out the local br_ip variable before use · 10851921
      Nikolay Aleksandrov authored
      [ Upstream commit f1158b74 ]
      
      Since commit b0e9a30d ("bridge: Add vlan id to multicast groups")
      there's a check in br_ip_equal() for a matching vlan id, but the mdb
      functions were not modified to use (or at least zero it) so when an
      entry was added it would have a garbage vlan id (from the local br_ip
      variable in __br_mdb_add/del) and this would prevent it from being
      matched and also deleted. So zero out the whole local ip var to protect
      ourselves from future changes and also to fix the current bug, since
      there's no vlan id support in the mdb uapi - use always vlan id 0.
      Example before patch:
      root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
      RTNETLINK answers: Invalid argument
      
      After patch:
      root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
      root@debian:~# bridge mdb
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: b0e9a30d ("bridge: Add vlan id to multicast groups")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      10851921
    • Stephen Smalley's avatar
      net/tipc: initialize security state for new connection socket · f5aeec43
      Stephen Smalley authored
      [ Upstream commit fdd75ea8 ]
      
      Calling connect() with an AF_TIPC socket would trigger a series
      of error messages from SELinux along the lines of:
      SELinux: Invalid class 0
      type=AVC msg=audit(1434126658.487:34500): avc:  denied  { <unprintable> }
        for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
        tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
        permissive=0
      
      This was due to a failure to initialize the security state of the new
      connection sock by the tipc code, leaving it with junk in the security
      class field and an unlabeled secid.  Add a call to security_sk_clone()
      to inherit the security state from the parent socket.
      Reported-by: default avatarTim Shearer <tim.shearer@overturenetworks.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f5aeec43
    • Timo Teräs's avatar
      ip_tunnel: fix ipv4 pmtu check to honor inner ip header df · 18213234
      Timo Teräs authored
      [ Upstream commit fc24f2b2 ]
      
      Frag needed should be sent only if the inner header asked
      to not fragment. Currently fragmentation is broken if the
      tunnel has df set, but df was not asked in the original
      packet. The tunnel's df needs to be still checked to update
      internally the pmtu cache.
      
      Commit 23a3647b broke it, and this commit fixes
      the ipv4 df check back to the way it was.
      
      Fixes: 23a3647b ("ip_tunnels: Use skb-len to PMTU check.")
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarTimo Teräs <timo.teras@iki.fi>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      18213234
    • Daniel Borkmann's avatar
      rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver · 9e7281c3
      Daniel Borkmann authored
      [ Upstream commit 4f7d2cdf ]
      
      Jason Gunthorpe reported that since commit c02db8c6 ("rtnetlink: make
      SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
      anymore with respect to their policy, that is, ifla_vfinfo_policy[].
      
      Before, they were part of ifla_policy[], but they have been nested since
      placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
      which is another nested attribute for the actual VF attributes such as
      IFLA_VF_MAC, IFLA_VF_VLAN, etc.
      
      Despite the policy being split out from ifla_policy[] in this commit,
      it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
      testing for struct nlattr, but it doesn't know about the data context and
      their requirements.
      
      Fix, on top of Jason's initial work, does 1) parsing of the attributes
      with the right policy, and 2) using the resulting parsed attribute table
      from 1) instead of the nla_for_each_nested() loop (just like we used to
      do when still part of ifla_policy[]).
      
      Reference: http://thread.gmane.org/gmane.linux.network/368913
      Fixes: c02db8c6 ("rtnetlink: make SR-IOV VF interface symmetric")
      Reported-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Cc: Greg Rose <gregory.v.rose@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Rony Efraim <ronye@mellanox.com>
      Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarVlad Zolotarov <vladz@cloudius-systems.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9e7281c3
    • Eric Dumazet's avatar
      net: graceful exit from netif_alloc_netdev_queues() · 161a5358
      Eric Dumazet authored
      [ Upstream commit d339727c ]
      
      User space can crash kernel with
      
      ip link add ifb10 numtxqueues 100000 type ifb
      
      We must replace a BUG_ON() by proper test and return -EINVAL for
      crazy values.
      
      Fixes: 60877a32 ("net: allow large number of tx queues")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      161a5358
    • Angga's avatar
      ipv6: Make MLD packets to only be processed locally · e27e696c
      Angga authored
      [ Upstream commit 4c938d22 ]
      
      Before commit daad1512 ("ipv6: Make ipv6_is_mld() inline and use it
      from ip6_mc_input().") MLD packets were only processed locally. After the
      change, a copy of MLD packet goes through ip6_mr_input, causing
      MRT6MSG_NOCACHE message to be generated to user space.
      
      Make MLD packet only processed locally.
      
      Fixes: daad1512 ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
      Signed-off-by: default avatarHermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e27e696c
  6. 25 Aug, 2015 10 commits
    • Dave Airlie's avatar
      drm/radeon: fix hotplug race at startup · 76c74a6e
      Dave Airlie authored
      commit 7f98ca45 upstream.
      
      We apparantly get a hotplug irq before we've initialised
      modesetting,
      
      [drm] Loading R100 Microcode
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91
      *pde = 00000000
      Oops: 0002 [#1]
      Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys
      CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf674028 #111
      Hardware name: MicroLink                               /D850MV                         , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003
      Workqueue: events radeon_hotplug_work_func [radeon]
      task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000
      EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0
      EIP is at __mutex_lock_slowpath+0x23/0x91
      EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe
      ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c
       DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0
      Stack:
       f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca
       f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0
       c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940
      Call Trace:
       [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7
       [<c125f162>] ? mutex_lock+0x9/0xa
       [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon]
       [<c1034741>] ? process_one_work+0xfc/0x194
       [<c1034b58>] ? worker_thread+0x18d/0x218
       [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5
       [<c103742a>] ? kthread+0x7b/0x80
       [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30
       [<c10373af>] ? init_completion+0x18/0x18
      Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63
      Reported-and-Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      76c74a6e
    • Mika Westerberg's avatar
      mfd: lpc_ich: Assign subdevice ids automatically · 350624fa
      Mika Westerberg authored
      commit 1abf25a2 upstream.
      
      Using -1 as platform device id means that the platform driver core will not
      assign any id to the device (the device name will not have id at all). This
      results problems on systems that have multiple PCHs (Platform Controller
      HUBs) because all of them also include their own copy of LPC device.
      
      All the subsequent device creations will fail because there already exists
      platform device with the same name.
      
      Fix this by passing PLATFORM_DEVID_AUTO as platform device id. This makes
      the platform device core to allocate new ids automatically.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      350624fa
    • Jiri Slaby's avatar
      Linux 3.12.47 · 5f6177d6
      Jiri Slaby authored
      5f6177d6
    • Ilya Dryomov's avatar
      rbd: fix copyup completion race · eca3a901
      Ilya Dryomov authored
      commit 2761713d upstream.
      
      For write/discard obj_requests that involved a copyup method call, the
      opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
      rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
      ->xferred and delegates to rbd_img_obj_callback(), the "normal" image
      object callback, for reporting to block layer and putting refs.
      
      rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
      which means obj_request is marked done in rbd_osd_trivial_callback(),
      *before* ->callback is invoked and rbd_img_obj_copyup_callback() has
      a chance to run.  Marking obj_request done essentially means giving
      rbd_img_obj_callback() a license to end it at any moment, so if another
      obj_request from the same img_request is being completed concurrently,
      rbd_img_obj_end_request() may very well be called on such prematurally
      marked done request:
      
      <obj_request-1/2 reply>
      handle_reply()
        rbd_osd_req_callback()
          rbd_osd_trivial_callback()
          rbd_obj_request_complete()
          rbd_img_obj_copyup_callback()
          rbd_img_obj_callback()
                                          <obj_request-2/2 reply>
                                          handle_reply()
                                            rbd_osd_req_callback()
                                              rbd_osd_trivial_callback()
            for_each_obj_request(obj_request->img_request) {
              rbd_img_obj_end_request(obj_request-1/2)
              rbd_img_obj_end_request(obj_request-2/2) <--
            }
      
      Calling rbd_img_obj_end_request() on such a request leads to trouble,
      in particular because its ->xfferred is 0.  We report 0 to the block
      layer with blk_update_request(), get back 1 for "this request has more
      data in flight" and then trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
      been called for both requests and lhs (more) being 1 because we haven't
      got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
      
      To fix this, leverage that rbd wants to call class methods in only two
      cases: one is a generic method call wrapper (obj_request is standalone)
      and the other is a copyup (obj_request is part of an img_request).  So
      make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
      rbd_img_obj_copyup_callback() from it if obj_request is part of an
      img_request, similar to how CEPH_OSD_OP_READ handler invokes
      rbd_img_obj_request_read_callback().
      
      Since rbd_img_obj_copyup_callback() is now being called from the OSD
      request callback (only), it is renamed to rbd_osd_copyup_callback().
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      [idryomov@gmail.com: backport to < 3.18: context]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      eca3a901
    • Alex Deucher's avatar
      drm/radeon: add new OLAND pci id · f81a12e9
      Alex Deucher authored
      commit e037239e upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f81a12e9
    • Michael Walle's avatar
      EDAC, ppc4xx: Access mci->csrows array elements properly · 2a1fd7df
      Michael Walle authored
      commit 5c16179b upstream.
      
      The commit
      
        de3910eb ("edac: change the mem allocation scheme to
      		 make Documentation/kobject.txt happy")
      
      changed the memory allocation for the csrows member. But ppc4xx_edac was
      forgotten in the patch. Fix it.
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.ccSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2a1fd7df
    • Richard Weinberger's avatar
      localmodconfig: Use Kbuild files too · 6713bcb7
      Richard Weinberger authored
      commit c0ddc8c7 upstream.
      
      In kbuild it is allowed to define objects in files named "Makefile"
      and "Kbuild".
      Currently localmodconfig reads objects only from "Makefile"s and misses
      modules like nouveau.
      
      Link: http://lkml.kernel.org/r/1437948415-16290-1-git-send-email-richard@nod.atReported-and-tested-by: default avatarLeonidas Spyropoulos <artafinde@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6713bcb7
    • Joe Thornber's avatar
      dm thin metadata: delete btrees when releasing metadata snapshot · 1d124d81
      Joe Thornber authored
      commit 7f518ad0 upstream.
      
      The device details and mapping trees were just being decremented
      before.  Now btree_del() is called to do a deep delete.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1d124d81
    • Peter Zijlstra's avatar
      perf: Fix fasync handling on inherited events · 78010f59
      Peter Zijlstra authored
      commit fed66e2c upstream.
      
      Vince reported that the fasync signal stuff doesn't work proper for
      inherited events. So fix that.
      
      Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
      which upon the demise of the file descriptor ensures the allocation is
      freed and state is updated.
      
      Now for perf, we can have the events stick around for a while after the
      original FD is dead because of references from child events. So we
      cannot copy the fasync pointer around. We can however consistently use
      the parent's fasync, as that will be updated.
      Reported-and-Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twinsSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      78010f59
    • Bob Liu's avatar
      xen-blkfront: don't add indirect pages to list when !feature_persistent · 78e034d3
      Bob Liu authored
      commit 7b076750 upstream.
      
      We should consider info->feature_persistent when adding indirect page to list
      info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.
      
      When we are using persistent grants the indirect_pages list
      should always be empty because blkfront has pre-allocated enough
      persistent pages to fill all requests on the ring.
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      78e034d3