1. 21 Nov, 2017 12 commits
  2. 18 Nov, 2017 28 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.63 · ea88d5c5
      Greg Kroah-Hartman authored
      ea88d5c5
    • Willy Tarreau's avatar
      misc: panel: properly restore atomic counter on error path · e81b96ca
      Willy Tarreau authored
      commit 93dc1774 upstream.
      
      Commit f4757af8 ("staging: panel: Fix single-open policy race condition")
      introduced in 3.19-rc1 attempted to fix a race condition on the open, but
      failed to properly do it and used to exit without restoring the semaphore.
      
      This results in -EBUSY being returned after the first open error until
      the module is reloaded or the system restarted (ie: consecutive to a
      dual open resulting in -EBUSY or to a permission error).
      
      Fixes: f4757af8 # 3.19-rc1
      Cc: Mariusz Gorski <marius.gorski@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      [wt: driver is in misc/panel in 4.9]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e81b96ca
    • Nicholas Bellinger's avatar
      qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2) · b2dbcb7c
      Nicholas Bellinger authored
      commit 6bcbb317 upstream.
      
      This patch drops two incorrect usages of tcm_qla2xxx_free_cmd()
      during TMR ABORT within tcm_qla2xxx_handle_data_work() and
      tcm_qla2xxx_aborted_task(), which where attempting to dispatch
      into workqueue context to do tcm_qla2xxx_complete_free() and
      subsequently invoke transport_generic_free_cmd().
      
      This is incorrect because during TMR ABORT target-core will
      drop the outstanding se_cmd->cmd_kref references once it has
      quiesced the se_cmd via transport_wait_for_tasks(), and in
      the case of qla2xxx it should not attempt to do it's own
      transport_generic_free_cmd() once the abort has occured.
      
      As reported by Pascal, this was originally manifesting as a
      BUG_ON(cmd->cmd_in_wq) in qlt_free_cmd() during TMR ABORT,
      with a LIO backend that had sufficently high enough WRITE
      latency to trigger a host side TMR ABORT_TASK.
      
      (v2: Drop the qla_tgt_cmd->write_pending_abort_comp changes,
           as they will be addressed in a seperate series)
      Reported-by: default avatarPascal de Bruijn <p.debruijn@unilogic.nl>
      Tested-by: default avatarPascal de Bruijn <p.debruijn@unilogic.nl>
      Cc: Pascal de Bruijn <p.debruijn@unilogic.nl>
      Reported-by: default avatarLukasz Engel <lukasz.engel@softax.pl>
      Cc: Lukasz Engel <lukasz.engel@softax.pl>
      Acked-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Cc: Quinn Tran <quinn.tran@cavium.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b2dbcb7c
    • Bart Van Assche's avatar
      target/iscsi: Fix iSCSI task reassignment handling · ff492718
      Bart Van Assche authored
      commit 59b6986d upstream.
      
      Allocate a task management request structure for all task management
      requests, including task reassignment. This change avoids that the
      se_tmr->response assignment dereferences an uninitialized se_tmr
      pointer.
      Reported-by: default avatarMoshe David <mdavid@infinidat.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Moshe David <mdavid@infinidat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff492718
    • Chi-hsien Lin's avatar
      brcmfmac: remove setting IBSS mode when stopping AP · e7c9ca5a
      Chi-hsien Lin authored
      commit 9029679f upstream.
      
      Upon stopping an AP interface the driver disable INFRA mode effectively
      setting the interface in IBSS mode. However, this may affect other
      interfaces running in INFRA mode. For instance, if user creates and stops
      hostap daemon on virtual interface, then association cannot work on
      primary interface because default BSS has been set to IBSS mode in
      firmware side. The IBSS mode should be set when cfg80211 changes the
      interface.
      Reviewed-by: default avatarWright Feng <wright.feng@cypress.com>
      Signed-off-by: default avatarChi-hsien Lin <Chi-Hsien.Lin@cypress.com>
      [kvalo@codeaurora.org: rephased commit log based on discussion]
      Signed-off-by: default avatarWright Feng <wright.feng@cypress.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: Philipp Rosenberger <p.rosenberger@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7c9ca5a
    • Bilal Amarni's avatar
      security/keys: add CONFIG_KEYS_COMPAT to Kconfig · 31c8c494
      Bilal Amarni authored
      commit 47b2c3ff upstream.
      
      CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
      several 64-bit architectures : mips, parisc, tile.
      
      At the moment and for those architectures, calling in 32-bit userspace the
      keyctl syscall would return an ENOSYS error.
      
      This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
      make sure the compatibility wrapper is registered by default for any 64-bit
      architecture as long as it is configured with CONFIG_COMPAT.
      
      [DH: Modified to remove arm64 compat enablement also as requested by Eric
       Biggers]
      Signed-off-by: default avatarBilal Amarni <bilal.amarni@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      cc: Eric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Cc: James Cowgill <james.cowgill@mips.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31c8c494
    • Florian Westphal's avatar
      netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" · a23349bb
      Florian Westphal authored
      commit e1bf1687 upstream.
      
      This reverts commit 870190a9.
      
      It was not a good idea. The custom hash table was a much better
      fit for this purpose.
      
      A fast lookup is not essential, in fact for most cases there is no lookup
      at all because original tuple is not taken and can be used as-is.
      What needs to be fast is insertion and deletion.
      
      rhlist removal however requires a rhlist walk.
      We can have thousands of entries in such a list if source port/addresses
      are reused for multiple flows, if this happens removal requests are so
      expensive that deletions of a few thousand flows can take several
      seconds(!).
      
      The advantages that we got from rhashtable are:
      1) table auto-sizing
      2) multiple locks
      
      1) would be nice to have, but it is not essential as we have at
      most one lookup per new flow, so even a million flows in the bysource
      table are not a problem compared to current deletion cost.
      2) is easy to add to custom hash table.
      
      I tried to add hlist_node to rhlist to speed up rhltable_remove but this
      isn't doable without changing semantics.  rhltable_remove_fast will
      check that the to-be-deleted object is part of the table and that
      requires a list walk that we want to avoid.
      
      Furthermore, using hlist_node increases size of struct rhlist_head, which
      in turn increases nf_conn size.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821Reported-by: default avatarIvan Babrou <ibobrik@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a23349bb
    • Florian Westphal's avatar
      netfilter: nat: avoid use of nf_conn_nat extension · 25db12f1
      Florian Westphal authored
      commit 6e699867 upstream.
      
      successful insert into the bysource hash sets IPS_SRC_NAT_DONE status bit
      so we can check that instead of presence of nat extension which requires
      extra deref.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25db12f1
    • Greg Kroah-Hartman's avatar
      Revert "ARM: dts: imx53-qsb-common: fix FEC pinmux config" · fd1ca9fe
      Greg Kroah-Hartman authored
      This reverts commit 62b9fa2c which is
      commit 8b649e42 upstream.
      
      Turns out not to be a good idea in the stable kernels for now as Patrick
      writes:
      	As discussed for 4.4 stable queue this patch might break
      	existing machines, if they use a different pinmux configuration
      	with their own bootloader.
      Reported-by: default avatarPatrick Brünn <P.Bruenn@beckhoff.com>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd1ca9fe
    • Takashi Iwai's avatar
      ALSA: seq: Cancel pending autoload work at unbinding device · 1862eca9
      Takashi Iwai authored
      commit fc27fe7e upstream.
      
      ALSA sequencer core has a mechanism to load the enumerated devices
      automatically, and it's performed in an off-load work.  This seems
      causing some race when a sequencer is removed while the pending
      autoload work is running.  As syzkaller spotted, it may lead to some
      use-after-free:
        BUG: KASAN: use-after-free in snd_rawmidi_dev_seq_free+0x69/0x70
        sound/core/rawmidi.c:1617
        Write of size 8 at addr ffff88006c611d90 by task kworker/2:1/567
      
        CPU: 2 PID: 567 Comm: kworker/2:1 Not tainted 4.13.0+ #29
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Workqueue: events autoload_drivers
        Call Trace:
         __dump_stack lib/dump_stack.c:16 [inline]
         dump_stack+0x192/0x22c lib/dump_stack.c:52
         print_address_description+0x78/0x280 mm/kasan/report.c:252
         kasan_report_error mm/kasan/report.c:351 [inline]
         kasan_report+0x230/0x340 mm/kasan/report.c:409
         __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
         snd_rawmidi_dev_seq_free+0x69/0x70 sound/core/rawmidi.c:1617
         snd_seq_dev_release+0x4f/0x70 sound/core/seq_device.c:192
         device_release+0x13f/0x210 drivers/base/core.c:814
         kobject_cleanup lib/kobject.c:648 [inline]
         kobject_release lib/kobject.c:677 [inline]
         kref_put include/linux/kref.h:70 [inline]
         kobject_put+0x145/0x240 lib/kobject.c:694
         put_device+0x25/0x30 drivers/base/core.c:1799
         klist_devices_put+0x36/0x40 drivers/base/bus.c:827
         klist_next+0x264/0x4a0 lib/klist.c:403
         next_device drivers/base/bus.c:270 [inline]
         bus_for_each_dev+0x17e/0x210 drivers/base/bus.c:312
         autoload_drivers+0x3b/0x50 sound/core/seq_device.c:117
         process_one_work+0x9fb/0x1570 kernel/workqueue.c:2097
         worker_thread+0x1e4/0x1350 kernel/workqueue.c:2231
         kthread+0x324/0x3f0 kernel/kthread.c:231
         ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:425
      
      The fix is simply to assure canceling the autoload work at removing
      the device.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1862eca9
    • Dmitry Torokhov's avatar
      Input: ims-psu - check if CDC union descriptor is sane · 9d65d0ea
      Dmitry Torokhov authored
      commit ea04efee upstream.
      
      Before trying to use CDC union descriptor, try to validate whether that it
      is sane by checking that intf->altsetting->extra is big enough and that
      descriptor bLength is not too big and not too small.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d65d0ea
    • Alan Stern's avatar
      usb: usbtest: fix NULL pointer dereference · 8cf061d9
      Alan Stern authored
      commit 7c80f9e4 upstream.
      
      If the usbtest driver encounters a device with an IN bulk endpoint but
      no OUT bulk endpoint, it will try to dereference a NULL pointer
      (out->desc.bEndpointAddress).  The problem can be solved by adding a
      missing test.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cf061d9
    • Johannes Berg's avatar
      mac80211: don't compare TKIP TX MIC key in reinstall prevention · ddd95bc9
      Johannes Berg authored
      commit cfbb0d90 upstream.
      
      For the reinstall prevention, the code I had added compares the
      whole key. It turns out though that iwlwifi firmware doesn't
      provide the TKIP TX MIC key as it's not needed in client mode,
      and thus the comparison will always return false.
      
      For client mode, thus always zero out the TX MIC key part before
      doing the comparison in order to avoid accepting the reinstall
      of the key with identical encryption and RX MIC key, but not the
      same TX MIC key (since the supplicant provides the real one.)
      
      Fixes: fdf7cb41 ("mac80211: accept key reinstall without changing anything")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddd95bc9
    • Jason A. Donenfeld's avatar
      mac80211: use constant time comparison with keys · 38762a51
      Jason A. Donenfeld authored
      commit 2bdd713b upstream.
      
      Otherwise we risk leaking information via timing side channel.
      
      Fixes: fdf7cb41 ("mac80211: accept key reinstall without changing anything")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38762a51
    • Johannes Berg's avatar
      mac80211: accept key reinstall without changing anything · 2586fa00
      Johannes Berg authored
      commit fdf7cb41 upstream.
      
      When a key is reinstalled we can reset the replay counters
      etc. which can lead to nonce reuse and/or replay detection
      being impossible, breaking security properties, as described
      in the "KRACK attacks".
      
      In particular, CVE-2017-13080 applies to GTK rekeying that
      happened in firmware while the host is in D3, with the second
      part of the attack being done after the host wakes up. In
      this case, the wpa_supplicant mitigation isn't sufficient
      since wpa_supplicant doesn't know the GTK material.
      
      In case this happens, simply silently accept the new key
      coming from userspace but don't take any action on it since
      it's the same key; this keeps the PN replay counters intact.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2586fa00
    • Guillaume Nault's avatar
      ppp: fix race in ppp device destruction · ac4cfc73
      Guillaume Nault authored
      
      [ Upstream commit 6151b8b3 ]
      
      ppp_release() tries to ensure that netdevices are unregistered before
      decrementing the unit refcount and running ppp_destroy_interface().
      
      This is all fine as long as the the device is unregistered by
      ppp_release(): the unregister_netdevice() call, followed by
      rtnl_unlock(), guarantee that the unregistration process completes
      before rtnl_unlock() returns.
      
      However, the device may be unregistered by other means (like
      ppp_nl_dellink()). If this happens right before ppp_release() calling
      rtnl_lock(), then ppp_release() has to wait for the concurrent
      unregistration code to release the lock.
      But rtnl_unlock() releases the lock before completing the device
      unregistration process. This allows ppp_release() to proceed and
      eventually call ppp_destroy_interface() before the unregistration
      process completes. Calling free_netdev() on this partially unregistered
      device will BUG():
      
       ------------[ cut here ]------------
       kernel BUG at net/core/dev.c:8141!
       invalid opcode: 0000 [#1] SMP
      
       CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
      
       Call Trace:
        ppp_destroy_interface+0xd8/0xe0 [ppp_generic]
        ppp_disconnect_channel+0xda/0x110 [ppp_generic]
        ppp_unregister_channel+0x5e/0x110 [ppp_generic]
        pppox_unbind_sock+0x23/0x30 [pppox]
        pppoe_connect+0x130/0x440 [pppoe]
        SYSC_connect+0x98/0x110
        ? do_fcntl+0x2c0/0x5d0
        SyS_connect+0xe/0x10
        entry_SYSCALL_64_fastpath+0x1a/0xa5
      
       RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88
       ---[ end trace ed294ff0cc40eeff ]---
      
      We could set the ->needs_free_netdev flag on PPP devices and move the
      ppp_destroy_interface() logic in the ->priv_destructor() callback. But
      that'd be quite intrusive as we'd first need to unlink from the other
      channels and units that depend on the device (the ones that used the
      PPPIOCCONNECT and PPPIOCATTACH ioctls).
      
      Instead, we can just let the netdevice hold a reference on its
      ppp_file. This reference is dropped in ->priv_destructor(), at the very
      end of the unregistration process, so that neither ppp_release() nor
      ppp_disconnect_channel() can call ppp_destroy_interface() in the interim.
      Reported-by: default avatarBeniamino Galvani <bgalvani@redhat.com>
      Fixes: 8cb775bc ("ppp: fix device unregistration upon netns deletion")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac4cfc73
    • Cong Wang's avatar
      net_sched: avoid matching qdisc with zero handle · 7b9870f0
      Cong Wang authored
      
      [ Upstream commit 50317fce ]
      
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b9870f0
    • Xin Long's avatar
      sctp: reset owner sk for data chunks on out queues when migrating a sock · b89fc6a5
      Xin Long authored
      
      [ Upstream commit d04adf1b ]
      
      Now when migrating sock to another one in sctp_sock_migrate(), it only
      resets owner sk for the data in receive queues, not the chunks on out
      queues.
      
      It would cause that data chunks length on the sock is not consistent
      with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
      the old sk would never be freed, and the new sock may crash due to
      the overflow sk_wmem_alloc.
      
      syzbot found this issue with this series:
      
        r0 = socket$inet_sctp()
        sendto$inet(r0)
        listen(r0)
        accept4(r0)
        close(r0)
      
      Although listen() should have returned error when one TCP-style socket
      is in connecting (I may fix this one in another patch), it could also
      be reproduced by peeling off an assoc.
      
      This issue is there since very beginning.
      
      This patch is to reset owner sk for the chunks on out queues so that
      sk sk_wmem_alloc has correct value after accept one sock or peeloff
      an assoc to one sock.
      
      Note that when resetting owner sk for chunks on outqueue, it has to
      sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
      first and then sctp_set_owner_w them after changing assoc->base.sk,
      due to that sctp_wfree and it's callees are using assoc->base.sk.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b89fc6a5
    • Julien Gomes's avatar
      tun: allow positive return values on dev_get_valid_name() call · 210a6418
      Julien Gomes authored
      
      [ Upstream commit 5c25f65f ]
      
      If the name argument of dev_get_valid_name() contains "%d", it will try
      to assign it a unit number in __dev__alloc_name() and return either the
      unit number (>= 0) or an error code (< 0).
      Considering positive values as error values prevent tun device creations
      relying this mechanism, therefor we should only consider negative values
      as errors here.
      Signed-off-by: default avatarJulien Gomes <julien@arista.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      210a6418
    • Xin Long's avatar
      ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit · d6b1aebc
      Xin Long authored
      
      [ Upstream commit 8aec4959 ]
      
      When receiving a Toobig icmpv6 packet, ip6gre_err would just set
      tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
      still be using the old value, it has no chance to be updated with
      tunnel dev's mtu.
      
      Jianlin found this issue by reducing route's mtu while running
      netperf, the performance went to 0.
      
      ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
      the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
      to upper socket after setting tunnel dev's mtu.
      
      We couldn't do that for ip6_gre, as gre's inner packet could be
      any protocol, it's difficult to handle them (like lookup upper
      dst) in a good way.
      
      So this patch is to fix it by updating skb_dst(skb)'s pmtu when
      dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
      update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
      performance regression can be caused by this.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b1aebc
    • Xin Long's avatar
      ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err · 6d428bc4
      Xin Long authored
      
      [ Upstream commit f8d20b46 ]
      
      The similar fix in patch 'ipip: only increase err_count for some
      certain type icmp in ipip_err' is needed for ip6gre_err.
      
      In Jianlin's case, udp netperf broke even when receiving a TooBig
      icmpv6 packet.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d428bc4
    • Xin Long's avatar
      ipip: only increase err_count for some certain type icmp in ipip_err · df0eebce
      Xin Long authored
      
      [ Upstream commit f3594f0a ]
      
      t->err_count is used to count the link failure on tunnel and an err
      will be reported to user socket in tx path if t->err_count is not 0.
      udp socket could even return EHOSTUNREACH to users.
      
      Since commit fd58156e ("IPIP: Use ip-tunneling code.") removed
      the 'switch check' for icmp type in ipip_err(), err_count would be
      increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
      failure would be reported out due to this.
      
      In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
      udp netperf failed with the err:
        send_data: data send error: No route to host (errno 113)
      
      We expect this error reported from tunnel to socket when receiving
      some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
      or ICMP_PARAMETERPROB ones.
      
      This patch is to bring 'switch check' for icmp type back to ipip_err
      so that it only reports link failure for the right type icmp, just as
      in ipgre_err() and ipip6_err().
      
      Fixes: fd58156e ("IPIP: Use ip-tunneling code.")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df0eebce
    • Girish Moodalbail's avatar
      tap: double-free in error path in tap_open() · fbf92277
      Girish Moodalbail authored
      
      [ Upstream commit 78e0ea67 ]
      
      Double free of skb_array in tap module is causing kernel panic. When
      tap_set_queue() fails we free skb_array right away by calling
      skb_array_cleanup(). However, later on skb_array_cleanup() is called
      again by tap_sock_destruct through sock_put(). This patch fixes that
      issue.
      
      Fixes: 362899b8 (macvtap: switch to use skb array)
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbf92277
    • Andrei Vagin's avatar
      net/unix: don't show information about sockets from other namespaces · 62de3fe4
      Andrei Vagin authored
      
      [ Upstream commit 0f5da659 ]
      
      socket_diag shows information only about sockets from a namespace where
      a diag socket lives.
      
      But if we request information about one unix socket, the kernel don't
      check that its netns is matched with a diag socket namespace, so any
      user can get information about any unix socket in a system. This looks
      like a bug.
      
      v2: add a Fixes tag
      
      Fixes: 51d7cccf ("net: make sock diag per-namespace")
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62de3fe4
    • Eric Dumazet's avatar
      tcp/dccp: fix other lockdep splats accessing ireq_opt · 2af59c65
      Eric Dumazet authored
      
      [ Upstream commit 06f877d6 ]
      
      In my first attempt to fix the lockdep splat, I forgot we could
      enter inet_csk_route_req() with a freshly allocated request socket,
      for which refcount has not yet been elevated, due to complex
      SLAB_TYPESAFE_BY_RCU rules.
      
      We either are in rcu_read_lock() section _or_ we own a refcount on the
      request.
      
      Correct RCU verb to use here is rcu_dereference_check(), although it is
      not possible to prove we actually own a reference on a shared
      refcount :/
      
      In v2, I added ireq_opt_deref() helper and use in three places, to fix other
      possible splats.
      
      [   49.844590]  lockdep_rcu_suspicious+0xea/0xf3
      [   49.846487]  inet_csk_route_req+0x53/0x14d
      [   49.848334]  tcp_v4_route_req+0xe/0x10
      [   49.850174]  tcp_conn_request+0x31c/0x6a0
      [   49.851992]  ? __lock_acquire+0x614/0x822
      [   49.854015]  tcp_v4_conn_request+0x5a/0x79
      [   49.855957]  ? tcp_v4_conn_request+0x5a/0x79
      [   49.858052]  tcp_rcv_state_process+0x98/0xdcc
      [   49.859990]  ? sk_filter_trim_cap+0x2f6/0x307
      [   49.862085]  tcp_v4_do_rcv+0xfc/0x145
      [   49.864055]  ? tcp_v4_do_rcv+0xfc/0x145
      [   49.866173]  tcp_v4_rcv+0x5ab/0xaf9
      [   49.868029]  ip_local_deliver_finish+0x1af/0x2e7
      [   49.870064]  ip_local_deliver+0x1b2/0x1c5
      [   49.871775]  ? inet_del_offload+0x45/0x45
      [   49.873916]  ip_rcv_finish+0x3f7/0x471
      [   49.875476]  ip_rcv+0x3f1/0x42f
      [   49.876991]  ? ip_local_deliver_finish+0x2e7/0x2e7
      [   49.878791]  __netif_receive_skb_core+0x6d3/0x950
      [   49.880701]  ? process_backlog+0x7e/0x216
      [   49.882589]  __netif_receive_skb+0x1d/0x5e
      [   49.884122]  process_backlog+0x10c/0x216
      [   49.885812]  net_rx_action+0x147/0x3df
      
      Fixes: a6ca7abe ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2af59c65
    • Eric Dumazet's avatar
      tcp/dccp: fix lockdep splat in inet_csk_route_req() · 3107d4dc
      Eric Dumazet authored
      
      [ Upstream commit a6ca7abe ]
      
      This patch fixes the following lockdep splat in inet_csk_route_req()
      
        lockdep_rcu_suspicious
        inet_csk_route_req
        tcp_v4_send_synack
        tcp_rtx_synack
        inet_rtx_syn_ack
        tcp_fastopen_synack_time
        tcp_retransmit_timer
        tcp_write_timer_handler
        tcp_write_timer
        call_timer_fn
      
      Thread running inet_csk_route_req() owns a reference on the request
      socket, so we have the guarantee ireq->ireq_opt wont be changed or
      freed.
      
      lockdep can enforce this invariant for us.
      
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3107d4dc
    • Laszlo Toth's avatar
      sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND · ec5caf54
      Laszlo Toth authored
      
      [ Upstream commit b71d21c2 ]
      
      Commit 9b974202 ("sctp: support ipv6 nonlocal bind")
      introduced support for the above options as v4 sctp did,
      so patched sctp_v6_available().
      
      In the v4 implementation it's enough, because
      sctp_inet_bind_verify() just returns with sctp_v4_available().
      However sctp_inet6_bind_verify() has an extra check before that
      for link-local scope_id, which won't respect the above options.
      
      Added the checks before calling ipv6_chk_addr(), but
      not before the validation of scope_id.
      
      before (w/ both options):
       ./v6test fe80::10 sctp
       bind failed, errno: 99 (Cannot assign requested address)
       ./v6test fe80::10 tcp
       bind success, errno: 0 (Success)
      
      after (w/ both options):
       ./v6test fe80::10 sctp
       bind success, errno: 0 (Success)
      Signed-off-by: default avatarLaszlo Toth <laszlth@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec5caf54
    • Eric Dumazet's avatar
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 28fa583f
      Eric Dumazet authored
      
      [ Upstream commit 864e2a1f ]
      
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28fa583f