1. 03 Mar, 2017 11 commits
    • Mathias Nyman's avatar
      xhci: free xhci virtual devices with leaf nodes first · 4b6ac340
      Mathias Nyman authored
      [ Upstream commit ee8665e2 ]
      
      the tt_info provided by a HS hub might be in use to by a child device
      Make sure we free the devices in the correct order.
      
      This is needed in special cases such as when xhci controller is
      reset when resuming from hibernate, and all virt_devices are freed.
      
      Also free the virt_devices starting from max slot_id as children
      more commonly have higher slot_id than parent.
      
      CC: <stable@vger.kernel.org>
      Reported-by: default avatarGuenter Roeck <groeck@chromium.org>
      Tested-by: default avatarGuenter Roeck <groeck@chromium.org>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4b6ac340
    • Bartosz Golaszewski's avatar
      ARM: davinci: da850: don't add emac clock to lookup table twice · 3e88ba68
      Bartosz Golaszewski authored
      [ Upstream commit ef37427a ]
      
      Similarly to the aemif clock - this screws up the linked list of clock
      children. Create a separate clock for mdio inheriting the rate from
      emac_clk.
      
      Cc: <stable@vger.kernel.org> # 3.12.x-
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      [nsekhar@ti.com: add a comment over mdio_clk to explaing its existence +
      		 commit headline updates]
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      3e88ba68
    • Alan Stern's avatar
      USB: gadgetfs: fix checks of wTotalLength in config descriptors · 4848f01b
      Alan Stern authored
      [ Upstream commit 1c069b05 ]
      
      Andrey Konovalov's fuzz testing of gadgetfs showed that we should
      improve the driver's checks for valid configuration descriptors passed
      in by the user.  In particular, the driver needs to verify that the
      wTotalLength value in the descriptor is not too short (smaller
      than USB_DT_CONFIG_SIZE).  And the check for whether wTotalLength is
      too large has to be changed, because the driver assumes there is
      always enough room remaining in the buffer to hold a device descriptor
      (at least USB_DT_DEVICE_SIZE bytes).
      
      This patch adds the additional check and fixes the existing check.  It
      may do a little more than strictly necessary, but one extra check
      won't hurt.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      CC: Andrey Konovalov <andreyknvl@google.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4848f01b
    • Alan Stern's avatar
      USB: gadgetfs: fix use-after-free bug · fdd21c63
      Alan Stern authored
      [ Upstream commit add333a8 ]
      
      Andrey Konovalov reports that fuzz testing with syzkaller causes a
      KASAN use-after-free bug report in gadgetfs:
      
      BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2
      Read of size 2 by task syz-executor0/22994
      CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #16
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6
       ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8
       ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748
      Call Trace:
       <IRQ> [  201.343209]  [<     inline     >] __dump_stack lib/dump_stack.c:15
       <IRQ> [  201.343209]  [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
       [<     inline     >] print_address_description mm/kasan/report.c:197
       [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
       [<     inline     >] kasan_report mm/kasan/report.c:306
       [<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337
       [<     inline     >] config_buf drivers/usb/gadget/legacy/inode.c:1298
       [<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368
       [<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858
       [<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
       [<     inline     >] expire_timers kernel/time/timer.c:1348
       [<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641
       [<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
       [<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284
      
      The cause of the bug is subtle.  The dev_config() routine gets called
      twice by the fuzzer.  The first time, the user data contains both a
      full-speed configuration descriptor and a high-speed config
      descriptor, causing dev->hs_config to be set.  But it also contains an
      invalid device descriptor, so the buffer containing the descriptors is
      deallocated and dev_config() returns an error.
      
      The second time dev_config() is called, the user data contains only a
      full-speed config descriptor.  But dev->hs_config still has the stale
      pointer remaining from the first call, causing the routine to think
      that there is a valid high-speed config.  Later on, when the driver
      dereferences the stale pointer to copy that descriptor, we get a
      use-after-free access.
      
      The fix is simple: Clear dev->hs_config if the passed-in data does not
      contain a high-speed config descriptor.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      fdd21c63
    • Alan Stern's avatar
      USB: gadgetfs: fix unbounded memory allocation bug · 6cd527a3
      Alan Stern authored
      [ Upstream commit faab5098 ]
      
      Andrey Konovalov reports that fuzz testing with syzkaller causes a
      KASAN warning in gadgetfs:
      
      BUG: KASAN: slab-out-of-bounds in dev_config+0x86f/0x1190 at addr ffff88003c47e160
      Write of size 65537 by task syz-executor0/6356
      CPU: 3 PID: 6356 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #19
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003c107ad8 ffffffff81f96aba ffffffff3dc11ef0 1ffff10007820eee
       ffffed0007820ee6 ffff88003dc11f00 0000000041b58ab3 ffffffff8598b4c8
       ffffffff81f96828 ffffffff813fb4a0 ffff88003b6eadc0 ffff88003c107738
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
       [<     inline     >] print_address_description mm/kasan/report.c:197
       [<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
       [<ffffffff817e5705>] kasan_report+0x35/0x40 mm/kasan/report.c:306
       [<     inline     >] check_memory_region_inline mm/kasan/kasan.c:308
       [<ffffffff817e3fb9>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315
       [<ffffffff817e4044>] kasan_check_write+0x14/0x20 mm/kasan/kasan.c:326
       [<     inline     >] copy_from_user arch/x86/include/asm/uaccess.h:689
       [<     inline     >] ep0_write drivers/usb/gadget/legacy/inode.c:1135
       [<ffffffff83228caf>] dev_config+0x86f/0x1190 drivers/usb/gadget/legacy/inode.c:1759
       [<ffffffff817fdd55>] __vfs_write+0x5d5/0x760 fs/read_write.c:510
       [<ffffffff817ff650>] vfs_write+0x170/0x4e0 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81803a5b>] SyS_write+0xfb/0x230 fs/read_write.c:599
       [<ffffffff84f47ec1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      Indeed, there is a comment saying that the value of len is restricted
      to a 16-bit integer, but the code doesn't actually do this.
      
      This patch fixes the warning.  It replaces the comment with a
      computation that forces the amount of data copied from the user in
      ep0_write() to be no larger than the wLength size for the control
      transfer, which is a 16-bit quantity.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6cd527a3
    • Greg Kroah-Hartman's avatar
      usb: gadgetfs: restrict upper bound on device configuration size · 709c4986
      Greg Kroah-Hartman authored
      [ Upstream commit 0994b0a2 ]
      
      Andrey Konovalov reported that we were not properly checking the upper
      limit before of a device configuration size before calling
      memdup_user(), which could cause some problems.
      
      So set the upper limit to PAGE_SIZE * 4, which should be good enough for
      all devices.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      709c4986
    • Alan Stern's avatar
      USB: dummy-hcd: fix bug in stop_activity (handle ep0) · dde808bf
      Alan Stern authored
      [ Upstream commit bcdbeb84 ]
      
      The stop_activity() routine in dummy-hcd is supposed to unlink all
      active requests for every endpoint, among other things.  But it
      doesn't handle ep0.  As a result, fuzz testing can generate a WARNING
      like the following:
      
      WARNING: CPU: 0 PID: 4410 at drivers/usb/gadget/udc/dummy_hcd.c:672 dummy_free_request+0x153/0x170
      Modules linked in:
      CPU: 0 PID: 4410 Comm: syz-executor Not tainted 4.9.0-rc7+ #32
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006a64ed10 ffffffff81f96b8a ffffffff41b58ab3 1ffff1000d4c9d35
       ffffed000d4c9d2d ffff880065f8ac00 0000000041b58ab3 ffffffff8598b510
       ffffffff81f968f8 0000000041b58ab3 ffffffff859410e0 ffffffff813f0590
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
       [<ffffffff812b808f>] __warn+0x19f/0x1e0 kernel/panic.c:550
       [<ffffffff812b831c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff830fcb13>] dummy_free_request+0x153/0x170 drivers/usb/gadget/udc/dummy_hcd.c:672
       [<ffffffff830ed1b0>] usb_ep_free_request+0xc0/0x420 drivers/usb/gadget/udc/core.c:195
       [<ffffffff83225031>] gadgetfs_unbind+0x131/0x190 drivers/usb/gadget/legacy/inode.c:1612
       [<ffffffff830ebd8f>] usb_gadget_remove_driver+0x10f/0x2b0 drivers/usb/gadget/udc/core.c:1228
       [<ffffffff830ec084>] usb_gadget_unregister_driver+0x154/0x240 drivers/usb/gadget/udc/core.c:1357
      
      This patch fixes the problem by iterating over all the endpoints in
      the driver's ep array instead of iterating over the gadget's ep_list,
      which explicitly leaves out ep0.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      dde808bf
    • Krzysztof Opasiak's avatar
      usb: gadget: composite: Test get_alt() presence instead of set_alt() · 6c0b4029
      Krzysztof Opasiak authored
      [ Upstream commit 7e4da3fc ]
      
      By convention (according to doc) if function does not provide
      get_alt() callback composite framework should assume that it has only
      altsetting 0 and should respond with error if host tries to set
      other one.
      
      After commit dd4dff8b ("USB: composite: Fix bug: should test
      set_alt function pointer before use it")
      we started checking set_alt() callback instead of get_alt().
      This check is useless as we check if set_alt() is set inside
      usb_add_function() and fail if it's NULL.
      
      Let's fix this check and move comment about why we check the get
      method instead of set a little bit closer to prevent future false
      fixes.
      
      Fixes: dd4dff8b ("USB: composite: Fix bug: should test set_alt function pointer before use it")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Opasiak <k.opasiak@samsung.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6c0b4029
    • Felipe Balbi's avatar
      usb: dwc3: gadget: always unmap EP0 requests · df6a1fb2
      Felipe Balbi authored
      [ Upstream commit d6214592 ]
      
      commit 0416e494 ("usb: dwc3: ep0: correct cache
      sync issue in case of ep0_bounced") introduced a bug
      where we would leak DMA resources which would cause
      us to starve the system of them resulting in failing
      DMA transfers.
      
      Fix the bug by making sure that we always unmap EP0
      requests since those are *always* mapped.
      
      Fixes: 0416e494 ("usb: dwc3: ep0: correct cache
      	sync issue in case of ep0_bounced")
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarTomasz Medrek <tomaszx.medrek@intel.com>
      Reported-by: default avatarJanusz Dziedzic <januszx.dziedzic@linux.intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      df6a1fb2
    • Hauke Mehrtens's avatar
      mtd: nand: xway: disable module support · d4000ff4
      Hauke Mehrtens authored
      [ Upstream commit 73529c87 ]
      
      The xway_nand driver accesses the ltq_ebu_membase symbol which is not
      exported. This also should not get exported and we should handle the
      EBU interface in a better way later. This quick fix just deactivated
      support for building as module.
      
      Fixes: 99f2b107 ("mtd: lantiq: Add NAND support on Lantiq XWAY SoC.")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d4000ff4
    • Eric W. Biederman's avatar
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 6d237451
      Eric W. Biederman authored
      [ Upstream commit 64b875f7 ]
      
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6d237451
  2. 18 Jan, 2017 1 commit
  3. 13 Jan, 2017 28 commits
    • Alexander Duyck's avatar
      gro: Allow tunnel stacking in the case of FOU/GUE · d8435bb5
      Alexander Duyck authored
      [ Upstream commit c3483384 ]
      
      This patch should fix the issues seen with a recent fix to prevent
      tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
      correct for now as long as we do not add any devices that support
      NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
      potential to mess things up due to the fact that the outer transport header
      points to the outer UDP header and not the GRE header as would be expected.
      
      Fixes: fac8e0f5 ("tunnels: Don't apply GRO to multiple layers of encapsulation.")
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d8435bb5
    • Jesse Gross's avatar
      tunnels: Don't apply GRO to multiple layers of encapsulation. · 066b300e
      Jesse Gross authored
      [ Upstream commit fac8e0f5 ]
      
      When drivers express support for TSO of encapsulated packets, they
      only mean that they can do it for one layer of encapsulation.
      Supporting additional levels would mean updating, at a minimum,
      more IP length fields and they are unaware of this.
      
      No encapsulation device expresses support for handling offloaded
      encapsulated packets, so we won't generate these types of frames
      in the transmit path. However, GRO doesn't have a check for
      multiple levels of encapsulation and will attempt to build them.
      
      UDP tunnel GRO actually does prevent this situation but it only
      handles multiple UDP tunnels stacked on top of each other. This
      generalizes that solution to prevent any kind of tunnel stacking
      that would cause problems.
      
      Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
      Signed-off-by: default avatarJesse Gross <jesse@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      066b300e
    • Deepa Dinamani's avatar
      net: ipv4: Convert IP network timestamps to be y2038 safe · b11e1542
      Deepa Dinamani authored
      [ Upstream commit 822c8685 ]
      
      ICMP timestamp messages and IP source route options require
      timestamps to be in milliseconds modulo 24 hours from
      midnight UT format.
      
      Add inet_current_timestamp() function to support this. The function
      returns the required timestamp in network byte order.
      
      Timestamp calculation is also changed to call ktime_get_real_ts64()
      which uses struct timespec64. struct timespec64 is y2038 safe.
      Previously it called getnstimeofday() which uses struct timespec.
      struct timespec is not y2038 safe.
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      b11e1542
    • Jesse Gross's avatar
      ipip: Properly mark ipip GRO packets as encapsulated. · 5023ae27
      Jesse Gross authored
      [ Upstream commit b8cba75b ]
      
      ipip encapsulated packets can be merged together by GRO but the result
      does not have the proper GSO type set or even marked as being
      encapsulated at all. Later retransmission of these packets will likely
      fail if the device does not support ipip offloads. This is similar to
      the issue resolved in IPv6 sit in feec0cb3
      ("ipv6: gro: support sit protocol").
      Reported-by: default avatarPatrick Boutilier <boutilpj@ednet.ns.ca>
      Fixes: 9667e9bb ("ipip: Add gro callbacks to ipip offload")
      Tested-by: default avatarPatrick Boutilier <boutilpj@ednet.ns.ca>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJesse Gross <jesse@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5023ae27
    • Al Viro's avatar
      sg_write()/bsg_write() is not fit to be called under KERNEL_DS · 3e326731
      Al Viro authored
      [ Upstream commit 128394ef ]
      
      Both damn things interpret userland pointers embedded into the payload;
      worse, they are actually traversing those.  Leaving aside the bad
      API design, this is very much _not_ safe to call with KERNEL_DS.
      Bail out early if that happens.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      3e326731
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · 363f1a90
      Aleksa Sarai authored
      [ Upstream commit 613cc2b6 ]
      
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Cc: <stable@vger.kernel.org> # v3.2+
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      363f1a90
    • Bart Van Assche's avatar
      IB/cma: Fix a race condition in iboe_addr_get_sgid() · c63e64ae
      Bart Van Assche authored
      [ Upstream commit fba332b0 ]
      
      Code that dereferences the struct net_device ip_ptr member must be
      protected with an in_dev_get() / in_dev_put() pair. Hence insert
      calls to these functions.
      
      Fixes: commit 7b85627b ("IB/cma: IBoE (RoCE) IP-based GID addressing")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      c63e64ae
    • Takashi Iwai's avatar
      Revert "ALSA: usb-audio: Fix race at stopping the stream" · 6242668e
      Takashi Iwai authored
      [ Upstream commit f8114f85 ]
      
      This reverts commit 16200948.
      
      The commit was intended to cover the race condition, but it introduced
      yet another regression for devices with the implicit feedback, leading
      to a kernel panic due to NULL-dereference in an irq context.
      
      As the race condition that was addressed by the commit is very rare
      and the regression is much worse, let's revert the commit for rc1, and
      fix the issue properly in a later patch.
      
      Fixes: 16200948 ("ALSA: usb-audio: Fix race at stopping the stream")
      Reported-by: default avatarIoan-Adrian Ratiu <adi@adirat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6242668e
    • Jim Mattson's avatar
      kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) · dd503527
      Jim Mattson authored
      [ Upstream commit ef85b673 ]
      
      When L2 exits to L0 due to "exception or NMI", software exceptions
      (#BP and #OF) for which L1 has requested an intercept should be
      handled by L1 rather than L0. Previously, only hardware exceptions
      were forwarded to L1.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      dd503527
    • Russell Currey's avatar
      drivers/gpu/drm/ast: Fix infinite loop if read fails · 6fdb4af1
      Russell Currey authored
      [ Upstream commit 298360af ]
      
      ast_get_dram_info() configures a window in order to access BMC memory.
      A BMC register can be configured to disallow this, and if so, causes
      an infinite loop in the ast driver which renders the system unusable.
      
      Fix this by erroring out if an error is detected.  On powerpc systems with
      EEH, this leads to the device being fenced and the system continuing to
      operate.
      
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.ccSigned-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6fdb4af1
    • Andy Grover's avatar
      target/user: Fix use-after-free of tcmu_cmds if they are expired · d1b9fb84
      Andy Grover authored
      [ Upstream commit d0905ca7 ]
      
      Don't free the cmd in tcmu_check_expired_cmd, it's still referenced by
      an entry in our cmd_id->cmd idr. If userspace ever resumes processing,
      tcmu_handle_completions() will use the now-invalid cmd pointer.
      
      Instead, don't free cmd. It will be freed by tcmu_handle_completion() if
      userspace ever recovers, or tcmu_free_device if not.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBryant G Ly <bgly@us.ibm.com>
      Tested-by: default avatarBryant G Ly <bgly@us.ibm.com>
      Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d1b9fb84
    • Douglas Anderson's avatar
      kernel/debug/debug_core.c: more properly delay for secondary CPUs · bbe48c2f
      Douglas Anderson authored
      [ Upstream commit 2d13bb64 ]
      
      We've got a delay loop waiting for secondary CPUs.  That loop uses
      loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
      many tight loops make up a jiffy on all architectures.  It is quite
      common to see things like this in the boot log:
      
        Calibrating delay loop (skipped), value calculated using timer
        frequency.. 48.00 BogoMIPS (lpj=24000)
      
      In my case I was seeing lots of cases where other CPUs timed out
      entering the debugger only to print their stack crawls shortly after the
      kdb> prompt was written.
      
      Elsewhere in kgdb we already use udelay(), so that should be safe enough
      to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
      should give us a full second of delay (just like the old code wanted)
      but allow us to notice that we're done every 1 ms.
      
      [akpm@linux-foundation.org: simplifications, per Daniel]
      Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.orgSigned-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Brian Norris <briannorris@chromium.org>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      bbe48c2f
    • Wei Fang's avatar
      scsi: avoid a permanent stop of the scsi device's request queue · 1b51fce8
      Wei Fang authored
      [ Upstream commit d2a14525 ]
      
      A race between scanning and fc_remote_port_delete() may result in a
      permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
      and unblocked after.  The reason is that blocking a device sets both the
      SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED.  However,
      scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
      device to be ignored by scsi_target_unblock() and thus never have its
      QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
      running but has a stopped queue.
      
      We actually have two places where SDEV_RUNNING is set: once in
      scsi_add_lun() which respects the blocked flag and once in
      scsi_sysfs_add_sdev() which doesn't.  Since the second set is entirely
      spurious, simply remove it to fix the problem.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarZengxi Chen <chenzengxi@huawei.com>
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1b51fce8
    • Bart Van Assche's avatar
      IB/multicast: Check ib_find_pkey() return value · f413092e
      Bart Van Assche authored
      [ Upstream commit d3a2418e ]
      
      This patch avoids that Coverity complains about not checking the
      ib_find_pkey() return value.
      
      Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      f413092e
    • Bart Van Assche's avatar
      IPoIB: Avoid reading an uninitialized member variable · 8c84816c
      Bart Van Assche authored
      [ Upstream commit 11b642b8 ]
      
      This patch avoids that Coverity reports the following:
      
          Using uninitialized value port_attr.state when calling printk
      
      Fixes: commit 94232d9c ("IPoIB: Start multicast join process only on active ports")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8c84816c
    • NeilBrown's avatar
      block_dev: don't test bdev->bd_contains when it is not stable · 5eba6129
      NeilBrown authored
      [ Upstream commit bcc7f5b4 ]
      
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Cc: stable@vger.kernel.org (v2.6.35+)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5eba6129
    • Maxim Patlasov's avatar
      btrfs: limit async_work allocation and worker func duration · 25e9e236
      Maxim Patlasov authored
      [ Upstream commit 2939e1a8 ]
      
      Problem statement: unprivileged user who has read-write access to more than
      one btrfs subvolume may easily consume all kernel memory (eventually
      triggering oom-killer).
      
      Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):
      
      [root@kteam1 ~]# cat prep.sh
      
      DEV=/dev/sdb
      mkfs.btrfs -f $DEV
      mount $DEV /mnt
      for i in `seq 1 16`
      do
      	mkdir /mnt/$i
      	btrfs subvolume create /mnt/SV_$i
      	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
      	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
      	chmod a+rwx /mnt/$i
      done
      
      [root@kteam1 ~]# sh prep.sh
      
      [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done
      
      [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
      kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
      kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
      kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
      kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0
      
      The huge numbers above come from insane number of async_work-s allocated
      and queued by btrfs_wq_run_delayed_node.
      
      The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
      works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
      worker func (btrfs_async_run_delayed_root) processes at least
      BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
      works as expected while the list is almost empty. As soon as it is getting
      bigger, worker func starts to process more than one item at a time, it takes
      longer, and the chances to have async_works queued more than needed is getting
      higher.
      
      The problem above is worsened by another flaw of delayed-inode implementation:
      if async_work was queued in a throttling branch (number of items >=
      BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
      the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
      the func occupies CPU infinitely (up to 30sec in my experiments): while the
      func is trying to drain the list, the user activity may add more and more
      items to the list.
      
      The patch fixes both problems in straightforward way: refuse queuing too
      many works in btrfs_wq_run_delayed_node and bail out of worker func if
      at least BTRFS_DELAYED_WRITEBACK items are processed.
      
      Changed in v2: remove support of thresh == NO_THRESHOLD.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      25e9e236
    • Shaohua Li's avatar
      mm/vmscan.c: set correct defer count for shrinker · ced9b7a6
      Shaohua Li authored
      [ Upstream commit 5f33a080 ]
      
      Our system uses significantly more slab memory with memcg enabled with
      the latest kernel.  With 3.10 kernel, slab uses 2G memory, while with
      4.6 kernel, 6G memory is used.  The shrinker has problem.  Let's see we
      have two memcg for one shrinker.  In do_shrink_slab:
      
      1. Check cg1.  nr_deferred = 0, assume total_scan = 700.  batch size
         is 1024, then no memory is freed.  nr_deferred = 700
      
      2. Check cg2.  nr_deferred = 700.  Assume freeable = 20, then
         total_scan = 10 or 40.  Let's assume it's 10.  No memory is freed.
         nr_deferred = 10.
      
      The deferred share of cg1 is lost in this case.  kswapd will free no
      memory even run above steps again and again.
      
      The fix makes sure one memcg's deferred share isn't lost.
      
      Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      ced9b7a6
    • Jingkui Wang's avatar
      Input: drv260x - fix input device's parent assignment · a62e4587
      Jingkui Wang authored
      [ Upstream commit 5a8a6b89 ]
      
      We were assigning I2C bus controller instead of client as parent device.
      Besides being logically wrong, it messed up with devm handling of input
      device. As a result we were leaving input device and event node behind
      after rmmod-ing the driver, which lead to a kernel oops if one were to
      access the event node later.
      
      Let's remove the assignment and rely on devm_input_allocate_device() to
      set it up properly for us.
      Signed-off-by: default avatarJingkui Wang <jkwang@google.com>
      Fixes: 7132fe4f ("Input: drv260x - add TI drv260x haptics driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      a62e4587
    • Ilya Dryomov's avatar
      libceph: verify authorize reply on connect · e1589b27
      Ilya Dryomov authored
      [ Upstream commit 5c056fdc ]
      
      After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
      the client gets back a ceph_x_authorize_reply, which it is supposed to
      verify to ensure the authenticity and protect against replay attacks.
      The code for doing this is there (ceph_x_verify_authorizer_reply(),
      ceph_auth_verify_authorizer_reply() + plumbing), but it is never
      invoked by the the messenger.
      
      AFAICT this goes back to 2009, when ceph authentication protocols
      support was added to the kernel client in 4e7a5dcd ("ceph:
      negotiate authentication protocol; implement AUTH_NONE protocol").
      
      The second param of ceph_connection_operations::verify_authorizer_reply
      is unused all the way down.  Pass 0 to facilitate backporting, and kill
      it in the next commit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      e1589b27
    • Jussi Laako's avatar
      ALSA: hiface: Fix M2Tech hiFace driver sampling rate change · ed21b94e
      Jussi Laako authored
      [ Upstream commit 995c6a7f ]
      
      Sampling rate changes after first set one are not reflected to the
      hardware, while driver and ALSA think the rate has been changed.
      
      Fix the problem by properly stopping the interface at the beginning of
      prepare call, allowing new rate to be set to the hardware. This keeps
      the hardware in sync with the driver.
      Signed-off-by: default avatarJussi Laako <jussi@sonarnerd.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      ed21b94e
    • Gerald Schaefer's avatar
      s390/vmlogrdr: fix IUCV buffer allocation · f8c36e2b
      Gerald Schaefer authored
      [ Upstream commit 5457e03d ]
      
      The buffer for iucv_message_receive() needs to be below 2 GB. In
      __iucv_message_receive(), the buffer address is casted to an u32, which
      would result in either memory corruption or an addressing exception when
      using addresses >= 2 GB.
      
      Fix this by using GFP_DMA for the buffer allocation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      f8c36e2b
    • Ben Hutchings's avatar
      kconfig/nconf: Fix hang when editing symbol with a long prompt · 0848a267
      Ben Hutchings authored
      [ Upstream commit 79e51b5c ]
      
      Currently it is impossible to edit the value of a config symbol with a
      prompt longer than (terminal width - 2) characters.  dialog_inputbox()
      calculates a negative x-offset for the input window and newwin() fails
      as this is invalid.  It also doesn't check for this failure, so it
      busy-loops calling wgetch(NULL) which immediately returns -1.
      
      The additions in the offset calculations also don't match the intended
      size of the window.
      
      Limit the window size and calculate the offset similarly to
      show_scroll_win().
      
      Cc: stable <stable@vger.kernel.org>
      Fixes: 692d97c3 ("kconfig: new configuration interface (nconfig)")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      0848a267
    • NeilBrown's avatar
      SUNRPC: fix refcounting problems with auth_gss messages. · 7b53fb41
      NeilBrown authored
      [ Upstream commit 1cded9d2 ]
      
      There are two problems with refcounting of auth_gss messages.
      
      First, the reference on the pipe->pipe list (taken by a call
      to rpc_queue_upcall()) is not counted.  It seems to be
      assumed that a message in pipe->pipe will always also be in
      pipe->in_downcall, where it is correctly reference counted.
      
      However there is no guaranty of this.  I have a report of a
      NULL dereferences in rpc_pipe_read() which suggests a msg
      that has been freed is still on the pipe->pipe list.
      
      One way I imagine this might happen is:
      - message is queued for uid=U and auth->service=S1
      - rpc.gssd reads this message and starts processing.
        This removes the message from pipe->pipe
      - message is queued for uid=U and auth->service=S2
      - rpc.gssd replies to the first message. gss_pipe_downcall()
        calls __gss_find_upcall(pipe, U, NULL) and it finds the
        *second* message, as new messages are placed at the head
        of ->in_downcall, and the service type is not checked.
      - This second message is removed from ->in_downcall and freed
        by gss_release_msg() (even though it is still on pipe->pipe)
      - rpc.gssd tries to read another message, and dereferences a pointer
        to this message that has just been freed.
      
      I fix this by incrementing the reference count before calling
      rpc_queue_upcall(), and decrementing it if that fails, or normally in
      gss_pipe_destroy_msg().
      
      It seems strange that the reply doesn't target the message more
      precisely, but I don't know all the details.  In any case, I think the
      reference counting irregularity became a measureable bug when the
      extra arg was added to __gss_find_upcall(), hence the Fixes: line
      below.
      
      The second problem is that if rpc_queue_upcall() fails, the new
      message is not freed. gss_alloc_msg() set the ->count to 1,
      gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
      then the pointer is discarded so the memory never gets freed.
      
      Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7b53fb41
    • Dan Carpenter's avatar
      ext4: return -ENOMEM instead of success · 14927595
      Dan Carpenter authored
      [ Upstream commit 578620f4 ]
      
      We should set the error code if kzalloc() fails.
      
      Fixes: 67cf5b09 ("ext4: add the basic function for inline data support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      14927595
    • Al Viro's avatar
      nfs_write_end(): fix handling of short copies · 576cfe61
      Al Viro authored
      [ Upstream commit c0cf3ef5 ]
      
      What matters when deciding if we should make a page uptodate is
      not how much we _wanted_ to copy, but how much we actually have
      copied.  As it is, on architectures that do not zero tail on
      short copy we can leave uninitialized data in page marked uptodate.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      576cfe61
    • Steven Rostedt (Red Hat)'s avatar
      fgraph: Handle a case where a tracer ignores set_graph_notrace · 77f3c11c
      Steven Rostedt (Red Hat) authored
      [ Upstream commit 794de08a ]
      
      Both the wakeup and irqsoff tracers can use the function graph tracer when
      the display-graph option is set. The problem is that they ignore the notrace
      file, and record the entry of functions that would be ignored by the
      function_graph tracer. This causes the trace->depth to be recorded into the
      ring buffer. The set_graph_notrace uses a trick by adding a large negative
      number to the trace->depth when a graph function is to be ignored.
      
      On trace output, the graph function uses the depth to record a stack of
      functions. But since the depth is negative, it accesses the array with a
      negative number and causes an out of bounds access that can cause a kernel
      oops or corrupt data.
      
      Have the print functions handle cases where a tracer still records functions
      even when they are in set_graph_notrace.
      
      Also add warnings if the depth is below zero before accessing the array.
      
      Note, the function graph logic will still prevent the return of these
      functions from being recorded, which means that they will be left hanging
      without a return. For example:
      
         # echo '*spin*' > set_graph_notrace
         # echo 1 > options/display-graph
         # echo wakeup > current_tracer
         # cat trace
         [...]
            _raw_spin_lock() {
              preempt_count_add() {
              do_raw_spin_lock() {
            update_rq_clock();
      
      Where it should look like:
      
            _raw_spin_lock() {
              preempt_count_add();
              do_raw_spin_lock();
            }
            update_rq_clock();
      
      Cc: stable@vger.kernel.org
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Fixes: 29ad23b0 ("ftrace: Add set_graph_notrace filter")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      77f3c11c
    • Thomas Gleixner's avatar
      timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · 29955c9a
      Thomas Gleixner authored
      [ Upstream commit 9c164572 ]
      
      The clocksource delta to nanoseconds conversion is using signed math, but
      the delta is unsigned. This makes the conversion space smaller than
      necessary and in case of a multiplication overflow the conversion can
      become negative. The conversion is done with scaled math:
      
          s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
      
      Shifting a signed integer right obvioulsy preserves the sign, which has
      interesting consequences:
      
       - Time jumps backwards
      
       - __iter_div_u64_rem() which is used in one of the calling code pathes
         will take forever to piecewise calculate the seconds/nanoseconds part.
      
      This has been reported by several people with different scenarios:
      
      David observed that when stopping a VM with a debugger:
      
       "It was essentially the stopped by debugger case.  I forget exactly why,
        but the guest was being explicitly stopped from outside, it wasn't just
        scheduling lag.  I think it was something in the vicinity of 10 minutes
        stopped."
      
       When lifting the stop the machine went dead.
      
      The stopped by debugger case is not really interesting, but nevertheless it
      would be a good thing not to die completely.
      
      But this was also observed on a live system by Liav:
      
       "When the OS is too overloaded, delta will get a high enough value for the
        msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
        after the shift the nsec variable will gain a value similar to
        0xffffffffff000000."
      
      Unfortunately this has been reintroduced recently with commit 6bd58f09
      ("time: Add cycles to nanoseconds translation"). It had been fixed a year
      ago already in commit 35a4933a ("time: Avoid signed overflow in
      timekeeping_get_ns()").
      
      Though it's not surprising that the issue has been reintroduced because the
      function itself and the whole call chain uses s64 for the result and the
      propagation of it. The change in this recent commit is subtle:
      
         s64 nsec;
      
      -  nsec = (d * m + n) >> s:
      +  nsec = d * m + n;
      +  nsec >>= s;
      
      d being type of cycle_t adds another level of obfuscation.
      
      This wouldn't have happened if the previous change to unsigned computation
      would have made the 'nsec' variable u64 right away and a follow up patch
      had cleaned up the whole call chain.
      
      There have been patches submitted which basically did a revert of the above
      patch leaving everything else unchanged as signed. Back to square one. This
      spawned a admittedly pointless discussion about potential users which rely
      on the unsigned behaviour until someone pointed out that it had been fixed
      before. The changelogs of said patches added further confusion as they made
      finally false claims about the consequences for eventual users which expect
      signed results.
      
      Despite delta being cycle_t, aka. u64, it's very well possible to hand in
      a signed negative value and the signed computation will happily return the
      correct result. But nobody actually sat down and analyzed the code which
      was added as user after the propably unintended signed conversion.
      
      Though in sensitive code like this it's better to analyze it proper and
      make sure that nothing relies on this than hunting the subtle wreckage half
      a year later. After analyzing all call chains it stands that no caller can
      hand in a negative value (which actually would work due to the s64 cast)
      and rely on the signed math to do the right thing.
      
      Change the conversion function to unsigned math. The conversion of all call
      chains is done in a follow up patch.
      
      This solves the starvation issue, which was caused by the negative result,
      but it does not solve the underlying problem. It merily procrastinates
      it. When the timekeeper update is deferred long enough that the unsigned
      multiplication overflows, then time going backwards is observable again.
      
      It does neither solve the issue of clocksources with a small counter width
      which will wrap around possibly several times and cause random time stamps
      to be generated. But those are usually not found on systems used for
      virtualization, so this is likely a non issue.
      
      I took the liberty to claim authorship for this simply because
      analyzing all callsites and writing the changelog took substantially
      more time than just making the simple s/s64/u64/ change and ignore the
      rest.
      
      Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
      Reported-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reported-by: default avatarLiav Rehana <liavr@mellanox.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      29955c9a