1. 27 Jul, 2018 3 commits
    • Dave Jiang's avatar
      mm: disallow mappings that conflict for devm_memremap_pages() · 15d36fec
      Dave Jiang authored
      When pmem namespaces created are smaller than section size, this can
      cause an issue during removal and gpf was observed:
      
        general protection fault: 0000 1 SMP PTI
        CPU: 36 PID: 3941 Comm: ndctl Tainted: G W 4.14.28-1.el7uek.x86_64 #2
        task: ffff88acda150000 task.stack: ffffc900233a4000
        RIP: 0010:__put_page+0x56/0x79
        Call Trace:
          devm_memremap_pages_release+0x155/0x23a
          release_nodes+0x21e/0x260
          devres_release_all+0x3c/0x48
          device_release_driver_internal+0x15c/0x207
          device_release_driver+0x12/0x14
          unbind_store+0xba/0xd8
          drv_attr_store+0x27/0x31
          sysfs_kf_write+0x3f/0x46
          kernfs_fop_write+0x10f/0x18b
          __vfs_write+0x3a/0x16d
          vfs_write+0xb2/0x1a1
          SyS_write+0x55/0xb9
          do_syscall_64+0x79/0x1ae
          entry_SYSCALL_64_after_hwframe+0x3d/0x0
      
      Add code to check whether we have a mapping already in the same section
      and prevent additional mappings from being created if that is the case.
      
      Link: http://lkml.kernel.org/r/152909478401.50143.312364396244072931.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15d36fec
    • Arnd Bergmann's avatar
      kasan: only select SLUB_DEBUG with SYSFS=y · 03758dbb
      Arnd Bergmann authored
      Building with KASAN and SLUB but without sysfs now results in a
      build-time error:
      
        WARNING: unmet direct dependencies detected for SLUB_DEBUG
          Depends on [n]: SLUB [=y] && SYSFS [=n]
          Selected by [y]:
          - KASAN [=y] && HAVE_ARCH_KASAN [=y] && (SLUB [=y] || SLAB [=n] && !DEBUG_SLAB [=n]) && SLUB [=y]
        mm/slub.c:4565:12: error: 'list_locations' defined but not used [-Werror=unused-function]
         static int list_locations(struct kmem_cache *s, char *buf,
                    ^~~~~~~~~~~~~~
        mm/slub.c:4406:13: error: 'validate_slab_cache' defined but not used [-Werror=unused-function]
         static long validate_slab_cache(struct kmem_cache *s)
      
      This disallows that broken configuration in Kconfig.
      
      Link: http://lkml.kernel.org/r/20180709154019.1693026-1-arnd@arndb.de
      Fixes: dd275caf ("kasan: depend on CONFIG_SLUB_DEBUG")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03758dbb
    • Tejun Heo's avatar
      delayacct: fix crash in delayacct_blkio_end() after delayacct init failure · b512719f
      Tejun Heo authored
      While forking, if delayacct init fails due to memory shortage, it
      continues expecting all delayacct users to check task->delays pointer
      against NULL before dereferencing it, which all of them used to do.
      
      Commit c96f5471 ("delayacct: Account blkio completion on the correct
      task"), while updating delayacct_blkio_end() to take the target task
      instead of always using %current, made the function test NULL on
      %current->delays and then continue to operated on @p->delays.  If
      %current succeeded init while @p didn't, it leads to the following
      crash.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
       IP: __delayacct_blkio_end+0xc/0x40
       PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
       RIP: 0010:__delayacct_blkio_end+0xc/0x40
       Call Trace:
        try_to_wake_up+0x2c0/0x600
        autoremove_wake_function+0xe/0x30
        __wake_up_common+0x74/0x120
        wake_up_page_bit+0x9c/0xe0
        mpage_end_io+0x27/0x70
        blk_update_request+0x78/0x2c0
        scsi_end_request+0x2c/0x1e0
        scsi_io_completion+0x20b/0x5f0
        blk_mq_complete_request+0xa2/0x100
        ata_scsi_qc_complete+0x79/0x400
        ata_qc_complete_multiple+0x86/0xd0
        ahci_handle_port_interrupt+0xc9/0x5c0
        ahci_handle_port_intr+0x54/0xb0
        ahci_single_level_irq_intr+0x3b/0x60
        __handle_irq_event_percpu+0x43/0x190
        handle_irq_event_percpu+0x20/0x50
        handle_irq_event+0x2a/0x50
        handle_edge_irq+0x80/0x1c0
        handle_irq+0xaf/0x120
        do_IRQ+0x41/0xc0
        common_interrupt+0xf/0xf
      
      Fix it by updating delayacct_blkio_end() check @p->delays instead.
      
      Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
      Fixes: c96f5471 ("delayacct: Account blkio completion on the correct task")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <dsj@fb.com>
      Debugged-by: default avatarDave Jones <dsj@fb.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Josh Snyder <joshs@netflix.com>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b512719f
  2. 26 Jul, 2018 5 commits
    • Linus Torvalds's avatar
      Merge tag 'usb-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cd3f77d7
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of USB fixes and new device ids for 4.18-rc7.
      
        The largest number are a bunch of gadget driver fixes that got delayed
        in being submitted earlier due to vacation schedules, but nothing
        really huge is present in them. There are some new device ids and some
        PHY driver fixes that were connected to some USB ones. Full details
        are in the shortlog.
      
        All have been in linux-next for a while with no reported issues"
      
      * tag 'usb-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
        usb: core: handle hub C_PORT_OVER_CURRENT condition
        usb: xhci: Fix memory leak in xhci_endpoint_reset()
        usb: typec: tcpm: Fix sink PDO starting index for PPS APDO selection
        usb: gadget: f_fs: Only return delayed status when len is 0
        usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3'
        usb: dwc2: Fix inefficient copy of unaligned buffers
        usb: dwc2: Fix DMA alignment to start at allocated boundary
        usb: dwc3: rockchip: Fix PHY documentation links.
        tools: usb: ffs-test: Fix build on big endian systems
        usb: gadget: aspeed: Workaround memory ordering issue
        usb: dwc3: gadget: remove redundant variable maxpacket
        usb: dwc2: avoid NULL dereferences
        usb/phy: fix PPC64 build errors in phy-fsl-usb.c
        usb: dwc2: host: do not delay retries for CONTROL IN transfers
        usb: gadget: u_audio: protect stream runtime fields with stream spinlock
        usb: gadget: u_audio: remove cached period bytes value
        usb: gadget: u_audio: remove caching of stream buffer parameters
        usb: gadget: u_audio: update hw_ptr in iso_complete after data copied
        usb: gadget: u_audio: fix pcm/card naming in g_audio_setup()
        usb: gadget: f_uac2: fix error handling in afunc_bind (again)
        ...
      cd3f77d7
    • Linus Torvalds's avatar
      Merge tag 'staging-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · fd4f84fa
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are three small staging driver fixes for 4.18-rc7.
      
        One is a revert of an earlier patch that turned out to be incorrect,
        one is a fix for the speakup drivers, and the last a fix for the
        ks7010 driver to resolve a regression.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'staging-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: speakup: fix wraparound in uaccess length check
        staging: ks7010: call 'hostif_mib_set_request_int' instead of 'hostif_mib_set_request_bool'
        Revert "staging:r8188eu: Use lib80211 to support TKIP"
      fd4f84fa
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.18-rc7' of... · a5f9e5da
      Linus Torvalds authored
      Merge tag 'driver-core-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fix from Greg KH:
       "This is a single driver core fix for 4.18-rc7. It partially reverts a
        previous commit to resolve some reported issues.
      
        It has been in linux-next for a while now with no reported issues"
      
      * tag 'driver-core-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Partially revert "driver core: correct device's shutdown order"
      a5f9e5da
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9bd59183
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Fix a recent ACPICA regression causing the AML parser to get confused
        and fail in some situations involving incorrect AML in an ACPI table
        (Erik Schmauss)"
      
      * tag 'acpi-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPICA: AML Parser: ignore dispatcher error status during table load
      9bd59183
    • Linus Torvalds's avatar
      Merge tag 'pm-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 99015e94
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix up the recently introduced cpufreq driver for Qualcomm Kryo
        processors by adding a terminating NULL entry to its table of device
        IDs (YueHaibing)"
      
      * tag 'pm-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: qcom-kryo: add NULL entry to the end of_device_id array
      99015e94
  3. 25 Jul, 2018 9 commits
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 6e77b267
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One more round of updates for problems seen this -rc series. Drivers
        fixes are:
      
         - Amlogic Meson audio divider fix and CPU clk critical marking
      
         - Qualcomm multimedia GDSC marked as 'always on' to keep display
           working
      
         - Aspeed fixes for critical clks, resets causing clks to stay
           disabled, and an incorrect HPLL frequency calculation
      
         - Marvell Armada 3700 cpu clks would undervolt when switching from
           low frequencies to high frequencies because the voltage didn't
           stabilize in time so now we switch to an intermediate frequency
      
        Plus we have a core framework thinko that messed up the debugfs flag
        printing logic to make it not very useful"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: aspeed: Support HPLL strapping on ast2400
        clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz
        clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
        clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON
        clk: aspeed: Treat a gate in reset as disabled
        clk: Really show symbolic clock flags in debugfs
        clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock
        clk: meson: audio-divider is one based
        clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
      6e77b267
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20180725' of... · 5c61ef1b
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache/cachefiles fixes from David Howells:
      
       - Allow cancelled operations to be queued so they can be cleaned up.
      
       - Fix a refcounting bug in the monitoring of reads on backend files
         whereby a race can occur between monitor objects being listed for
         work, the work processing being queued and the work processor running
         and destroying the monitor objects.
      
       - Fix a ref overput in object attachment, whereby a tentatively
         considered object is put in error handling without first being 'got'.
      
       - Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
         assertion occurs when we retry because it seems the object is now
         active.
      
       - Wait rather BUG'ing on an object collision in the depths of
         cachefiles as the active object should be being cleaned up - also
         depends on the one above.
      
      * tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
        cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
        fscache: Fix reference overput in fscache_attach_object() error handling
        cachefiles: Fix refcounting bug in backing-file read monitoring
        fscache: Allow cancelled operations to be enqueued
      5c61ef1b
    • Kiran Kumar Modukuri's avatar
      cachefiles: Wait rather than BUG'ing on "Unexpected object collision" · c2412ac4
      Kiran Kumar Modukuri authored
      If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
      the active object tree, we have been emitting a BUG after logging
      information about it and the new object.
      
      Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
      on the old object (or return an error).  The ACTIVE flag should be cleared
      after it has been removed from the active object tree.  A timeout of 60s is
      used in the wait, so we shouldn't be able to get stuck there.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c2412ac4
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag · 5ce83d4b
      Kiran Kumar Modukuri authored
      In cachefiles_mark_object_active(), the new object is marked active and
      then we try to add it to the active object tree.  If a conflicting object
      is already present, we want to wait for that to go away.  After the wait,
      we go round again and try to re-mark the object as being active - but it's
      already marked active from the first time we went through and a BUG is
      issued.
      
      Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
      
      Analysis from Kiran Kumar Modukuri:
      
      [Impact]
      Oops during heavy NFS + FSCache + Cachefiles
      
      CacheFiles: Error: Overlong wait for old active object to go away.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      
      CacheFiles: Error: Object already active kernel BUG at
      fs/cachefiles/namei.c:163!
      
      [Cause]
      In a heavily loaded system with big files being read and truncated, an
      fscache object for a cookie is being dropped and a new object being
      looked. The new object being looked for has to wait for the old object
      to go away before the new object is moved to active state.
      
      [Fix]
      Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
      retrying the object lookup.
      
      [Testcase]
      Have run ~100 hours of NFS stress tests and have not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      5ce83d4b
    • Kiran Kumar Modukuri's avatar
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri authored
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      
      
      Analysis from Kiran Kumar:
      
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      
      [Cause]
      Two threads are trying to do operate on a cookie and two objects.
      
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
      
            nfs_fscache_release_super_cookie
             -> __fscache_relinquish_cookie
              ->__fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      (2) A second thread tries to lookup an object for reading data in following
          path:
      
          fscache_alloc_object
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                ->fscache_object_destroy
                  ->fscache_cookie_put
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
      
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          ...
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
           deactivate_locked_super+0x48/0x80
           deactivate_super+0x5c/0x60
           cleanup_mnt+0x3f/0x90
           __cleanup_mnt+0x12/0x20
           task_work_run+0x86/0xb0
           exit_to_usermode_loop+0xc2/0xd0
           syscall_return_slowpath+0x4e/0x60
           int_ret_from_sys_call+0x25/0x9f
      
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      fscache_attach_object().
      
      [Testcase]
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f29507ce
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix refcounting bug in backing-file read monitoring · 934140ab
      Kiran Kumar Modukuri authored
      cachefiles_read_waiter() has the right to access a 'monitor' object by
      virtue of being called under the waitqueue lock for one of the pages in its
      purview.  However, it has no ref on that monitor object or on the
      associated operation.
      
      What it is allowed to do is to move the monitor object to the operation's
      to_do list, but once it drops the work_lock, it's actually no longer
      permitted to access that object.  However, it is trying to enqueue the
      retrieval operation for processing - but it can only do this via a pointer
      in the monitor object, something it shouldn't be doing.
      
      If it doesn't enqueue the operation, the operation may not get processed.
      If the order is flipped so that the enqueue is first, then it's possible
      for the work processor to look at the to_do list before the monitor is
      enqueued upon it.
      
      Fix this by getting a ref on the operation so that we can trust that it
      will still be there once we've added the monitor to the to_do list and
      dropped the work_lock.  The op can then be enqueued after the lock is
      dropped.
      
      The bug can manifest in one of a couple of ways.  The first manifestation
      looks like:
      
       FS-Cache:
       FS-Cache: Assertion failed
       FS-Cache: 6 == 5 is false
       ------------[ cut here ]------------
       kernel BUG at fs/fscache/operation.c:494!
       RIP: 0010:fscache_put_operation+0x1e3/0x1f0
       ...
       fscache_op_work_func+0x26/0x50
       process_one_work+0x131/0x290
       worker_thread+0x45/0x360
       kthread+0xf8/0x130
       ? create_worker+0x190/0x190
       ? kthread_cancel_work_sync+0x10/0x10
       ret_from_fork+0x1f/0x30
      
      This is due to the operation being in the DEAD state (6) rather than
      INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
      fscache_put_operation().
      
      The bug can also manifest like the following:
      
       kernel BUG at fs/fscache/operation.c:69!
       ...
          [exception RIP: fscache_enqueue_operation+246]
       ...
       #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
       #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
       #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
      
      I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
      entirely clear which assertion failed.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarLei Xue <carmark.dlut@gmail.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Reported-by: default avatarAnthony DeRobertis <aderobertis@metrics.net>
      Reported-by: default avatarNeilBrown <neilb@suse.com>
      Reported-by: default avatarDaniel Axtens <dja@axtens.net>
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      934140ab
    • Kiran Kumar Modukuri's avatar
      fscache: Allow cancelled operations to be enqueued · d0eb06af
      Kiran Kumar Modukuri authored
      Alter the state-check assertion in fscache_enqueue_operation() to allow
      cancelled operations to be given processing time so they can be cleaned up.
      
      Also fix a debugging statement that was requiring such operations to have
      an object assigned.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d0eb06af
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 9981b4fb
      Linus Torvalds authored
      Pull MIPS fixes from Paul Burton:
       "A couple more MIPS fixes for 4.18:
      
         - Fix an off-by-one in reporting PCI resource sizes to userland which
           regressed in v3.12.
      
         - Fix writes to DDR controller registers used to flush write buffers,
           which regressed with some refactoring in v4.2"
      
      * tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: ath79: fix register address in ath79_ddr_wb_flush()
        MIPS: Fix off-by-one in pci_resource_to_user()
      9981b4fb
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 07230906
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Handle stations tied to AP_VLANs properly during mac80211 hw
          reconfig. From Manikanta Pubbisetty.
      
       2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.
      
       3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
          Ben Elisha.
      
       4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.
      
       5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.
      
       6) Fix cached netdev name leak in nf_tables, from Florian Westphal.
      
       7) Fix memory leaks on chain rename, also from Florian Westphal.
      
       8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
          Cheng.
      
       9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.
      
      10) Fix link local address handling with VRF, from David Ahern.
      
      11) Don't clobber 'err' on a successful call to __skb_linearize() in
          skb_segment(). From Eric Dumazet.
      
      12) Fix vxlan fdb notification races, from Roopa Prabhu.
      
      13) Hash UDP fragments consistently, from Paolo Abeni.
      
      14) If TCP receives lots of out of order tiny packets, we do really
          silly stuff. Make the out-of-order queue ending more robust to this
          kind of behavior, from Eric Dumazet.
      
      15) Don't leak netlink dump state in nf_tables, from Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
        net: axienet: Fix double deregister of mdio
        qmi_wwan: fix interface number for DW5821e production firmware
        ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
        bnx2x: Fix invalid memory access in rss hash config path.
        net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
        r8169: restore previous behavior to accept BIOS WoL settings
        cfg80211: never ignore user regulatory hint
        sock: fix sg page frag coalescing in sk_alloc_sg
        netfilter: nf_tables: move dumper state allocation into ->start
        tcp: add tcp_ooo_try_coalesce() helper
        tcp: call tcp_drop() from tcp_data_queue_ofo()
        tcp: detect malicious patterns in tcp_collapse_ofo_queue()
        tcp: avoid collapses in tcp_prune_queue() if possible
        tcp: free batches of packets in tcp_prune_ofo_queue()
        ip: hash fragments consistently
        ipv6: use fib6_info_hold_safe() when necessary
        can: xilinx_can: fix power management handling
        can: xilinx_can: fix incorrect clear of non-processed interrupts
        can: xilinx_can: fix RX overflow interrupt not being enabled
        can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
        ...
      07230906
  4. 24 Jul, 2018 14 commits
    • Shubhrajyoti Datta's avatar
      net: axienet: Fix double deregister of mdio · 03bc7cab
      Shubhrajyoti Datta authored
      If the registration fails then mdio_unregister is called.
      However at unbind the unregister ia attempted again resulting
      in the below crash
      
      [   73.544038] kernel BUG at drivers/net/phy/mdio_bus.c:415!
      [   73.549362] Internal error: Oops - BUG: 0 [#1] SMP
      [   73.554127] Modules linked in:
      [   73.557168] CPU: 0 PID: 2249 Comm: sh Not tainted 4.14.0 #183
      [   73.562895] Hardware name: xlnx,zynqmp (DT)
      [   73.567062] task: ffffffc879e41180 task.stack: ffffff800cbe0000
      [   73.572973] PC is at mdiobus_unregister+0x84/0x88
      [   73.577656] LR is at axienet_mdio_teardown+0x18/0x30
      [   73.582601] pc : [<ffffff80085fa4cc>] lr : [<ffffff8008616858>]
      pstate: 20000145
      [   73.589981] sp : ffffff800cbe3c30
      [   73.593277] x29: ffffff800cbe3c30 x28: ffffffc879e41180
      [   73.598573] x27: ffffff8008a21000 x26: 0000000000000040
      [   73.603868] x25: 0000000000000124 x24: ffffffc879efe920
      [   73.609164] x23: 0000000000000060 x22: ffffffc879e02000
      [   73.614459] x21: ffffffc879e02800 x20: ffffffc87b0b8870
      [   73.619754] x19: ffffffc879e02800 x18: 000000000000025d
      [   73.625050] x17: 0000007f9a719ad0 x16: ffffff8008195bd8
      [   73.630345] x15: 0000007f9a6b3d00 x14: 0000000000000010
      [   73.635640] x13: 74656e7265687465 x12: 0000000000000030
      [   73.640935] x11: 0000000000000030 x10: 0101010101010101
      [   73.646231] x9 : 241f394f42533300 x8 : ffffffc8799f6e98
      [   73.651526] x7 : ffffffc8799f6f18 x6 : ffffffc87b0ba318
      [   73.656822] x5 : ffffffc87b0ba498 x4 : 0000000000000000
      [   73.662117] x3 : 0000000000000000 x2 : 0000000000000008
      [   73.667412] x1 : 0000000000000004 x0 : ffffffc8799f4000
      [   73.672708] Process sh (pid: 2249, stack limit = 0xffffff800cbe0000)
      
      Fix the same by making the bus NULL on unregister.
      Signed-off-by: default avatarShubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03bc7cab
    • Aleksander Morgado's avatar
      qmi_wwan: fix interface number for DW5821e production firmware · f25e1392
      Aleksander Morgado authored
      The original mapping for the DW5821e was done using a development
      version of the firmware. Confirmed with the vendor that the final
      USB layout ends up exposing the QMI control/data ports in USB
      config #1, interface #0, not in interface #1 (which is now a HID
      interface).
      
      T:  Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0
      D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  2
      P:  Vendor=413c ProdID=81d7 Rev=03.18
      S:  Manufacturer=DELL
      S:  Product=DW5821e Snapdragon X20 LTE
      S:  SerialNumber=0123456789ABCDEF
      C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 1 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid
      I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      
      Fixes: e7e197ed ("qmi_wwan: add support for the Dell Wireless 5821e module")
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f25e1392
    • Willem de Bruijn's avatar
      ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull · 2efd4fca
      Willem de Bruijn authored
      Syzbot reported a read beyond the end of the skb head when returning
      IPV6_ORIGDSTADDR:
      
        BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
        CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
        Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x185/0x1d0 lib/dump_stack.c:113
          kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
          kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
          kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
          copy_to_user include/linux/uaccess.h:184 [inline]
          put_cmsg+0x5ef/0x860 net/core/scm.c:242
          ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
          ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
          rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
          [..]
      
      This logic and its ipv4 counterpart read the destination port from
      the packet at skb_transport_offset(skb) + 4.
      
      With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
      packet that stores headers exactly up to skb_transport_offset(skb) in
      the head and the remainder in a frag.
      
      Call pskb_may_pull before accessing the pointer to ensure that it lies
      in skb head.
      
      Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
      Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2efd4fca
    • Sudarsana Reddy Kalluru's avatar
      bnx2x: Fix invalid memory access in rss hash config path. · ae2dcb28
      Sudarsana Reddy Kalluru authored
      Rx hash/filter table configuration uses rss_conf_obj to configure filters
      in the hardware. This object is initialized only when the interface is
      brought up.
      This patch adds driver changes to configure rss params only when the device
      is in opened state. In port disabled case, the config will be cached in the
      driver structure which will be applied in the successive load path.
      
      Please consider applying it to 'net' branch.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae2dcb28
    • Jack Morgenstein's avatar
      net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper · 958c696f
      Jack Morgenstein authored
      Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
      context, rather than the one passed in the input modifier.
      
      However, the qp number in the qp context is not defined as a
      required parameter by the FW. Therefore, drivers may choose to not
      specify the qp number in the qp context for the reset-to-init transition.
      
      Thus, we must save the qp number passed in the command input modifier --
      which is always present. (This saved qp number is used as the input
      modifier for command 2RST_QP when a slave's qp's are destroyed).
      
      Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      958c696f
    • Heiner Kallweit's avatar
      r8169: restore previous behavior to accept BIOS WoL settings · 18041b52
      Heiner Kallweit authored
      Commit 7edf6d31 tried to resolve an inconsistency (BIOS WoL
      settings are accepted, but device isn't wakeup-enabled) resulting
      from a previous broken-BIOS workaround by making disabled WoL the
      default.
      This however had some side effects, most likely due to a broken BIOS
      some systems don't properly resume from suspend when the MagicPacket
      WoL bit isn't set in the chip, see
      https://bugzilla.kernel.org/show_bug.cgi?id=200195
      Therefore restore the WoL behavior from 4.16.
      Reported-by: default avatarAlbert Astals Cid <aacid@kde.org>
      Fixes: 7edf6d31 ("r8169: disable WOL per default")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18041b52
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · f89ed2f8
      Linus Torvalds authored
      Pull s390 fix from Martin Schwidefsky.
      
      Guenter Roeck reports that the s390 allmodconfig build fails because of
      a gcc plugin problem.  The fix won't be in-tree until 4.19, so for now
      disable the gcc plugins on s390.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: disable gcc plugins
      f89ed2f8
    • Guenter Roeck's avatar
      media: staging: omap4iss: Include asm/cacheflush.h after generic includes · 0894da84
      Guenter Roeck authored
      Including asm/cacheflush.h first results in the following build error
      when trying to build sparc32:allmodconfig, because 'struct page' has not
      been declared, and the function declaration ends up creating a separate
      (private) declaration of struct page (as a result of function arguments
      being in the scope of the function declaration and definition, not in
      global scope).
      
      The C scoping rules do not just affect variable visibility, they also
      affect type declaration visibility.
      
      The end result is that when the actual call site is seen in
      <linux/highmem.h>, the 'struct page' type in the caller is not the same
      'struct page' that the function was declared with, resulting in:
      
        In file included from arch/sparc/include/asm/page.h:10:0,
                         ...
                         from drivers/staging/media/omap4iss/iss_video.c:15:
        include/linux/highmem.h: In function 'clear_user_highpage':
        include/linux/highmem.h:137:31: error:
      	passing argument 1 of 'sparc_flush_page_to_ram' from incompatible
      	pointer type
      
      Include generic includes files first to fix the problem.
      
      Fixes: fc96d58c ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices")
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      [ Added explanation of C scope rules - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0894da84
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 049f5604
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Make sure we don't go over the maximum jump stack boundary,
         from Taehee Yoo.
      
      2) Missing rcu_barrier() in hash and rbtree sets, also from Taehee.
      
      3) Missing check to nul-node in rbtree timeout routine, from Taehee.
      
      4) Use dev->name from flowtable to fix a memleak, from Florian.
      
      5) Oneliner to free flowtable object on removal, from Florian.
      
      6) Memleak in chain rename transaction, again from Florian.
      
      7) Don't allow two chains to use the same name in the same
         transaction, from Florian.
      
      8) handle DCCP SYNC/SYNCACK as invalid, this triggers an
         uninitialized timer in conntrack reported by syzbot, from Florian.
      
      9) Fix leak in case netlink_dump_start() fails, from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      049f5604
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2018-07-24' of... · e1adf314
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Only a few fixes:
       * always keep regulatory user hint
       * add missing break statement in station flags parsing
       * fix non-linear SKBs in port-control-over-nl80211
       * reconfigure VLAN stations during HW restart
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1adf314
    • YueHaibing's avatar
      cpufreq: qcom-kryo: add NULL entry to the end of_device_id array · bafaf056
      YueHaibing authored
      Make sure of_device_id tables are NULL terminated.
      
      Found by coccinelle spatch "misc/of_table.cocci"
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarIlia Lin <ilia.lin@kernel.org>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bafaf056
    • Amar Singhal's avatar
      cfg80211: never ignore user regulatory hint · e31f6456
      Amar Singhal authored
      Currently user regulatory hint is ignored if all wiphys
      in the system are self managed. But the hint is not ignored
      if there is no wiphy in the system. This affects the global
      regulatory setting. Global regulatory setting needs to be
      maintained so that it can be applied to a new wiphy entering
      the system. Therefore, do not ignore user regulatory setting
      even if all wiphys in the system are self managed.
      Signed-off-by: default avatarAmar Singhal <asinghal@codeaurora.org>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      e31f6456
    • Martin Schwidefsky's avatar
      s390: disable gcc plugins · 2fba3573
      Martin Schwidefsky authored
      The s390 build currently fails with the latent entropy plugin:
      
      arch/s390/kernel/als.o: In function `verify_facilities':
      als.c:(.init.text+0x24): undefined reference to `latent_entropy'
      als.c:(.init.text+0xae): undefined reference to `latent_entropy'
      make[3]: *** [arch/s390/boot/compressed/vmlinux] Error 1
      make[2]: *** [arch/s390/boot/compressed/vmlinux] Error 2
      make[1]: *** [bzImage] Error 2
      
      This will be fixed with the early boot rework from Vasily, which
      is planned for the 4.19 merge window.
      
      For 4.18 the simplest solution is to disable the gcc plugins and
      reenable them after the early boot rework is upstream.
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      2fba3573
    • Daniel Borkmann's avatar
      sock: fix sg page frag coalescing in sk_alloc_sg · 144fe2bf
      Daniel Borkmann authored
      Current sg coalescing logic in sk_alloc_sg() (latter is used by tls and
      sockmap) is not quite correct in that we do fetch the previous sg entry,
      however the subsequent check whether the refilled page frag from the
      socket is still the same as from the last entry with prior offset and
      length matching the start of the current buffer is comparing always the
      first sg list entry instead of the prior one.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      144fe2bf
  5. 23 Jul, 2018 9 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: move dumper state allocation into ->start · 90fd131a
      Florian Westphal authored
      Shaochun Chen points out we leak dumper filter state allocations
      stored in dump_control->data in case there is an error before netlink sets
      cb_running (after which ->done will be called at some point).
      
      In order to fix this, add .start functions and do the allocations
      there.
      
      ->done is going to clean up, and in case error occurs before
      ->start invocation no cleanups need to be done anymore.
      Reported-by: default avatarshaochun chen <cscnull@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      90fd131a
    • David S. Miller's avatar
      Merge branch 'tcp-robust-ooo' · 1a4f14ba
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      Juha-Matti Tilli reported that malicious peers could inject tiny
      packets in out_of_order_queue, forcing very expensive calls
      to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
      every incoming packet.
      
      With tcp_rmem[2] default of 6MB, the ooo queue could
      contain ~7000 nodes.
      
      This patch series makes sure we cut cpu cycles enough to
      render the attack not critical.
      
      We might in the future go further, like disconnecting
      or black-holing proven malicious flows.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a4f14ba
    • Eric Dumazet's avatar
      tcp: add tcp_ooo_try_coalesce() helper · 58152ecb
      Eric Dumazet authored
      In case skb in out_or_order_queue is the result of
      multiple skbs coalescing, we would like to get a proper gso_segs
      counter tracking, so that future tcp_drop() can report an accurate
      number.
      
      I chose to not implement this tracking for skbs in receive queue,
      since they are not dropped, unless socket is disconnected.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58152ecb
    • Eric Dumazet's avatar
      tcp: call tcp_drop() from tcp_data_queue_ofo() · 8541b21e
      Eric Dumazet authored
      In order to be able to give better diagnostics and detect
      malicious traffic, we need to have better sk->sk_drops tracking.
      
      Fixes: 9f5afeae ("tcp: use an RB tree for ooo receive queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8541b21e
    • Eric Dumazet's avatar
      tcp: detect malicious patterns in tcp_collapse_ofo_queue() · 3d4bf93a
      Eric Dumazet authored
      In case an attacker feeds tiny packets completely out of order,
      tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
      expensive copies, but not changing socket memory usage at all.
      
      1) Do not attempt to collapse tiny skbs.
      2) Add logic to exit early when too many tiny skbs are detected.
      
      We prefer not doing aggressive collapsing (which copies packets)
      for pathological flows, and revert to tcp_prune_ofo_queue() which
      will be less expensive.
      
      In the future, we might add the possibility of terminating flows
      that are proven to be malicious.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d4bf93a
    • Eric Dumazet's avatar
      tcp: avoid collapses in tcp_prune_queue() if possible · f4a3313d
      Eric Dumazet authored
      Right after a TCP flow is created, receiving tiny out of order
      packets allways hit the condition :
      
      if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
      	tcp_clamp_window(sk);
      
      tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
      (guarded by tcp_rmem[2])
      
      Calling tcp_collapse_ofo_queue() in this case is not useful,
      and offers a O(N^2) surface attack to malicious peers.
      
      Better not attempt anything before full queue capacity is reached,
      forcing attacker to spend lots of resource and allow us to more
      easily detect the abuse.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a3313d
    • Eric Dumazet's avatar
      tcp: free batches of packets in tcp_prune_ofo_queue() · 72cd43ba
      Eric Dumazet authored
      Juha-Matti Tilli reported that malicious peers could inject tiny
      packets in out_of_order_queue, forcing very expensive calls
      to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for
      every incoming packet. out_of_order_queue rb-tree can contain
      thousands of nodes, iterating over all of them is not nice.
      
      Before linux-4.9, we would have pruned all packets in ofo_queue
      in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs
      truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB.
      
      Since we plan to increase tcp_rmem[2] in the future to cope with
      modern BDP, can not revert to the old behavior, without great pain.
      
      Strategy taken in this patch is to purge ~12.5 % of the queue capacity.
      
      Fixes: 36a6503f ("tcp: refine tcp_prune_ofo_queue() to not drop all packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJuha-Matti Tilli <juha-matti.tilli@iki.fi>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72cd43ba
    • Paolo Abeni's avatar
      ip: hash fragments consistently · 3dd1c9a1
      Paolo Abeni authored
      The skb hash for locally generated ip[v6] fragments belonging
      to the same datagram can vary in several circumstances:
      * for connected UDP[v6] sockets, the first fragment get its hash
        via set_owner_w()/skb_set_hash_from_sk()
      * for unconnected IPv6 UDPv6 sockets, the first fragment can get
        its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
        auto_flowlabel is enabled
      
      For the following frags the hash is usually computed via
      skb_get_hash().
      The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
      scenario the egress tx queue can be selected on a per packet basis
      via the skb hash.
      It may also fool flow-oriented schedulers to place fragments belonging
      to the same datagram in different flows.
      
      Fix the issue by copying the skb hash from the head frag into
      the others at fragmentation time.
      
      Before this commit:
      perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
      netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
      perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
      perf script
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
      
      After this commit:
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      
      Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dd1c9a1
    • Wei Wang's avatar
      ipv6: use fib6_info_hold_safe() when necessary · e873e4b9
      Wei Wang authored
      In the code path where only rcu read lock is held, e.g. in the route
      lookup code path, it is not safe to directly call fib6_info_hold()
      because the fib6_info may already have been deleted but still exists
      in the rcu grace period. Holding reference to it could cause double
      free and crash the kernel.
      
      This patch adds a new function fib6_info_hold_safe() and replace
      fib6_info_hold() in all necessary places.
      
      Syzbot reported 3 crash traces because of this. One of them is:
      8021q: adding VLAN 0 to HW filter on device team0
      IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
      dst_release: dst:(____ptrval____) refcnt:-1
      dst_release: dst:(____ptrval____) refcnt:-2
      WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline]
      WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
      dst_release: dst:(____ptrval____) refcnt:-1
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       panic+0x238/0x4e7 kernel/panic.c:184
      dst_release: dst:(____ptrval____) refcnt:-2
      dst_release: dst:(____ptrval____) refcnt:-3
       __warn.cold.8+0x163/0x1ba kernel/panic.c:536
      dst_release: dst:(____ptrval____) refcnt:-4
       report_bug+0x252/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
      dst_release: dst:(____ptrval____) refcnt:-5
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
      RIP: 0010:dst_hold include/net/dst.h:239 [inline]
      RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
      Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78
      RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293
      RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6
      RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005
      RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338
      R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720
      R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980
       ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768
       udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:641 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:651
       ___sys_sendmsg+0x51d/0x930 net/socket.c:2125
       __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220
       __do_sys_sendmmsg net/socket.c:2249 [inline]
       __se_sys_sendmmsg net/socket.c:2246 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x446ba9
      Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9
      RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003
      RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843
      R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com
      Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com
      Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e873e4b9