1. 15 Jun, 2019 40 commits
    • Mika Westerberg's avatar
      net: thunderbolt: Unregister ThunderboltIP protocol handler when suspending · 18d4decc
      Mika Westerberg authored
      [ Upstream commit 9872760e ]
      
      The XDomain protocol messages may start as soon as Thunderbolt control
      channel is started. This means that if the other host starts sending
      ThunderboltIP packets early enough they will be passed to the network
      driver which then gets confused because its resume hook is not called
      yet.
      
      Fix this by unregistering the ThunderboltIP protocol handler when
      suspending and registering it back on resume.
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      18d4decc
    • Wesley Sheng's avatar
      switchtec: Fix unintended mask of MRPC event · e86a0475
      Wesley Sheng authored
      [ Upstream commit 083c1b5e ]
      
      When running application tool switchtec-user's `firmware update` and `event
      wait` commands concurrently, sometimes the firmware update speed reduced
      significantly.
      
      It is because when the MRPC event happened after MRPC event occurrence
      check but before the event mask loop reaches its header register in event
      ISR, the MRPC event would be masked unintentionally.  Since there's no
      chance to enable it again except for a module reload, all the following
      MRPC execution completion checks time out.
      
      Fix this bug by skipping the mask operation for MRPC event in event ISR,
      same as what we already do for LINK event.
      
      Fixes: 52eabba5 ("switchtec: Add IOCTLs to the Switchtec driver")
      Signed-off-by: default avatarWesley Sheng <wesley.sheng@microchip.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e86a0475
    • Will Deacon's avatar
      iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel · ddfffa59
      Will Deacon authored
      [ Upstream commit 3f54c447 ]
      
      Disabling the SMMU when probing from within a kdump kernel so that all
      incoming transactions are terminated can prevent the core of the crashed
      kernel from being transferred off the machine if all I/O devices are
      behind the SMMU.
      
      Instead, continue to probe the SMMU after it is disabled so that we can
      reinitialise it entirely and re-attach the DMA masters as they are reset.
      Since the kdump kernel may not have drivers for all of the active DMA
      masters, we suppress fault reporting to avoid spamming the console and
      swamping the IRQ threads.
      Reported-by: default avatar"Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
      Tested-by: default avatar"Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
      Tested-by: default avatarBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ddfffa59
    • Farhan Ali's avatar
      vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" · 6a3bde70
      Farhan Ali authored
      [ Upstream commit 41be3e26 ]
      
      vfio_dev_present() which is the condition to
      wait_event_interruptible_timeout(), will call vfio_group_get_device
      and try to acquire the mutex group->device_lock.
      
      wait_event_interruptible_timeout() will set the state of the current
      task to TASK_INTERRUPTIBLE, before doing the condition check. This
      means that we will try to acquire the mutex while already in a
      sleeping state. The scheduler warns us by giving the following
      warning:
      
      [ 4050.264464] ------------[ cut here ]------------
      [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
      [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
      ....
      
       4050.264756] Call Trace:
      [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
      [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
      [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
      [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
      [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
      [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
      [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
      [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
      [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
      [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
      [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
      [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
      [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
      [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
      [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
      [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
      [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
      [ 4050.264925] 4 locks held by sh/35924:
      [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
      [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
      [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
      [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
      [ 4050.264993] Last Breaking-Event-Address:
      [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
      [ 4050.265010] irq event stamp: 7039
      [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
      [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
      [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
      [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
      [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
      
      Let's fix this as described in the article
      https://lwn.net/Articles/628628/.
      Signed-off-by: default avatarFarhan Ali <alifm@linux.ibm.com>
      [remove now redundant vfio_dev_present()]
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a3bde70
    • Arnd Bergmann's avatar
      nfsd: avoid uninitialized variable warning · 00cf52ed
      Arnd Bergmann authored
      [ Upstream commit 0ab88ca4 ]
      
      clang warns that 'contextlen' may be accessed without an initialization:
      
      fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
                                                                      contextlen);
                                                                      ^~~~~~~~~~
      fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
              int contextlen;
                            ^
                             = 0
      
      Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
      set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
      Adding another #ifdef like the other two in this function
      avoids the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00cf52ed
    • J. Bruce Fields's avatar
      nfsd: allow fh_want_write to be called twice · 82ade407
      J. Bruce Fields authored
      [ Upstream commit 0b8f6262 ]
      
      A fuzzer recently triggered lockdep warnings about potential sb_writers
      deadlocks caused by fh_want_write().
      
      Looks like we aren't careful to pair each fh_want_write() with an
      fh_drop_write().
      
      It's not normally a problem since fh_put() will call fh_drop_write() for
      us.  And was OK for NFSv3 where we'd do one operation that might call
      fh_want_write(), and then put the filehandle.
      
      But an NFSv4 protocol fuzzer can do weird things like call unlink twice
      in a compound, and then we get into trouble.
      
      I'm a little worried about this approach of just leaving everything to
      fh_put().  But I think there are probably a lot of
      fh_want_write()/fh_drop_write() imbalances so for now I think we need it
      to be more forgiving.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      82ade407
    • Kirill Smelkov's avatar
      fuse: retrieve: cap requested size to negotiated max_write · a5ee124e
      Kirill Smelkov authored
      [ Upstream commit 7640682e ]
      
      FUSE filesystem server and kernel client negotiate during initialization
      phase, what should be the maximum write size the client will ever issue.
      Correspondingly the filesystem server then queues sys_read calls to read
      requests with buffer capacity large enough to carry request header + that
      max_write bytes. A filesystem server is free to set its max_write in
      anywhere in the range between [1*page, fc->max_pages*page]. In particular
      go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
      corresponds to 128K. Libfuse also allows users to configure max_write, but
      by default presets it to possible maximum.
      
      If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
      allow to retrieve more than max_write bytes, corresponding prepared
      NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
      filesystem server, in full correspondence with server/client contract, will
      be only queuing sys_read with ~max_write buffer capacity, and
      fuse_dev_do_read throws away requests that cannot fit into server request
      buffer. In turn the filesystem server could get stuck waiting indefinitely
      for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
      understood by clients as that NOTIFY_REPLY was queued and will be sent
      back.
      
      Cap requested size to negotiate max_write to avoid the problem.  This
      aligns with the way NOTIFY_RETRIEVE handler works, which already
      unconditionally caps requested retrieve size to fuse_conn->max_pages.  This
      way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
      was originally requested.
      
      Please see [1] for context where the problem of stuck filesystem was hit
      for real, how the situation was traced and for more involving patch that
      did not make it into the tree.
      
      [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
      [2] https://github.com/hanwen/go-fuseSigned-off-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a5ee124e
    • Chen-Yu Tsai's avatar
      nvmem: sunxi_sid: Support SID on A83T and H5 · 4c55b1f8
      Chen-Yu Tsai authored
      [ Upstream commit da75b890 ]
      
      The device tree binding already lists compatible strings for these two
      SoCs. They don't have the defect as seen on the H3, and the size and
      register layout is the same as the A64. Furthermore, the driver does
      not include nvmem cell definitions.
      
      Add support for these two compatible strings, re-using the config for
      the A64.
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Acked-by: default avatarMaxime Ripard <maxime.ripard@bootlin.com>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c55b1f8
    • Jorge Ramirez-Ortiz's avatar
      nvmem: core: fix read buffer in place · f6fdf2d1
      Jorge Ramirez-Ortiz authored
      [ Upstream commit 2fe518fe ]
      
      When the bit_offset in the cell is zero, the pointer to the msb will
      not be properly initialized (ie, will still be pointing to the first
      byte in the buffer).
      
      This being the case, if there are bits to clear in the msb, those will
      be left untouched while the mask will incorrectly clear bit positions
      on the first byte.
      
      This commit also makes sure that any byte unused in the cell is
      cleared.
      Signed-off-by: default avatarJorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f6fdf2d1
    • Lu Baolu's avatar
      iommu/vt-d: Don't request page request irq under dmar_global_lock · 2a5fd4fa
      Lu Baolu authored
      [ Upstream commit a7755c3c ]
      
      Requesting page reqest irq under dmar_global_lock could cause
      potential lock race condition (caught by lockdep).
      
      [    4.100055] ======================================================
      [    4.100063] WARNING: possible circular locking dependency detected
      [    4.100072] 5.1.0-rc4+ #2169 Not tainted
      [    4.100078] ------------------------------------------------------
      [    4.100086] swapper/0/1 is trying to acquire lock:
      [    4.100094] 000000007dcbe3c3 (dmar_lock){+.+.}, at: dmar_alloc_hwirq+0x35/0x140
      [    4.100112] but task is already holding lock:
      [    4.100120] 0000000060bbe946 (dmar_global_lock){++++}, at: intel_iommu_init+0x191/0x1438
      [    4.100136] which lock already depends on the new lock.
      [    4.100146] the existing dependency chain (in reverse order) is:
      [    4.100155]
                     -> #2 (dmar_global_lock){++++}:
      [    4.100169]        down_read+0x44/0xa0
      [    4.100178]        intel_irq_remapping_alloc+0xb2/0x7b0
      [    4.100186]        mp_irqdomain_alloc+0x9e/0x2e0
      [    4.100195]        __irq_domain_alloc_irqs+0x131/0x330
      [    4.100203]        alloc_isa_irq_from_domain.isra.4+0x9a/0xd0
      [    4.100212]        mp_map_pin_to_irq+0x244/0x310
      [    4.100221]        setup_IO_APIC+0x757/0x7ed
      [    4.100229]        x86_late_time_init+0x17/0x1c
      [    4.100238]        start_kernel+0x425/0x4e3
      [    4.100247]        secondary_startup_64+0xa4/0xb0
      [    4.100254]
                     -> #1 (irq_domain_mutex){+.+.}:
      [    4.100265]        __mutex_lock+0x7f/0x9d0
      [    4.100273]        __irq_domain_add+0x195/0x2b0
      [    4.100280]        irq_domain_create_hierarchy+0x3d/0x40
      [    4.100289]        msi_create_irq_domain+0x32/0x110
      [    4.100297]        dmar_alloc_hwirq+0x111/0x140
      [    4.100305]        dmar_set_interrupt.part.14+0x1a/0x70
      [    4.100314]        enable_drhd_fault_handling+0x2c/0x6c
      [    4.100323]        apic_bsp_setup+0x75/0x7a
      [    4.100330]        x86_late_time_init+0x17/0x1c
      [    4.100338]        start_kernel+0x425/0x4e3
      [    4.100346]        secondary_startup_64+0xa4/0xb0
      [    4.100352]
                     -> #0 (dmar_lock){+.+.}:
      [    4.100364]        lock_acquire+0xb4/0x1c0
      [    4.100372]        __mutex_lock+0x7f/0x9d0
      [    4.100379]        dmar_alloc_hwirq+0x35/0x140
      [    4.100389]        intel_svm_enable_prq+0x61/0x180
      [    4.100397]        intel_iommu_init+0x1128/0x1438
      [    4.100406]        pci_iommu_init+0x16/0x3f
      [    4.100414]        do_one_initcall+0x5d/0x2be
      [    4.100422]        kernel_init_freeable+0x1f0/0x27c
      [    4.100431]        kernel_init+0xa/0x110
      [    4.100438]        ret_from_fork+0x3a/0x50
      [    4.100444]
                     other info that might help us debug this:
      
      [    4.100454] Chain exists of:
                       dmar_lock --> irq_domain_mutex --> dmar_global_lock
      [    4.100469]  Possible unsafe locking scenario:
      
      [    4.100476]        CPU0                    CPU1
      [    4.100483]        ----                    ----
      [    4.100488]   lock(dmar_global_lock);
      [    4.100495]                                lock(irq_domain_mutex);
      [    4.100503]                                lock(dmar_global_lock);
      [    4.100512]   lock(dmar_lock);
      [    4.100518]
                      *** DEADLOCK ***
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Reported-by: default avatarDave Jiang <dave.jiang@intel.com>
      Fixes: a222a7f0 ("iommu/vt-d: Implement page request handling")
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a5fd4fa
    • Valentin Schneider's avatar
      arm64: defconfig: Update UFSHCD for Hi3660 soc · 3b837ba0
      Valentin Schneider authored
      [ Upstream commit 7b3320e6 ]
      
      Commit 7ee7ef24 ("scsi: arm64: defconfig: enable configs for Hisilicon ufs")
      set 'CONFIG_SCSI_UFS_HISI=y', but the configs it depends
      on
      
        (CONFIG_SCSI_HFSHCD_PLATFORM && CONFIG_SCSI_UFSHCD)
      
      were left to being built as modules.
      
      Commit 1f4fa50d ("arm64: defconfig: Regenerate for v4.20") "fixed"
      that by reverting to 'CONFIG_SCSI_UFS_HISI=m'.
      
      Thing is, if the rootfs is stored in the on-board flash (which
      is the "canonical" way of doing things), we either need these drivers
      to be built-in, or we need to fiddle with an initramfs to access that
      flash and eventually load the modules installed over there.
      
      The former is the easiest, do that.
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b837ba0
    • Nathan Fontenot's avatar
      powerpc/pseries: Track LMB nid instead of using device tree · 7bd4b91a
      Nathan Fontenot authored
      [ Upstream commit b2d3b5ee ]
      
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7bd4b91a
    • Takashi Iwai's avatar
      ALSA: hda - Register irq handler after the chip initialization · 996fffea
      Takashi Iwai authored
      [ Upstream commit f495222e ]
      
      Currently the IRQ handler in HD-audio controller driver is registered
      before the chip initialization.  That is, we have some window opened
      between the azx_acquire_irq() call and the CORB/RIRB setup.  If an
      interrupt is triggered in this small window, the IRQ handler may
      access to the uninitialized RIRB buffer, which leads to a NULL
      dereference Oops.
      
      This is usually no big problem since most of Intel chips do register
      the IRQ via MSI, and we've already fixed the order of the IRQ
      enablement and the CORB/RIRB setup in the former commit b61749a8
      ("sound: enable interrupt after dma buffer initialization"), hence the
      IRQ won't be triggered in that room.  However, some platforms use a
      shared IRQ, and this may allow the IRQ trigger by another source.
      
      Another possibility is the kdump environment: a stale interrupt might
      be present in there, the IRQ handler can be falsely triggered as well.
      
      For covering this small race, let's move the azx_acquire_irq() call
      after hda_intel_init_chip() call.  Although this is a bit radical
      change, it can cover more widely than checking the CORB/RIRB setup
      locally in the callee side.
      Reported-by: default avatarLiwei Song <liwei.song@windriver.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      996fffea
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix netdev refcnt leak · 11ae693f
      Taehee Yoo authored
      [ Upstream commit 26a302af ]
      
      flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
      nf_route() calls ip_route_output_key() that allocates a dst_entry and
      holds it. So, a dst_entry should be released by dst_release() if
      nf_route() is successful.
      
      Otherwise, netns exit routine cannot be finished and the following
      message is printed:
      
      [  257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11ae693f
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: check ttl value in flow offload data path · c155f374
      Taehee Yoo authored
      [ Upstream commit 33cc3c0c ]
      
      nf_flow_offload_ip_hook() and nf_flow_offload_ipv6_hook() do not check
      ttl value. So, ttl value overflow may occur.
      
      Fixes: 97add9f0 ("netfilter: flow table support for IPv4")
      Fixes: 09952107 ("netfilter: flow table support for IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c155f374
    • Keith Busch's avatar
      nvme-pci: shutdown on timeout during deletion · c6508f86
      Keith Busch authored
      [ Upstream commit 9dc1a38e ]
      
      We do not restart a controller in a deleting state for timeout errors.
      When in this state, unblock potential request dispatchers with failed
      completions by shutting down the controller on timeout detection.
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6508f86
    • Keith Busch's avatar
      nvme-pci: unquiesce admin queue on shutdown · ca6bcc03
      Keith Busch authored
      [ Upstream commit c8e9e9b7 ]
      
      Just like IO queues, the admin queue also will not be restarted after a
      controller shutdown. Unquiesce this queue so that we do not block
      request dispatch on a permanently disabled controller.
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca6bcc03
    • Kishon Vijay Abraham I's avatar
      PCI: designware-ep: Use aligned ATU window for raising MSI interrupts · d6f90fae
      Kishon Vijay Abraham I authored
      [ Upstream commit 6b733030 ]
      
      Certain platforms like K2G reguires the outbound ATU window to be
      aligned. The alignment size is already present in mem->page_size.
      Use the alignment size present in mem->page_size to configure an
      aligned ATU window. In order to raise an interrupt, CPU has to write
      to address offset from the start of the window unlike before where
      writes were always to the beginning of the ATU window.
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6f90fae
    • Kishon Vijay Abraham I's avatar
      misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test · 449b9fd4
      Kishon Vijay Abraham I authored
      [ Upstream commit 8f220664 ]
      
      commit 834b9051 ("misc: pci_endpoint_test: Add support for
      PCI_ENDPOINT_TEST regs to be mapped to any BAR") while adding
      test_reg_bar in order to map PCI_ENDPOINT_TEST regs to be mapped to any
      BAR failed to update test_reg_bar in pci_endpoint_test, resulting in
      test_reg_bar having invalid value when used outside probe.
      
      Fix it.
      
      Fixes: 834b9051 ("misc: pci_endpoint_test: Add support for PCI_ENDPOINT_TEST regs to be mapped to any BAR")
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      449b9fd4
    • Greg Kurz's avatar
      vfio-pci/nvlink2: Fix potential VMA leak · dcaa5e20
      Greg Kurz authored
      [ Upstream commit 2c85f2bd ]
      
      If vfio_pci_register_dev_region() fails then we should rollback
      previous changes, ie. unmap the ATSD registers.
      
      Fixes: 7f928917 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dcaa5e20
    • Lu Baolu's avatar
      iommu/vt-d: Set intel_iommu_gfx_mapped correctly · 47e80978
      Lu Baolu authored
      [ Upstream commit cf1ec453 ]
      
      The intel_iommu_gfx_mapped flag is exported by the Intel
      IOMMU driver to indicate whether an IOMMU is used for the
      graphic device. In a virtualized IOMMU environment (e.g.
      QEMU), an include-all IOMMU is used for graphic device.
      This flag is found to be clear even the IOMMU is used.
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Reported-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Fixes: c0771df8 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.")
      Suggested-by: default avatarKevin Tian <kevin.tian@intel.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47e80978
    • Ming Lei's avatar
      blk-mq: move cancel of requeue_work into blk_mq_release · 9e2c1b3d
      Ming Lei authored
      [ Upstream commit fbc2a15e ]
      
      With holding queue's kobject refcount, it is safe for driver
      to schedule requeue. However, blk_mq_kick_requeue_list() may
      be called after blk_sync_queue() is done because of concurrent
      requeue activities, then requeue work may not be completed when
      freeing queue, and kernel oops is triggered.
      
      So moving the cancel of requeue_work into blk_mq_release() for
      avoiding race between requeue and freeing queue.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e2c1b3d
    • Vladimir Zapolskiy's avatar
      watchdog: fix compile time error of pretimeout governors · 1b6349c8
      Vladimir Zapolskiy authored
      [ Upstream commit a223770b ]
      
      CONFIG_WATCHDOG_PRETIMEOUT_GOV build symbol adds watchdog_pretimeout.o
      object to watchdog.o, the latter is compiled only if CONFIG_WATCHDOG_CORE
      is selected, so it rightfully makes sense to add it as a dependency.
      
      The change fixes the next compilation errors, if CONFIG_WATCHDOG_CORE=n
      and CONFIG_WATCHDOG_PRETIMEOUT_GOV=y are selected:
      
        drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_register':
        drivers/watchdog/pretimeout_noop.c:35: undefined reference to `watchdog_register_governor'
        drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_unregister':
        drivers/watchdog/pretimeout_noop.c:40: undefined reference to `watchdog_unregister_governor'
      
        drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_register':
        drivers/watchdog/pretimeout_panic.c:35: undefined reference to `watchdog_register_governor'
        drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_unregister':
        drivers/watchdog/pretimeout_panic.c:40: undefined reference to `watchdog_unregister_governor'
      Reported-by: default avatarKuo, Hsuan-Chi <hckuo2@illinois.edu>
      Fixes: ff84136c ("watchdog: add watchdog pretimeout governor framework")
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b6349c8
    • Georg Hofmann's avatar
      watchdog: imx2_wdt: Fix set_timeout for big timeout values · 81a0eaaa
      Georg Hofmann authored
      [ Upstream commit b07e228e ]
      
      The documentated behavior is: if max_hw_heartbeat_ms is implemented, the
      minimum of the set_timeout argument and max_hw_heartbeat_ms should be used.
      This patch implements this behavior.
      Previously only the first 7bits were used and the input argument was
      returned.
      Signed-off-by: default avatarGeorg Hofmann <georg@hofmannsweb.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81a0eaaa
    • Florian Westphal's avatar
      netfilter: nf_tables: fix base chain stat rcu_dereference usage · 1f468942
      Florian Westphal authored
      [ Upstream commit edbd82c5 ]
      
      Following splat gets triggered when nfnetlink monitor is running while
      xtables-nft selftests are running:
      
      net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
      other info that might help us debug this:
      
      1 lock held by xtables-nft-mul/27006:
       #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
      Call Trace:
       nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
       nf_tables_chain_notify+0xf8/0x1a0
       nf_tables_commit+0x165c/0x1740
      
      nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
      or from the transaction path if a userspace process subscribed to nftables
      notifications.
      
      In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
      hold transaction mutex so the pointer can be NULLed right after the check.
      Just unconditionally fetch the value, then have the helper return
      immediately if its NULL.
      
      In the notification case we don't hold the rcu read lock, but updates are
      prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
      aware of this.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1f468942
    • Serge Semin's avatar
      mips: Make sure dt memory regions are valid · b0e7ae14
      Serge Semin authored
      [ Upstream commit 93fa5b28 ]
      
      There are situations when memory regions coming from dts may be
      too big for the platform physical address space. This especially
      concerns XPA-capable systems. Bootloader may determine more than 4GB
      memory available and pass it to the kernel over dts memory node, while
      kernel is built without XPA/64BIT support. In this case the region
      may either simply be truncated by add_memory_region() method
      or by u64->phys_addr_t type casting. But in worst case the method
      can even drop the memory region if it exceeds PHYS_ADDR_MAX size.
      So lets make sure the retrieved from dts memory regions are valid,
      and if some of them aren't, just manually truncate them with a warning
      printed out.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Serge Semin <Sergey.Semin@t-platforms.ru>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0e7ae14
    • Jakub Jankowski's avatar
      netfilter: nf_conntrack_h323: restore boundary check correctness · 58c8298c
      Jakub Jankowski authored
      [ Upstream commit f5e85ce8 ]
      
      Since commit bc7d811a ("netfilter: nf_ct_h323: Convert
      CHECK_BOUND macro to function"), NAT traversal for H.323
      doesn't work, failing to parse H323-UserInformation.
      nf_h323_error_boundary() compares contents of the bitstring,
      not the addresses, preventing valid H.323 packets from being
      conntrack'd.
      
      This looks like an oversight from when CHECK_BOUND macro was
      converted to a function.
      
      To fix it, stop dereferencing bs->cur and bs->end.
      
      Fixes: bc7d811a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
      Signed-off-by: default avatarJakub Jankowski <shasta@toxcorp.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58c8298c
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast · d34b034d
      Taehee Yoo authored
      [ Upstream commit 43c8f131 ]
      
      rhashtable_insert_fast() may return an error value when memory
      allocation fails, but flow_offload_add() does not check for errors.
      This patch just adds missing error checking.
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d34b034d
    • Ludovic Barre's avatar
      mmc: mmci: Prevent polling for busy detection in IRQ context · db8ca7fd
      Ludovic Barre authored
      [ Upstream commit 8520ce1e ]
      
      The IRQ handler, mmci_irq(), loops until all status bits have been cleared.
      However, the status bit signaling busy in variant->busy_detect_flag, may be
      set even if busy detection isn't monitored for the current request.
      
      This may be the case for the CMD11 when switching the I/O voltage, which
      leads to that mmci_irq() busy loops in IRQ context. Fix this problem, by
      clearing the status bit for busy, before continuing to validate the
      condition for the loop. This is safe, because the busy status detection has
      already been taken care of by mmci_cmd_irq().
      Signed-off-by: default avatarLudovic Barre <ludovic.barre@st.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db8ca7fd
    • Amir Goldstein's avatar
      ovl: do not generate duplicate fsnotify events for "fake" path · f2ab446f
      Amir Goldstein authored
      [ Upstream commit d9899030 ]
      
      Overlayfs "fake" path is used for stacked file operations on underlying
      files.  Operations on files with "fake" path must not generate fsnotify
      events with path data, because those events have already been generated at
      overlayfs layer and because the reported event->fd for fanotify marks on
      underlying inode/filesystem will have the wrong path (the overlayfs path).
      
      Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/Reported-by: default avatarMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f2ab446f
    • Andreas Schwab's avatar
      fbcon: Don't reset logo_shown when logo is currently shown · 6aaaa535
      Andreas Schwab authored
      [ Upstream commit 3c5a1b11 ]
      
      When the logo is currently drawn on a virtual console, and the console
      loglevel is reduced to quiet, logo_shown must be left alone, so that it
      the scrolling region on that virtual console is properly reset.
      
      Fixes: 10993504 ("fbcon: Silence fbcon logo on 'quiet' boots")
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Yisheng Xie <ysxie@foxmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Marko Myllynen <myllynen@redhat.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6aaaa535
    • Jisheng Zhang's avatar
      PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi() · 62729638
      Jisheng Zhang authored
      [ Upstream commit dc69a3d5 ]
      
      To avoid a memory leak, free the page allocated for MSI IRQ in
      dw_pcie_free_msi().
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62729638
    • Jisheng Zhang's avatar
      PCI: dwc: Free MSI in dw_pcie_host_init() error path · 2507c7c9
      Jisheng Zhang authored
      [ Upstream commit 9e2b5de5 ]
      
      If we ever did MSI-related initializations, we need to call
      dw_pcie_free_msi() in the error code path.
      
      Remove the IS_ENABLED(CONFIG_PCI_MSI) check for MSI init because
      pci_msi_enabled() already has a stub for !CONFIG_PCI_MSI.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2507c7c9
    • Maciej Żenczykowski's avatar
      uml: fix a boot splat wrt use of cpu_all_mask · 84e5ca83
      Maciej Żenczykowski authored
      [ Upstream commit 689a5860 ]
      
      Memory: 509108K/542612K available (3835K kernel code, 919K rwdata, 1028K rodata, 129K init, 211K bss, 33504K reserved, 0K cma-reserved)
      NR_IRQS: 15
      clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 0 at kernel/time/clockevents.c:458 clockevents_register_device+0x72/0x140
      posix-timer cpumask == cpu_all_mask, using cpu_possible_mask instead
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-00048-ged79cc87 #4
      Stack:
       604ebda0 603c5370 604ebe20 6046fd17
       00000000 6006fcbb 604ebdb0 603c53b5
       604ebe10 6003bfc4 604ebdd0 9000001ca
      Call Trace:
       [<6006fcbb>] ? printk+0x0/0x94
       [<60083160>] ? clockevents_register_device+0x72/0x140
       [<6001f16e>] show_stack+0x13b/0x155
       [<603c5370>] ? dump_stack_print_info+0xe2/0xeb
       [<6006fcbb>] ? printk+0x0/0x94
       [<603c53b5>] dump_stack+0x2a/0x2c
       [<6003bfc4>] __warn+0x10e/0x13e
       [<60070320>] ? vprintk_func+0xc8/0xcf
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<6003c08b>] warn_slowpath_fmt+0x97/0x99
       [<600311a1>] ? set_signals+0x0/0x3f
       [<6003bff4>] ? warn_slowpath_fmt+0x0/0x99
       [<600842cb>] ? tick_oneshot_mode_active+0x44/0x4f
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<6007d2d5>] ? __clocksource_select+0x20/0x1b1
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<60083160>] clockevents_register_device+0x72/0x140
       [<60031192>] ? get_signals+0x0/0xf
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<60002eec>] um_timer_setup+0xc8/0xca
       [<60001b59>] start_kernel+0x47f/0x57e
       [<600035bc>] start_kernel_proc+0x49/0x4d
       [<6006c483>] ? kmsg_dump_register+0x82/0x8a
       [<6001de62>] new_thread_handler+0x81/0xb2
       [<60003571>] ? kmsg_dumper_stdout_init+0x1a/0x1c
       [<60020c75>] uml_finishsetup+0x54/0x59
      
      random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
      ---[ end trace 00173d0117a88acb ]---
      Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: linux-um@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84e5ca83
    • YueHaibing's avatar
      configfs: fix possible use-after-free in configfs_register_group · 93e0a666
      YueHaibing authored
      [ Upstream commit 35399f87 ]
      
      In configfs_register_group(), if create_default_group() failed, we
      forget to unlink the group. It will left a invalid item in the parent list,
      which may trigger the use-after-free issue seen below:
      
      BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
      Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996
      
      CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
       __list_add include/linux/list.h:60 [inline]
       list_add_tail include/linux/list.h:93 [inline]
       link_obj+0xb0/0x190 fs/configfs/dir.c:759
       link_group+0x1c/0x130 fs/configfs/dir.c:784
       configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751
       configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834
       ? 0xffffffffc1be0000
       iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger]
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      Allocated by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459
       slab_free_hook mm/slub.c:1429 [inline]
       slab_free_freelist_hook mm/slub.c:1456 [inline]
       slab_free mm/slub.c:3003 [inline]
       kfree+0xe1/0x270 mm/slub.c:3955
       configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881ef61ae00
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 32 bytes inside of
       192-byte region [ffff8881ef61ae00, ffff8881ef61aec0)
      The buggy address belongs to the page:
      page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000
      raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
       ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 5cf6a51e ("configfs: allow dynamic group creation")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93e0a666
    • John Sperbeck's avatar
      percpu: remove spurious lock dependency between percpu and sched · da0abba5
      John Sperbeck authored
      [ Upstream commit 198790d9 ]
      
      In free_percpu() we sometimes call pcpu_schedule_balance_work() to
      queue a work item (which does a wakeup) while holding pcpu_lock.
      This creates an unnecessary lock dependency between pcpu_lock and
      the scheduler's pi_lock.  There are other places where we call
      pcpu_schedule_balance_work() without hold pcpu_lock, and this case
      doesn't need to be different.
      
      Moving the call outside the lock prevents the following lockdep splat
      when running tools/testing/selftests/bpf/{test_maps,test_progs} in
      sequence with lockdep enabled:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.1.0-dbg-DEV #1 Not tainted
      ------------------------------------------------------
      kworker/23:255/18872 is trying to acquire lock:
      000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
      
      but task is already holding lock:
      00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #4 (pcpu_lock){..-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             pcpu_alloc+0xfa/0x780
             __alloc_percpu_gfp+0x12/0x20
             alloc_htab_elem+0x184/0x2b0
             __htab_percpu_map_update_elem+0x252/0x290
             bpf_percpu_hash_update+0x7c/0x130
             __do_sys_bpf+0x1912/0x1be0
             __x64_sys_bpf+0x1a/0x20
             do_syscall_64+0x59/0x400
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      -> #3 (&htab->buckets[i].lock){....}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             htab_map_update_elem+0x1af/0x3a0
      
      -> #2 (&rq->lock){-.-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock+0x2f/0x40
             task_fork_fair+0x37/0x160
             sched_fork+0x211/0x310
             copy_process.part.43+0x7b1/0x2160
             _do_fork+0xda/0x6b0
             kernel_thread+0x29/0x30
             rest_init+0x22/0x260
             arch_call_rest_init+0xe/0x10
             start_kernel+0x4fd/0x520
             x86_64_start_reservations+0x24/0x26
             x86_64_start_kernel+0x6f/0x72
             secondary_startup_64+0xa4/0xb0
      
      -> #1 (&p->pi_lock){-.-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             try_to_wake_up+0x41/0x600
             wake_up_process+0x15/0x20
             create_worker+0x16b/0x1e0
             workqueue_init+0x279/0x2ee
             kernel_init_freeable+0xf7/0x288
             kernel_init+0xf/0x180
             ret_from_fork+0x24/0x30
      
      -> #0 (&(&pool->lock)->rlock){-.-.}:
             __lock_acquire+0x101f/0x12a0
             lock_acquire+0x9e/0x180
             _raw_spin_lock+0x2f/0x40
             __queue_work+0xb2/0x520
             queue_work_on+0x38/0x80
             free_percpu+0x221/0x260
             pcpu_freelist_destroy+0x11/0x20
             stack_map_free+0x2a/0x40
             bpf_map_free_deferred+0x3c/0x50
             process_one_work+0x1f7/0x580
             worker_thread+0x54/0x410
             kthread+0x10f/0x150
             ret_from_fork+0x24/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_lock);
                                     lock(&htab->buckets[i].lock);
                                     lock(pcpu_lock);
        lock(&(&pool->lock)->rlock);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/23:255/18872:
       #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
           at: process_one_work+0x17a/0x580
       #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
           at: process_one_work+0x17a/0x580
       #2: 00000000e3e7a6aa (pcpu_lock){..-.},
           at: free_percpu+0x36/0x260
      
      stack backtrace:
      CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
      Hardware name: ...
      Workqueue: events bpf_map_free_deferred
      Call Trace:
       dump_stack+0x67/0x95
       print_circular_bug.isra.38+0x1c6/0x220
       check_prev_add.constprop.50+0x9f6/0xd20
       __lock_acquire+0x101f/0x12a0
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       __queue_work+0xb2/0x520
       queue_work_on+0x38/0x80
       free_percpu+0x221/0x260
       pcpu_freelist_destroy+0x11/0x20
       stack_map_free+0x2a/0x40
       bpf_map_free_deferred+0x3c/0x50
       process_one_work+0x1f7/0x580
       worker_thread+0x54/0x410
       kthread+0x10f/0x150
       ret_from_fork+0x24/0x30
      Signed-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da0abba5
    • Eugen Hristev's avatar
      media: atmel: atmel-isc: fix asd memory allocation · 6f9ca872
      Eugen Hristev authored
      [ Upstream commit 1e4e25c4 ]
      
      The subsystem will free the asd memory on notifier cleanup, if the asd is
      added to the notifier.
      However the memory is freed using kfree.
      Thus, we cannot allocate the asd using devm_*
      This can lead to crashes and problems.
      To test this issue, just return an error at probe, but cleanup the
      notifier beforehand.
      
      Fixes: 10626744 ("[media] atmel-isc: add the Image Sensor Controller code")
      Signed-off-by: default avatarEugen Hristev <eugen.hristev@microchip.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f9ca872
    • Chao Yu's avatar
      f2fs: fix to do checksum even if inode page is uptodate · b0395364
      Chao Yu authored
      [ Upstream commit b42b179b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0395364
    • Chao Yu's avatar
      f2fs: fix to retrieve inline xattr space · 0fd306ae
      Chao Yu authored
      [ Upstream commit 45a74688 ]
      
      With below mkfs and mount option, generic/339 of fstest will report that
      scratch image becomes corrupted.
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
      MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
      
      [ASSERT] (f2fs_check_dirent_position:1315)  --> Wrong position of dirent pino:1970, name: (...)
      level:8, dir_level:0, pgofs:951, correct range:[900, 901]
      
      In old kernel, inline data and directory always reserved 200 bytes in
      inode layout, even if inline_xattr is disabled, then new kernel tries
      to retrieve that space for non-inline xattr inode, but for inline dentry,
      its layout size should be fixed, so we just keep that reserved space.
      
      But the problem here is that, after inline dentry conversion, inline
      dentry layout no longer exists, if we still reserve inline xattr space,
      after dents updates, there will be a hole in inline xattr space, which
      can break hierarchy hash directory structure.
      
      This patch fixes this issue by retrieving inline xattr space after
      inline dentry conversion.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0fd306ae
    • Chao Yu's avatar
      f2fs: fix to avoid deadloop in foreground GC · e2877ff9
      Chao Yu authored
      [ Upstream commit 793ab1c8 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203211
      
      - Overview
      When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cd test
      touch t
      
      - Messages
      [   58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      ... (repeating)
      
      During GC, if segment type stored in SSA and SIT is inconsistent, we just
      skip migrating current segment directly, since we need to know the exact
      type to decide the migration function we use.
      
      So in foreground GC, we will easily run into a infinite loop as we may
      select the same victim segment which has inconsistent type due to greedy
      policy. In order to end up this, we choose to shutdown filesystem. For
      backgrond GC, we need to do that as well, so that we can avoid latter
      potential infinite looped foreground GC.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e2877ff9