Commits · 2b40c5db73e239531ea54991087f4edc07fbb08e · Kirill Smelkov / linux

13 May, 2020 2 commits

selftests/pidfd: add pidfd setns tests · 2b40c5db

Christian Brauner authored May 05, 2020

This is basically a test-suite for setns() and as of now contains:
- test that we can't pass garbage flags
- test that we can't attach to the namespaces of  task that has already exited
- test that we can incrementally setns into all namespaces of a target task
  using a pidfd
- test that we can setns atomically into all namespaces of a target task
- test that we can't cross setns into a user namespace outside of our user
  namespace hierarchy
- test that we can't setns into namespaces owned by user namespaces over which
  we are not privileged
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200505140432.181565-4-christian.brauner@ubuntu.com

2b40c5db

nsproxy: attach to namespaces via pidfds · 303cc571

Christian Brauner authored May 05, 2020

For quite a while we have been thinking about using pidfds to attach to
namespaces. This patchset has existed for about a year already but we've
wanted to wait to see how the general api would be received and adopted.
Now that more and more programs in userspace have started using pidfds
for process management it's time to send this one out.

This patch makes it possible to use pidfds to attach to the namespaces
of another process, i.e. they can be passed as the first argument to the
setns() syscall. When only a single namespace type is specified the
semantics are equivalent to passing an nsfd. That means
setns(nsfd, CLONE_NEWNET) equals setns(pidfd, CLONE_NEWNET). However,
when a pidfd is passed, multiple namespace flags can be specified in the
second setns() argument and setns() will attach the caller to all the
specified namespaces all at once or to none of them. Specifying 0 is not
valid together with a pidfd.

Here are just two obvious examples:
setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
setns(pidfd, CLONE_NEWUSER);
Allowing to also attach subsets of namespaces supports various use-cases
where callers setns to a subset of namespaces to retain privilege, perform
an action and then re-attach another subset of namespaces.

If the need arises, as Eric suggested, we can extend this patchset to
assume even more context than just attaching all namespaces. His suggestion
specifically was about assuming the process' root directory when
setns(pidfd, 0) or setns(pidfd, SETNS_PIDFD) is specified. For now, just
keep it flexible in terms of supporting subsets of namespaces but let's
wait until we have users asking for even more context to be assumed. At
that point we can add an extension.

The obvious example where this is useful is a standard container
manager interacting with a running container: pushing and pulling files
or directories, injecting mounts, attaching/execing any kind of process,
managing network devices all these operations require attaching to all
or at least multiple namespaces at the same time. Given that nowadays
most containers are spawned with all namespaces enabled we're currently
looking at at least 14 syscalls, 7 to open the /proc/<pid>/ns/<ns>
nsfds, another 7 to actually perform the namespace switch. With time
namespaces we're looking at about 16 syscalls.
(We could amortize the first 7 or 8 syscalls for opening the nsfds by
stashing them in each container's monitor process but that would mean
we need to send around those file descriptors through unix sockets
everytime we want to interact with the container or keep on-disk
state. Even in scenarios where a caller wants to join a particular
namespace in a particular order callers still profit from batching
other namespaces. That mostly applies to the user namespace but
all container runtimes I found join the user namespace first no matter
if it privileges or deprivileges the container similar to how unshare
behaves.)
With pidfds this becomes a single syscall no matter how many namespaces
are supposed to be attached to.

A decently designed, large-scale container manager usually isn't the
parent of any of the containers it spawns so the containers don't die
when it crashes or needs to update or reinitialize. This means that
for the manager to interact with containers through pids is inherently
racy especially on systems where the maximum pid number is not
significicantly bumped. This is even more problematic since we often spawn
and manage thousands or ten-thousands of containers. Interacting with a
container through a pid thus can become risky quite quickly. Especially
since we allow for an administrator to enable advanced features such as
syscall interception where we're performing syscalls in lieu of the
container. In all of those cases we use pidfds if they are available and
we pass them around as stable references. Using them to setns() to the
target process' namespaces is as reliable as using nsfds. Either the
target process is already dead and we get ESRCH or we manage to attach
to its namespaces but we can't accidently attach to another process'
namespaces. So pidfds lend themselves to be used with this api.
The other main advantage is that with this change the pidfd becomes the
only relevant token for most container interactions and it's the only
token we need to create and send around.

Apart from significiantly reducing the number of syscalls from double
digit to single digit which is a decent reason post-spectre/meltdown
this also allows to switch to a set of namespaces atomically, i.e.
either attaching to all the specified namespaces succeeds or we fail. If
we fail we haven't changed a single namespace. There are currently three
namespaces that can fail (other than for ENOMEM which really is not
very interesting since we then have other problems anyway) for
non-trivial reasons, user, mount, and pid namespaces. We can fail to
attach to a pid namespace if it is not our current active pid namespace
or a descendant of it. We can fail to attach to a user namespace because
we are multi-threaded or because our current mount namespace shares
filesystem state with other tasks, or because we're trying to setns()
to the same user namespace, i.e. the target task has the same user
namespace as we do. We can fail to attach to a mount namespace because
it shares filesystem state with other tasks or because we fail to lookup
the new root for the new mount namespace. In most non-pathological
scenarios these issues can be somewhat mitigated. But there are cases where
we're half-attached to some namespace and failing to attach to another one.
I've talked about some of these problem during the hallway track (something
only the pre-COVID-19 generation will remember) of Plumbers in Los Angeles
in 2018(?). Even if all these issues could be avoided with super careful
userspace coding it would be nicer to have this done in-kernel. Pidfds seem
to lend themselves nicely for this.

The other neat thing about this is that setns() becomes an actual
counterpart to the namespace bits of unshare().
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20200505140432.181565-3-christian.brauner@ubuntu.com

303cc571

09 May, 2020 1 commit

nsproxy: add struct nsset · f2a8d52e

Christian Brauner authored May 05, 2020

Add a simple struct nsset. It holds all necessary pieces to switch to a new
set of namespaces without leaving a task in a half-switched state which we
will make use of in the next patch. This patch switches the existing setns
logic over without causing a change in setns() behavior. This brings
setns() closer to how unshare() works(). The prepare_ns() function is
responsible to prepare all necessary information. This has two reasons.
First it minimizes dependencies between individual namespaces, i.e. all
install handler can expect that all fields are properly initialized
independent in what order they are called in. Second, this makes the code
easier to maintain and easier to follow if it needs to be changed.

The prepare_ns() helper will only be switched over to use a flags argument
in the next patch. Here it will still use nstype as a simple integer
argument which was argued would be clearer. I'm not particularly
opinionated about this if it really helps or not. The struct nsset itself
already contains the flags field since its name already indicates that it
can contain information required by different namespaces. None of this
should have functional consequences.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com

f2a8d52e

03 May, 2020 4 commits

Linux 5.7-rc4 · 0e698dfa
Linus Torvalds authored May 03, 2020

0e698dfa

Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 262f7a6b

Linus Torvalds authored May 03, 2020

Pull more btrfs fixes from David Sterba:
 "A few more stability fixes, minor build warning fixes and git url
  fixup:

   - fix partial loss of prealloc extent past i_size after fsync

   - fix potential deadlock due to wrong transaction handle passing via
     journal_info

   - fix gcc 4.8 struct intialization warning

   - update git URL in MAINTAINERS entry"

* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: btrfs: fix git repo URL
  btrfs: fix gcc-4.8 build warning for struct initializer
  btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
  btrfs: fix partial loss of prealloc extent past i_size after fsync

262f7a6b

Merge tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · ea915933

Linus Torvalds authored May 03, 2020

Pull IOMMU fixes from Joerg Roedel:

 - Fix a memory leak when dev_iommu gets freed and a sub-pointer does
   not

 - Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU
   driver

 - Export iommu_group_get_for_dev() only for GPLed modules

 - Fix AMD IOMMU interrupt remapping when x2apic is enabled

 - Fix error path in the QCOM IOMMU driver probe function

* tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/qcom: Fix local_base status check
  iommu: Properly export iommu_group_get_for_dev()
  iommu/vt-d: Use right Kconfig option name
  iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
  iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config
  iommu/mediatek: Fix MTK_IOMMU dependencies
  iommu: Fix the memory leak in dev_iommu_free()

ea915933

MAINTAINERS: btrfs: fix git repo URL · eb91db63

Eric Biggers authored May 01, 2020

The git repo listed for btrfs hasn't been updated in over a year.
List the current one instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

eb91db63

02 May, 2020 8 commits

Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 743f0573

Linus Torvalds authored May 02, 2020

Pull power management fixes from Rafael Wysocki:

 - prevent the intel_pstate driver from printing excessive diagnostic
   messages in some cases (Chris Wilson)

 - make the hibernation restore kernel freeze kernel threads as well as
   user space tasks (Dexuan Cui)

 - fix the ACPI device PM disagnostic messages to include the correct
   power state name (Kai-Heng Feng).

* tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: ACPI: Output correct message on target power state
  PM: hibernate: Freeze kernel threads in software_resume()
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

743f0573

Merge branches 'pm-cpufreq' and 'pm-sleep' · a5383996

Rafael J. Wysocki authored May 02, 2020

* pm-cpufreq:
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

* pm-sleep:
  PM: hibernate: Freeze kernel threads in software_resume()

a5383996

Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f66ed1eb

Linus Torvalds authored May 02, 2020

Pull iomap fix from Darrick Wong:
 "Hoist the check for an unrepresentable FIBMAP return value into
  ioctl_fibmap.

  The internal kernel function can handle 64-bit values (and is needed
  to fix a regression on ext4 + jbd2). It is only the userspace ioctl
  that is so old that it cannot deal"

* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fibmap: Warn and return an error in case of block > INT_MAX

f66ed1eb

Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 29a47f45

Linus Torvalds authored May 02, 2020

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix handling of backchannel binding in BIND_CONN_TO_SESSION

  Bugfixes:
   - Fix a credential use-after-free issue in pnfs_roc()
   - Fix potential posix_acl refcnt leak in nfs3_set_acl
   - defer slow parts of rpc_free_client() to a workqueue
   - Fix an Oopsable race in __nfs_list_for_each_server()
   - Fix trace point use-after-free race
   - Regression: the RDMA client no longer responds to server disconnect
     requests
   - Fix return values of xdr_stream_encode_item_{present, absent}
   - _pnfs_return_layout() must always wait for layoutreturn completion

  Cleanups:
   - Remove unreachable error conditions"

* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race in __nfs_list_for_each_server()
  NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
  SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
  NFSv4: Remove unreachable error condition due to rpc_run_task()
  SUNRPC: Remove unreachable error condition
  xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
  xprtrdma: Fix trace point use-after-free race
  xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
  nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
  NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
  NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion

29a47f45

Merge tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma · ed6889db

Linus Torvalds authored May 02, 2020

Pull dmaengine fixes from Vinod Koul:
 "Core:
   - Documentation typo fixes
   - fix the channel indexes
   - dmatest: fixes for process hang and iterations

  Drivers:
   - hisilicon: build error fix without PCI_MSI
   - ti-k3: deadlock fix
   - uniphier-xdmac: fix for reg region
   - pch: fix data race
   - tegra: fix clock state"

* tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dmatest: Fix process hang when reading 'wait' parameter
  dmaengine: dmatest: Fix iteration non-stop logic
  dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization
  dmaengine: fix channel index enumeration
  dmaengine: mmp_tdma: Reset channel error on release
  dmaengine: mmp_tdma: Do not ignore slave config validation errors
  dmaengine: pch_dma.c: Avoid data race between probe and irq handler
  dt-bindings: dma: uniphier-xdmac: switch to single reg region
  include/linux/dmaengine: Typos fixes in API documentation
  dmaengine: xilinx_dma: Add missing check for empty list
  dmaengine: ti: k3-psil: fix deadlock on error path
  dmaengine: hisilicon: Fix build error without PCI_MSI

ed6889db

Merge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio · 690e2aba

Linus Torvalds authored May 01, 2020

Pull VFIO fixes from Alex Williamson:

 - copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao)

 - Fix a potential math overflow (Yan Zhao)

 - Use follow_pfn() for calculating PFNMAPs (Sean Christopherson)

* tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
  vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
  vfio: checking of validity of user vaddr in vfio_dma_rw

690e2aba

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 42eb62d4

Linus Torvalds authored May 01, 2020

Pull arm64 fix from Catalin Marinas:
 "Add -fasynchronous-unwind-tables to the vDSO CFLAGS"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: vdso: Add -fasynchronous-unwind-tables to cflags

42eb62d4

Merge tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block · cf018530

Linus Torvalds authored May 01, 2020

Pull io_uring fixes from Jens Axboe:

 - Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail

 - Cover a few cases where async poll can handle retry, eliminating the
   need for an async thread

 - fallback request busy/free fix (Bijan)

 - syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang)

 - Fix extra put of req for sync_file_range (Pavel)

 - Always punt splice async. We'll improve this for 5.8, but wanted to
   eliminate the inode mutex lock from the non-blocking path for 5.7
   (Pavel)

* tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
  io_uring: punt splice async because of inode mutex
  io_uring: check non-sync defer_list carefully
  io_uring: fix extra put in sync_file_range()
  io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
  io_uring: use proper references for fallback_req locking
  io_uring: only force async punt if poll based retry can't handle it
  io_uring: enable poll retry for any file with ->read_iter / ->write_iter
  io_uring: statx must grab the file table for valid fd

cf018530

01 May, 2020 19 commits

Merge tag 'block-5.7-2020-05-01' of git://git.kernel.dk/linux-block · 052c467c

Linus Torvalds authored May 01, 2020

Pull block fixes from Jens Axboe:
 "A few fixes for this release:

   - NVMe pull request from Christoph, with a single fix for a double
     free in the namespace error handling.

   - Kill the bd_openers check in blk_drop_partitions(), fixing a
     regression in this merge window (Christoph)"

* tag 'block-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
  block: remove the bd_openers checks in blk_drop_partitions
  nvme: prevent double free in nvme_alloc_ns() error handling

052c467c

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ab386c46

Linus Torvalds authored May 01, 2020

Pull i2c fixes from Wolfram Sang:
 "Three driver bugfixes, and two reverts because the original patches
  revealed underlying problems which the Tegra guys are now working on"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: aspeed: Avoid i2c interrupt status clear race condition.
  i2c: amd-mp2-pci: Fix Oops in amd_mp2_pci_init() error handling
  Revert "i2c: tegra: Better handle case where CPU0 is busy for a long time"
  Revert "i2c: tegra: Synchronize DMA before termination"
  i2c: iproc: generate stop event for slave writes

ab386c46

Merge tag 'sound-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c5364190

Linus Torvalds authored May 01, 2020

Pull sound fixes from Takashi Iwai:
 "Just a collection of small fixes around this time:

   - One more try for fixing PCM OSS regression

   - HD-audio: a new quirk for Lenovo, the improved driver blacklisting,
     a lock fix in the minor error path, and a fix for the possible race
     at monitor notifiaction

   - USB-audio: a quirk ID fix, a fix for POD HD500 workaround"

* tag 'sound-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Correct a typo of NuPrime DAC-10 USB ID
  ALSA: opti9xx: shut up gcc-10 range warning
  ALSA: hda/hdmi: fix without unlocked before return
  ALSA: hda/hdmi: fix race in monitor detection during probe
  ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter
  ALSA: line6: Fix POD HD500 audio playback
  ALSA: pcm: oss: Place the plugin buffer overflow checks correctly (for 5.7)
  ALSA: pcm: oss: Place the plugin buffer overflow checks correctly
  ALSA: hda: Match both PCI ID and SSID for driver blacklist

c5364190

Merge tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm · 477bfeb9

Linus Torvalds authored May 01, 2020

Pull drm fixes from Dave Airlie:
 "Regular scheduled fixes for graphics. Nothing to extreme bunch of
  amdgpu fixes, i915 and qxl fixes, along with some misc ones.

  All seems to be progressing normally.

  core:
   - EDID off by one DTD fix
   - DP mst write return code fix

  dma-buf:
   - fix SET_NAME ioctl uapi
   - doc fixes

  amdgpu:
   - Fix a green screen on resume issue
   - PM fixes for SR-IOV SDMA fix for navi
   - Renoir display fixes
   - Cursor and pageflip stuttering fixes
   - Misc additional display fixes
   - (uapi) Add additional DCC tiling flags for navi1x

  i915:
   - Fix selftest refcnt leak (Xiyu)
   - Fix gem vma lock (Chris)
   - Fix gt's i915_request.timeline acquire by checking if cacheline is
     valid (Chris)
   - Fix IRQ postinistall fault masks (Matt)

  qxl:
   - use after gree fix
   - fix lost kunmap
   - release leak fix

  virtio:
   - context destruction fix"

* tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm: (26 commits)
  dma-buf: fix documentation build warnings
  drm/qxl: qxl_release use after free
  drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
  drm/i915: Use proper fault mask in interrupt postinstall too
  drm/amd/display: Use cursor locking to prevent flip delays
  drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1
  drm/amd/display: Defer cursor update around VUPDATE for all ASIC
  drm/amd/display: fix rn soc bb update
  drm/amd/display: check if REFCLK_CNTL register is present
  drm/amdgpu: bump version for invalidate L2 before SDMA IBs
  drm/amdgpu: invalidate L2 before SDMA IBs (v2)
  drm/amdgpu: add tiling flags from Mesa
  drm/amd/powerplay: avoid using pm_en before it is initialized revised
  Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
  drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
  drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
  drm/virtio: only destroy created contexts
  drm/dp_mst: Fix drm_dp_send_dpcd_write() return code
  drm/i915/gt: Check cacheline is valid before acquiring
  drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()
  ...

477bfeb9

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · cebcff3a

Linus Torvalds authored May 01, 2020

Pull SCSI fixes from James Bottomley:
 "Four minor fixes: three in drivers and one in the core.

  The core one allows an additional state change that fixes a regression
  introduced by an update to the aacraid driver in the previous merge
  window"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target/iblock: fix WRITE SAME zeroing
  scsi: qla2xxx: check UNLOADING before posting async work
  scsi: qla2xxx: set UNLOADING before waiting for session deletion
  scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK

cebcff3a

io_uring: punt splice async because of inode mutex · 2fb3e822

Pavel Begunkov authored May 01, 2020

Nonblocking do_splice() still may wait for some time on an inode mutex.
Let's play safe and always punt it async.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2fb3e822

io_uring: check non-sync defer_list carefully · 4ee36314

Pavel Begunkov authored May 01, 2020

io_req_defer() do double-checked locking. Use proper helpers for that,
i.e. list_empty_careful().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4ee36314

io_uring: fix extra put in sync_file_range() · 7759a0bf

Pavel Begunkov authored May 01, 2020

[   40.179474] refcount_t: underflow; use-after-free.
[   40.179499] WARNING: CPU: 6 PID: 1848 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
...
[   40.179612] RIP: 0010:refcount_warn_saturate+0xae/0xf0
[   40.179617] Code: 28 44 0a 01 01 e8 d7 01 c2 ff 0f 0b 5d c3 80 3d 15 44 0a 01 00 75 91 48 c7 c7 b8 f5 75 be c6 05 05 44 0a 01 01 e8 b7 01 c2 ff <0f> 0b 5d c3 80 3d f3 43 0a 01 00 0f 85 6d ff ff ff 48 c7 c7 10 f6
[   40.179619] RSP: 0018:ffffb252423ebe18 EFLAGS: 00010286
[   40.179623] RAX: 0000000000000000 RBX: ffff98d65e929400 RCX: 0000000000000000
[   40.179625] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 00000000ffffffff
[   40.179627] RBP: ffffb252423ebe18 R08: 0000000000000001 R09: 000000000000055d
[   40.179629] R10: 0000000000000c8c R11: 0000000000000001 R12: 0000000000000000
[   40.179631] R13: ffff98d68c434400 R14: ffff98d6a9cbaa20 R15: ffff98d6a609ccb8
[   40.179634] FS:  0000000000000000(0000) GS:ffff98d6af580000(0000) knlGS:0000000000000000
[   40.179636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.179638] CR2: 00000000033e3194 CR3: 000000006480a003 CR4: 00000000003606e0
[   40.179641] Call Trace:
[   40.179652]  io_put_req+0x36/0x40
[   40.179657]  io_free_work+0x15/0x20
[   40.179661]  io_worker_handle_work+0x2f5/0x480
[   40.179667]  io_wqe_worker+0x2a9/0x360
[   40.179674]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[   40.179681]  kthread+0x12c/0x170
[   40.179685]  ? io_worker_handle_work+0x480/0x480
[   40.179690]  ? kthread_park+0x90/0x90
[   40.179695]  ret_from_fork+0x35/0x40
[   40.179702] ---[ end trace 85027405f00110aa ]---

Opcode handler must never put submission ref, but that's what
io_sync_file_range_finish() do. use io_steal_work() there.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7759a0bf

iommu/qcom: Fix local_base status check · b52649ae

Tang Bin authored Apr 18, 2020

The function qcom_iommu_device_probe() does not perform sufficient
error checking after executing devm_ioremap_resource(), which can
result in crashes if a critical error path is encountered.

Fixes: 0ae349a0 ("iommu/qcom: Add qcom_iommu")
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20200418134703.1760-1-tangbin@cmss.chinamobile.comSigned-off-by: Joerg Roedel <jroedel@suse.de>

b52649ae

iommu: Properly export iommu_group_get_for_dev() · ae74c19f

Greg Kroah-Hartman authored Apr 30, 2020

In commit a7ba5c3d ("drivers/iommu: Export core IOMMU API symbols to
permit modular drivers") a bunch of iommu symbols were exported, all
with _GPL markings except iommu_group_get_for_dev().  That export should
also be _GPL like the others.

Fixes: a7ba5c3d ("drivers/iommu: Export core IOMMU API symbols to permit modular drivers")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200430120120.2948448-1-gregkh@linuxfoundation.orgSigned-off-by: Joerg Roedel <jroedel@suse.de>

ae74c19f

iommu/vt-d: Use right Kconfig option name · ba61c3da

Lu Baolu authored May 01, 2020

The CONFIG_ prefix should be added in the code.

Fixes: 04618252 ("iommu/vt-d: Add Kconfig option to enable/disable scalable mode")
Reported-and-tested-by: Kumar, Sanjay K <sanjay.k.kumar@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20200501072427.14265-1-baolu.lu@linux.intel.comSigned-off-by: Joerg Roedel <jroedel@suse.de>

ba61c3da

iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system · b74aa02d

Suravee Suthikulpanit authored Apr 22, 2020

Currently, system fails to boot because the legacy interrupt remapping
mode does not enable 128-bit IRTE (GA), which is required for x2APIC
support.

Fix by using AMD_IOMMU_GUEST_IR_LEGACY_GA mode when booting with
kernel option amd_iommu_intr=legacy instead. The initialization
logic will check GASup and automatically fallback to using
AMD_IOMMU_GUEST_IR_LEGACY if GA mode is not supported.

Fixes: 3928aa3f ("iommu/amd: Detect and enable guest vAPIC support")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/1587562202-14183-1-git-send-email-suravee.suthikulpanit@amd.comSigned-off-by: Joerg Roedel <jroedel@suse.de>

b74aa02d

io_uring: use cond_resched() in io_ring_ctx_wait_and_kill() · 3fd44c86

Xiaoguang Wang authored May 01, 2020

While working on to make io_uring sqpoll mode support syscalls that need
struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(),

    while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
        cpu_relax();

above loop never has an chance to exit, it's because preempt isn't enabled
in the kernel, and the context calling io_ring_ctx_wait_and_kill() and
io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched()
yield cpu and another context enters above loop, then io_sq_thread() will
always in runqueue and never exit.

Use cond_resched() can fix this issue.

 Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3fd44c86

io_uring: use proper references for fallback_req locking · dd461af6

Bijan Mottahedeh authored Apr 29, 2020

Use ctx->fallback_req address for test_and_set_bit_lock() and
clear_bit_unlock().
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dd461af6

io_uring: only force async punt if poll based retry can't handle it · 490e8967

Jens Axboe authored Apr 28, 2020

We do blocking retry from our poll handler, if the file supports polled
notifications. Only mark the request as needing an async worker if we
can't poll for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

490e8967

io_uring: enable poll retry for any file with ->read_iter / ->write_iter · af197f50

Jens Axboe authored Apr 28, 2020

We can have files like eventfd where it's perfectly fine to do poll
based retry on them, right now io_file_supports_async() doesn't take
that into account.

Pass in data direction and check the f_op instead of just always needing
an async worker.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

af197f50

Merge tag 'amd-drm-fixes-5.7-2020-04-29' of... · e3dcd86b

Dave Airlie authored May 01, 2020

Merge tag 'amd-drm-fixes-5.7-2020-04-29' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.7-2020-04-29:

amdgpu:
- Fix a green screen on resume issue
- PM fixes for SR-IOV
- SDMA fix for navi
- Renoir display fixes
- Cursor and pageflip stuttering fixes
- Misc additional display fixes

UAPI:
- Add additional DCC tiling flags for navi1x
  Used by: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429212008.4306-1-alexander.deucher@amd.com

e3dcd86b

Merge tag 'drm-intel-fixes-2020-04-30' of... · a979bb70

Dave Airlie authored May 01, 2020

Merge tag 'drm-intel-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix selftest refcnt leak (Xiyu)
- Fix gem vma lock (Chris)
- Fix gt's i915_request.timeline acquire by checking if cacheline is valid (Chris)
- Fix IRQ postinistall fault masks (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430140042.GA270140@intel.com

a979bb70

Merge tag 'drm-misc-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · c62098c9

Dave Airlie authored May 01, 2020

A few resources-related fixes for qxl, some doc build warnings and ioctl
fixes for dma-buf, an off-by-one fix in edid, and a return code fix in
DP-MST
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430153201.wx6of2b2gsoip7bk@gilmour.lan

c62098c9

30 Apr, 2020 6 commits

Merge tag 'for-5.7/dm-fixes-2' of... · c45e8bcc

Linus Torvalds authored Apr 30, 2020

Merge tag 'for-5.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Document DM integrity allow_discard feature that was added during 5.7
   merge window.

 - Fix potential for DM writecache data corruption during DM table
   reloads.

 - Fix DM verity's FEC support's hash block number calculation in
   verity_fec_decode().

 - Fix bio-based DM multipath crash due to use of stale copy of
   MPATHF_QUEUE_IO flag state in __map_bio().

* tag 'for-5.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm multipath: use updated MPATHF_QUEUE_IO on mapping for bio-based mpath
  dm verity fec: fix hash block number in verity_fec_decode
  dm writecache: fix data corruption when reloading the target
  dm integrity: document allow_discard option

c45e8bcc

Merge tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 39e16d93

Linus Torvalds authored Apr 30, 2020

Pull SELinux fixes from Paul Moore:
 "Two more SELinux patches to fix problems in the v5.7-rcX releases.

  Wei Yongjun's patch fixes a return code in an error path, and my patch
  fixes a problem where we were not correctly applying access controls
  to all of the netlink messages in the netlink_send LSM hook"

* tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: properly handle multiple messages in selinux_netlink_send()
  selinux: fix error return code in cond_read_list()

39e16d93

Merge tag 'linux-kselftest-kunit-5.7-rc4' of... · 0468915b

Linus Torvalds authored Apr 30, 2020

Merge tag 'linux-kselftest-kunit-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kunit fix from Shuah Khan:
 "A single fix to flush the test summary to the console log without
  delay"

* tag 'linux-kselftest-kunit-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Add missing newline in summary message

0468915b

Merge tag 'linux-kselftest-5.7-rc4' of... · 75ec0ba2

Linus Torvalds authored Apr 30, 2020

Merge tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

 - ftrace test fixes to check for required filter files and kprobe args.

 - Kselftest build/cross-build dependency check script to make it easier
   for test ring admins/users to configure build systems correctly for
   build/cross-build kselftests. Currently checks library dependencies.

    - Checks if Kselftests can be built/cross-built on a system running
      compile test on a trivial C file with LDLIBS specified for each
      individual test in their Makefiles.

    - Prints suggested target list for a system filtering out tests
      failed the build dependency check from the TARGETS in Selftests
      the main Makefile when optional -p is specified.

    - Prints pass/fail dependency check for each tests/sub-test.

    - Prints pass/fail targets and libraries.

    - Default: runs dependency checks on all tests.

    - Optional test name can be specified to check dependencies for it.

* tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Check the first record for kprobe_args_type.tc
  selftests: add build/cross-build dependency check script
  selftests/ftrace: Check required filter files before running test

75ec0ba2

selinux: properly handle multiple messages in selinux_netlink_send() · fb739741

Paul Moore authored Apr 28, 2020

Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control.  Prior to this patch, SELinux only inspected
the first message in the sk_buff.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

fb739741

NFS: Fix a race in __nfs_list_for_each_server() · 9c07b75b

Trond Myklebust authored Apr 30, 2020

The struct nfs_server gets put on the cl_superblocks list before
the server->super field has been initialised, in which case the
call to nfs_sb_active() will Oops. Add a check to ensure that
we skip such a list entry.

Fixes: 3c9e502b ("NFS: Add a helper nfs_client_for_each_server()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9c07b75b