1. 10 Jun, 2021 22 commits
    • Al Viro's avatar
      pull handling of ->iov_offset into iterate_{iovec,bvec,xarray} · a6e4ec7b
      Al Viro authored
      fewer arguments (by one, but still...) for iterate_...() macros
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a6e4ec7b
    • Al Viro's avatar
      iov_iter: make iterator callbacks use base and len instead of iovec · 7baa5099
      Al Viro authored
      Iterator macros used to provide the arguments for step callbacks in
      a structure matching the flavour - iovec for ITER_IOVEC, kvec for
      ITER_KVEC and bio_vec for ITER_BVEC.  That already broke down for
      ITER_XARRAY (bio_vec there); now that we are using kvec callback
      for bvec and xarray cases, we are always passing a pointer + length
      (void __user * + size_t for ITER_IOVEC callback, void * + size_t
      for everything else).
      
      Note that the original reason for bio_vec (page + offset + len) in
      case of ITER_BVEC used to be that we did *not* want to kmap a
      page when all we wanted was e.g. to find the alignment of its
      subrange.  Now all such users are gone and the ones that are left
      want the page mapped anyway for actually copying the data.
      
      So in all cases we have pointer + length, and there's no good
      reason for keeping those in struct iovec or struct kvec - we
      can just pass them to callback separately.
      
      Again, less boilerplate in callbacks...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7baa5099
    • Al Viro's avatar
      iov_iter: make the amount already copied available to iterator callbacks · 622838f3
      Al Viro authored
      Making iterator macros keep track of the amount of data copied is pretty
      easy and it has several benefits:
      	1) we no longer need the mess like (from += v.iov_len) - v.iov_len
      in the callbacks - initial value + total amount copied so far would do
      just fine.
      	2) less obviously, we no longer need to remember the initial amount
      of data we wanted to copy; the loops in iterator macros are along the lines
      of
      	wanted = bytes;
      	while (bytes) {
      		copy some
      		bytes -= copied
      		if short copy
      			break
      	}
      	bytes = wanted - bytes;
      Replacement is
      	offs = 0;
      	while (bytes) {
      		copy some
      		offs += copied
      		bytes -= copied
      		if short copy
      			break
      	}
      	bytes = offs;
      That wouldn't be a win per se, but unlike the initial value of bytes, the amount
      copied so far *is* useful in callbacks.
      	3) in some cases (csum_and_copy_..._iter()) we already had offs manually
      maintained by the callbacks.  With that change we can drop that.
      
      	Less boilerplate and more readable code...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      622838f3
    • Al Viro's avatar
      iov_iter: get rid of separate bvec and xarray callbacks · 21b56c84
      Al Viro authored
      After the previous commit we have
      	* xarray and bvec callbacks idential in all cases
      	* both equivalent to kvec callback wrapped into
      kmap_local_page()/kunmap_local() pair.
      
      So we can pass only two (iovec and kvec) callbacks to
      iterate_and_advance() and let iterate_{bvec,xarray} wrap
      it into kmap_local_page()/kunmap_local_page().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      21b56c84
    • Al Viro's avatar
      iov_iter: teach iterate_{bvec,xarray}() about possible short copies · 1b4fb5ff
      Al Viro authored
      ... and now we finally can sort out the mess in _copy_mc_to_iter().
      Provide a variant of iterate_and_advance() that does *NOT* ignore
      the return values of bvec, xarray and kvec callbacks, use that in
      _copy_mc_to_iter().  That gets rid of magic in those callbacks -
      we used to need it so we'd get at least the right return value in
      case of failure halfway through.
      
      As a bonus, now iterator is advanced by the amount actually copied
      for all flavours.  That's what the callers expect and it used to do that
      correctly in iovec and xarray cases.  However, in kvec and bvec cases
      the iterator had not been advanced on such failures, breaking the users.
      Fixed now...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1b4fb5ff
    • Al Viro's avatar
      iterate_bvec(): expand bvec.h macro forest, massage a bit · 7491a2bf
      Al Viro authored
      ... incidentally, using pointer instead of index in an array
      (the only change here) trims half-kilobyte of .text...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7491a2bf
    • Al Viro's avatar
      iov_iter: unify iterate_iovec and iterate_kvec · 5c67aa90
      Al Viro authored
      The differences between iterate_iovec and iterate_kvec are minor:
      	* kvec callback is treated as if it returned 0
      	* initialization of __p is with i->iov and i->kvec resp.
      which is trivially dealt with.
      
      No code generation changes - compiler is quite capable of turning
      	left = ((void)(STEP), 0);
      	__v.iov_len -= left;
      (with no accesses to left downstream) and
      	(void)(STEP);
      into the same code.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5c67aa90
    • Al Viro's avatar
      iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec · 7a1bcb5d
      Al Viro authored
      Premature optimization is the root of all evil...  Trying
      to unroll the first pass through the loop makes it harder
      to follow and not just for readers - compiler ends up
      generating worse code than it would on a "non-optimized"
      loop.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7a1bcb5d
    • Al Viro's avatar
      iterate_and_advance(): get rid of magic in case when n is 0 · f5da8354
      Al Viro authored
      iov_iter_advance() needs to do some non-trivial work when it's given
      0 as argument (skip all empty iovecs, mostly).  We used to implement
      it via iterate_and_advance(); we no longer do so and for all other
      users of iterate_and_advance() zero length is a no-op.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f5da8354
    • Al Viro's avatar
      csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter() · 594e450b
      Al Viro authored
      Namely, have off counted starting from 0 rather than from csstate->off.
      To compensate we need to shift the initial value (csstate->sum) (rotate
      by 8 bits, as usual for csum) and do the same after we are finished adding
      the pieces up.
      
      What we get out of that is a bit more redundancy in our variables - from
      is always equal to addr + off, which will be useful several commits down
      the road.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      594e450b
    • Al Viro's avatar
      iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant · f0b65f39
      Al Viro authored
      Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
      callers do *not* need to do iov_iter_advance() after it.  In case when they end
      up consuming less than they'd been given they need to do iov_iter_revert() on
      everything they had not consumed.  That, however, needs to be done only on slow
      paths.
      
      All in-tree callers converted.  And that kills the last user of iterate_all_kinds()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f0b65f39
    • Al Viro's avatar
      [xarray] iov_iter_npages(): just use DIV_ROUND_UP() · e4f8df86
      Al Viro authored
      Compiler is capable of recognizing division by power of 2 and turning
      it into shifts.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e4f8df86
    • Al Viro's avatar
      iov_iter_npages(): don't bother with iterate_all_kinds() · 66531c65
      Al Viro authored
      note that in bvec case pages can be compound ones - we can't just assume
      that each segment is covered by one (sub)page
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      66531c65
    • Al Viro's avatar
      get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc() · 3d671ca6
      Al Viro authored
      Here iterate_all_kinds() is used just to find the first (non-empty, in
      case of iovec) segment.  Which can be easily done explicitly.
      Note that in bvec case we now can get more than PAGE_SIZE worth of them,
      in case when we have a compound page in bvec and a range that crosses
      a subpage boundary.  Older behaviour had been to stop on that boundary;
      we used to get the right first page (for_each_bvec() took care of that),
      but that was all we'd got.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3d671ca6
    • Al Viro's avatar
      iov_iter_gap_alignment(): get rid of iterate_all_kinds() · 610c7a71
      Al Viro authored
      For one thing, it's only used for iovec (and makes sense only for those).
      For another, here we don't care about iov_offset, since the beginning of
      the first segment and the end of the last one are ignored.  So it makes
      a lot more sense to just walk through the iovec array...
      
      We need to deal with the case of truncated iov_iter, but unlike the
      situation with iov_iter_alignment() we don't care where the last
      segment ends - just which segment is the last one.
      
      [fixed a braino spotted by Qian Cai <quic_qiancai@quicinc.com>]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      610c7a71
    • Al Viro's avatar
      iov_iter_alignment(): don't bother with iterate_all_kinds() · 9221d2e3
      Al Viro authored
      It's easier to go over the array manually.  We need to watch out
      for truncated iov_iter, though - iovec array might cover more
      than i->count.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9221d2e3
    • Al Viro's avatar
      sanitize iov_iter_fault_in_readable() · 8409a0d2
      Al Viro authored
      1) constify iov_iter argument; we are not advancing it in this primitive.
      
      2) cap the amount requested by the amount of data in iov_iter.  All
      existing callers should've been safe, but the check is really cheap and
      doing it here makes for easier analysis, as well as more consistent
      semantics among the primitives.
      
      3) don't bother with iterate_iovec().  Explicit loop is not any harder
      to follow, and we get rid of standalone iterate_iovec() users - it's
      only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8409a0d2
    • Al Viro's avatar
      iov_iter: optimize iov_iter_advance() for iovec and kvec · 185ac4d4
      Al Viro authored
      We can do better than generic iterate_and_advance() for this one;
      inspired by bvec_iter_advance() (and massaged into that form by
      equivalent transformations).
      
      [fixed a braino caught by kernel test robot <oliver.sang@intel.com>]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      185ac4d4
    • Al Viro's avatar
      iov_iter: separate direction from flavour · 8cd54c1c
      Al Viro authored
      Instead of having them mixed in iter->type, use separate ->iter_type
      and ->data_source (u8 and bool resp.)  And don't bother with (pseudo-)
      bitmap for the former - microoptimizations from being able to check
      if the flavour is one of two values are not worth the confusion for
      optimizer.  It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE,
      so we end up with extra headache.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8cd54c1c
    • Al Viro's avatar
      iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD · 556351c1
      Al Viro authored
      the field is not used for that flavour
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      556351c1
    • Al Viro's avatar
      iov_iter: reorder handling of flavours in primitives · 28f38db7
      Al Viro authored
      iovec is the most common one; test it first and test explicitly,
      rather than "not anything else".  Replace all flavour checks with
      use of iov_iter_is_...() helpers.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      28f38db7
    • Al Viro's avatar
      iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert() · 4b6c132b
      Al Viro authored
      Use corresponding plain variants, revert on short copy.  That's the way it
      should've been done from the very beginning, except that we didn't have
      iov_iter_revert() back then...
      
      [fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4b6c132b
  2. 03 Jun, 2021 6 commits
  3. 02 Jun, 2021 2 commits
  4. 09 May, 2021 10 commits
    • Linus Torvalds's avatar
      Linux 5.13-rc1 · 6efb943b
      Linus Torvalds authored
      6efb943b
    • Linus Torvalds's avatar
      fbmem: fix horribly incorrect placement of __maybe_unused · 6dae40ae
      Linus Torvalds authored
      Commit b9d79e4c ("fbmem: Mark proc_fb_seq_ops as __maybe_unused")
      places the '__maybe_unused' in an entirely incorrect location between
      the "struct" keyword and the structure name.
      
      It's a wonder that gcc accepts that silently, but clang quite reasonably
      warns about it:
      
          drivers/video/fbdev/core/fbmem.c:736:21: warning: attribute declaration must precede definition [-Wignored-attributes]
          static const struct __maybe_unused seq_operations proc_fb_seq_ops = {
                              ^
      
      Fix it.
      
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6dae40ae
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm · efc58a96
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Bit later than usual, I queued them all up on Friday then promptly
        forgot to write the pull request email. This is mainly amdgpu fixes,
        with some radeon/msm/fbdev and one i915 gvt fix thrown in.
      
        amdgpu:
         - MPO hang workaround
         - Fix for concurrent VM flushes on vega/navi
         - dcefclk is not adjustable on navi1x and newer
         - MST HPD debugfs fix
         - Suspend/resumes fixes
         - Register VGA clients late in case driver fails to load
         - Fix GEM leak in user framebuffer create
         - Add support for polaris12 with 32 bit memory interface
         - Fix duplicate cursor issue when using overlay
         - Fix corruption with tiled surfaces on VCN3
         - Add BO size and stride check to fix BO size verification
      
        radeon:
         - Fix off-by-one in power state parsing
         - Fix possible memory leak in power state parsing
      
        msm:
         - NULL ptr dereference fix
      
        fbdev:
         - procfs disabled warning fix
      
        i915:
         - gvt: Fix a possible division by zero in vgpu display rate
           calculation"
      
      * tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: Use device specific BO size & stride check.
        drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
        drm/amd/pm: initialize variable
        drm/radeon: Avoid power table parsing memory leaks
        drm/radeon: Fix off-by-one power_state index heap overwrite
        drm/amd/display: Fix two cursor duplication when using overlay
        drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
        fbmem: Mark proc_fb_seq_ops as __maybe_unused
        drm/msm/dpu: Delete bonkers code
        drm/i915/gvt: Prevent divided by zero when calculating refresh rate
        amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
        drm/amdgpu: Register VGA clients after init can no longer fail
        drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
        drm/amdgpu: fix r initial values
        drm/amd/display: fix wrong statement in mst hpd debugfs
        amdgpu/pm: set pp_dpm_dcefclk to readonly on NAVI10 and newer gpus
        amdgpu/pm: Prevent force of DCEFCLK on NAVI10 and SIENNA_CICHLID
        drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
        drm/amd/display: Reject non-zero src_y and src_x for video planes
      efc58a96
    • Linus Torvalds's avatar
      Merge tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block · 506c3079
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "Turns out the bio max size change still has issues, so let's get it
        reverted for 5.13-rc1. We'll shake out the issues there and defer it
        to 5.14 instead"
      
      * tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block:
        Revert "bio: limit bio max size"
      506c3079
    • Linus Torvalds's avatar
      Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6 · 0a55a1fb
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small SMB3 chmultichannel related changesets (also for stable)
        from the SMB3 test event this week.
      
        The other fixes are still in review/testing"
      
      * tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: if max_channels set to more than one channel request multichannel
        smb3: do not attempt multichannel to server which does not support it
        smb3: when mounting with multichannel include it in requested capabilities
      0a55a1fb
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9819f682
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A set of scheduler updates:
      
         - Prevent PSI state corruption when schedule() races with cgroup
           move.
      
           A recent commit combined two PSI callbacks to reduce the number of
           cgroup tree updates, but missed that schedule() can drop rq::lock
           for load balancing, which opens the race window for
           cgroup_move_task() which then observes half updated state.
      
           The fix is to solely use task::ps_flags instead of looking at the
           potentially mismatching scheduler state
      
         - Prevent an out-of-bounds access in uclamp caused bu a rounding
           division which can lead to an off-by-one error exceeding the
           buckets array size.
      
         - Prevent unfairness caused by missing load decay when a task is
           attached to a cfs runqueue.
      
           The old load of the task was attached to the runqueue and never
           removed. Fix it by enforcing the load update through the hierarchy
           for unthrottled run queue instances.
      
         - A documentation fix fot the 'sched_verbose' command line option"
      
      * tag 'sched-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix unfairness caused by missing load decay
        sched: Fix out-of-bound access in uclamp
        psi: Fix psi state corruption when schedule() races with cgroup move
        sched,doc: sched_debug_verbose cmdline should be sched_verbose
      9819f682
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 732a27a0
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "A set of locking related fixes and updates:
      
         - Two fixes for the futex syscall related to the timeout handling.
      
           FUTEX_LOCK_PI does not support the FUTEX_CLOCK_REALTIME bit and
           because it's not set the time namespace adjustment for clock
           MONOTONIC is applied wrongly.
      
           FUTEX_WAIT cannot support the FUTEX_CLOCK_REALTIME bit because its
           always a relative timeout.
      
         - Cleanups in the futex syscall entry points which became obvious
           when the two timeout handling bugs were fixed.
      
         - Cleanup of queued_write_lock_slowpath() as suggested by Linus
      
         - Fixup of the smp_call_function_single_async() prototype"
      
      * tag 'locking-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Make syscall entry points less convoluted
        futex: Get rid of the val2 conditional dance
        futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
        Revert 337f1304 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
        locking/qrwlock: Cleanup queued_write_lock_slowpath()
        smp: Fix smp_call_function_single_async prototype
      732a27a0
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 85bbba1c
      Linus Torvalds authored
      Pull x86 perf fix from Borislav Petkov:
       "Handle power-gating of AMD IOMMU perf counters properly when they are
        used"
      
      * tag 'perf_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
      85bbba1c
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dd3e4012
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of things accumulated for x86 in the last two weeks:
      
         - Fix guest vtime accounting so that ticks happening while the guest
           is running can also be accounted to it. Along with a consolidation
           to the guest-specific context tracking helpers.
      
         - Provide for the host NMI handler running after a VMX VMEXIT to be
           able to run on the kernel stack correctly.
      
         - Initialize MSR_TSC_AUX when RDPID is supported and not RDTSCP (virt
           relevant - real hw supports both)
      
         - A code generation improvement to TASK_SIZE_MAX through the use of
           alternatives
      
         - The usual misc and related cleanups and improvements"
      
      * tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        KVM: x86: Consolidate guest enter/exit logic to common helpers
        context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
        context_tracking: Consolidate guest enter/exit wrappers
        sched/vtime: Move guest enter/exit vtime accounting to vtime.h
        sched/vtime: Move vtime accounting external declarations above inlines
        KVM: x86: Defer vtime accounting 'til after IRQ handling
        context_tracking: Move guest exit vtime accounting to separate helpers
        context_tracking: Move guest exit context tracking to separate helpers
        KVM/VMX: Invoke NMI non-IST entry instead of IST entry
        x86/cpu: Remove write_tsc() and write_rdtscp_aux() wrappers
        x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
        x86/resctrl: Fix init const confusion
        x86: Delete UD0, UD1 traces
        x86/smpboot: Remove duplicate includes
        x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant
      dd3e4012
    • Jens Axboe's avatar
      Revert "bio: limit bio max size" · 35c820e7
      Jens Axboe authored
      This reverts commit cd2c7545.
      
      Alex reports that the commit causes corruption with LUKS on ext4. Revert
      it for now so that this can be investigated properly.
      
      Link: https://lore.kernel.org/linux-block/1620493841.bxdq8r5haw.none@localhost/Reported-by: default avatarAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      35c820e7