1. 20 Sep, 2022 2 commits
    • Vincent Mailhol's avatar
      x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions · fdb6649a
      Vincent Mailhol authored
      If x is not 0, __ffs(x) is equivalent to:
        (unsigned long)__builtin_ctzl(x)
      And if x is not ~0UL, ffz(x) is equivalent to:
        (unsigned long)__builtin_ctzl(~x)
      Because __builting_ctzl() returns an int, a cast to (unsigned long) is
      necessary to avoid potential warnings on implicit casts.
      
      Concerning the edge cases, __builtin_ctzl(0) is always undefined,
      whereas __ffs(0) and ffz(~0UL) may or may not be defined, depending on
      the processor. Regardless, for both functions, developers are asked to
      check against 0 or ~0UL so replacing __ffs() or ffz() by
      __builting_ctzl() is safe.
      
      For x86_64, the current __ffs() and ffz() implementations do not
      produce optimized code when called with a constant expression. On the
      contrary, the __builtin_ctzl() folds into a single instruction.
      
      However, for non constant expressions, the __ffs() and ffz() asm
      versions of the kernel remains slightly better than the code produced
      by GCC (it produces a useless instruction to clear eax).
      
      Use __builtin_constant_p() to select between the kernel's
      __ffs()/ffz() and the __builtin_ctzl() depending on whether the
      argument is constant or not.
      
      ** Statistics **
      
      On a allyesconfig, before...:
      
        $ objdump -d vmlinux.o | grep tzcnt | wc -l
        3607
      
      ...and after:
      
        $ objdump -d vmlinux.o | grep tzcnt | wc -l
        2600
      
      So, roughly 27.9% of the calls to either __ffs() or ffz() were using
      constant expressions and could be optimized out.
      
      (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)
      
      Note: on x86_64, the BSF instruction produces TZCNT when used with the
      REP prefix (which explain the use of `grep tzcnt' instead of `grep bsf'
      in above benchmark). c.f. [1]
      
      [1] e26a44a2 ("x86: Use REP BSF unconditionally")
      
        [ bp: Massage commit message. ]
      Signed-off-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr
      fdb6649a
    • Vincent Mailhol's avatar
      x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions · 146034fe
      Vincent Mailhol authored
      For x86_64, the current ffs() implementation does not produce optimized
      code when called with a constant expression. On the contrary, the
      __builtin_ffs() functions of both GCC and clang are able to fold the
      expression into a single instruction.
      
      ** Example **
      
      Consider two dummy functions foo() and bar() as below:
      
        #include <linux/bitops.h>
        #define CONST 0x01000000
      
        unsigned int foo(void)
        {
        	return ffs(CONST);
        }
      
        unsigned int bar(void)
        {
        	return __builtin_ffs(CONST);
        }
      
      GCC would produce below assembly code:
      
        0000000000000000 <foo>:
           0:	ba 00 00 00 01       	mov    $0x1000000,%edx
           5:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
           a:	0f bc c2             	bsf    %edx,%eax
           d:	83 c0 01             	add    $0x1,%eax
          10:	c3                   	ret
        <Instructions after ret and before next function were redacted>
      
        0000000000000020 <bar>:
          20:	b8 19 00 00 00       	mov    $0x19,%eax
          25:	c3                   	ret
      
      And clang would produce:
      
        0000000000000000 <foo>:
           0:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
           5:	0f bc 05 00 00 00 00 	bsf    0x0(%rip),%eax        # c <foo+0xc>
           c:	83 c0 01             	add    $0x1,%eax
           f:	c3                   	ret
      
        0000000000000010 <bar>:
          10:	b8 19 00 00 00       	mov    $0x19,%eax
          15:	c3                   	ret
      
      Both examples clearly demonstrate the benefit of using __builtin_ffs()
      instead of the kernel's asm implementation for constant expressions.
      
      However, for non constant expressions, the kernel's ffs() asm version
      remains better for x86_64 because, contrary to GCC, it doesn't emit the
      CMOV assembly instruction, c.f. [1] (noticeably, clang is able optimize
      out the CMOV call).
      
      Use __builtin_constant_p() to select between the kernel's ffs() and
      the __builtin_ffs() depending on whether the argument is constant or
      not.
      
      As a side benefit, replacing the ffs() function declaration by a macro
      also removes below -Wshadow warning:
      
        ./arch/x86/include/asm/bitops.h:283:28: warning: declaration of 'ffs' shadows a built-in function [-Wshadow]
          283 | static __always_inline int ffs(int x)
      
      ** Statistics **
      
      On a allyesconfig, before...:
      
        $ objdump -d vmlinux.o | grep bsf | wc -l
        1081
      
      ...and after:
      
        $ objdump -d vmlinux.o | grep bsf | wc -l
        792
      
      So, roughly 26.7% of the calls to ffs() were using constant
      expressions and could be optimized out.
      
      (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)
      
      [1] commit ca3d30cc ("x86_64, asm: Optimise fls(), ffs() and fls64()")
      
        [ bp: Massage commit message. ]
      Signed-off-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/r/20220511160319.1045812-1-mailhol.vincent@wanadoo.fr
      146034fe
  2. 18 Sep, 2022 5 commits
  3. 16 Sep, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · a335366b
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix the level-low interrupt type support in gpio-mpc8xxx
      
       - convert another two drivers to using immutable irq chips
      
       - MAINTAINERS update
      
      * tag 'gpio-fixes-for-v6.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: mt7621: Make the irqchip immutable
        gpio: ixp4xx: Make irqchip immutable
        MAINTAINERS: Update HiSilicon GPIO Driver maintainer
        gpio: mpc8xxx: Fix support for IRQ_TYPE_LEVEL_LOW flow_type in mpc85xx
      a335366b
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 6879c2d3
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Nothing special, just driver fixes:
      
         - Fix IRQ wakeup and pins for UFS and SDC2 issues on the Qualcomm
           SC8180x
      
         - Fix the Rockchip driver to support interrupt on both rising and
           falling edges.
      
         - Name the Allwinner A100 R_PIO properly
      
         - Fix several issues with the Ocelot interrupts"
      
      * tag 'pinctrl-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: ocelot: Fix interrupt controller
        pinctrl: sunxi: Fix name for A100 R_PIO
        pinctrl: rockchip: Enhance support for IRQ_TYPE_EDGE_BOTH
        pinctrl: qcom: sc8180x: Fix wrong pin numbers
        pinctrl: qcom: sc8180x: Fix gpio_wakeirq_map
      6879c2d3
    • Linus Torvalds's avatar
      Merge tag 'block-6.0-2022-09-16' of git://git.kernel.dk/linux-block · 68e777e4
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Two fixes for -rc6:
      
         - Fix a mixup of sectors and bytes in the secure erase ioctl
           (Mikulas)
      
         - Fix for a bad return value for a non-blocking bio/blk queue enter
           call (me)"
      
      * tag 'block-6.0-2022-09-16' of git://git.kernel.dk/linux-block:
        blk-lib: fix blkdev_issue_secure_erase
        block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait
      68e777e4
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block · 0158137d
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two small patches:
      
         - Fix using an unsigned type for the return value, introduced in this
           release (Pavel)
      
         - Stable fix for a missing check for a fixed file on put (me)"
      
      * tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block:
        io_uring/msg_ring: check file type before putting
        io_uring/rw: fix error'ed retry return values
      0158137d
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-09-16' of git://anongit.freedesktop.org/drm/drm · 5763d7f2
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This is the regular drm fixes pull.
      
        The i915 and misc fixes are fairly regular, but the amdgpu contains
        fixes for new hw blocks, the dcn314 specific path hookups and also has
        a bunch of fixes for clang stack size warnings which are a bit churny
        but fairly straightforward. This means it looks a little larger than
        usual.
      
        amdgpu:
         - BACO fixes for some RDNA2 boards
         - PCI AER fixes uncovered by a core PCI change
         - Properly hook up dirtyfb helper
         - RAS fixes for GC 11.x
         - TMR fix
         - DCN 3.2.x fixes
         - DCN 3.1.4 fixes
         - LLVM DML stack size fixes
      
        i915:
         - Revert a display patch around max DP source rate now that the
           proper WaEdpLinkRateDataReload is in place
         - Fix perf limit reasons bit position
         - Fix unclaimmed mmio registers on suspend flow with GuC
         - A vma_move_to_active fix for a regression with video decoding
         - DP DSP fix
      
        gma500:
         - Locking and IRQ fixes
      
        meson:
         - OSD1 display fixes
      
        panel-edp:
         - Fix Innolux timings
      
        rockchip:
         - DP/HDMI fixes"
      
      * tag 'drm-fixes-2022-09-16' of git://anongit.freedesktop.org/drm/drm: (42 commits)
        drm/amdgpu: make sure to init common IP before gmc
        drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
        drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
        drm/rockchip: Fix return type of cdn_dp_connector_mode_valid
        drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage
        drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule()
        drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()
        drm/amd/display: Reduce number of arguments of dml32_CalculatePrefetchSchedule()
        drm/amd/display: Reduce number of arguments of dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport()
        drm/amd/display: Refactor SubVP calculation to remove FPU
        drm/amd/display: Limit user regamma to a valid value
        drm/amd/display: add workaround for subvp cursor corruption for DCN32/321
        drm/amd/display: SW cursor fallback for SubVP
        drm/amd/display: Round cursor width up for MALL allocation
        drm/amd/display: Correct dram channel width for dcn314
        drm/amd/display: Relax swizzle checks for video non-RGB formats on DCN314
        drm/amd/display: Hook up DCN314 specific dml implementation
        drm/amd/display: Enable dlg and vba compilation for dcn314
        drm/amd/display: Fix compilation errors on DCN314
        drm/amd/display: Fix divide by zero in DML
        ...
      5763d7f2
    • Linus Torvalds's avatar
      Merge tag '6.0-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 714820c6
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Four smb3 fixes for stable:
      
         - important fix to revalidate mapping when doing direct writes
      
         - missing spinlock
      
         - two fixes to socket handling
      
         - trivial change to update internal version number for cifs.ko"
      
      * tag '6.0-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module number
        cifs: add missing spinlock around tcon refcount
        cifs: always initialize struct msghdr smb_msg completely
        cifs: don't send down the destination address to sendmsg for a SOCK_STREAM
        cifs: revalidate mapping when doing direct writes
      714820c6
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2022-09-15' of... · 25100377
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2022-09-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      - Revert a display patch around max DP source rate now
        that the proper WaEdpLinkRateDataReload is in place. (Ville)
      - Fix perf limit reasons bit position. (Ashutosh)
      - Fix unclaimmed mmio registers on suspend flow with GuC. (Umesh)
      - A vma_move_to_active fix for a regression with video decoding. (Nirmoy)
      - DP DSP fix. (Ankit)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/YyMtmGMXRLsURoM5@intel.com
      25100377
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2022-09-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 87d9862b
      Dave Airlie authored
      Short summary of fixes pull:
      
       * gma500: Locking and IRQ fixes
       * meson: OSD1 display fixes
       * panel-edp: Fix Innolux timings
       * rockchip: DP/HDMI fixes
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/YyMUpP1w21CPXq+I@linux-uq9g
      87d9862b
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-6.0-2022-09-14' of... · e2111ae2
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-6.0-2022-09-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      
      amd-drm-fixes-6.0-2022-09-14:
      
      amdgpu:
      - BACO fixes for some RDNA2 boards
      - PCI AER fixes uncovered by a core PCI change
      - Properly hook up dirtyfb helper
      - RAS fixes for GC 11.x
      - TMR fix
      - DCN 3.2.x fixes
      - DCN 3.1.4 fixes
      - LLVM DML stack size fixes
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220914184030.6145-1-alexander.deucher@amd.com
      e2111ae2
  4. 15 Sep, 2022 4 commits
  5. 14 Sep, 2022 11 commits
  6. 13 Sep, 2022 9 commits