Commits · 7e89a37c477c4caacde7b511c64720e20104945f · Kirill Smelkov / linux

06 Mar, 2019 1 commit

ipc: Fix building compat mode without sysvipc · 7e89a37c

Arnd Bergmann authored Feb 28, 2019

As John Stultz noticed, my y2038 syscall series caused a link
failure when CONFIG_SYSVIPC is disabled but CONFIG_COMPAT is
enabled:

arch/arm64/kernel/sys32.o:(.rodata+0x960): undefined reference to `__arm64_compat_sys_old_semctl'
arch/arm64/kernel/sys32.o:(.rodata+0x980): undefined reference to `__arm64_compat_sys_old_msgctl'
arch/arm64/kernel/sys32.o:(.rodata+0x9a0): undefined reference to `__arm64_compat_sys_old_shmctl'

Add the missing entries in kernel/sys_ni.c for the new system
calls.

Cc: Laura Abbott <labbott@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

7e89a37c

27 Feb, 2019 1 commit

Merge tag 'y2038-syscall-abi' of... · cfbe2716

Thomas Gleixner authored Feb 27, 2019

Merge tag 'y2038-syscall-abi' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground into timers/2038

Pull additional syscall ABI cleanup for y2038 from Arnd Bergmann:

This is a follow-up to the y2038 syscall patches already merged in the tip
tree.  As the final 32-bit RISC-V syscall ABI is still being decided on,
this is the last chance to make a few corrections to leave out interfaces
based on 32-bit time_t along with the old off_t and rlimit types.

The series achieves this in a few steps:

- A couple of bug fixes for minor regressions I introduced
  in the original series

- A couple of older patches from Yury Norov that I had never
  merged in the past, these fix up the openat/open_by_handle_at and
  getrlimit/setrlimit syscalls to disallow the old versions of off_t
  and rlimit.

- Hiding the deprecated system calls behind an #ifdef in
  include/uapi/asm-generic/unistd.h

- Change arch/riscv to drop all these ABIs.

Originally, the plan was to also leave these out on C-Sky, but that now
has a glibc port that uses the older interfaces, so we need to leave
them in place.

cfbe2716

25 Feb, 2019 1 commit

riscv: Use latest system call ABI · d4c08b97

Arnd Bergmann authored Feb 18, 2019

We don't yet have an upstream glibc port for riscv, so there is no user
space for the existing ABI, and we can remove the definitions for 32-bit
time_t, off_t and struct resource and system calls based on them,
including the vdso.
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

d4c08b97

19 Feb, 2019 5 commits

checksyscalls: fix up mq_timedreceive and stat exceptions · 1d5b8233

Arnd Bergmann authored Feb 18, 2019

mq_timedreceive was spelled incorrectly, and we need exceptions
for new architectures that leave out newstat or stat64, implementing
only statx() now.

Fixes: 48166e6e ("y2038: add 64-bit time_t syscalls to all 32-bit architectures")
Fixes: bf4b6a7d ("y2038: Remove stat64 family from default syscall set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

1d5b8233

unicore32: Fix __ARCH_WANT_STAT64 definition · 8e9f51a8

Arnd Bergmann authored Feb 18, 2019

The __ARCH_WANT_STAT64 macro must be defined before including
asm-generic/unistd.h. I got this right for everything except
unicore32.

Fixes: bf4b6a7d ("y2038: Remove stat64 family from default syscall set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

8e9f51a8

asm-generic: Make time32 syscall numbers optional · c8ce48f0

Arnd Bergmann authored Feb 18, 2019

We don't want new architectures to even provide the old 32-bit time_t
based system calls any more, or define the syscall number macros.

Add a new __ARCH_WANT_TIME32_SYSCALLS macro that gets enabled for all
existing 32-bit architectures using the generic system call table,
so we don't change any current behavior.
Since this symbol is evaluated in user space as well, we cannot use
a Kconfig CONFIG_* macro but have to define it in uapi/asm/unistd.h.

On 64-bit architectures, the same system call numbers mostly refer to
the system calls we want to keep, as they already pass 64-bit time_t.

As new architectures no longer provide these, we need new exceptions
in checksyscalls.sh.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

c8ce48f0

asm-generic: Drop getrlimit and setrlimit syscalls from default list · 80d7da1c

Yury Norov authored May 16, 2018

The newer prlimit64 syscall provides all the functionality of getrlimit
and setrlimit syscalls and adds the pid of target process, so future
architectures won't need to include getrlimit and setrlimit.

Therefore drop getrlimit and setrlimit syscalls from the generic syscall
list unless __ARCH_WANT_SET_GET_RLIMIT is defined by the architecture's
unistd.h prior to including asm-generic/unistd.h, and adjust all
architectures using the generic syscall list to define it so that no
in-tree architectures are affected.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-hexagon@vger.kernel.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: James Hogan <james.hogan@imgtec.com> [metag]
Acked-by: Ley Foon Tan <lftan@altera.com> [nios2]
Acked-by: Stafford Horne <shorne@gmail.com> [openrisc]
Acked-by: Will Deacon <will.deacon@arm.com> [arm64]
Acked-by: Vineet Gupta <vgupta@synopsys.com> #arch/arc bits
Signed-off-by: Yury Norov <ynorov@marvell.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

80d7da1c

32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option · 942fa985

Yury Norov authored May 16, 2018

All new 32-bit architectures should have 64-bit userspace off_t type, but
existing architectures has 32-bit ones.

To enforce the rule, new config option is added to arch/Kconfig that defaults
ARCH_32BIT_OFF_T to be disabled for new 32-bit architectures. All existing
32-bit architectures enable it explicitly.

New option affects force_o_largefile() behaviour. Namely, if userspace
off_t is 64-bits long, we have no reason to reject user to open big files.

Note that even if architectures has only 64-bit off_t in the kernel
(arc, c6x, h8300, hexagon, nios2, openrisc, and unicore32),
a libc may use 32-bit off_t, and therefore want to limit the file size
to 4GB unless specified differently in the open flags.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <ynorov@marvell.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

942fa985

18 Feb, 2019 1 commit

compat ABI: use non-compat openat and open_by_handle_at variants · 0d0216c0

Yury Norov authored May 16, 2018

The only difference between native and compat openat and open_by_handle_at
is that non-compat version forces O_LARGEFILE, and it should be the
default behaviour for all architectures, as we are going to drop the
support of 32-bit userspace off_t.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Yury Norov <ynorov@marvell.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

0d0216c0

10 Feb, 2019 2 commits

Merge tag 'y2038-new-syscalls' of... · 41ea3910

Thomas Gleixner authored Feb 10, 2019

Merge tag 'y2038-new-syscalls' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038

Pull y2038 - time64 system calls from Arnd Bergmann:

This series finally gets us to the point of having system calls with 64-bit
time_t on all architectures, after a long time of incremental preparation
patches.

There was actually one conversion that I missed during the summer,
i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes
and review comments.

The following system calls are now added on all 32-bit architectures using
the same system call numbers:

403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64

Each one of these corresponds directly to an existing system call that
includes a 'struct timespec' argument, or a structure containing a timespec
or (in case of clock_adjtime) timeval. Not included here are new versions
of getitimer/setitimer and getrusage/waitid, which are planned for the
future but only needed to make a consistent API rather than for correct
operation beyond y2038. These four system calls are based on 'timeval', and
it has not been finally decided what the replacement kernel interface will
use instead.

So far, I have done a lot of build testing across most architectures, which
has found a number of bugs. Runtime testing so far included testing LTP on
32-bit ARM with the existing system calls, to ensure we do not regress for
existing binaries, and a test with a 32-bit x86 build of LTP against a
modified version of the musl C library that has been adapted to the new
system call interface [3].  This library can be used for testing on all
architectures supported by musl-1.1.21, but it is not how the support is
getting integrated into the official musl release. Official musl support is
planned but will require more invasive changes to the library.

Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/
Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/
Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]

41ea3910

Merge tag 'y2038-syscall-cleanup' of... · fd659cc0

Thomas Gleixner authored Feb 10, 2019

Merge tag 'y2038-syscall-cleanup' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038

Pull preparatory work for y2038 changes from Arnd Bergmann:

System call unification and cleanup

The system call tables have diverged a bit over the years, and a number of
the recent additions never made it into all architectures, for one reason
or another.

This is an attempt to clean it up as far as we can without breaking
compatibility, doing a number of steps:

- Add system calls that have not yet been integrated into all architectures
but that we definitely want there. This includes {,f}statfs64() and
get{eg,eu,g,p,u,pp}id() on alpha, which have been missing traditionally.

- The s390 compat syscall handling is cleaned up to be more like what we
do on other architectures, while keeping the 31-bit pointer
extension. This was merged as a shared branch by the s390 maintainers
and is included here in order to base the other patches on top.

- Add the separate ipc syscalls on all architectures that traditionally
only had sys_ipc(). This version is done without support for IPC_OLD
that is we have in sys_ipc. The new semtimedop_time64 syscall will only
be added here, not in sys_ipc

- Add syscall numbers for a couple of syscalls that we probably don't need
everywhere, in particular pkey_* and rseq, for the purpose of symmetry:
if it's in asm-generic/unistd.h, it makes sense to have it everywhere. I
expect that any future system calls will get assigned on all platforms
together, even when they appear to be specific to a single architecture.

- Prepare for having the same system call numbers for any future calls. In
combination with the generated tables, this hopefully makes it easier to
add new calls across all architectures together.

All of the above are technically separate from the y2038 work, but are done
as preparation before we add the new 64-bit time_t system calls everywhere,
providing a common baseline set of system calls.

I expect that glibc and other libraries that want to use 64-bit time_t will
require linux-5.1 kernel headers for building in the future, and at a much
later point may also require linux-5.1 or a later version as the minimum
kernel at runtime. Having a common baseline then allows the removal of many
architecture or kernel version specific workarounds.

fd659cc0

07 Feb, 2019 13 commits

Merge tag 'platform-drivers-x86-v5.0-2' of git://git.infradead.org/linux-platform-drivers-x86 · 74e96711

Linus Torvalds authored Feb 07, 2019

Pull x86 platform driver fixlet from Darren Hart:
 "Correct Documentation/ABI 4.21 KernelVersion to 5.0"

* tag 'platform-drivers-x86-v5.0-2' of git://git.infradead.org/linux-platform-drivers-x86:
  Documentation/ABI: Correct mlxreg-io KernelVersion for 5.0

74e96711

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · e303a067

Linus Torvalds authored Feb 07, 2019

Pull KVM fixes from Paolo Bonzini:
 "Three security fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
  KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
  kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)

e303a067

Merge tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux · ee6c0737

Linus Torvalds authored Feb 07, 2019

Pull nfsd fixes from Bruce Fields:
 "Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug"

* tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Remove max_sge check at connect time
  nfsd: Fix error return values for nfsd4_clone_file_range()

ee6c0737

Merge tag 'for-5.0/dm-fixes-2' of... · 8b5cdbe5

Linus Torvalds authored Feb 07, 2019

Merge tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Both of these fixes address issues in changes merged for 5.0-rc4:

   - Fix DM core's missing memory barrier before waitqueue_active()
     calls.

   - Fix DM core's clone_bio() to work when cloning a subset of a bio
     with an integrity payload; bio_integrity_trim() wasn't getting
     called due to bio_trim()'s early return"

* tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: don't use bio_trim() afterall
  dm: add memory barrier before waitqueue_active

8b5cdbe5

KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) · ecec7688

Peter Shier authored Oct 11, 2018

Bugzilla: 1671904

There are multiple code paths where an hrtimer may have been started to
emulate an L1 VMX preemption timer that can result in a call to free_nested
without an intervening L2 exit where the hrtimer is normally
cancelled. Unconditionally cancel in free_nested to cover all cases.

Embargoed until Feb 7th 2019.
Signed-off-by: Peter Shier <pshier@google.com>
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@kernel.org
Message-Id: <20181011184646.154065-1-pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ecec7688

KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) · 353c0956

Paolo Bonzini authored Jan 29, 2019

Bugzilla: 1671930

Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with
memory operand, INVEPT, INVVPID) can incorrectly inject a page fault
when passed an operand that points to an MMIO address.  The page fault
will use uninitialized kernel stack memory as the CR2 and error code.

The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just
ensure that the error code and CR2 are zero.

Embargoed until Feb 7th 2019.
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

353c0956

kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974) · cfa39381

Jann Horn authored Jan 26, 2019

kvm_ioctl_create_device() does the following:

1. creates a device that holds a reference to the VM object (with a borrowed
   reference, the VM's refcount has not been bumped yet)
2. initializes the device
3. transfers the reference to the device to the caller's file descriptor table
4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real
   reference

The ownership transfer in step 3 must not happen before the reference to the VM
becomes a proper, non-borrowed reference, which only happens in step 4.
After step 3, an attacker can close the file descriptor and drop the borrowed
reference, which can cause the refcount of the kvm object to drop to zero.

This means that we need to grab a reference for the device before
anon_inode_getfd(), otherwise the VM can disappear from under us.

Fixes: 852b6d57 ("kvm: add device control API")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cfa39381

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · d47e3da1

Linus Torvalds authored Feb 07, 2019

Pull HID fix from Jiri Kosina:
 "A fix for a bug in hid-debug that can lock up the kernel in infinite
  loop (CVE-2019-3819), from Vladis Dronov"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: debug: fix the ring buffer implementation

d47e3da1

Merge tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 6f64e3a4

Linus Torvalds authored Feb 07, 2019

Pull sound fixes from Takashi Iwai:
 "A collection of a few small fixes.

  The most significant one is the fix for the possible race at loading
  HD-audio drivers. This has been present for long time and surfaced
  only in a rare occasion, but finally spotted out"

* tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
  ALSA: compress: Fix stop handling on compressed capture streams
  ALSA: usb-audio: Add support for new T+A USB DAC
  ALSA: hda - Serialize codec registrations
  ALSA: hda/realtek - Use a common helper for hp pin reference
  ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
  ALSA: hda/realtek - Headset microphone support for System76 darp5

6f64e3a4

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · b0314565

Linus Torvalds authored Feb 07, 2019

Pull virtio fixes from Michael Tsirkin:
 "A small fix for a uapi header, and a fix for VDPA for non-x86 guests"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: drop internal struct from UAPI
  virtio: support VIRTIO_F_ORDER_PLATFORM

b0314565

Merge tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 4879f116

Linus Torvalds authored Feb 07, 2019

Pull tracing fixes from Steven Rostedt:
 "This has two fixes for uprobe code.

   - Cut and paste fix to have uprobe printks say "uprobe" and not
     "kprobe"

   - Add terminating '\0' byte when copying function arguments"

* tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobes: Fix output for multiple string arguments
  tracing: uprobes: Fix typo in pr_fmt string

4879f116

Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 076a3f55

Linus Torvalds authored Feb 07, 2019

Pull fuse fixes from Miklos Szeredi:
 "A fix for a CUSE regression introduced in v4.20, as well as fixes for
  a couple of old bugs"

* tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: decrement NR_WRITEBACK_TEMP on the right page
  fuse: call pipe_buf_release() under pipe lock
  cuse: fix ioctl
  fuse: handle zero sized retrieve correctly

076a3f55

Merge tag 'pinctrl-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b66bc777

Linus Torvalds authored Feb 07, 2019

Pull pin control fixes from Linus Walleij:

 - Mediatek Kconfig fix

 - Sunxi regulator, IRQ banks and pin base fixup

 - Intel Cherryview Strago DMI workaround

 - Potential regmap problem on mcp23s08

* tag 'pinctrl-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
  pinctrl: mcp23s08: spi: Fix regmap allocation for mcp23s18
  pinctrl: cherryview: fix Strago DMI workaround
  pinctrl: sunxi: Consider pin_base when calculating regulator array index
  pinctrl: sunxi: Fix and simplify pin bank regulator handling
  pinctrl: mediatek: fix Kconfig build errors for moore core

b66bc777

06 Feb, 2019 16 commits

y2038: add 64-bit time_t syscalls to all 32-bit architectures · 48166e6e

Arnd Bergmann authored Jan 10, 2019

This adds 21 new system calls on each ABI that has 32-bit time_t
today. All of these have the exact same semantics as their existing
counterparts, and the new ones all have macro names that end in 'time64'
for clarification.

This gets us to the point of being able to safely use a C library
that has 64-bit time_t in user space. There are still a couple of
loose ends to tie up in various areas of the code, but this is the
big one, and should be entirely uncontroversial at this point.

In particular, there are four system calls (getitimer, setitimer,
waitid, and getrusage) that don't have a 64-bit counterpart yet,
but these can all be safely implemented in the C library by wrapping
around the existing system calls because the 32-bit time_t they
pass only counts elapsed time, not time since the epoch. They
will be dealt with later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>

48166e6e

y2038: rename old time and utime syscalls · d33c577c

Arnd Bergmann authored Jan 06, 2019

The time, stime, utime, utimes, and futimesat system calls are only
used on older architectures, and we do not provide y2038 safe variants
of them, as they are replaced by clock_gettime64, clock_settime64,
and utimensat_time64.

However, for consistency it seems better to have the 32-bit architectures
that still use them call the "time32" entry points (leaving the
traditional handlers for the 64-bit architectures), like we do for system
calls that now require two versions.

Note: We used to always define __ARCH_WANT_SYS_TIME and
__ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
__ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
mode. The resulting asm/unistd.h changes look a bit counterintuitive.

This is only a cleanup patch and it should not change any behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>

d33c577c

y2038: remove struct definition redirects · c70a772f

Arnd Bergmann authored Jan 07, 2019

We now use 64-bit time_t on all architectures, so the __kernel_timex,
__kernel_timeval and __kernel_timespec redirects can be removed
after having served their purpose.

This makes it all much less confusing, as the __kernel_* types
now always refer to the same layout based on 64-bit time_t across
all 32-bit and 64-bit architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

c70a772f

y2038: use time32 syscall names on 32-bit · 00bf25d6

Arnd Bergmann authored Jan 01, 2019

This is the big flip, where all 32-bit architectures set COMPAT_32BIT_TIME
and use the _time32 system calls from the former compat layer instead
of the system calls that take __kernel_timespec and similar arguments.

The temporary redirects for __kernel_timespec, __kernel_itimerspec
and __kernel_timex can get removed with this.

It would be easy to split this commit by architecture, but with the new
generated system call tables, it's easy enough to do it all at once,
which makes it a little easier to check that the changes are the same
in each table.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

00bf25d6

syscalls: remove obsolete __IGNORE_ macros · 805089c2

Arnd Bergmann authored Jan 11, 2019

These are all for ignoring the lack of obsolete system calls,
which have been marked the same way in scripts/checksyscall.sh,
so these can be removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>

805089c2

y2038: syscalls: rename y2038 compat syscalls · 8dabe724

Arnd Bergmann authored Jan 07, 2019

A lot of system calls that pass a time_t somewhere have an implementation
using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
been reworked so that this implementation can now be used on 32-bit
architectures as well.

The missing step is to redefine them using the regular SYSCALL_DEFINEx()
to get them out of the compat namespace and make it possible to build them
on 32-bit architectures.

Any system call that ends in 'time' gets a '32' suffix on its name for
that version, while the others get a '_time32' suffix, to distinguish
them from the normal version, which takes a 64-bit time argument in the
future.

In this step, only 64-bit architectures are changed, doing this rename
first lets us avoid touching the 32-bit architectures twice.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

8dabe724

x86/x32: use time64 versions of sigtimedwait and recvmmsg · 7948450d

Arnd Bergmann authored Jan 11, 2019

x32 has always followed the time64 calling conventions of these
syscalls, which required a special hack in compat_get_timespec
aka get_old_timespec32 to continue working.

Since we now have the time64 syscalls, use those explicitly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

7948450d

timex: change syscalls to use struct __kernel_timex · 3876ced4

Deepa Dinamani authored Jul 02, 2018

struct timex is not y2038 safe.
Switch all the syscall apis to use y2038 safe __kernel_timex.

Note that sys_adjtimex() does not have a y2038 safe solution.  C libraries
can implement it by calling clock_adjtime(CLOCK_REALTIME, ...).
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

3876ced4

timex: use __kernel_timex internally · ead25417

Deepa Dinamani authored Jul 02, 2018

struct timex is not y2038 safe.
Replace all uses of timex with y2038 safe __kernel_timex.

Note that struct __kernel_timex is an ABI interface definition.
We could define a new structure based on __kernel_timex that
is only available internally instead. Right now, there isn't
a strong motivation for this as the structure is isolated to
a few defined struct timex interfaces and such a structure would
be exactly the same as struct timex.

The patch was generated by the following coccinelle script:

virtual patch

@depends on patch forall@
identifier ts;
expression e;
@@
(
- struct timex ts;
+ struct __kernel_timex ts;
|
- struct timex ts = {};
+ struct __kernel_timex ts = {};
|
- struct timex ts = e;
+ struct __kernel_timex ts = e;
|
- struct timex *ts;
+ struct __kernel_timex *ts;
|
(memset \| copy_from_user \| copy_to_user \)(...,
- sizeof(struct timex))
+ sizeof(struct __kernel_timex))
)

@depends on patch forall@
identifier ts;
identifier fn;
@@
fn(...,
- struct timex *ts,
+ struct __kernel_timex *ts,
...) {
...
}

@depends on patch forall@
identifier ts;
identifier fn;
@@
fn(...,
- struct timex *ts) {
+ struct __kernel_timex *ts) {
...
}
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: linux-alpha@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ead25417

sparc64: add custom adjtimex/clock_adjtime functions · 1a596398

Arnd Bergmann authored Jan 03, 2019

sparc64 is the only architecture on Linux that has a 'timeval'
definition with a 32-bit tv_usec but a 64-bit tv_sec. This causes
problems for sparc32 compat mode when we convert it to use the
new __kernel_timex type that has the same layout as all other
64-bit architectures.

To avoid adding sparc64 specific code into the generic adjtimex
implementation, this adds a wrapper in the sparc64 system call handling
that converts the sparc64 'timex' into the new '__kernel_timex'.

At this point, the two structures are defined to be identical,
but that will change in the next step once we convert sparc32.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

1a596398

time: fix sys_timer_settime prototype · 50b93f30

Arnd Bergmann authored Jan 01, 2019

A small typo has crept into the y2038 conversion of the timer_settime
system call. So far this was completely harmless, but once we start
using the new version, this has to be fixed.

Fixes: 6ff84735 ("time: Change types to new y2038 safe __kernel_itimerspec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

50b93f30

time: Add struct __kernel_timex · 2c620ff9

Deepa Dinamani authored Jul 02, 2018

struct timex uses struct timeval internally.
struct timeval is not y2038 safe.
Introduce a new UAPI type struct __kernel_timex
that is y2038 safe.

struct __kernel_timex uses a timeval type that is
similar to struct __kernel_timespec which preserves the
same structure size across 32 bit and 64 bit ABIs.
struct __kernel_timex also restructures other members of the
structure to make the structure the same on 64 bit and 32 bit
architectures.
Note that struct __kernel_timex is the same as struct timex
on a 64 bit architecture.

The above solution is similar to other new y2038 syscalls
that are being introduced: both 32 bit and 64 bit ABIs
have a common entry, and the compat entry supports the old 32 bit
syscall interface.

Alternatives considered were:
1. Add new time type to struct timex that makes use of padded
   bits. This time type could be based on the struct __kernel_timespec.
   modes will use a flag to notify which time structure should be
   used internally.
   This needs some application level changes on both 64 bit and 32 bit
   architectures. Although 64 bit machines could continue to use the
   older timeval structure without any changes.

2. Add a new u8 type to struct timex that makes use of padded bits. This
   can be used to save higher order tv_sec bits. modes will use a flag to
   notify presence of such a type.
   This will need some application level changes on 32 bit architectures.

3. Add a new compat_timex structure that differs in only the size of the
   time type; keep rest of struct timex the same.
   This requires extra syscalls to manage all 3 cases on 64 bit
   architectures. This will not need any application level changes but will
   add more complexity from kernel side.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

2c620ff9

time: make adjtime compat handling available for 32 bit · 4d5f007e

Arnd Bergmann authored Jan 02, 2019

We want to reuse the compat_timex handling on 32-bit architectures the
same way we are using the compat handling for timespec when moving to
64-bit time_t.

Move all definitions related to compat_timex out of the compat code
into the normal timekeeping code, along with a rename to old_timex32,
corresponding to the timespec/timeval structures, and make it controlled
by CONFIG_COMPAT_32BIT_TIME, which 32-bit architectures will then select.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

4d5f007e

dm: don't use bio_trim() afterall · fa8db494

Mike Snitzer authored Feb 05, 2019

bio_trim() has an early return, which makes it _not_ idempotent, if the
offset is 0 and the bio's bi_size already matches the requested size.
Prior to DM, all users of bio_trim() were fine with this.  But DM has
exposed the fact that bio_trim()'s early return is incompatible with a
cloned bio whose integrity payload must be trimmed via
bio_integrity_trim().

Fix this by reverting DM back to doing the equivalent of bio_trim() but
in an idempotent manner (so bio_integrity_trim is always performed).

Follow-on work is needed to assess what benefit bio_trim()'s early
return is providing to its existing callers.
Reported-by: Milan Broz <gmazyland@gmail.com>
Fixes: 57c36519 ("dm: fix clone_bio() to trigger blk_recount_segments()")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

fa8db494

dm: add memory barrier before waitqueue_active · 645efa84

Mikulas Patocka authored Feb 05, 2019

Block core changes to switch bio-based IO accounting to be percpu had a
side-effect of altering DM core to now rely on calling waitqueue_active
(in both bio-based and request-based) to check if another task is in
dm_wait_for_completion().

A memory barrier is needed before calling waitqueue_active().  DM core
doesn't piggyback on a preceding memory barrier so it must explicitly
use its own.

For more details on why using waitqueue_active() without a preceding
barrier is unsafe, please see the comment before the waitqueue_active()
definition in include/linux/wait.h.

Add the missing memory barrier by switching to using wq_has_sleeper().

Fixes: 6f757231 ("dm: remove the pending IO accounting")
Fixes: c4576aed ("dm: fix request-based dm's use of dm_wait_for_completion")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

645efa84

svcrdma: Remove max_sge check at connect time · e248aa7b

Chuck Lever authored Jan 25, 2019

Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec69 ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.

Apparently my memory is going, because some time later, I submitted
commit 25fd86ec ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0e ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.

The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.

Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.
Reported-by: Don Dutile <ddutile@redhat.com>
Fixes: 25fd86ec ("svcrdma: Don't overrun the SGE array in ...")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e248aa7b