Commits · e91313591b29ce724fe2f1bdf29f2482878fc275 · Kirill Smelkov / linux

16 Apr, 2024 8 commits

dlm: use rwlock for rsb hash table · e9131359

Alexander Aring authored Apr 15, 2024

The conversion to rhashtable introduced a hash table lock per lockspace,
in place of per bucket locks.  To make this more scalable, switch to
using a rwlock for hash table access.  The common case fast path uses
it as a read lock.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

e9131359

dlm: drop dlm_scand kthread and use timers · b1f2381c

Alexander Aring authored Apr 15, 2024

Currently the scand kthread acts like a garbage collection for expired
rsbs on toss list, to clean them up after a certain timeout. It triggers
every couple of seconds and iterates over the toss list while holding
ls_rsbtbl_lock for the whole hash bucket iteration.

To reduce the amount of time holding ls_rsbtbl_lock, we now handle the
disposal of expired rsbs using a per-lockspace timer that expires for the
earliest tossed rsb on the lockspace toss queue. This toss queue is
ordered according to the rsb res_toss_time with the earliest tossed rsb
as the first entry. The toss timer will only trylock() necessary locks,
since it is low priority garbage collection, and will rearm the timer
if trylock() fails. If the timer function does not find any expired
rsb's, it rearms the timer with the next earliest expired rsb.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

b1f2381c

dlm: do not use ref counts for rsb in the toss state · 6644925a

Alexander Aring authored Apr 15, 2024

In the past we had problems when an rsb had a reference counter greater
than one while in the toss state. An rsb in the toss state is not
actively used for locking, and should not have any other references
apart from the single ref keeping it on the rsb hash. Shift to freeing
rsb's directly rather than using kref_put to free them, since the ref
counting is not meant to be used in this state. Add warnings if ref
counting is seen while an rsb is in the toss state.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

6644925a

dlm: switch to use rhashtable for rsbs · 6c648035

Alexander Aring authored Apr 15, 2024

Replace our own hash table with the more advanced rhashtable
for keeping rsb structs.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

6c648035

dlm: add rsb lists for iteration · 93a693d1

Alexander Aring authored Apr 15, 2024

To prepare for using rhashtable, add two rsb lists for iterating
through rsb's in two uncommon cases where this is necesssary:
- when dumping rsb state from debugfs, now using seq_list.
- when looking at all rsb's during recovery.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

93a693d1

dlm: merge toss and keep hash table lists into one list · 2d903540

Alexander Aring authored Apr 15, 2024

There are several places where lock processing can perform two hash table
lookups, first in the "keep" list, and if not found, in the "toss" list.
This patch introduces a new rsb state flag "RSB_TOSS" to represent the
difference between the state of being on keep vs toss list, so that the
two lists can be combined. This avoids cases of two lookups.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

2d903540

dlm: change to single hashtable lock · dcdaad05

Alexander Aring authored Apr 15, 2024

Prepare to replace our own hash table with rhashtable by replacing
the per-bucket locks in our own hash table with a single lock.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dcdaad05

dlm: increment ls_count for dlm_scand · 700b0480

Alexander Aring authored Apr 15, 2024

Increment the ls_count value while dlm_scand is processing a
lockspace so that release_lockspace()/remove_lockspace() will
wait for dlm_scand to finish.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

700b0480

09 Apr, 2024 14 commits

dlm: do message processing in softirq context · 92d59adf

Alexander Aring authored Apr 02, 2024

Move dlm message processing from an ordered workqueue context to an
ordered softirq context. Handling dlm messages in softirq will allow
requests to be cleared more quickly and efficiently, and should avoid
longer queues of incomplete requests. Later patches are expected to
run completion/blocking callbacks directly from this message processing
context, further reducing context switches required to complete a request.
In the longer term, concurrent message processing could be implemented.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

92d59adf

dlm: use spin_lock_bh for message processing · 578acf9a

Alexander Aring authored Apr 02, 2024

Use spin_lock_bh for all spinlocks involved in message processing,
in preparation for softirq message processing.  DLM lock requests
from user space involve dlm processing in user context, in addition
to the standard kernel context, necessitating bh variants.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

578acf9a

dlm: remove schedule in receive path · 308533b4

Alexander Aring authored Apr 02, 2024

Remove an explicit schedule() call in the message processing path,
in preparation for softirq message processing.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

308533b4

dlm: convert ls_recv_active from rw_semaphore to rwlock · d52c9b8f

Alexander Aring authored Apr 02, 2024

Convert ls_recv_active rw_semaphore to an rwlock to avoid
sleeping, in preparation for softirq message processing.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

d52c9b8f

dlm: avoid blocking receive at the end of recovery · c288745f

Alexander Aring authored Apr 02, 2024

The end of the recovery process transitioned to normal message
processing by temporarily blocking the receiving context,
processing saved messages, then unblocking the receiving
context.  To avoid blocking the receiving context, the old
wait_queue and mutex are replaced by a new rwlock and the new
RECV_MSG_BLOCKED flag.  Received messages are added to the
list of saved messages, protected by the rwlock, until the
flag is cleared, which happens when all saved messages have
been processed.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

c288745f

dlm: convert res_lock to spinlock · cc396e23

Alexander Aring authored Apr 02, 2024

Convert the rsb struct res_lock from a mutex to a spinlock
in preparation for processing messages in softirq context.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

cc396e23

dlm: convert ls_waiters_mutex to spinlock · 097691db

Alexander Aring authored Apr 02, 2024

Convert the waiters mutex to a spinlock in prepration for
processing messages in softirq context.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

097691db

dlm: drop mutex use in waiters recovery · 6b52ea79

Alexander Aring authored Apr 02, 2024

The waiters_mutex no longer needs to be used in the waiters recovery
functions dlm_recover_waiters_pre() and dlm_recover_waiters_pre().
During recovery, ordinary locking operations are paused, and the
recovery thread is the only context accessing the waiters list,
so the lock is not needed.

Access to the waiters list from debugfs functions is avoided by
taking the top level recovery lock in the debugfs dump function.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

6b52ea79

dlm: add new struct to save position in dlm_copy_master_names · 3ae67760

Alexander Aring authored Apr 02, 2024

Add a new struct to save the current position in the rsb masters_list
while sending the rsb names to other nodes. The rsb names are sent in
multiple chunks, and for each new chunk, the new "dlm_dir_dump" struct
saves the last position in the masters_list. The new struct is also
used to save more information to sanity check the recovery process.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

3ae67760

dlm: move rsb root_list to ls_recover() stack · 3a747f4a

Alexander Aring authored Apr 02, 2024

Move the rsb root_list from the lockspace to a stack variable since
it is now only used by the ls_recover() function.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

3a747f4a

dlm: use a new list for recovery of master rsb names · aff46e0f

Alexander Aring authored Apr 02, 2024

Add a new "masters_list" for master rsb structs, with a new
rwlock. The new list is created and used during the recovery
process to send the master rsb names to new nodes. With this
change, the current "root_list" can be used without locking.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

aff46e0f

dlm: move root_list functionality to recover.c · 29e345f3

Alexander Aring authored Apr 02, 2024

Move dlm_create_root_list() and dlm_release_root_list() to
recover.c and declare them static because they are only used
there.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

29e345f3

dlm: switch to GFP_ATOMIC in dlm allocations · 455597a5

Alexander Aring authored Apr 02, 2024

Replace GFP_NOFS with GFP_ATOMIC.  Also stop using idr_preload which
uses a non-bh spin_lock.  This is further preparation for softirq
message processing.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

455597a5

dlm: remove allocation parameter in msg allocation · 98808644

Alexander Aring authored Apr 02, 2024

Remove the context parameter for message allocations and
always use GFP_ATOMIC. This prepares for softirq message
processing.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

98808644

02 Apr, 2024 1 commit

dlm: Simplify the allocation of slab caches in dlm_lowcomms_msg_cache_create · ca0dcef7

Kunwu Chan authored Apr 02, 2024

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

ca0dcef7

01 Apr, 2024 8 commits

dlm: remove callback reference counting · 2bec1bbd

Alexander Aring authored Mar 28, 2024

Get rid of the unnecessary refcounting on callback structs.
Copy interesting callback info into the lkb struct rather
than maintaining pointers to callback structs from the lkb.
This goes back to the way things were done prior to
commit 61bed0ba ("fs: dlm: use a non-static queue for callbacks").
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

2bec1bbd

dlm: fix race between final callback and remove · 986ae3c2

Alexander Aring authored Mar 28, 2024

This patch fixes the following issue:

node 1 is dir
node 2 is master
node 3 is other

1->2: unlock
2: put final lkb, rsb moved to toss
2->1: unlock_reply
1: queue lkb callback with EUNLOCK
2->1: remove
1: receive_remove ignored (rsb on keep because of queued lkb callback)
1: complete lkb callback, put_lkb, move rsb to toss
3->1: lookup
1->3: lookup_reply master=2
3->2: request
2->3: request_reply EBADR

In summary:
An unexpected lkb reference causes the rsb to remain on the wrong list.
The rsb being on the wrong list causes receive_remove to be ignored.
An ignored receive_remove causes inconsistent dir and master state.

This sequence requires an unusually long delay in delivering the unlock
callback, because the remove message from 2->1 usually happens after
some seconds.  So, it's not known exactly how frequently this sequence
occurs in pratice.  It's possible that the same end result could also
have another unknown cause.

The solution for this issue is to further separate callback state
from the lkb, so that an lkb reference (and from that, an rsb ref)
are not held while a callback remains queued.  Then, within the
unlock_reply, the lkb will be freed and the rsb moved to the toss
list. So, the receive_remove will not be ignored.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

986ae3c2

dlm: combine switch case fail and default statements · 0175e51b

Alexander Aring authored Mar 28, 2024

This patch combines the failure and default cases for enqueue and
dequeue a callback to the lkb callback queue that should end in both
cases as it should never happen.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

0175e51b

dlm: save callback debug info earlier · 4ed42428

Alexander Aring authored Mar 28, 2024

Save lkb callback info when queueing the callback so that the
lkb struct is not needed in the callback workqueue processing.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

4ed42428

dlm: remove callback queue debugfs functionality · 16e98462

Alexander Aring authored Mar 28, 2024

Remove the ability to dump pending lkb callbacks from debugfs.
The prepares for separating lkb structs from callbacks.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

16e98462

dlm: remove lkb from callback tracepoints · 1131f339

Alexander Aring authored Mar 28, 2024

Stop using lkb structs in the callback tracepoints so that lkb
references are not needed. This prepares for separating lkb
structs from callbacks.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

1131f339

dlm: Simplify the allocation of slab caches in dlm_midcomms_cache_create · 609ed5bd

Kunwu Chan authored Mar 28, 2024

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

609ed5bd

dlm: fix user space lock decision to copy lvb · ad191e0e

Alexander Aring authored Mar 28, 2024

This patch fixes the copy lvb decision for user space lock requests.
Checking dlm_lvb_operations is done earlier, where granted/requested
lock modes are available to use in the matrix.

The decision had been moved to the wrong location, where granted mode
and requested mode where the same, which causes the dlm_lvb_operations
matix to produce the wrong copy decision. For PW or EX requests, the
caller could get invalid lvb data.

Fixes: 61bed0ba ("fs: dlm: use a non-static queue for callbacks")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

ad191e0e

31 Mar, 2024 9 commits

Linux 6.9-rc2 · 39cd87c4
Linus Torvalds authored Mar 31, 2024

39cd87c4

Merge tag 'kbuild-fixes-v6.9' of... · 7e40c210

Linus Torvalds authored Mar 31, 2024

Merge tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Deduplicate Kconfig entries for CONFIG_CXL_PMU

 - Fix unselectable choice entry in MIPS Kconfig, and forbid this
   structure

 - Remove unused include/asm-generic/export.h

 - Fix a NULL pointer dereference bug in modpost

 - Enable -Woverride-init warning consistently with W=1

 - Drop KCSAN flags from *.mod.c files

* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: Fix typo HEIGTH to HEIGHT
  Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
  kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
  kbuild: make -Woverride-init warnings more consistent
  modpost: do not make find_tosym() return NULL
  export.h: remove include/asm-generic/export.h
  kconfig: do not reparent the menu inside a choice block
  MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
  cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig

7e40c210

Merge tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 18737353

Linus Torvalds authored Mar 31, 2024

Pull EDAC fixes from Borislav Petkov:

 - Fix more issues in the AMD FMPM driver

* tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  RAS: Avoid build errors when CONFIG_DEBUG_FS=n
  RAS/AMD/FMPM: Safely handle saved records of various sizes
  RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()

18737353

Merge tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5dad2623

Linus Torvalds authored Mar 31, 2024

Pull irq fixes from Borislav Petkov:

 - Fix an unused function warning on irqchip/irq-armada-370-xp

 - Fix the IRQ sharing with pinctrl-amd and ACPI OSL

* tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/armada-370-xp: Suppress unused-function warning
  genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd

5dad2623

Merge tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 448f828f

Linus Torvalds authored Mar 31, 2024

Pull x86 perf fixes from Borislav Petkov:

 - Define the correct set of default hw events on AMD Zen4

 - Use the correct stalled cycles PMCs on AMD Zen2 and newer

 - Fix detection of the LBR freeze feature on AMD

* tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
  perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
  perf/x86/amd/lbr: Use freeze based on availability
  x86/cpufeatures: Add new word for scattered features

448f828f

Merge tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8d338df7

Linus Torvalds authored Mar 31, 2024

Pull timers update from Borislav Petkov:

 - Volunteer in Anna-Maria and Frederic as timers co-maintainers so that
   tglx can relax more :-P

* tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add co-maintainers for time[rs]

8d338df7

Merge tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8d7e7684

Linus Torvalds authored Mar 31, 2024

Pull objtool fix from Borislav Petkov:

 - Fix a format specifier build error in objtool during an x32 build

* tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix compile failure when using the x32 compiler

8d7e7684

Merge tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1aac9cb7

Linus Torvalds authored Mar 31, 2024

Pull x86 fixes from Borislav Petkov:

 - Make sure single object builds in arch/x86/virt/ ala
      make ... arch/x86/virt/vmx/tdx/seamcall.o
   work again

 - Do not do ROM range scans and memory validation when the kernel is
   running as a SEV-SNP guest as those can get problematic and, before
   that, are not really needed in such a guest

 - Exclude the build-time generated vdso-image-x32.o object from objtool
   validation and in particular the return sites in there due to a
   warning which fires when an unpatched return thunk is being used

 - Improve the NMI CPUs stall message to show additional information
   about the state of each CPU wrt the NMI handler

 - Enable gcc named address spaces support only on !KCSAN configs due to
   compiler options incompatibility

 - Revert a change which was trying to use GB pages for mapping regions
   only when the regions would be large enough but that change lead to
   kexec failing

 - A documentation fixlet

* tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Use obj-y to descend into arch/x86/virt/
  x86/sev: Skip ROM range scans and validation for SEV-SNP guests
  x86/vdso: Fix rethunk patching for vdso-image-x32.o too
  x86/nmi: Upgrade NMI backtrace stall checks & messages
  x86/percpu: Disable named address spaces for KCSAN
  Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
  Documentation/x86: Fix title underline length

1aac9cb7

kconfig: Fix typo HEIGTH to HEIGHT · 89e5462b

Isak Ellmer authored Mar 30, 2024

Fixed a typo in some variables where height was misspelled as heigth.
Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

89e5462b