Commit 61387b8d authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper VDO target from Mike Snitzer:
 "Introduce the DM vdo target which provides block-level deduplication,
  compression, and thin provisioning. Please see:

      Documentation/admin-guide/device-mapper/vdo.rst
      Documentation/admin-guide/device-mapper/vdo-design.rst

  The DM vdo target handles its concurrency by pinning an IO, and
  subsequent stages of handling that IO, to a particular VDO thread.
  This aspect of VDO is "unique" but its overall implementation is very
  tightly coupled to its mostly lockless threading model. As such, VDO
  is not easily changed to use more traditional finer-grained locking
  and Linux workqueues. Please see the "Zones and Threading" section of
  vdo-design.rst

  The DM vdo target has been used in production for many years but has
  seen significant changes over the past ~6 years to prepare it for
  upstream inclusion. The codebase is still large but it is isolated to
  drivers/md/dm-vdo/ and has been made considerably more approachable
  and maintainable.

  Matt Sakai has been added to the MAINTAINERS file to reflect that he
  will send VDO changes upstream through the DM subsystem maintainers"

* tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (142 commits)
  dm vdo: document minimum metadata size requirements
  dm vdo: remove meaningless version number constant
  dm vdo: remove vdo_perform_once
  dm vdo block-map: Remove stray semicolon
  dm vdo string-utils: change from uds_ to vdo_ namespace
  dm vdo logger: change from uds_ to vdo_ namespace
  dm vdo funnel-queue: change from uds_ to vdo_ namespace
  dm vdo indexer: fix use after free
  dm vdo logger: remove log level to string conversion code
  dm vdo: document log_level parameter
  dm vdo: add 'log_level' module parameter
  dm vdo: remove all sysfs interfaces
  dm vdo target: eliminate inappropriate uses of UDS_SUCCESS
  dm vdo indexer: update ASSERT and ASSERT_LOG_ONLY usage
  dm vdo encodings: update some stale comments
  dm vdo permassert: audit all of ASSERT to test for VDO_SUCCESS
  dm-vdo funnel-workqueue: return VDO_SUCCESS from make_simple_work_queue
  dm vdo thread-utils: return VDO_SUCCESS on vdo_create_thread success
  dm vdo int-map: return VDO_SUCCESS on success
  dm vdo: check for VDO_SUCCESS return value from memory-alloc functions
  ...
parents c0499a08 cb824724
......@@ -34,6 +34,8 @@ Device Mapper
switch
thin-provisioning
unstriped
vdo-design
vdo
verity
writecache
zero
......
This diff is collapsed.
This diff is collapsed.
......@@ -6134,6 +6134,14 @@ F: include/linux/device-mapper.h
F: include/linux/dm-*.h
F: include/uapi/linux/dm-*.h
DEVICE-MAPPER VDO TARGET
M: Matthew Sakai <msakai@redhat.com>
M: dm-devel@lists.linux.dev
L: dm-devel@lists.linux.dev
S: Maintained
F: Documentation/admin-guide/device-mapper/vdo*.rst
F: drivers/md/dm-vdo/
DEVLINK
M: Jiri Pirko <jiri@resnulli.us>
L: netdev@vger.kernel.org
......
......@@ -634,4 +634,6 @@ config DM_AUDIT
Enables audit logging of several security relevant events in the
particular device-mapper targets, especially the integrity target.
source "drivers/md/dm-vdo/Kconfig"
endif # MD
......@@ -68,6 +68,7 @@ obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_VDO) += dm-vdo/
obj-$(CONFIG_DM_CACHE) += dm-cache.o
obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o
obj-$(CONFIG_DM_EBS) += dm-ebs.o
......
# SPDX-License-Identifier: GPL-2.0-only
config DM_VDO
tristate "VDO: deduplication and compression target"
depends on 64BIT
depends on BLK_DEV_DM
select DM_BUFIO
select LZ4_COMPRESS
select LZ4_DECOMPRESS
help
This device mapper target presents a block device with
deduplication, compression and thin-provisioning.
To compile this code as a module, choose M here: the module will
be called dm-vdo.
If unsure, say N.
# SPDX-License-Identifier: GPL-2.0-only
ccflags-y := -I$(srctree)/$(src) -I$(srctree)/$(src)/indexer
obj-$(CONFIG_DM_VDO) += dm-vdo.o
dm-vdo-objs := \
action-manager.o \
admin-state.o \
block-map.o \
completion.o \
data-vio.o \
dedupe.o \
dm-vdo-target.o \
dump.o \
encodings.o \
errors.o \
flush.o \
funnel-queue.o \
funnel-workqueue.o \
int-map.o \
io-submitter.o \
logger.o \
logical-zone.o \
memory-alloc.o \
message-stats.o \
murmurhash3.o \
packer.o \
permassert.o \
physical-zone.o \
priority-table.o \
recovery-journal.o \
repair.o \
slab-depot.o \
status-codes.o \
string-utils.o \
thread-device.o \
thread-registry.o \
thread-utils.o \
vdo.o \
vio.o \
wait-queue.o \
indexer/chapter-index.o \
indexer/config.o \
indexer/delta-index.o \
indexer/funnel-requestqueue.o \
indexer/geometry.o \
indexer/index.o \
indexer/index-layout.o \
indexer/index-page-map.o \
indexer/index-session.o \
indexer/io-factory.o \
indexer/open-chapter.o \
indexer/radix-sort.o \
indexer/sparse-cache.o \
indexer/volume.o \
indexer/volume-index.o
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_ACTION_MANAGER_H
#define VDO_ACTION_MANAGER_H
#include "admin-state.h"
#include "types.h"
/*
* An action_manager provides a generic mechanism for applying actions to multi-zone entities (such
* as the block map or slab depot). Each action manager is tied to a specific context for which it
* manages actions. The manager ensures that only one action is active on that context at a time,
* and supports at most one pending action. Calls to schedule an action when there is already a
* pending action will result in VDO_COMPONENT_BUSY errors. Actions may only be submitted to the
* action manager from a single thread (which thread is determined when the action manager is
* constructed).
*
* A scheduled action consists of four components:
*
* preamble
* an optional method to be run on the initiator thread before applying the action to all zones
* zone_action
* an optional method to be applied to each of the zones
* conclusion
* an optional method to be run on the initiator thread once the per-zone method has been
* applied to all zones
* parent
* an optional completion to be finished once the conclusion is done
*
* At least one of the three methods must be provided.
*/
/*
* A function which is to be applied asynchronously to a set of zones.
* @context: The object which holds the per-zone context for the action
* @zone_number: The number of zone to which the action is being applied
* @parent: The object to notify when the action is complete
*/
typedef void (*vdo_zone_action_fn)(void *context, zone_count_t zone_number,
struct vdo_completion *parent);
/*
* A function which is to be applied asynchronously on an action manager's initiator thread as the
* preamble of an action.
* @context: The object which holds the per-zone context for the action
* @parent: The object to notify when the action is complete
*/
typedef void (*vdo_action_preamble_fn)(void *context, struct vdo_completion *parent);
/*
* A function which will run on the action manager's initiator thread as the conclusion of an
* action.
* @context: The object which holds the per-zone context for the action
*
* Return: VDO_SUCCESS or an error
*/
typedef int (*vdo_action_conclusion_fn)(void *context);
/*
* A function to schedule an action.
* @context: The object which holds the per-zone context for the action
*
* Return: true if an action was scheduled
*/
typedef bool (*vdo_action_scheduler_fn)(void *context);
/*
* A function to get the id of the thread associated with a given zone.
* @context: The action context
* @zone_number: The number of the zone for which the thread ID is desired
*/
typedef thread_id_t (*vdo_zone_thread_getter_fn)(void *context, zone_count_t zone_number);
struct action_manager;
int __must_check vdo_make_action_manager(zone_count_t zones,
vdo_zone_thread_getter_fn get_zone_thread_id,
thread_id_t initiator_thread_id, void *context,
vdo_action_scheduler_fn scheduler,
struct vdo *vdo,
struct action_manager **manager_ptr);
const struct admin_state_code *__must_check
vdo_get_current_manager_operation(struct action_manager *manager);
void * __must_check vdo_get_current_action_context(struct action_manager *manager);
bool vdo_schedule_default_action(struct action_manager *manager);
bool vdo_schedule_action(struct action_manager *manager, vdo_action_preamble_fn preamble,
vdo_zone_action_fn action, vdo_action_conclusion_fn conclusion,
struct vdo_completion *parent);
bool vdo_schedule_operation(struct action_manager *manager,
const struct admin_state_code *operation,
vdo_action_preamble_fn preamble, vdo_zone_action_fn action,
vdo_action_conclusion_fn conclusion,
struct vdo_completion *parent);
bool vdo_schedule_operation_with_context(struct action_manager *manager,
const struct admin_state_code *operation,
vdo_action_preamble_fn preamble,
vdo_zone_action_fn action,
vdo_action_conclusion_fn conclusion,
void *context, struct vdo_completion *parent);
#endif /* VDO_ACTION_MANAGER_H */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_ADMIN_STATE_H
#define VDO_ADMIN_STATE_H
#include "completion.h"
#include "types.h"
struct admin_state_code {
const char *name;
/* Normal operation, data_vios may be active */
bool normal;
/* I/O is draining, new requests should not start */
bool draining;
/* This is a startup time operation */
bool loading;
/* The next state will be quiescent */
bool quiescing;
/* The VDO is quiescent, there should be no I/O */
bool quiescent;
/* Whether an operation is in progress and so no other operation may be started */
bool operating;
};
extern const struct admin_state_code *VDO_ADMIN_STATE_NORMAL_OPERATION;
extern const struct admin_state_code *VDO_ADMIN_STATE_OPERATING;
extern const struct admin_state_code *VDO_ADMIN_STATE_FORMATTING;
extern const struct admin_state_code *VDO_ADMIN_STATE_PRE_LOADING;
extern const struct admin_state_code *VDO_ADMIN_STATE_PRE_LOADED;
extern const struct admin_state_code *VDO_ADMIN_STATE_LOADING;
extern const struct admin_state_code *VDO_ADMIN_STATE_LOADING_FOR_RECOVERY;
extern const struct admin_state_code *VDO_ADMIN_STATE_LOADING_FOR_REBUILD;
extern const struct admin_state_code *VDO_ADMIN_STATE_WAITING_FOR_RECOVERY;
extern const struct admin_state_code *VDO_ADMIN_STATE_NEW;
extern const struct admin_state_code *VDO_ADMIN_STATE_INITIALIZED;
extern const struct admin_state_code *VDO_ADMIN_STATE_RECOVERING;
extern const struct admin_state_code *VDO_ADMIN_STATE_REBUILDING;
extern const struct admin_state_code *VDO_ADMIN_STATE_SAVING;
extern const struct admin_state_code *VDO_ADMIN_STATE_SAVED;
extern const struct admin_state_code *VDO_ADMIN_STATE_SCRUBBING;
extern const struct admin_state_code *VDO_ADMIN_STATE_SAVE_FOR_SCRUBBING;
extern const struct admin_state_code *VDO_ADMIN_STATE_STOPPING;
extern const struct admin_state_code *VDO_ADMIN_STATE_STOPPED;
extern const struct admin_state_code *VDO_ADMIN_STATE_SUSPENDING;
extern const struct admin_state_code *VDO_ADMIN_STATE_SUSPENDED;
extern const struct admin_state_code *VDO_ADMIN_STATE_SUSPENDED_OPERATION;
extern const struct admin_state_code *VDO_ADMIN_STATE_RESUMING;
struct admin_state {
const struct admin_state_code *current_state;
/* The next administrative state (when the current operation finishes) */
const struct admin_state_code *next_state;
/* A completion waiting on a state change */
struct vdo_completion *waiter;
/* Whether an operation is being initiated */
bool starting;
/* Whether an operation has completed in the initiator */
bool complete;
};
/**
* typedef vdo_admin_initiator_fn - A method to be called once an admin operation may be initiated.
*/
typedef void (*vdo_admin_initiator_fn)(struct admin_state *state);
static inline const struct admin_state_code * __must_check
vdo_get_admin_state_code(const struct admin_state *state)
{
return READ_ONCE(state->current_state);
}
/**
* vdo_set_admin_state_code() - Set the current admin state code.
*
* This function should be used primarily for initialization and by adminState internals. Most uses
* should go through the operation interfaces.
*/
static inline void vdo_set_admin_state_code(struct admin_state *state,
const struct admin_state_code *code)
{
WRITE_ONCE(state->current_state, code);
}
static inline bool __must_check vdo_is_state_normal(const struct admin_state *state)
{
return vdo_get_admin_state_code(state)->normal;
}
static inline bool __must_check vdo_is_state_suspending(const struct admin_state *state)
{
return (vdo_get_admin_state_code(state) == VDO_ADMIN_STATE_SUSPENDING);
}
static inline bool __must_check vdo_is_state_saving(const struct admin_state *state)
{
return (vdo_get_admin_state_code(state) == VDO_ADMIN_STATE_SAVING);
}
static inline bool __must_check vdo_is_state_saved(const struct admin_state *state)
{
return (vdo_get_admin_state_code(state) == VDO_ADMIN_STATE_SAVED);
}
static inline bool __must_check vdo_is_state_draining(const struct admin_state *state)
{
return vdo_get_admin_state_code(state)->draining;
}
static inline bool __must_check vdo_is_state_loading(const struct admin_state *state)
{
return vdo_get_admin_state_code(state)->loading;
}
static inline bool __must_check vdo_is_state_resuming(const struct admin_state *state)
{
return (vdo_get_admin_state_code(state) == VDO_ADMIN_STATE_RESUMING);
}
static inline bool __must_check vdo_is_state_clean_load(const struct admin_state *state)
{
const struct admin_state_code *code = vdo_get_admin_state_code(state);
return ((code == VDO_ADMIN_STATE_FORMATTING) || (code == VDO_ADMIN_STATE_LOADING));
}
static inline bool __must_check vdo_is_state_quiescing(const struct admin_state *state)
{
return vdo_get_admin_state_code(state)->quiescing;
}
static inline bool __must_check vdo_is_state_quiescent(const struct admin_state *state)
{
return vdo_get_admin_state_code(state)->quiescent;
}
bool __must_check vdo_assert_load_operation(const struct admin_state_code *operation,
struct vdo_completion *waiter);
bool vdo_start_loading(struct admin_state *state,
const struct admin_state_code *operation,
struct vdo_completion *waiter, vdo_admin_initiator_fn initiator);
bool vdo_finish_loading(struct admin_state *state);
bool vdo_finish_loading_with_result(struct admin_state *state, int result);
bool vdo_start_resuming(struct admin_state *state,
const struct admin_state_code *operation,
struct vdo_completion *waiter, vdo_admin_initiator_fn initiator);
bool vdo_finish_resuming(struct admin_state *state);
bool vdo_finish_resuming_with_result(struct admin_state *state, int result);
int vdo_resume_if_quiescent(struct admin_state *state);
bool vdo_start_draining(struct admin_state *state,
const struct admin_state_code *operation,
struct vdo_completion *waiter, vdo_admin_initiator_fn initiator);
bool vdo_finish_draining(struct admin_state *state);
bool vdo_finish_draining_with_result(struct admin_state *state, int result);
int vdo_start_operation(struct admin_state *state,
const struct admin_state_code *operation);
int vdo_start_operation_with_waiter(struct admin_state *state,
const struct admin_state_code *operation,
struct vdo_completion *waiter,
vdo_admin_initiator_fn initiator);
bool vdo_finish_operation(struct admin_state *state, int result);
#endif /* VDO_ADMIN_STATE_H */
This diff is collapsed.
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2023 Red Hat
*/
#include "completion.h"
#include <linux/kernel.h>
#include "logger.h"
#include "permassert.h"
#include "status-codes.h"
#include "types.h"
#include "vio.h"
#include "vdo.h"
/**
* DOC: vdo completions.
*
* Most of vdo's data structures are lock free, each either belonging to a single "zone," or
* divided into a number of zones whose accesses to the structure do not overlap. During normal
* operation, at most one thread will be operating in any given zone. Each zone has a
* vdo_work_queue which holds vdo_completions that are to be run in that zone. A completion may
* only be enqueued on one queue or operating in a single zone at a time.
*
* At each step of a multi-threaded operation, the completion performing the operation is given a
* callback, error handler, and thread id for the next step. A completion is "run" when it is
* operating on the correct thread (as specified by its callback_thread_id). If the value of its
* "result" field is an error (i.e. not VDO_SUCCESS), the function in its "error_handler" will be
* invoked. If the error_handler is NULL, or there is no error, the function set as its "callback"
* will be invoked. Generally, a completion will not be run directly, but rather will be
* "launched." In this case, it will check whether it is operating on the correct thread. If it is,
* it will run immediately. Otherwise, it will be enqueue on the vdo_work_queue associated with the
* completion's "callback_thread_id". When it is dequeued, it will be on the correct thread, and
* will get run. In some cases, the completion should get queued instead of running immediately,
* even if it is being launched from the correct thread. This is usually in cases where there is a
* long chain of callbacks, all on the same thread, which could overflow the stack. In such cases,
* the completion's "requeue" field should be set to true. Doing so will skip the current thread
* check and simply enqueue the completion.
*
* A completion may be "finished," in which case its "complete" field will be set to true before it
* is next run. It is a bug to attempt to set the result or re-finish a finished completion.
* Because a completion's fields are not safe to examine from any thread other than the one on
* which the completion is currently operating, this field is used only to aid in detecting
* programming errors. It can not be used for cross-thread checking on the status of an operation.
* A completion must be "reset" before it can be reused after it has been finished. Resetting will
* also clear any error from the result field.
**/
void vdo_initialize_completion(struct vdo_completion *completion,
struct vdo *vdo,
enum vdo_completion_type type)
{
memset(completion, 0, sizeof(*completion));
completion->vdo = vdo;
completion->type = type;
vdo_reset_completion(completion);
}
static inline void assert_incomplete(struct vdo_completion *completion)
{
VDO_ASSERT_LOG_ONLY(!completion->complete, "completion is not complete");
}
/**
* vdo_set_completion_result() - Set the result of a completion.
*
* Older errors will not be masked.
*/
void vdo_set_completion_result(struct vdo_completion *completion, int result)
{
assert_incomplete(completion);
if (completion->result == VDO_SUCCESS)
completion->result = result;
}
/**
* vdo_launch_completion_with_priority() - Run or enqueue a completion.
* @priority: The priority at which to enqueue the completion.
*
* If called on the correct thread (i.e. the one specified in the completion's callback_thread_id
* field) and not marked for requeue, the completion will be run immediately. Otherwise, the
* completion will be enqueued on the specified thread.
*/
void vdo_launch_completion_with_priority(struct vdo_completion *completion,
enum vdo_completion_priority priority)
{
thread_id_t callback_thread = completion->callback_thread_id;
if (completion->requeue || (callback_thread != vdo_get_callback_thread_id())) {
vdo_enqueue_completion(completion, priority);
return;
}
vdo_run_completion(completion);
}
/** vdo_finish_completion() - Mark a completion as complete and then launch it. */
void vdo_finish_completion(struct vdo_completion *completion)
{
assert_incomplete(completion);
completion->complete = true;
if (completion->callback != NULL)
vdo_launch_completion(completion);
}
void vdo_enqueue_completion(struct vdo_completion *completion,
enum vdo_completion_priority priority)
{
struct vdo *vdo = completion->vdo;
thread_id_t thread_id = completion->callback_thread_id;
if (VDO_ASSERT(thread_id < vdo->thread_config.thread_count,
"thread_id %u (completion type %d) is less than thread count %u",
thread_id, completion->type,
vdo->thread_config.thread_count) != VDO_SUCCESS)
BUG();
completion->requeue = false;
completion->priority = priority;
completion->my_queue = NULL;
vdo_enqueue_work_queue(vdo->threads[thread_id].queue, completion);
}
/**
* vdo_requeue_completion_if_needed() - Requeue a completion if not called on the specified thread.
*
* Return: True if the completion was requeued; callers may not access the completion in this case.
*/
bool vdo_requeue_completion_if_needed(struct vdo_completion *completion,
thread_id_t callback_thread_id)
{
if (vdo_get_callback_thread_id() == callback_thread_id)
return false;
completion->callback_thread_id = callback_thread_id;
vdo_enqueue_completion(completion, VDO_WORK_Q_DEFAULT_PRIORITY);
return true;
}
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_COMPLETION_H
#define VDO_COMPLETION_H
#include "permassert.h"
#include "status-codes.h"
#include "types.h"
/**
* vdo_run_completion() - Run a completion's callback or error handler on the current thread.
*
* Context: This function must be called from the correct callback thread.
*/
static inline void vdo_run_completion(struct vdo_completion *completion)
{
if ((completion->result != VDO_SUCCESS) && (completion->error_handler != NULL)) {
completion->error_handler(completion);
return;
}
completion->callback(completion);
}
void vdo_set_completion_result(struct vdo_completion *completion, int result);
void vdo_initialize_completion(struct vdo_completion *completion, struct vdo *vdo,
enum vdo_completion_type type);
/**
* vdo_reset_completion() - Reset a completion to a clean state, while keeping the type, vdo and
* parent information.
*/
static inline void vdo_reset_completion(struct vdo_completion *completion)
{
completion->result = VDO_SUCCESS;
completion->complete = false;
}
void vdo_launch_completion_with_priority(struct vdo_completion *completion,
enum vdo_completion_priority priority);
/**
* vdo_launch_completion() - Launch a completion with default priority.
*/
static inline void vdo_launch_completion(struct vdo_completion *completion)
{
vdo_launch_completion_with_priority(completion, VDO_WORK_Q_DEFAULT_PRIORITY);
}
/**
* vdo_continue_completion() - Continue processing a completion.
* @result: The current result (will not mask older errors).
*
* Continue processing a completion by setting the current result and calling
* vdo_launch_completion().
*/
static inline void vdo_continue_completion(struct vdo_completion *completion, int result)
{
vdo_set_completion_result(completion, result);
vdo_launch_completion(completion);
}
void vdo_finish_completion(struct vdo_completion *completion);
/**
* vdo_fail_completion() - Set the result of a completion if it does not already have an error,
* then finish it.
*/
static inline void vdo_fail_completion(struct vdo_completion *completion, int result)
{
vdo_set_completion_result(completion, result);
vdo_finish_completion(completion);
}
/**
* vdo_assert_completion_type() - Assert that a completion is of the correct type.
*
* Return: VDO_SUCCESS or an error
*/
static inline int vdo_assert_completion_type(struct vdo_completion *completion,
enum vdo_completion_type expected)
{
return VDO_ASSERT(expected == completion->type,
"completion type should be %u, not %u", expected,
completion->type);
}
static inline void vdo_set_completion_callback(struct vdo_completion *completion,
vdo_action_fn callback,
thread_id_t callback_thread_id)
{
completion->callback = callback;
completion->callback_thread_id = callback_thread_id;
}
/**
* vdo_launch_completion_callback() - Set the callback for a completion and launch it immediately.
*/
static inline void vdo_launch_completion_callback(struct vdo_completion *completion,
vdo_action_fn callback,
thread_id_t callback_thread_id)
{
vdo_set_completion_callback(completion, callback, callback_thread_id);
vdo_launch_completion(completion);
}
/**
* vdo_prepare_completion() - Prepare a completion for launch.
*
* Resets the completion, and then sets its callback, error handler, callback thread, and parent.
*/
static inline void vdo_prepare_completion(struct vdo_completion *completion,
vdo_action_fn callback,
vdo_action_fn error_handler,
thread_id_t callback_thread_id, void *parent)
{
vdo_reset_completion(completion);
vdo_set_completion_callback(completion, callback, callback_thread_id);
completion->error_handler = error_handler;
completion->parent = parent;
}
/**
* vdo_prepare_completion_for_requeue() - Prepare a completion for launch ensuring that it will
* always be requeued.
*
* Resets the completion, and then sets its callback, error handler, callback thread, and parent.
*/
static inline void vdo_prepare_completion_for_requeue(struct vdo_completion *completion,
vdo_action_fn callback,
vdo_action_fn error_handler,
thread_id_t callback_thread_id,
void *parent)
{
vdo_prepare_completion(completion, callback, error_handler,
callback_thread_id, parent);
completion->requeue = true;
}
void vdo_enqueue_completion(struct vdo_completion *completion,
enum vdo_completion_priority priority);
bool vdo_requeue_completion_if_needed(struct vdo_completion *completion,
thread_id_t callback_thread_id);
#endif /* VDO_COMPLETION_H */
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_CONSTANTS_H
#define VDO_CONSTANTS_H
#include <linux/blkdev.h>
#include "types.h"
enum {
/*
* The maximum number of contiguous PBNs which will go to a single bio submission queue,
* assuming there is more than one queue.
*/
VDO_BIO_ROTATION_INTERVAL_LIMIT = 1024,
/* The number of entries on a block map page */
VDO_BLOCK_MAP_ENTRIES_PER_PAGE = 812,
/* The origin of the flat portion of the block map */
VDO_BLOCK_MAP_FLAT_PAGE_ORIGIN = 1,
/*
* The height of a block map tree. Assuming a root count of 60 and 812 entries per page,
* this is big enough to represent almost 95 PB of logical space.
*/
VDO_BLOCK_MAP_TREE_HEIGHT = 5,
/* The default number of bio submission queues. */
DEFAULT_VDO_BIO_SUBMIT_QUEUE_COUNT = 4,
/* The number of contiguous PBNs to be submitted to a single bio queue. */
DEFAULT_VDO_BIO_SUBMIT_QUEUE_ROTATE_INTERVAL = 64,
/* The number of trees in the arboreal block map */
DEFAULT_VDO_BLOCK_MAP_TREE_ROOT_COUNT = 60,
/* The default size of the recovery journal, in blocks */
DEFAULT_VDO_RECOVERY_JOURNAL_SIZE = 32 * 1024,
/* The default size of each slab journal, in blocks */
DEFAULT_VDO_SLAB_JOURNAL_SIZE = 224,
/* Unit test minimum */
MINIMUM_VDO_SLAB_JOURNAL_BLOCKS = 2,
/*
* The initial size of lbn_operations and pbn_operations, which is based upon the expected
* maximum number of outstanding VIOs. This value was chosen to make it highly unlikely
* that the maps would need to be resized.
*/
VDO_LOCK_MAP_CAPACITY = 10000,
/* The maximum number of logical zones */
MAX_VDO_LOGICAL_ZONES = 60,
/* The maximum number of physical zones */
MAX_VDO_PHYSICAL_ZONES = 16,
/* The base-2 logarithm of the maximum blocks in one slab */
MAX_VDO_SLAB_BITS = 23,
/* The maximum number of slabs the slab depot supports */
MAX_VDO_SLABS = 8192,
/*
* The maximum number of block map pages to load simultaneously during recovery or rebuild.
*/
MAXIMUM_SIMULTANEOUS_VDO_BLOCK_MAP_RESTORATION_READS = 1024,
/* The maximum number of entries in the slab summary */
MAXIMUM_VDO_SLAB_SUMMARY_ENTRIES = MAX_VDO_SLABS * MAX_VDO_PHYSICAL_ZONES,
/* The maximum number of total threads in a VDO thread configuration. */
MAXIMUM_VDO_THREADS = 100,
/* The maximum number of VIOs in the system at once */
MAXIMUM_VDO_USER_VIOS = 2048,
/* The only physical block size supported by VDO */
VDO_BLOCK_SIZE = 4096,
/* The number of sectors per block */
VDO_SECTORS_PER_BLOCK = (VDO_BLOCK_SIZE >> SECTOR_SHIFT),
/* The size of a sector that will not be torn */
VDO_SECTOR_SIZE = 512,
/* The physical block number reserved for storing the zero block */
VDO_ZERO_BLOCK = 0,
};
#endif /* VDO_CONSTANTS_H */
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef UDS_CPU_H
#define UDS_CPU_H
#include <linux/cache.h>
/**
* uds_prefetch_address() - Minimize cache-miss latency by attempting to move data into a CPU cache
* before it is accessed.
*
* @address: the address to fetch (may be invalid)
* @for_write: must be constant at compile time--false if for reading, true if for writing
*/
static inline void uds_prefetch_address(const void *address, bool for_write)
{
/*
* for_write won't be a constant if we are compiled with optimization turned off, in which
* case prefetching really doesn't matter. clang can't figure out that if for_write is a
* constant, it can be passed as the second, mandatorily constant argument to prefetch(),
* at least currently on llvm 12.
*/
if (__builtin_constant_p(for_write)) {
if (for_write)
__builtin_prefetch(address, true);
else
__builtin_prefetch(address, false);
}
}
/**
* uds_prefetch_range() - Minimize cache-miss latency by attempting to move a range of addresses
* into a CPU cache before they are accessed.
*
* @start: the starting address to fetch (may be invalid)
* @size: the number of bytes in the address range
* @for_write: must be constant at compile time--false if for reading, true if for writing
*/
static inline void uds_prefetch_range(const void *start, unsigned int size,
bool for_write)
{
/*
* Count the number of cache lines to fetch, allowing for the address range to span an
* extra cache line boundary due to address alignment.
*/
const char *address = (const char *) start;
unsigned int offset = ((uintptr_t) address % L1_CACHE_BYTES);
unsigned int cache_lines = (1 + ((size + offset) / L1_CACHE_BYTES));
while (cache_lines-- > 0) {
uds_prefetch_address(address, for_write);
address += L1_CACHE_BYTES;
}
}
#endif /* UDS_CPU_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_DEDUPE_H
#define VDO_DEDUPE_H
#include <linux/list.h>
#include <linux/timer.h>
#include "indexer.h"
#include "admin-state.h"
#include "constants.h"
#include "statistics.h"
#include "types.h"
#include "wait-queue.h"
struct dedupe_context {
struct hash_zone *zone;
struct uds_request request;
struct list_head list_entry;
struct funnel_queue_entry queue_entry;
u64 submission_jiffies;
struct data_vio *requestor;
atomic_t state;
};
struct hash_lock;
struct hash_zone {
/* Which hash zone this is */
zone_count_t zone_number;
/* The administrative state of the zone */
struct admin_state state;
/* The thread ID for this zone */
thread_id_t thread_id;
/* Mapping from record name fields to hash_locks */
struct int_map *hash_lock_map;
/* List containing all unused hash_locks */
struct list_head lock_pool;
/*
* Statistics shared by all hash locks in this zone. Only modified on the hash zone thread,
* but queried by other threads.
*/
struct hash_lock_statistics statistics;
/* Array of all hash_locks */
struct hash_lock *lock_array;
/* These fields are used to manage the dedupe contexts */
struct list_head available;
struct list_head pending;
struct funnel_queue *timed_out_complete;
struct timer_list timer;
struct vdo_completion completion;
unsigned int active;
atomic_t timer_state;
/* The dedupe contexts for querying the index from this zone */
struct dedupe_context contexts[MAXIMUM_VDO_USER_VIOS];
};
struct hash_zones;
struct pbn_lock * __must_check vdo_get_duplicate_lock(struct data_vio *data_vio);
void vdo_acquire_hash_lock(struct vdo_completion *completion);
void vdo_continue_hash_lock(struct vdo_completion *completion);
void vdo_release_hash_lock(struct data_vio *data_vio);
void vdo_clean_failed_hash_lock(struct data_vio *data_vio);
void vdo_share_compressed_write_lock(struct data_vio *data_vio,
struct pbn_lock *pbn_lock);
int __must_check vdo_make_hash_zones(struct vdo *vdo, struct hash_zones **zones_ptr);
void vdo_free_hash_zones(struct hash_zones *zones);
void vdo_drain_hash_zones(struct hash_zones *zones, struct vdo_completion *parent);
void vdo_get_dedupe_statistics(struct hash_zones *zones, struct vdo_statistics *stats);
struct hash_zone * __must_check vdo_select_hash_zone(struct hash_zones *zones,
const struct uds_record_name *name);
void vdo_dump_hash_zones(struct hash_zones *zones);
const char *vdo_get_dedupe_index_state_name(struct hash_zones *zones);
u64 vdo_get_dedupe_index_timeout_count(struct hash_zones *zones);
int vdo_message_dedupe_index(struct hash_zones *zones, const char *name);
void vdo_set_dedupe_state_normal(struct hash_zones *zones);
void vdo_start_dedupe_index(struct hash_zones *zones, bool create_flag);
void vdo_resume_hash_zones(struct hash_zones *zones, struct vdo_completion *parent);
void vdo_finish_dedupe_index(struct hash_zones *zones);
/* Interval (in milliseconds) from submission until switching to fast path and skipping UDS. */
extern unsigned int vdo_dedupe_index_timeout_interval;
/*
* Minimum time interval (in milliseconds) between timer invocations to check for requests waiting
* for UDS that should now time out.
*/
extern unsigned int vdo_dedupe_index_min_timer_interval;
void vdo_set_dedupe_index_timeout_interval(unsigned int value);
void vdo_set_dedupe_index_min_timer_interval(unsigned int value);
#endif /* VDO_DEDUPE_H */
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2023 Red Hat
*/
#include "dump.h"
#include <linux/module.h>
#include "memory-alloc.h"
#include "string-utils.h"
#include "constants.h"
#include "data-vio.h"
#include "dedupe.h"
#include "funnel-workqueue.h"
#include "io-submitter.h"
#include "logger.h"
#include "types.h"
#include "vdo.h"
enum dump_options {
/* Work queues */
SHOW_QUEUES,
/* Memory pools */
SHOW_VIO_POOL,
/* Others */
SHOW_VDO_STATUS,
/* This one means an option overrides the "default" choices, instead of altering them. */
SKIP_DEFAULT
};
enum dump_option_flags {
/* Work queues */
FLAG_SHOW_QUEUES = (1 << SHOW_QUEUES),
/* Memory pools */
FLAG_SHOW_VIO_POOL = (1 << SHOW_VIO_POOL),
/* Others */
FLAG_SHOW_VDO_STATUS = (1 << SHOW_VDO_STATUS),
/* Special */
FLAG_SKIP_DEFAULT = (1 << SKIP_DEFAULT)
};
#define FLAGS_ALL_POOLS (FLAG_SHOW_VIO_POOL)
#define DEFAULT_DUMP_FLAGS (FLAG_SHOW_QUEUES | FLAG_SHOW_VDO_STATUS)
/* Another static buffer... log10(256) = 2.408+, round up: */
#define DIGITS_PER_U64 (1 + sizeof(u64) * 2409 / 1000)
static inline bool is_arg_string(const char *arg, const char *this_option)
{
/* convention seems to be case-independent options */
return strncasecmp(arg, this_option, strlen(this_option)) == 0;
}
static void do_dump(struct vdo *vdo, unsigned int dump_options_requested,
const char *why)
{
u32 active, maximum;
s64 outstanding;
vdo_log_info("%s dump triggered via %s", VDO_LOGGING_MODULE_NAME, why);
active = get_data_vio_pool_active_requests(vdo->data_vio_pool);
maximum = get_data_vio_pool_maximum_requests(vdo->data_vio_pool);
outstanding = (atomic64_read(&vdo->stats.bios_submitted) -
atomic64_read(&vdo->stats.bios_completed));
vdo_log_info("%u device requests outstanding (max %u), %lld bio requests outstanding, device '%s'",
active, maximum, outstanding,
vdo_get_device_name(vdo->device_config->owning_target));
if (((dump_options_requested & FLAG_SHOW_QUEUES) != 0) && (vdo->threads != NULL)) {
thread_id_t id;
for (id = 0; id < vdo->thread_config.thread_count; id++)
vdo_dump_work_queue(vdo->threads[id].queue);
}
vdo_dump_hash_zones(vdo->hash_zones);
dump_data_vio_pool(vdo->data_vio_pool,
(dump_options_requested & FLAG_SHOW_VIO_POOL) != 0);
if ((dump_options_requested & FLAG_SHOW_VDO_STATUS) != 0)
vdo_dump_status(vdo);
vdo_report_memory_usage();
vdo_log_info("end of %s dump", VDO_LOGGING_MODULE_NAME);
}
static int parse_dump_options(unsigned int argc, char *const *argv,
unsigned int *dump_options_requested_ptr)
{
unsigned int dump_options_requested = 0;
static const struct {
const char *name;
unsigned int flags;
} option_names[] = {
{ "viopool", FLAG_SKIP_DEFAULT | FLAG_SHOW_VIO_POOL },
{ "vdo", FLAG_SKIP_DEFAULT | FLAG_SHOW_VDO_STATUS },
{ "pools", FLAG_SKIP_DEFAULT | FLAGS_ALL_POOLS },
{ "queues", FLAG_SKIP_DEFAULT | FLAG_SHOW_QUEUES },
{ "threads", FLAG_SKIP_DEFAULT | FLAG_SHOW_QUEUES },
{ "default", FLAG_SKIP_DEFAULT | DEFAULT_DUMP_FLAGS },
{ "all", ~0 },
};
bool options_okay = true;
unsigned int i;
for (i = 1; i < argc; i++) {
unsigned int j;
for (j = 0; j < ARRAY_SIZE(option_names); j++) {
if (is_arg_string(argv[i], option_names[j].name)) {
dump_options_requested |= option_names[j].flags;
break;
}
}
if (j == ARRAY_SIZE(option_names)) {
vdo_log_warning("dump option name '%s' unknown", argv[i]);
options_okay = false;
}
}
if (!options_okay)
return -EINVAL;
if ((dump_options_requested & FLAG_SKIP_DEFAULT) == 0)
dump_options_requested |= DEFAULT_DUMP_FLAGS;
*dump_options_requested_ptr = dump_options_requested;
return 0;
}
/* Dump as specified by zero or more string arguments. */
int vdo_dump(struct vdo *vdo, unsigned int argc, char *const *argv, const char *why)
{
unsigned int dump_options_requested = 0;
int result = parse_dump_options(argc, argv, &dump_options_requested);
if (result != 0)
return result;
do_dump(vdo, dump_options_requested, why);
return 0;
}
/* Dump everything we know how to dump */
void vdo_dump_all(struct vdo *vdo, const char *why)
{
do_dump(vdo, ~0, why);
}
/*
* Dump out the data_vio waiters on a waitq.
* wait_on should be the label to print for queue (e.g. logical or physical)
*/
static void dump_vio_waiters(struct vdo_wait_queue *waitq, char *wait_on)
{
struct vdo_waiter *waiter, *first = vdo_waitq_get_first_waiter(waitq);
struct data_vio *data_vio;
if (first == NULL)
return;
data_vio = vdo_waiter_as_data_vio(first);
vdo_log_info(" %s is locked. Waited on by: vio %px pbn %llu lbn %llu d-pbn %llu lastOp %s",
wait_on, data_vio, data_vio->allocation.pbn, data_vio->logical.lbn,
data_vio->duplicate.pbn, get_data_vio_operation_name(data_vio));
for (waiter = first->next_waiter; waiter != first; waiter = waiter->next_waiter) {
data_vio = vdo_waiter_as_data_vio(waiter);
vdo_log_info(" ... and : vio %px pbn %llu lbn %llu d-pbn %llu lastOp %s",
data_vio, data_vio->allocation.pbn, data_vio->logical.lbn,
data_vio->duplicate.pbn,
get_data_vio_operation_name(data_vio));
}
}
/*
* Encode various attributes of a data_vio as a string of one-character flags. This encoding is for
* logging brevity:
*
* R => vio completion result not VDO_SUCCESS
* W => vio is on a waitq
* D => vio is a duplicate
* p => vio is a partial block operation
* z => vio is a zero block
* d => vio is a discard
*
* The common case of no flags set will result in an empty, null-terminated buffer. If any flags
* are encoded, the first character in the string will be a space character.
*/
static void encode_vio_dump_flags(struct data_vio *data_vio, char buffer[8])
{
char *p_flag = buffer;
*p_flag++ = ' ';
if (data_vio->vio.completion.result != VDO_SUCCESS)
*p_flag++ = 'R';
if (data_vio->waiter.next_waiter != NULL)
*p_flag++ = 'W';
if (data_vio->is_duplicate)
*p_flag++ = 'D';
if (data_vio->is_partial)
*p_flag++ = 'p';
if (data_vio->is_zero)
*p_flag++ = 'z';
if (data_vio->remaining_discard > 0)
*p_flag++ = 'd';
if (p_flag == &buffer[1]) {
/* No flags, so remove the blank space. */
p_flag = buffer;
}
*p_flag = '\0';
}
/* Implements buffer_dump_function. */
void dump_data_vio(void *data)
{
struct data_vio *data_vio = data;
/*
* This just needs to be big enough to hold a queue (thread) name and a function name (plus
* a separator character and NUL). The latter is limited only by taste.
*
* In making this static, we're assuming only one "dump" will run at a time. If more than
* one does run, the log output will be garbled anyway.
*/
static char vio_completion_dump_buffer[100 + MAX_VDO_WORK_QUEUE_NAME_LEN];
static char vio_block_number_dump_buffer[sizeof("P L D") + 3 * DIGITS_PER_U64];
static char vio_flush_generation_buffer[sizeof(" FG") + DIGITS_PER_U64];
static char flags_dump_buffer[8];
/*
* We're likely to be logging a couple thousand of these lines, and in some circumstances
* syslogd may have trouble keeping up, so keep it BRIEF rather than user-friendly.
*/
vdo_dump_completion_to_buffer(&data_vio->vio.completion,
vio_completion_dump_buffer,
sizeof(vio_completion_dump_buffer));
if (data_vio->is_duplicate) {
snprintf(vio_block_number_dump_buffer,
sizeof(vio_block_number_dump_buffer), "P%llu L%llu D%llu",
data_vio->allocation.pbn, data_vio->logical.lbn,
data_vio->duplicate.pbn);
} else if (data_vio_has_allocation(data_vio)) {
snprintf(vio_block_number_dump_buffer,
sizeof(vio_block_number_dump_buffer), "P%llu L%llu",
data_vio->allocation.pbn, data_vio->logical.lbn);
} else {
snprintf(vio_block_number_dump_buffer,
sizeof(vio_block_number_dump_buffer), "L%llu",
data_vio->logical.lbn);
}
if (data_vio->flush_generation != 0) {
snprintf(vio_flush_generation_buffer,
sizeof(vio_flush_generation_buffer), " FG%llu",
data_vio->flush_generation);
} else {
vio_flush_generation_buffer[0] = 0;
}
encode_vio_dump_flags(data_vio, flags_dump_buffer);
vdo_log_info(" vio %px %s%s %s %s%s", data_vio,
vio_block_number_dump_buffer,
vio_flush_generation_buffer,
get_data_vio_operation_name(data_vio),
vio_completion_dump_buffer,
flags_dump_buffer);
/*
* might want info on: wantUDSAnswer / operation / status
* might want info on: bio / bios_merged
*/
dump_vio_waiters(&data_vio->logical.waiters, "lbn");
/* might want to dump more info from vio here */
}
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_DUMP_H
#define VDO_DUMP_H
#include "types.h"
int vdo_dump(struct vdo *vdo, unsigned int argc, char *const *argv, const char *why);
void vdo_dump_all(struct vdo *vdo, const char *why);
void dump_data_vio(void *data);
#endif /* VDO_DUMP_H */
This diff is collapsed.
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2023 Red Hat
*/
#include "errors.h"
#include <linux/compiler.h>
#include <linux/errno.h>
#include "logger.h"
#include "permassert.h"
#include "string-utils.h"
static const struct error_info successful = { "UDS_SUCCESS", "Success" };
static const char *const message_table[] = {
[EPERM] = "Operation not permitted",
[ENOENT] = "No such file or directory",
[ESRCH] = "No such process",
[EINTR] = "Interrupted system call",
[EIO] = "Input/output error",
[ENXIO] = "No such device or address",
[E2BIG] = "Argument list too long",
[ENOEXEC] = "Exec format error",
[EBADF] = "Bad file descriptor",
[ECHILD] = "No child processes",
[EAGAIN] = "Resource temporarily unavailable",
[ENOMEM] = "Cannot allocate memory",
[EACCES] = "Permission denied",
[EFAULT] = "Bad address",
[ENOTBLK] = "Block device required",
[EBUSY] = "Device or resource busy",
[EEXIST] = "File exists",
[EXDEV] = "Invalid cross-device link",
[ENODEV] = "No such device",
[ENOTDIR] = "Not a directory",
[EISDIR] = "Is a directory",
[EINVAL] = "Invalid argument",
[ENFILE] = "Too many open files in system",
[EMFILE] = "Too many open files",
[ENOTTY] = "Inappropriate ioctl for device",
[ETXTBSY] = "Text file busy",
[EFBIG] = "File too large",
[ENOSPC] = "No space left on device",
[ESPIPE] = "Illegal seek",
[EROFS] = "Read-only file system",
[EMLINK] = "Too many links",
[EPIPE] = "Broken pipe",
[EDOM] = "Numerical argument out of domain",
[ERANGE] = "Numerical result out of range"
};
static const struct error_info error_list[] = {
{ "UDS_OVERFLOW", "Index overflow" },
{ "UDS_INVALID_ARGUMENT", "Invalid argument passed to internal routine" },
{ "UDS_BAD_STATE", "UDS data structures are in an invalid state" },
{ "UDS_DUPLICATE_NAME", "Attempt to enter the same name into a delta index twice" },
{ "UDS_ASSERTION_FAILED", "Assertion failed" },
{ "UDS_QUEUED", "Request queued" },
{ "UDS_ALREADY_REGISTERED", "Error range already registered" },
{ "UDS_OUT_OF_RANGE", "Cannot access data outside specified limits" },
{ "UDS_DISABLED", "UDS library context is disabled" },
{ "UDS_UNSUPPORTED_VERSION", "Unsupported version" },
{ "UDS_CORRUPT_DATA", "Some index structure is corrupt" },
{ "UDS_NO_INDEX", "No index found" },
{ "UDS_INDEX_NOT_SAVED_CLEANLY", "Index not saved cleanly" },
};
struct error_block {
const char *name;
int base;
int last;
int max;
const struct error_info *infos;
};
#define MAX_ERROR_BLOCKS 6
static struct {
int allocated;
int count;
struct error_block blocks[MAX_ERROR_BLOCKS];
} registered_errors = {
.allocated = MAX_ERROR_BLOCKS,
.count = 1,
.blocks = { {
.name = "UDS Error",
.base = UDS_ERROR_CODE_BASE,
.last = UDS_ERROR_CODE_LAST,
.max = UDS_ERROR_CODE_BLOCK_END,
.infos = error_list,
} },
};
/* Get the error info for an error number. Also returns the name of the error block, if known. */
static const char *get_error_info(int errnum, const struct error_info **info_ptr)
{
struct error_block *block;
if (errnum == UDS_SUCCESS) {
*info_ptr = &successful;
return NULL;
}
for (block = registered_errors.blocks;
block < registered_errors.blocks + registered_errors.count;
block++) {
if ((errnum >= block->base) && (errnum < block->last)) {
*info_ptr = block->infos + (errnum - block->base);
return block->name;
} else if ((errnum >= block->last) && (errnum < block->max)) {
*info_ptr = NULL;
return block->name;
}
}
return NULL;
}
/* Return a string describing a system error message. */
static const char *system_string_error(int errnum, char *buf, size_t buflen)
{
size_t len;
const char *error_string = NULL;
if ((errnum > 0) && (errnum < ARRAY_SIZE(message_table)))
error_string = message_table[errnum];
len = ((error_string == NULL) ?
snprintf(buf, buflen, "Unknown error %d", errnum) :
snprintf(buf, buflen, "%s", error_string));
if (len < buflen)
return buf;
buf[0] = '\0';
return "System error";
}
/* Convert an error code to a descriptive string. */
const char *uds_string_error(int errnum, char *buf, size_t buflen)
{
char *buffer = buf;
char *buf_end = buf + buflen;
const struct error_info *info = NULL;
const char *block_name;
if (buf == NULL)
return NULL;
if (errnum < 0)
errnum = -errnum;
block_name = get_error_info(errnum, &info);
if (block_name != NULL) {
if (info != NULL) {
buffer = vdo_append_to_buffer(buffer, buf_end, "%s: %s",
block_name, info->message);
} else {
buffer = vdo_append_to_buffer(buffer, buf_end, "Unknown %s %d",
block_name, errnum);
}
} else if (info != NULL) {
buffer = vdo_append_to_buffer(buffer, buf_end, "%s", info->message);
} else {
const char *tmp = system_string_error(errnum, buffer, buf_end - buffer);
if (tmp != buffer)
buffer = vdo_append_to_buffer(buffer, buf_end, "%s", tmp);
else
buffer += strlen(tmp);
}
return buf;
}
/* Convert an error code to its name. */
const char *uds_string_error_name(int errnum, char *buf, size_t buflen)
{
char *buffer = buf;
char *buf_end = buf + buflen;
const struct error_info *info = NULL;
const char *block_name;
if (errnum < 0)
errnum = -errnum;
block_name = get_error_info(errnum, &info);
if (block_name != NULL) {
if (info != NULL) {
buffer = vdo_append_to_buffer(buffer, buf_end, "%s", info->name);
} else {
buffer = vdo_append_to_buffer(buffer, buf_end, "%s %d",
block_name, errnum);
}
} else if (info != NULL) {
buffer = vdo_append_to_buffer(buffer, buf_end, "%s", info->name);
} else {
const char *tmp;
tmp = system_string_error(errnum, buffer, buf_end - buffer);
if (tmp != buffer)
buffer = vdo_append_to_buffer(buffer, buf_end, "%s", tmp);
else
buffer += strlen(tmp);
}
return buf;
}
/*
* Translate an error code into a value acceptable to the kernel. The input error code may be a
* system-generated value (such as -EIO), or an internal UDS status code. The result will be a
* negative errno value.
*/
int uds_status_to_errno(int error)
{
char error_name[VDO_MAX_ERROR_NAME_SIZE];
char error_message[VDO_MAX_ERROR_MESSAGE_SIZE];
/* 0 is success, and negative values are already system error codes. */
if (likely(error <= 0))
return error;
if (error < 1024) {
/* This is probably an errno from userspace. */
return -error;
}
/* Internal UDS errors */
switch (error) {
case UDS_NO_INDEX:
case UDS_CORRUPT_DATA:
/* The index doesn't exist or can't be recovered. */
return -ENOENT;
case UDS_INDEX_NOT_SAVED_CLEANLY:
case UDS_UNSUPPORTED_VERSION:
/*
* The index exists, but can't be loaded. Tell the client it exists so they don't
* destroy it inadvertently.
*/
return -EEXIST;
case UDS_DISABLED:
/* The session is unusable; only returned by requests. */
return -EIO;
default:
/* Translate an unexpected error into something generic. */
vdo_log_info("%s: mapping status code %d (%s: %s) to -EIO",
__func__, error,
uds_string_error_name(error, error_name,
sizeof(error_name)),
uds_string_error(error, error_message,
sizeof(error_message)));
return -EIO;
}
}
/*
* Register a block of error codes.
*
* @block_name: the name of the block of error codes
* @first_error: the first error code in the block
* @next_free_error: one past the highest possible error in the block
* @infos: a pointer to the error info array for the block
* @info_size: the size of the error info array
*/
int uds_register_error_block(const char *block_name, int first_error,
int next_free_error, const struct error_info *infos,
size_t info_size)
{
int result;
struct error_block *block;
struct error_block new_block = {
.name = block_name,
.base = first_error,
.last = first_error + (info_size / sizeof(struct error_info)),
.max = next_free_error,
.infos = infos,
};
result = VDO_ASSERT(first_error < next_free_error,
"well-defined error block range");
if (result != VDO_SUCCESS)
return result;
if (registered_errors.count == registered_errors.allocated) {
/* This should never happen. */
return UDS_OVERFLOW;
}
for (block = registered_errors.blocks;
block < registered_errors.blocks + registered_errors.count;
block++) {
if (strcmp(block_name, block->name) == 0)
return UDS_DUPLICATE_NAME;
/* Ensure error ranges do not overlap. */
if ((first_error < block->max) && (next_free_error > block->base))
return UDS_ALREADY_REGISTERED;
}
registered_errors.blocks[registered_errors.count++] = new_block;
return UDS_SUCCESS;
}
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef UDS_ERRORS_H
#define UDS_ERRORS_H
#include <linux/compiler.h>
#include <linux/types.h>
/* Custom error codes and error-related utilities */
#define VDO_SUCCESS 0
/* Valid status codes for internal UDS functions. */
enum uds_status_codes {
/* Successful return */
UDS_SUCCESS = VDO_SUCCESS,
/* Used as a base value for reporting internal errors */
UDS_ERROR_CODE_BASE = 1024,
/* Index overflow */
UDS_OVERFLOW = UDS_ERROR_CODE_BASE,
/* Invalid argument passed to internal routine */
UDS_INVALID_ARGUMENT,
/* UDS data structures are in an invalid state */
UDS_BAD_STATE,
/* Attempt to enter the same name into an internal structure twice */
UDS_DUPLICATE_NAME,
/* An assertion failed */
UDS_ASSERTION_FAILED,
/* A request has been queued for later processing (not an error) */
UDS_QUEUED,
/* This error range has already been registered */
UDS_ALREADY_REGISTERED,
/* Attempt to read or write data outside the valid range */
UDS_OUT_OF_RANGE,
/* The index session is disabled */
UDS_DISABLED,
/* The index configuration or volume format is no longer supported */
UDS_UNSUPPORTED_VERSION,
/* Some index structure is corrupt */
UDS_CORRUPT_DATA,
/* No index state found */
UDS_NO_INDEX,
/* Attempt to access incomplete index save data */
UDS_INDEX_NOT_SAVED_CLEANLY,
/* One more than the last UDS_INTERNAL error code */
UDS_ERROR_CODE_LAST,
/* One more than the last error this block will ever use */
UDS_ERROR_CODE_BLOCK_END = UDS_ERROR_CODE_BASE + 440,
};
enum {
VDO_MAX_ERROR_NAME_SIZE = 80,
VDO_MAX_ERROR_MESSAGE_SIZE = 128,
};
struct error_info {
const char *name;
const char *message;
};
const char * __must_check uds_string_error(int errnum, char *buf, size_t buflen);
const char *uds_string_error_name(int errnum, char *buf, size_t buflen);
int uds_status_to_errno(int error);
int uds_register_error_block(const char *block_name, int first_error,
int last_reserved_error, const struct error_info *infos,
size_t info_size);
#endif /* UDS_ERRORS_H */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_FLUSH_H
#define VDO_FLUSH_H
#include "funnel-workqueue.h"
#include "types.h"
#include "vio.h"
#include "wait-queue.h"
/* A marker for tracking which journal entries are affected by a flush request. */
struct vdo_flush {
/* The completion for enqueueing this flush request. */
struct vdo_completion completion;
/* The flush bios covered by this request */
struct bio_list bios;
/* The wait queue entry for this flush */
struct vdo_waiter waiter;
/* Which flush this struct represents */
sequence_number_t flush_generation;
};
struct flusher;
int __must_check vdo_make_flusher(struct vdo *vdo);
void vdo_free_flusher(struct flusher *flusher);
thread_id_t __must_check vdo_get_flusher_thread_id(struct flusher *flusher);
void vdo_complete_flushes(struct flusher *flusher);
void vdo_dump_flusher(const struct flusher *flusher);
void vdo_launch_flush(struct vdo *vdo, struct bio *bio);
void vdo_drain_flusher(struct flusher *flusher, struct vdo_completion *completion);
void vdo_resume_flusher(struct flusher *flusher, struct vdo_completion *parent);
#endif /* VDO_FLUSH_H */
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2023 Red Hat
*/
#include "funnel-queue.h"
#include "cpu.h"
#include "memory-alloc.h"
#include "permassert.h"
int vdo_make_funnel_queue(struct funnel_queue **queue_ptr)
{
int result;
struct funnel_queue *queue;
result = vdo_allocate(1, struct funnel_queue, "funnel queue", &queue);
if (result != VDO_SUCCESS)
return result;
/*
* Initialize the stub entry and put it in the queue, establishing the invariant that
* queue->newest and queue->oldest are never null.
*/
queue->stub.next = NULL;
queue->newest = &queue->stub;
queue->oldest = &queue->stub;
*queue_ptr = queue;
return VDO_SUCCESS;
}
void vdo_free_funnel_queue(struct funnel_queue *queue)
{
vdo_free(queue);
}
static struct funnel_queue_entry *get_oldest(struct funnel_queue *queue)
{
/*
* Barrier requirements: We need a read barrier between reading a "next" field pointer
* value and reading anything it points to. There's an accompanying barrier in
* vdo_funnel_queue_put() between its caller setting up the entry and making it visible.
*/
struct funnel_queue_entry *oldest = queue->oldest;
struct funnel_queue_entry *next = READ_ONCE(oldest->next);
if (oldest == &queue->stub) {
/*
* When the oldest entry is the stub and it has no successor, the queue is
* logically empty.
*/
if (next == NULL)
return NULL;
/*
* The stub entry has a successor, so the stub can be dequeued and ignored without
* breaking the queue invariants.
*/
oldest = next;
queue->oldest = oldest;
next = READ_ONCE(oldest->next);
}
/*
* We have a non-stub candidate to dequeue. If it lacks a successor, we'll need to put the
* stub entry back on the queue first.
*/
if (next == NULL) {
struct funnel_queue_entry *newest = READ_ONCE(queue->newest);
if (oldest != newest) {
/*
* Another thread has already swung queue->newest atomically, but not yet
* assigned previous->next. The queue is really still empty.
*/
return NULL;
}
/*
* Put the stub entry back on the queue, ensuring a successor will eventually be
* seen.
*/
vdo_funnel_queue_put(queue, &queue->stub);
/* Check again for a successor. */
next = READ_ONCE(oldest->next);
if (next == NULL) {
/*
* We lost a race with a producer who swapped queue->newest before we did,
* but who hasn't yet updated previous->next. Try again later.
*/
return NULL;
}
}
return oldest;
}
/*
* Poll a queue, removing the oldest entry if the queue is not empty. This function must only be
* called from a single consumer thread.
*/
struct funnel_queue_entry *vdo_funnel_queue_poll(struct funnel_queue *queue)
{
struct funnel_queue_entry *oldest = get_oldest(queue);
if (oldest == NULL)
return oldest;
/*
* Dequeue the oldest entry and return it. Only one consumer thread may call this function,
* so no locking, atomic operations, or fences are needed; queue->oldest is owned by the
* consumer and oldest->next is never used by a producer thread after it is swung from NULL
* to non-NULL.
*/
queue->oldest = READ_ONCE(oldest->next);
/*
* Make sure the caller sees the proper stored data for this entry. Since we've already
* fetched the entry pointer we stored in "queue->oldest", this also ensures that on entry
* to the next call we'll properly see the dependent data.
*/
smp_rmb();
/*
* If "oldest" is a very light-weight work item, we'll be looking for the next one very
* soon, so prefetch it now.
*/
uds_prefetch_address(queue->oldest, true);
WRITE_ONCE(oldest->next, NULL);
return oldest;
}
/*
* Check whether the funnel queue is empty or not. If the queue is in a transition state with one
* or more entries being added such that the list view is incomplete, this function will report the
* queue as empty.
*/
bool vdo_is_funnel_queue_empty(struct funnel_queue *queue)
{
return get_oldest(queue) == NULL;
}
/*
* Check whether the funnel queue is idle or not. If the queue has entries available to be
* retrieved, it is not idle. If the queue is in a transition state with one or more entries being
* added such that the list view is incomplete, it may not be possible to retrieve an entry with
* the vdo_funnel_queue_poll() function, but the queue will not be considered idle.
*/
bool vdo_is_funnel_queue_idle(struct funnel_queue *queue)
{
/*
* Oldest is not the stub, so there's another entry, though if next is NULL we can't
* retrieve it yet.
*/
if (queue->oldest != &queue->stub)
return false;
/*
* Oldest is the stub, but newest has been updated by _put(); either there's another,
* retrievable entry in the list, or the list is officially empty but in the intermediate
* state of having an entry added.
*
* Whether anything is retrievable depends on whether stub.next has been updated and become
* visible to us, but for idleness we don't care. And due to memory ordering in _put(), the
* update to newest would be visible to us at the same time or sooner.
*/
if (READ_ONCE(queue->newest) != &queue->stub)
return false;
return true;
}
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef VDO_FUNNEL_QUEUE_H
#define VDO_FUNNEL_QUEUE_H
#include <linux/atomic.h>
#include <linux/cache.h>
/*
* A funnel queue is a simple (almost) lock-free queue that accepts entries from multiple threads
* (multi-producer) and delivers them to a single thread (single-consumer). "Funnel" is an attempt
* to evoke the image of requests from more than one producer being "funneled down" to a single
* consumer.
*
* This is an unsynchronized but thread-safe data structure when used as intended. There is no
* mechanism to ensure that only one thread is consuming from the queue. If more than one thread
* attempts to consume from the queue, the resulting behavior is undefined. Clients must not
* directly access or manipulate the internals of the queue, which are only exposed for the purpose
* of allowing the very simple enqueue operation to be inlined.
*
* The implementation requires that a funnel_queue_entry structure (a link pointer) is embedded in
* the queue entries, and pointers to those structures are used exclusively by the queue. No macros
* are defined to template the queue, so the offset of the funnel_queue_entry in the records placed
* in the queue must all be the same so the client can derive their structure pointer from the
* entry pointer returned by vdo_funnel_queue_poll().
*
* Callers are wholly responsible for allocating and freeing the entries. Entries may be freed as
* soon as they are returned since this queue is not susceptible to the "ABA problem" present in
* many lock-free data structures. The queue is dynamically allocated to ensure cache-line
* alignment, but no other dynamic allocation is used.
*
* The algorithm is not actually 100% lock-free. There is a single point in vdo_funnel_queue_put()
* at which a preempted producer will prevent the consumers from seeing items added to the queue by
* later producers, and only if the queue is short enough or the consumer fast enough for it to
* reach what was the end of the queue at the time of the preemption.
*
* The consumer function, vdo_funnel_queue_poll(), will return NULL when the queue is empty. To
* wait for data to consume, spin (if safe) or combine the queue with a struct event_count to
* signal the presence of new entries.
*/
/* This queue link structure must be embedded in client entries. */
struct funnel_queue_entry {
/* The next (newer) entry in the queue. */
struct funnel_queue_entry *next;
};
/*
* The dynamically allocated queue structure, which is allocated on a cache line boundary so the
* producer and consumer fields in the structure will land on separate cache lines. This should be
* consider opaque but it is exposed here so vdo_funnel_queue_put() can be inlined.
*/
struct __aligned(L1_CACHE_BYTES) funnel_queue {
/*
* The producers' end of the queue, an atomically exchanged pointer that will never be
* NULL.
*/
struct funnel_queue_entry *newest;
/* The consumer's end of the queue, which is owned by the consumer and never NULL. */
struct funnel_queue_entry *oldest __aligned(L1_CACHE_BYTES);
/* A dummy entry used to provide the non-NULL invariants above. */
struct funnel_queue_entry stub;
};
int __must_check vdo_make_funnel_queue(struct funnel_queue **queue_ptr);
void vdo_free_funnel_queue(struct funnel_queue *queue);
/*
* Put an entry on the end of the queue.
*
* The entry pointer must be to the struct funnel_queue_entry embedded in the caller's data
* structure. The caller must be able to derive the address of the start of their data structure
* from the pointer that passed in here, so every entry in the queue must have the struct
* funnel_queue_entry at the same offset within the client's structure.
*/
static inline void vdo_funnel_queue_put(struct funnel_queue *queue,
struct funnel_queue_entry *entry)
{
struct funnel_queue_entry *previous;
/*
* Barrier requirements: All stores relating to the entry ("next" pointer, containing data
* structure fields) must happen before the previous->next store making it visible to the
* consumer. Also, the entry's "next" field initialization to NULL must happen before any
* other producer threads can see the entry (the xchg) and try to update the "next" field.
*
* xchg implements a full barrier.
*/
WRITE_ONCE(entry->next, NULL);
previous = xchg(&queue->newest, entry);
/*
* Preemptions between these two statements hide the rest of the queue from the consumer,
* preventing consumption until the following assignment runs.
*/
WRITE_ONCE(previous->next, entry);
}
struct funnel_queue_entry *__must_check vdo_funnel_queue_poll(struct funnel_queue *queue);
bool __must_check vdo_is_funnel_queue_empty(struct funnel_queue *queue);
bool __must_check vdo_is_funnel_queue_idle(struct funnel_queue *queue);
#endif /* VDO_FUNNEL_QUEUE_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2023 Red Hat
*/
#ifndef UDS_REQUEST_QUEUE_H
#define UDS_REQUEST_QUEUE_H
#include "indexer.h"
/*
* A simple request queue which will handle new requests in the order in which they are received,
* and will attempt to handle requeued requests before new ones. However, the nature of the
* implementation means that it cannot guarantee this ordering; the prioritization is merely a
* hint.
*/
struct uds_request_queue;
typedef void (*uds_request_queue_processor_fn)(struct uds_request *);
int __must_check uds_make_request_queue(const char *queue_name,
uds_request_queue_processor_fn processor,
struct uds_request_queue **queue_ptr);
void uds_request_queue_enqueue(struct uds_request_queue *queue,
struct uds_request *request);
void uds_request_queue_finish(struct uds_request_queue *queue);
#endif /* UDS_REQUEST_QUEUE_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment