Commit e90477dd authored by Patrick Mochel's avatar Patrick Mochel

Update driver model documentation.

parent 68239661
The (New) Linux Kernel Driver Model
Version 0.04
Patrick Mochel <mochel@osdl.org>
03 December 2001
Overview
~~~~~~~~
This driver model is a unification of all the current, disparate driver models
that are currently in the kernel. It is intended is to augment the
bus-specific drivers for bridges and devices by consolidating a set of data
and operations into globally accessible data structures.
Current driver models implement some sort of tree-like structure (sometimes
just a list) for the devices they control. But, there is no linkage between
the different bus types.
A common data structure can provide this linkage with little overhead: when a
bus driver discovers a particular device, it can insert it into the global
tree as well as its local tree. In fact, the local tree becomes just a subset
of the global tree.
Common data fields can also be moved out of the local bus models into the
global model. Some of the manipulation of these fields can also be
consolidated. Most likely, manipulation functions will become a set
of helper functions, which the bus drivers wrap around to include any
bus-specific items.
The common device and bridge interface currently reflects the goals of the
modern PC: namely the ability to do seamless Plug and Play, power management,
and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
us that any device in the system may fit any of these criteria.)
In reality, not every bus will be able to support such operations. But, most
buses will support a majority of those operations, and all future buses will.
In other words, a bus that doesn't support an operation is the exception,
instead of the other way around.
Drivers
~~~~~~~
The callbacks for bridges and devices are intended to be singular for a
particular type of bus. For each type of bus that has support compiled in the
kernel, there should be one statically allocated structure with the
appropriate callbacks that each device (or bridge) of that type share.
Each bus layer should implement the callbacks for these drivers. It then
forwards the calls on to the device-specific callbacks. This means that
device-specific drivers must still implement callbacks for each operation.
But, they are not called from the top level driver layer. [So for example
PCI devices will not call device_register but pci_device_register.]
This does add another layer of indirection for calling one of these functions,
but there are benefits that are believed to outweigh this slowdown.
First, it prevents device-specific drivers from having to know about the
global device layer. This speeds up integration time incredibly. It also
allows drivers to be more portable across kernel versions. Note that the
former was intentional, the latter is an added bonus.
Second, this added indirection allows the bus to perform any additional logic
necessary for its child devices. A bus layer may add additional information to
the call, or translate it into something meaningful for its children.
This could be done in the driver, but if it happens for every object of a
particular type, it is best done at a higher level.
Recap
~~~~~
Instances of devices and bridges are allocated dynamically as the system
discovers their existence. Their fields describe the individual object.
Drivers - in the global sense - are statically allocated and singular for a
particular type of bus. They describe a set of operations that every type of
bus could implement, the implementation following the bus's semantics.
Downstream Access
~~~~~~~~~~~~~~~~~
Common data fields have been moved out of individual bus layers into a common
data structure. But, these fields must still be accessed by the bus layers,
and sometimes by the device-specific drivers.
Other bus layers are encouraged to do what has been done for the PCI layer.
struct pci_dev now looks like this:
struct pci_dev {
...
struct device device;
};
Note first that it is statically allocated. This means only one allocation on
device discovery. Note also that it is at the _end_ of struct pci_dev. This is
to make people think about what they're doing when switching between the bus
driver and the global driver; and to prevent against mindless casts between
the two.
The PCI bus layer freely accesses the fields of struct device. It knows about
the structure of struct pci_dev, and it should know the structure of struct
device. PCI devices that have been converted generally do not touch the fields
of struct device. More precisely, device-specific drivers should not touch
fields of struct device unless there is a strong compelling reason to do so.
This abstraction is prevention of unnecessary pain during transitional phases.
If the name of the field changes or is removed, then every downstream driver
will break. On the other hand, if only the bus layer (and not the device
layer) accesses struct device, it is only those that need to change.
User Interface
~~~~~~~~~~~~~~
By virtue of having a complete hierarchical view of all the devices in the
system, exporting a complete hierarchical view to userspace becomes relatively
easy. This has been accomplished by implementing a special purpose virtual
file system named driverfs. It is hence possible for the user to mount the
whole driverfs on a particular mount point in the unified UNIX file hierarchy.
This can be done permanently by providing the following entry into the
/dev/fstab (under the provision that the mount point does exist, of course):
none /devices driverfs defaults 0 0
Or by hand on the command line:
~: mount -t driverfs none /devices
Whenever a device is inserted into the tree, a directory is created for it.
This directory may be populated at each layer of discovery - the global layer,
the bus layer, or the device layer.
The global layer currently creates two files - 'status' and 'power'. The
former only reports the name of the device and its bus ID. The latter reports
the current power state of the device. It also be used to set the current
power state.
The bus layer may also create files for the devices it finds while probing the
bus. For example, the PCI layer currently creates 'wake' and 'resource' files
for each PCI device.
A device-specific driver may also export files in its directory to expose
device-specific data or tunable interfaces.
These features were initially implemented using procfs. However, after one
conversation with Linus, a new filesystem - driverfs - was created to
implement these features. It is an in-memory filesystem, based heavily off of
ramfs, though it uses procfs as inspiration for its callback functionality.
Each struct device has a 'struct driver_dir_entry' which encapsulates the
device's directory and the files within.
Device Structures
~~~~~~~~~~~~~~~~~
struct device {
struct list_head bus_list;
struct iobus *parent;
struct iobus *subordinate;
char name[DEVICE_NAME_SIZE];
char bus_id[BUS_ID_SIZE];
struct driver_dir_entry * dir;
spinlock_t lock;
atomic_t refcount;
struct device_driver *driver;
void *driver_data;
void *platform_data;
u32 current_state;
unsigned char *saved_state;
};
bus_list:
List of all devices on a particular bus; i.e. the device's siblings
parent:
The parent bridge for the device.
subordinate:
If the device is a bridge itself, this points to the struct io_bus that is
created for it.
name:
Human readable (descriptive) name of device. E.g. "Intel EEPro 100"
bus_id:
Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
0). It is necessary to have a searchable bus id for each device; making it
ASCII allows us to use it for its directory name without translating it.
dir:
Driver's driverfs directory.
lock:
Driver specific lock.
refcount:
Driver's usage count.
When this goes to 0, the device is assumed to be removed. It will be removed
from its parent's list of children. It's remove() callback will be called to
inform the driver to clean up after itself.
driver:
Pointer to a struct device_driver, the common operations for each device. See
next section.
driver_data:
Private data for the driver.
Much like the PCI implementation of this field, this allows device-specific
drivers to keep a pointer to a device-specific data.
platform_data:
Data that the platform (firmware) provides about the device.
For example, the ACPI BIOS or EFI may have additional information about the
device that is not directly mappable to any existing kernel data structure.
It also allows the platform driver (e.g. ACPI) to a driver without the driver
having to have explicit knowledge of (atrocities like) ACPI.
current_state:
Current power state of the device. For PCI and other modern devices, this is
0-3, though it's not necessarily limited to those values.
saved_state:
Pointer to driver-specific set of saved state.
Having it here allows modules to be unloaded on system suspend and reloaded
on resume and maintain state across transitions.
It also allows generic drivers to maintain state across system state
transitions.
(I've implemented a generic PCI driver for devices that don't have a
device-specific driver. Instead of managing some vector of saved state
for each device the generic driver supports, it can simply store it here.)
struct device_driver {
int (*probe) (struct device *dev);
int (*remove) (struct device *dev);
int (*suspend) (struct device *dev, u32 state, u32 level);
int (*resume) (struct device *dev, u32 level);
}
probe:
Check for device existence and associate driver with it. In case of device
insertion, *all* drivers are called. Struct device has parent and bus_id
valid at this point. probe() may only be called from process context. Returns
0 if it handles that device, -ESRCH if this driver does not know how to handle
this device, valid error otherwise.
remove:
Dissociate driver with device. Releases device so that it could be used by
another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
ejection event could take place here. remove() can be called from interrupt
context. [Fixme: Is that good?] Returns 0 on success. [Can we recover from
failed remove or should I define that remove() never fails?]
suspend:
Perform one step of the device suspend process. Returns 0 on success.
resume:
Perform one step of the device resume process. Returns 0 on success.
The probe() and remove() callbacks are intended to be much simpler than the
current PCI correspondents.
probe() should do the following only:
- Check if hardware is present
- Register device interface
- Disable DMA/interrupts, etc, just in case.
Some device initialisation was done in probe(). This should not be the case
anymore. All initialisation should take place in the open() call for the
device. [FIXME: How do you "open" uhci?]
Breaking initialisation code out must also be done for the resume() callback,
as most devices will have to be completely reinitialised when coming back from
a suspend state.
remove() should simply unregister the device interface.
Device power management can be quite complicated, based exactly what is
desired to be done. Four operations sum up most of it:
- OS directed power management.
The OS takes care of notifying all drivers that a suspend is requested,
saving device state, and powering devices down.
- Firmware controlled power management.
The OS only wants to notify devices that a suspend is requested.
- Device power management.
A user wants to place only one device in a low power state, and maybe save
state.
- System reboot.
The system wants to place devices in a quiescent state before the system is
reset.
In an attempt to please all of these scenarios, the power management
transition for any device is broken up into several stages - notify, save
state, and power down. The disable stage, which should happen after notify and
before save state has been considered and may be implemented in the future.
Depending on what the system-wide policy is (usually dictated by the power
management scheme present), each driver's suspend callback may be called
multiple times, each with a different stage.
On all power management transitions, the stages should be called sequentially
(notify before save state; save state before power down). However, drivers
should not assume that any stage was called before hand. (If a driver gets a
power down call, it shouldn't assume notify or save state was called first.)
This allows the framework to be used seamlessly by all power management
actions. Hopefully.
Resume transitions happen in a similar manner. They are broken up into two
stages currently (power on and restore state), though a third stage (enable)
may be added later.
For suspend and resume transitions, the following values are defined to denote
the stage:
enum{
SUSPEND_NOTIFY,
SUSPEND_DISABLE,
SUSPEND_SAVE_STATE,
SUSPEND_POWER_DOWN,
};
enum {
RESUME_POWER_ON,
RESUME_RESTORE_STATE,
RESUME_ENABLE,
};
During a system power transition, the device tree must be walked in order,
calling the suspend() or resume() callback for each node. This may happen
several times.
Initially, this was done in kernel space. However, it has occurred to me that
doing recursion to a non-bounded depth is dangerous, and that there are a lot
of inherent race conditions in such an operation.
Non-recursive walking of the device tree is possible. However, this makes for
convoluted code.
No matter what, if the transition happens in kernel space, it is difficult to
gracefully recover from errors or to implement a policy that prevents one from
shutting down the device(s) you want to save state to.
Instead, the walking of the device tree has been moved to userspace. When a
user requests the system to suspend, it will walk the device tree, as exported
via driverfs, and tell each device to go to sleep. It will do this multiple
times based on what the system policy is. [Not possible. Take ACPI enabled
system, with battery critically low. In such state, you want to suspend-to-disk,
*fast*. User maybe is not even running powerd (think system startup)!]
Device resume should happen in the same manner when the system awakens.
Each suspend stage is described below:
SUSPEND_NOTIFY:
This level to notify the driver that it is going to sleep. If it knows that it
cannot resume the hardware from the requested level, or it feels that it is
too important to be put to sleep, it should return an error from this function.
It does not have to stop I/O requests or actually save state at this point. Called
from process context.
SUSPEND_DISABLE:
The driver should stop taking I/O requests at this stage. Because the save
state stage happens afterwards, the driver may not want to physically disable
the device; only mark itself unavailable if possible. Called from process
context.
SUSPEND_SAVE_STATE:
The driver should allocate memory and save any device state that is relevant
for the state it is going to enter. Called from process context.
SUSPEND_POWER_DOWN:
The driver should place the device in the power state requested. May be called
from interrupt context.
For resume, the stages are defined as follows:
RESUME_POWER_ON:
Devices should be powered on and reinitialised to some known working state.
Called from process context.
RESUME_RESTORE_STATE:
The driver should restore device state to its pre-suspend state and free any
memory allocated for its saved state. Called from process context.
RESUME_ENABLE:
The device should start taking I/O requests again. Called from process context.
Each driver does not have to implement each stage. But, it if it does
implement a stage, it should do what is described above. It should not assume
that it performed any stage previously, or that it will perform any stage
later. [Really? It makes sense to support SAVE_STATE only after DISABLE].
It is quite possible that a driver can fail during the suspend process, for
whatever reason. In this event, the calling process must gracefully recover
and restore everything to their states before the suspend transition began.
[Suspend may not fail, think battery low.]
If a driver knows that it cannot suspend or resume properly, it should fail
during the notify stage. Properly implemented power management schemes should
make sure that this is the first stage that is called.
If a driver gets a power down request, it should obey it, as it may very
likely be during a reboot.
Bus Structures
~~~~~~~~~~~~~~
struct iobus {
struct list_head node;
struct iobus *parent;
struct list_head children;
struct list_head devices;
struct list_head bus_list;
spinlock_t lock;
atomic_t refcount;
struct device *self;
struct driver_dir_entry * dir;
char name[DEVICE_NAME_SIZE];
char bus_id[BUS_ID_SIZE];
struct bus_driver *driver;
};
node:
Bus's node in sibling list (its parent's list of child buses).
parent:
Pointer to parent bridge.
children:
List of subordinate buses.
In the children, this correlates to their 'node' field.
devices:
List of devices on the bus this bridge controls.
This field corresponds to the 'bus_list' field in each child device.
bus_list:
Each type of bus keeps a list of all bridges that it finds. This is the
bridges entry in that list.
self:
Pointer to the struct device for this bridge.
lock:
Lock for the bus.
refcount:
Usage count for the bus.
dir:
Driverfs directory.
name:
Human readable ASCII name of bus.
bus_id:
Machine readable (though ASCII) description of position on parent bus.
driver:
Pointer to operations for bus.
struct iobus_driver {
char name[16];
struct list_head node;
int (*scan) (struct io_bus*);
int (*add_device) (struct io_bus*, char*);
};
name:
ASCII name of bus.
node:
List of buses of this type in system.
scan:
Search the bus for new devices. This may happen either at boot - where every
device discovered will be new - or later on - in which there may only be a few
(or no) new devices.
add_device:
Trigger a device insertion at a particular location.
The API
~~~~~~~
There are several functions exported by the global device layer, including
several optional helper functions, written solely to try and make your life
easier.
void device_init_dev(struct device * dev);
Initialise a device structure. It first zeros the device, the initialises all
of the lists. (Note that this would have been called device_init(), but that
name was already taken. :/)
struct device * device_alloc(void)
Allocate memory for a device structure and initialise it.
First, allocates memory, then calls device_init_dev() with the new pointer.
int device_register(struct device * dev);
Register a device with the global device layer.
The bus layer should call this function upon device discovery, e.g. when
probing the bus.
dev should be fully initialised when this is called.
If dev->parent is not set, it sets its parent to be the device root.
It then does the following:
- inserts it into its parent's list of children
- creates a driverfs directory for it
- creates a set of default files for the device in its directory
- calls platform_notify() to notify the firmware driver of its existence.
void get_device(struct device * dev);
Increment the refcount for a device.
int valid_device(struct device * dev);
Check if reference count is positive for a device (it's not waiting to be
freed). If it is positive, it increments the reference count for the device.
It returns whether or not the device is usable.
void put_device(struct device * dev);
Decrement the reference count for the device. If it hits 0, it removes the
device from its parent's list of children and calls the remove() callback for
the device.
void lock_device(struct device * dev);
Take the spinlock for the device.
void unlock_device(struct device * dev);
Release the spinlock for the device.
void iobus_init(struct iobus * iobus);
struct iobus * iobus_alloc(void);
int iobus_register(struct iobus * iobus);
void get_iobus(struct iobus * iobus);
int valid_iobus(struct iobus * iobus);
void put_iobus(struct iobus * iobus);
void lock_iobus(struct iobus * iobus);
void unlock_iobus(struct iobus * iobus);
These functions provide the same functionality as the device_*
counterparts, only operating on a struct iobus. One important thing to note,
though is that iobus_register() and iobus_unregister() operate recursively. It
is possible to add an entire tree in one call.
int device_driver_init(void);
Main initialisation routine.
This makes sure driverfs is up and running and initialises the device tree.
void device_driver_exit(void);
This frees up the device tree.
Credits
~~~~~~~
The following people have been extremely helpful in solidifying this document
and the driver model.
Randy Dunlap rddunlap@osdl.org
Jeff Garzik jgarzik@mandrakesoft.com
Ben Herrenschmidt benh@kernel.crashing.org
Driver Binding
Driver binding is the process of associating a device with a device
driver that can control it. Bus drivers have typically handled this
because there have been bus-specific structures to represent the
devices and the drivers. With generic device and device driver
structures, most of the binding can take place using common code.
Bus
~~~
The bus type structure contains a list of all devices that on that bus
type in the system. When device_register is called for a device, it is
inserted into the end of this list. The bus object also contains a
list of all drivers of that bus type. When driver_register is called
for a driver, it is inserted into the end of this list. These are the
two events which trigger driver binding.
device_register
~~~~~~~~~~~~~~~
When a new device is added, the bus's list of drivers is iterated over
to find one that supports it. In order to determine that, the device
ID of the device must match one of the device IDs that the driver
supports. The format and semantics for comparing IDs is bus-specific.
Instead of trying to derive a complex state machine and matching
algorithm, it is up to the bus driver to provide a callback to compare
a device against the IDs of a driver. The bus returns 1 if a match was
found; 0 otherwise.
int match(struct device * dev, struct device_driver * drv);
If a match is found, the device's driver field is set to the driver
and the driver's probe callback is called. This gives the driver a
chance to verify that it really does support the hardware, and that
it's in a working state.
Device Class
~~~~~~~~~~~~
Upon the successful completion of probe, the device is registered with
the class to which it belongs. Device drivers belong to one and only
class, and that is set in the driver's devclass field.
devclass_add_device is called to enumerate the device within the class
and actually register it with the class, which happens with the
class's register_dev callback.
NOTE: The device class structures and core routines to manipulate them
are not in the mainline kernel, so the discussion is still a bit
speculative.
Driver
~~~~~~
When a driver is attached to a device, the device is inserted into the
driver's list of devices.
driverfs
~~~~~~~~
A symlink is created in the bus's 'devices' directory that points to
the device's directory in the physical hierarchy.
A symlink is created in the driver's 'devices' directory that points
to the device's directory in the physical hierarchy.
A directory for the device is created in the class's directory. A
symlink is created in that directory that points to the device's
physical location in the driverfs tree.
A symlink can be created (though this isn't done yet) in the device's
physical directory to either its class directory, or the class's
top-level directory. One can also be created to point to its driver's
directory also.
driver_register
~~~~~~~~~~~~~~~
The process is almost identical for when a new driver is added.
The bus's list of devices is iterated over to find a match. Devices
that already have a driver are skipped. All the devices are iterated
over, to bind as many devices as possible to the driver.
Removal
~~~~~~~
When a device is removed, the reference count for it will eventually
go to 0. When it does, the remove callback of the driver is called. It
is removed from the driver's list of devices and the reference count
of the driver is decremented. All symlinks between the two are removed.
When a driver is removed, the list of devices that it supports is
iterated over, and the driver's remove callback is called for each
one. The device is removed from that list and the symlinks removed.
Bus Types
Definition
~~~~~~~~~~
struct bus_type {
char * name;
rwlock_t lock;
atomic_t refcount;
struct list_head node;
struct list_head devices;
struct list_head drivers;
struct driver_dir_entry dir;
struct driver_dir_entry device_dir;
struct driver_dir_entry driver_dir;
int (*match) (struct device * dev, struct device_driver * drv);
struct device (*add) (struct device * parent, char * bus_id);
};
int bus_register(struct bus_type * bus);
Declaration
~~~~~~~~~~~
Each bus type in the kernel (PCI, USB, etc) should declare one static
object of this type. They must initialize the name field, and may
optionally initialize the match callback.
struct bus_type pci_bus_type = {
name: "pci",
match: pci_bus_match,
};
The structure should be exported to drivers in a header file:
extern struct bus_type pci_bus_type;
Registration
~~~~~~~~~~~~
When a bus driver is initialized, it calls bus_register. This
initializes the rest of the fields in the bus object and inserts it
into a global list of bus types. Once the bus object is registered,
the fields in it (e.g. the rwlock_t) are usable by the bus driver.
Callbacks
~~~~~~~~~
match(): Attaching Drivers to Devices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The format of device ID structures and the semantics for comparing
them are inherently bus-specific. Drivers typically declare an array
of device IDs of device they support that reside in a bus-specific
driver structure.
The purpose of the match callback is provide the bus an opportunity to
determine if a particular driver supports a particular device by
comparing the device IDs the driver supports with the device ID of a
particular device, without sacrificing bus-specific functionality or
type-safety.
When a driver is registered with the bus, the bus's list of devices is
iterated over, and the match callback is called for each device that
does not have a driver associated with it.
add(): Adding a child device
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The add callback is available to notify the bus about a child device
at a particular location.
The parent parameter is the parent device of the child to be added. If
parent == NULL, the bus should add the device as a child of a default
parent device or as a child of the root. This policy decision is up to
the bus driver.
The format of the bus_id field should be consistent with the format of
the bus_id field of the rest of the devices on the bus. This requires
the caller to know the format.
On return, the bus driver should return a pointer to the device that
was created. If the device was not created, the bus driver should
return an appropriate error code. Refer to include/linux/err.h for
helper functions to encode errors. Some sample code:
struct device * pci_bus_add(struct device * parent, char * bus_id)
{
...
/* the device already exists */
return ERR_PTR(-EEXIST);
...
}
The caller can check the return value using IS_ERR():
struct device * newdev = pci_bus_type.add(parent,bus_id);
if (IS_ERR(newdev)) {
...
}
Device and Driver Lists
~~~~~~~~~~~~~~~~~~~~~~~
The lists of devices and drivers are intended to replace the local
lists that many buses keep. They are lists of struct devices and
struct device_drivers, respectively. Bus drivers are free to use the
lists as they please, but conversion to the bus-specific type may be
necessary.
The LDM core provides helper functions for iterating over each list.
int bus_for_each_dev(struct bus_type * bus, void * data,
int (*callback)(struct device * dev, void * data));
int bus_for_each_drv(struct bus_type * bus, void * data,
int (*callback)(struct device_driver * drv, void * data));
These helpers iterate over the respective list, and call the callback
for each device or driver in the list. All list accesses are
synchronized by taking the bus's lock (read currently). The reference
count on each object in the list is incremented before the callback is
called; it is decremented after the next object has been obtained. The
lock is not held when calling the callback.
driverfs
~~~~~~~~
There is a top-level directory named 'bus'.
Each bus gets a directory in the bus directory, along with two default
directories:
/sys/bus/pci/
|-- devices
`-- drivers
Drivers registered with the bus get a directory in the bus's drivers
directory:
/sys/bus/pci/
|-- devices
`-- drivers
|-- Intel ICH
|-- Intel ICH Joystick
|-- agpgart
`-- e100
Each device that is discovered a bus of that type gets a symlink in
the bus's devices directory to the device's directory in the physical
hierarchy:
/sys/bus/pci/
|-- devices
| |-- 00:00.0 -> ../../../root/pci0/00:00.0
| |-- 00:01.0 -> ../../../root/pci0/00:01.0
| `-- 00:02.0 -> ../../../root/pci0/00:02.0
`-- drivers
Exporting Attributes
~~~~~~~~~~~~~~~~~~~~
struct bus_attribute {
struct attribute attr;
ssize_t (*show)(struct bus_type *, char * buf, size_t count, loff_t off);
ssize_t (*store)(struct bus_type *, const char * buf, size_t count, loff_t off);
};
Bus drivers can export attributes using the BUS_ATTR macro that works
similarly to the DEVICE_ATTR macro for devices. For example, a definition
like this:
static BUS_ATTR(debug,0644,show_debug,store_debug);
is equivalent to declaring:
static bus_attribute bus_attr_debug;
This can then be used to add and remove the attribute from the bus's
driverfs directory using:
int bus_create_file(struct bus_type *, struct bus_attribute *);
void bus_remove_file(struct bus_type *, struct bus_attribute *);
Device Classes
Introduction
~~~~~~~~~~~~
A device class describes a type of device, like an audio or network
device. The following device classes have been identified:
<Insert List of Device Classes Here>
Each device class defines a set of semantics and a programming interface
that devices of that class adhere to. Device drivers are the
implemention of that programming interface for a particular device on
a particular bus.
Device classes are agnostic with respect to what bus a device resides
on.
Programming Interface
~~~~~~~~~~~~~~~~~~~~~
The device class structure looks like:
typedef int (*devclass_add)(struct device *);
typedef void (*devclass_remove)(struct device *);
struct device_class {
char * name;
rwlock_t lock;
u32 devnum;
struct list_head node;
struct list_head drivers;
struct list_head intf_list;
struct driver_dir_entry dir;
struct driver_dir_entry device_dir;
struct driver_dir_entry driver_dir;
devclass_add add_device;
devclass_remove remove_device;
};
A typical device class definition would look like:
struct device_class input_devclass = {
.name = "input",
.add_device = input_add_device,
.remove_device = input_remove_device,
};
Each device class structure should be exported in a header file so it
can be used by drivers, extensions and interfaces.
Device classes are registered and unregistered with the core using:
int devclass_register(struct device_class * cls);
void devclass_unregister(struct device_class * cls);
Devices
~~~~~~~
As devices are bound to drivers, they are added to the device class
that the driver belongs to. Before the driver model core, this would
typically happen during the driver's probe() callback, once the device
has been initialized. It now happens after the probe() callback
finishes from the core.
The device is enumerated in the class. Each time a device is added to
the class, the class's devnum field is incremented and assigned to the
device. The field is never decremented, so if the device is removed
from the class and re-added, it will receive a different enumerated
value.
The class is allowed to create a class-specific structure for the
device and store it in the device's class_data pointer.
There is no list of devices in the device class. Each driver has a
list of devices that it supports. The device class has a list of
drivers of that particular class. To access all of the devices in the
class, iterate over the device lists of each driver in the class.
Device Drivers
~~~~~~~~~~~~~~
Device drivers are added to device classes when they are registered
with the core. A driver specifies the class it belongs to by setting
the struct device_driver::devclass field.
driverfs directory structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is a top-level driverfs directory named 'class'.
Each class gets a directory in the class directory, along with two
default subdirectories:
class/
`-- input
|-- devices
`-- drivers
Drivers registered with the class get a symlink in the drivers/ directory
that points the driver's directory (under its bus directory):
class/
`-- input
|-- devices
`-- drivers
`-- usb:usb_mouse -> ../../../bus/drivers/usb_mouse/
Each device gets a symlink in the devices/ directory that points to the
device's directory in the physical hierarchy:
class/
`-- input
|-- devices
| `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
`-- drivers
Exporting Attributes
~~~~~~~~~~~~~~~~~~~~
struct devclass_attribute {
struct attribute attr;
ssize_t (*show)(struct device_class *, char * buf, size_t count, loff_t off);
ssize_t (*store)(struct device_class *, const char * buf, size_t count, loff_t off);
};
Class drivers can export attributes using the DEVCLASS_ATTR macro that works
similarly to the DEVICE_ATTR macro for devices. For example, a definition
like this:
static DEVCLASS_ATTR(debug,0644,show_debug,store_debug);
is equivalent to declaring:
static devclass_attribute devclass_attr_debug;
The bus driver can add and remove the attribute from the class's
driverfs directory using:
int devclass_create_file(struct device_class *, struct devclass_attribute *);
void devclass_remove_file(struct device_class *, struct devclass_attribute *);
In the example above, the file will be named 'debug' in placed in the
class's directory in driverfs.
Interfaces
~~~~~~~~~~
There may exist multiple mechanisms for accessing the same device of a
particular class type. Device interfaces describe these mechanisms.
When a device is added to a device class, the core attempts to add it
to every interface that is registered with the device class.
The Basic Device Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~
struct device {
struct list_head g_list;
struct list_head node;
struct list_head bus_list;
struct list_head driver_list;
struct list_head intf_list;
struct list_head children;
struct device * parent;
char name[DEVICE_NAME_SIZE];
char bus_id[BUS_ID_SIZE];
spinlock_t lock;
atomic_t refcount;
struct bus_type * bus;
struct driver_dir_entry dir;
u32 class_num;
struct device_driver *driver;
void *driver_data;
void *platform_data;
u32 current_state;
unsigned char *saved_state;
void (*release)(struct device * dev);
};
Fields
~~~~~~
g_list: Node in the global device list.
node: Node in device's parent's children list.
bus_list: Node in device's bus's devices list.
driver_list: Node in device's driver's devices list.
intf_list: List of intf_data. There is one structure allocated for
each interface that the device supports.
children: List of child devices.
name: ASCII description of device.
Example: " 3Com Corporation 3c905 100BaseTX [Boomerang]"
bus_id: ASCII representation of device's bus position. This
field should a name unique across all devices on the
bus type the device belongs to.
Example: PCI bus_ids are in the form of
<bus number>:<slot number>.<function number>
This name is unique across all PCI devices in the system.
lock: Spinlock for the device.
refcount: Reference count on the device.
bus: Pointer to struct bus_type that device belongs to.
dir: Device's driverfs directory.
driver: Pointer to struct device_driver that controls the device.
driver_data: Driver-specific data.
class_num: Class-enumerated value of the device.
platform_data: Platform data specific to the device.
current_state: Current power state of the device.
saved_state: Pointer to saved state of the device. This is usable by
the device driver controlling the device.
release: Callback to free the device after all references have
gone away. This should be set by the allocator of the
device (i.e. the bus driver that discovered the device).
Programming Interface
~~~~~~~~~~~~~~~~~~~~~
The bus driver that discovers the device uses this to register the
device with the core:
int device_register(struct device * dev);
The bus should initialize the following fields:
- parent
- name
- bus_id
- bus
A device is removed from the core when its reference count goes to
0. The reference count can be adjusted using:
struct device * get_device(struct device * dev);
void put_device(struct device * dev);
get_device() will return a pointer to the struct device passed to it
if the reference is not already 0 (if it's in the process of being
removed already).
A driver can take use the lock in the device structure using:
void lock_device(struct device * dev);
void unlock_device(struct device * dev);
Attributes
~~~~~~~~~~
struct device_attribute {
struct attribute attr;
ssize_t (*show)(struct device * dev, char * buf, size_t count, loff_t off);
ssize_t (*store)(struct device * dev, const char * buf, size_t count, loff_t off);
};
Attributes of devices can be exported via drivers using a simple
procfs-like interface.
Please see Documentation/filesystems/driverfs.txt for more information
on how driverfs works.
Attributes are declared using a macro called DEVICE_ATTR:
#define DEVICE_ATTR(name,mode,show,store)
Example:
DEVICE_ATTR(power,0644,show_power,store_power);
This declares a structure of type struct device_attribute named
'dev_attr_power'. This can then be added and removed to the device's
directory using:
int device_create_file(struct device *device, struct device_attribute * entry);
void device_remove_file(struct device * dev, struct device_attribute * attr);
Example:
device_create_file(dev,&dev_attr_power);
device_remove_file(dev,&dev_attr_power);
The file name will be 'power' with a mode of 0644 (-rw-r--r--).
Device Drivers
struct device_driver {
char * name;
struct bus_type * bus;
rwlock_t lock;
atomic_t refcount;
list_t bus_list;
list_t devices;
struct driver_dir_entry dir;
int (*probe) (struct device * dev);
int (*remove) (struct device * dev);
int (*suspend) (struct device * dev, u32 state, u32 level);
int (*resume) (struct device * dev, u32 level);
void (*release) (struct device_driver * drv);
};
Allocation
~~~~~~~~~~
Device drivers are statically allocated structures. Though there may
be multiple devices in a system that a driver supports, struct
device_driver represents the driver as a whole (not a particular
device instance).
Initialization
~~~~~~~~~~~~~~
The driver must initialize at least the name and bus fields. It should
also initalize the devclass field (when it arrives), so it may obtain
the proper linkage internally. It should also initialize as many of
the callbacks as possible, though each is optional.
Declaration
~~~~~~~~~~~
As stated above, struct device_driver objects are statically
allocated. Below is an example declaration of the eepro100
driver. This declaration is hypothetical only; it relies on the driver
being converted completely to the new model.
static struct device_driver eepro100_driver = {
name: "eepro100",
bus: &pci_bus_type,
devclass: &ethernet_devclass, /* when it's implemented */
probe: eepro100_probe,
remove: eepro100_remove,
suspend: eepro100_suspend,
resume: eepro100_resume,
};
Most drivers will not be able to be converted completely to the new
model because the bus they belong to has a bus-specific structure with
bus-specific fields that cannot be generalized.
The most common example this are device ID structures. A driver
typically defines an array of device IDs that it supports. The format
of this structure and the semantics for comparing device IDs is
completely bus-specific. Defining them as bus-specific entities would
sacrifice type-safety, so we keep bus-specific structures around.
Bus-specific drivers should include a generic struct device_driver in
the definition of the bus-specific driver. Like this:
struct pci_driver {
const struct pci_device_id *id_table;
struct device_driver driver;
};
A definition that included bus-specific fields would look something
like (using the eepro100 driver again):
static struct pci_driver eepro100_driver = {
id_table: eepro100_pci_tbl,
driver: {
name: "eepro100",
bus: &pci_bus_type,
devclass: &ethernet_devclass, /* when it's implemented */
probe: eepro100_probe,
remove: eepro100_remove,
suspend: eepro100_suspend,
resume: eepro100_resume,
},
};
Some may find the syntax of embedded struct intialization awkward or
even a bit ugly. So far, it's the best way we've found to do what we want...
Registration
~~~~~~~~~~~~
int driver_register(struct device_driver * drv);
The driver registers the structure on startup. For drivers that have
no bus-specific fields (i.e. don't have a bus-specific driver
structure), they would use driver_register and pass a pointer to their
struct device_driver object.
Most drivers, however, will have a bus-specific structure and will
need to register with the bus using something like pci_driver_register.
It is important that drivers register their drivers as early as
possible. Registration with the core initializes several fields in the
struct device_driver object, including the reference count and the
lock. These fields are assumed to be valid at all times and may be
used by the device model core or the bus driver.
Transition Bus Drivers
~~~~~~~~~~~~~~~~~~~~~~
By defining wrapper functions, the transition to the new model can be
made easier. Drivers can ignore the generic structure altogether and
let the bus wrapper fill in the fields. For the callbacks, the bus can
define generic callbacks that forward the call to the bus-specific
callbacks of the drivers.
This solution is intended to be only temporary. In order to get class
information in the driver, the drivers must be modified anyway. Since
converting drivers to the new model should reduce some infrastructural
complexity and code size, it is recommended that they are converted as
class information is added.
Access
~~~~~~
Once the object has been registered, it may access the common fields of
the object, like the lock and the list of devices.
int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));
The devices field is a list of all the devices that have been bound to
the driver. The LDM core provides a helper function to operate on all
the devices a driver controls. This helper locks the driver on each
node access, and does proper reference counting on each device as it
accesses it.
driverfs
~~~~~~~~
When a driver is registered, a driverfs directory is created in its
bus's directory. In this directory, the driver can export an interface
to userspace to control operation of the driver on a global basis;
e.g. toggling debugging output in the driver.
A future feature of this directory will be a 'devices' directory. This
directory will contain symlinks to the directories of devices it
supports.
Callbacks
~~~~~~~~~
int (*probe) (struct device * dev);
probe is called to verify the existence of a certain type of
hardware. This is called during the driver binding process, after the
bus has verified that the device ID of a device matches one of the
device IDs supported by the driver.
This callback only verifies that there actually is supported hardware
present. It may allocate a driver-specific structure, but it should
not do any initialization of the hardware itself. The device-specific
structure may be stored in the device's driver_data field.
int (*init) (struct device * dev);
init is called during the binding stage. It is called after probe has
successfully returned and the device has been registered with its
class. It is responsible for initializing the hardware.
int (*remove) (struct device * dev);
remove is called to dissociate a driver with a device. This may be
called if a device is physically removed from the system, if the
driver module is being unloaded, or during a reboot sequence.
It is up to the driver to determine if the device is present or
not. It should free any resources allocated specifically for the
device; i.e. anything in the device's driver_data field.
If the device is still present, it should quiesce the device and place
it into a supported low-power state.
int (*suspend) (struct device * dev, u32 state, u32 level);
suspend is called to put the device in a low power state. There are
several stages to sucessfully suspending a device, which is denoted in
the @level parameter. Breaking the suspend transition into several
stages affords the platform flexibility in performing device power
management based on the requirements of the system and the
user-defined policy.
SUSPEND_NOTIFY notifies the device that a suspend transition is about
to happen. This happens on system power state transition to verify
that all devices can sucessfully suspend.
A driver may choose to fail on this call, which should cause the
entire suspend transition to fail. A driver should fail only if it
knows that the device will not be able to be resumed properly when the
system wakes up again. It could also fail if it somehow determines it
is in the middle of an operation too important to stop.
SUSPEND_DISABLE tells the device to stop I/O transactions. When it
stops transactions, or what it should do with unfinished transactions
is a policy of the driver. After this call, the driver should not
accept any other I/O requests.
SUSPEND_SAVE_STATE tells the device to save the context of the
hardware. This includes any bus-specific hardware state and
device-specific hardware state. A pointer to this saved state can be
stored in the device's saved_state field.
SUSPEND_POWER_DOWN tells the driver to place the device in the low
power state requested.
Whether suspend is called with a given level is a policy of the
platform. Some levels may be omitted; drivers must not assume the
reception of any level. However, all levels must be called in the
order above; i.e. notification will always come before disabling;
disabling the device will come before suspending the device.
All calls are made with interrupts enabled, except for the
SUSPEND_POWER_DOWN level.
int (*resume) (struct device * dev, u32 level);
Resume is used to bring a device back from a low power state. Like the
suspend transition, it happens in several stages.
RESUME_POWER_ON tells the driver to set the power state to the state
before the suspend call (The device could have already been in a low
power state before the suspend call to put in a lower power state).
RESUME_RESTORE_STATE tells the driver to restore the state saved by
the SUSPEND_SAVE_STATE suspend call.
RESUME_ENABLE tells the driver to start accepting I/O transactions
again. Depending on driver policy, the device may already have pending
I/O requests.
RESUME_POWER_ON is called with interrupts disabled. The other resume
levels are called with interrupts enabled.
As with the various suspend stages, the driver must not assume that
any other resume calls have been or will be made. Each call should be
self-contained and not dependent on any external state.
Attributes
~~~~~~~~~~
struct driver_attribute {
struct attribute attr;
ssize_t (*show)(struct device_driver *, char * buf, size_t count, loff_t off);
ssize_t (*store)(struct device_driver *, const char * buf, size_t count, loff_t off);
};
Device drivers can export attributes via their driverfs directories.
Drivers can declare attributes using a DRIVER_ATTR macro that works
identically to the DEVICE_ATTR macro.
Example:
DRIVER_ATTR(debug,0644,show_debug,store_debug);
This is equivalent to declaring:
struct driver_attribute driver_attr_debug;
This can then be used to add and remove the attribute from the
driver's directory using:
int driver_create_file(struct device_driver *, struct driver_attribute *);
void driver_remove_file(struct device_driver *, struct driver_attribute *);
Device Interfaces
Introduction
~~~~~~~~~~~~
Device interfaces are the logical interfaces of device classes that correlate
directly to userspace interfaces, like device nodes.
Each device class may have multiple interfaces through which you can
access the same device. An input device may support the mouse interface,
the 'evdev' interface, and the touchscreen interface. A SCSI disk would
support the disk interface, the SCSI generic interface, and possibly a raw
device interface.
Device interfaces are registered with the class they belong to. As devices
are added to the class, they are added to each interface registered with
the class. The interface is responsible for determining whether the device
supports the interface or not.
Programming Interface
~~~~~~~~~~~~~~~~~~~~~
struct device_interface {
char * name;
rwlock_t lock;
u32 devnum;
struct device_class * devclass;
struct list_head node;
struct driver_dir_entry dir;
int (*add_device)(struct device *);
int (*add_device)(struct intf_data *);
};
int interface_register(struct device_interface *);
void interface_unregister(struct device_interface *);
An interface must specify the device class it belongs to. It is added
to that class's list of interfaces on registration.
Interfaces can be added to a device class at any time. Whenever it is
added, each device in the class is passed to the interface's
add_device callback. When an interface is removed, each device is
removed from the interface.
Devices
~~~~~~~
Once a device is added to a device class, it is added to each
interface that is registered with the device class. The class
is expected to place a class-specific data structure in
struct device::class_data. The interface can use that (along with
other fields of struct device) to determine whether or not the driver
and/or device support that particular interface.
Data
~~~~
struct intf_data {
struct list_head node;
struct device_interface * intf;
struct device * dev;
u32 intf_num;
};
int interface_add_data(struct interface_data *);
The interface is responsible for allocating and initializing a struct
intf_data and calling interface_add_data() to add it to the device's list
of interfaces it belongs to. This list will be iterated over when the device
is removed from the class (instead of all possible interfaces for a class).
This structure should probably be embedded in whatever per-device data
structure the interface is allocating anyway.
Devices are enumerated within the interface. This happens in interface_add_data()
and the enumerated value is stored in the struct intf_data for that device.
driverfs
~~~~~~~~
Each interface is given a directory in the directory of the device
class it belongs to:
Interfaces get a directory in the class's directory as well:
class/
`-- input
|-- devices
|-- drivers
|-- mouse
`-- evdev
When a device is added to the interface, a symlink is created that points
to the device's directory in the physical hierarchy:
class/
`-- input
|-- devices
| `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
|-- drivers
| `-- usb:usb_mouse -> ../../../bus/drivers/usb_mouse/
|-- mouse
| `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
`-- evdev
`-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
Future Plans
~~~~~~~~~~~~
A device interface is correlated directly with a userspace interface
for a device, specifically a device node. For instance, a SCSI disk
exposes at least two interfaces to userspace: the standard SCSI disk
interface and the SCSI generic interface. It might also export a raw
device interface.
Many interfaces have a major number associated with them and each
device gets a minor number. Or, multiple interfaces might share one
major number, and each get receive a range of minor numbers (like in
the case of input devices).
These major and minor numbers could be stored in the interface
structure. Major and minor allocation could happen when the interface
is registered with the class, or via a helper function.
The Linux Kernel Device Model
Patrick Mochel <mochel@osdl.org>
26 August 2002
Overview
~~~~~~~~
This driver model is a unification of all the current, disparate driver models
that are currently in the kernel. It is intended is to augment the
bus-specific drivers for bridges and devices by consolidating a set of data
and operations into globally accessible data structures.
Current driver models implement some sort of tree-like structure (sometimes
just a list) for the devices they control. But, there is no linkage between
the different bus types.
A common data structure can provide this linkage with little overhead: when a
bus driver discovers a particular device, it can insert it into the global
tree as well as its local tree. In fact, the local tree becomes just a subset
of the global tree.
Common data fields can also be moved out of the local bus models into the
global model. Some of the manipulation of these fields can also be
consolidated. Most likely, manipulation functions will become a set
of helper functions, which the bus drivers wrap around to include any
bus-specific items.
The common device and bridge interface currently reflects the goals of the
modern PC: namely the ability to do seamless Plug and Play, power management,
and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
us that any device in the system may fit any of these criteria.)
In reality, not every bus will be able to support such operations. But, most
buses will support a majority of those operations, and all future buses will.
In other words, a bus that doesn't support an operation is the exception,
instead of the other way around.
Downstream Access
~~~~~~~~~~~~~~~~~
Common data fields have been moved out of individual bus layers into a common
data structure. But, these fields must still be accessed by the bus layers,
and sometimes by the device-specific drivers.
Other bus layers are encouraged to do what has been done for the PCI layer.
struct pci_dev now looks like this:
struct pci_dev {
...
struct device device;
};
Note first that it is statically allocated. This means only one allocation on
device discovery. Note also that it is at the _end_ of struct pci_dev. This is
to make people think about what they're doing when switching between the bus
driver and the global driver; and to prevent against mindless casts between
the two.
The PCI bus layer freely accesses the fields of struct device. It knows about
the structure of struct pci_dev, and it should know the structure of struct
device. PCI devices that have been converted generally do not touch the fields
of struct device. More precisely, device-specific drivers should not touch
fields of struct device unless there is a strong compelling reason to do so.
This abstraction is prevention of unnecessary pain during transitional phases.
If the name of the field changes or is removed, then every downstream driver
will break. On the other hand, if only the bus layer (and not the device
layer) accesses struct device, it is only those that need to change.
User Interface
~~~~~~~~~~~~~~
By virtue of having a complete hierarchical view of all the devices in the
system, exporting a complete hierarchical view to userspace becomes relatively
easy. This has been accomplished by implementing a special purpose virtual
file system named driverfs. It is hence possible for the user to mount the
whole driverfs filesystem anywhere in userspace.
This can be done permanently by providing the following entry into the
/etc/fstab (under the provision that the mount point does exist, of course):
none /devices driverfs defaults 0 0
Or by hand on the command line:
~: mount -t driverfs none /devices
Whenever a device is inserted into the tree, a directory is created for it.
This directory may be populated at each layer of discovery - the global layer,
the bus layer, or the device layer.
The global layer currently creates two files - name and 'power'. The
former only reports the name of the device. The latter reports the
current power state of the device. It also be used to set the current
power state.
The bus layer may also create files for the devices it finds while probing the
bus. For example, the PCI layer currently creates 'irq' and 'resource' files
for each PCI device.
A device-specific driver may also export files in its directory to expose
device-specific data or tunable interfaces.
More information about the driverfs directory layout can be found in
the other documents in this directory and in the file
Documentation/filesystems/driverfs.txt.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment