Commits · e9356da889878f12ef8be47d291e78affbc71750 · Kirill Smelkov / linux

19 Jul, 2002 28 commits

Martin Dalecki authored Jul 18, 2002

Most noticable in the patch:

1. we handle IRQ sharing now better then ever

2. survives quite a lot of testing by few people. Forexample
cat /dev/hdb > /dev/null, where /dev/hdb contains a CD-ROM
with a big cratch on the surface making sure it's broken :-).
it's BTW. amanzing how wide the cratch had to be until errors
ocurred.

3. Doesn't play with rq_rdev and friends

Fri Jul 12 05:04:32 CEST 2002 ide-clean-99

- Push nIEN disabling down at the place where we are finished with a particular
   request.

- First round of command line parser cleanups by Gerald Champagne.

- Unfold the drive eviction functions in do_request(). This allowed us to
   realize that we don't have to re-get the major/minor numbers of the device we
   are action on from the raw device field of the currently running request. One
   significant place less in kernel where major/minor data gets manipulated.

- Move the big IDE_BUSY loop out of do_request to do_ide_request().  This makes
   us realize that we don't have to clear the IDE_BUSY bit just before
   reentering do_request to look for more requests still pending on the queue
   and set it immediately again.

   This is fixing a tinny race on the code path from IRQ or timer function,
   where we had a tinny window between the clearing of the IDE_BUSY bit and
   reentering the request queue for completely unrelated requests to come in to
   our way.

- Don't return any value in do_reset1(). It's always ATA_OP_CONTINUES. Split it
   up in to two functions one for disks (well in fact channels) and one for
   ATAPI devices. It turns out that they can be moved to the places where they
   are used to clarify the code flow. The only function remaining is
   do_reset_channel() now.

- Duplicate code from ide_do_drive_code explicitely in ide_raw_taskfile().
   Simplify ide_raw_taskfile() thereafter. Realize that ide_do_drive_cmd()
   is now only used by ATAPI devices. Move it therefore to atapi.c.

- Do busy polling for ATAPI reset operations. This is much safer then the
   previous timer games played there. It simply doesn't make sense to give the
   bus up during such a subtile operation. We don't have to disable IRQs here as
   well, since we are already under the protection of the do_request mechanisms.
   (Well hopefully...)

- Remove no longer used reset_poll() function. poll_timeout and friends are now
   used only in pdc4030 code. Those function where not called from IRQ context
   but they where set as handlers and not as expiry functions.

- Return ATA_OP_CONTINUES instead of ATA_OP_FINISHED in ata_error(), to signal
   that we are willing to retry the operation until the maximal number of retry
   attempts is exceeded. Returning ATA_OP_FINISHED without prior end_request()
   hangs the system.

- Apply trivia from DJ patch set.

- Apply small configuration fix to ide-pci.c from Muli Ben-Yehuda.

- Feed add_blkdev_randomness with information we already have in struct
   ata_channel *ch->major, instead of using the major(macro) on the request in
   question.

- Make ide_raw_taskfile use the same request submission mechanism as
   tcq_invalidate_queue(). Something similar would be ideal for ioctl() code as
   well.

- Implement actual device reset. Realize that the recalibration procedure is
   doomed by the standard. Don't try to recover by recalibrating devices
   therefore -just our retry mechanism should work in those cases. And suddenly
   the error handling code is IRQ safe.

- Reinvent the ATA reset operation, since it is apparently needed. We still
   have to do the whole transfer timing reconfiguration there.

- Move drive_is_ready(), which is in reality an attempt to check for IRQ
   requesters without clearing the IRQ line, over to the place where it belongs:
   device.c, which is the direct device access abstraction place.  Rename it to
   ata_status_irq() to prevent global name space pollution.

- Updates to the pdc202xxx host chip controller setup code by Bart³omiej
   ¯o³nierkiewicz:

   Forward port 2.4 patch by Hank Yang from Promise:

	- Add PDC20271 support
	- Disable LBA48 support on PDC20262
	- Fix ATAPI UDMA port value
	- Add new quirk drive
	- Adjust timings for all drives when using ATA133
	- Update pdc202xx_reset() waiting time

- Mark TCQ as dangerous and add some bits about it to the help.

- Add some missing exports.

- Some small ide-scsi.c host allocation fixes by sullivan.

e9356da8

[PATCH] MD - Get rid of dev in rdev and use bdev exclusively. · 3ec59360

Neil Brown authored Jul 18, 2002

Get rid of dev in rdev and use bdev exclusively.

There is an awkwardness here in that userspace sometimes
passed down a dev_t (e.g. hot_add_disk) and sometime
a major and a minor (e.g. add_new_disk).  Should we convert
both to kdev_t as the uniform standard....
That is what was being done but it seemed very clumsy and
things were gets converted back and forth a lot.

As bdget used a dev_t, I felt safe in staying with dev_t once I
had one rather than converting to kdev_t and back.

3ec59360

[PATCH] MD - Change partition_name calls to bdev_partition_name were possible. · c4909782

Neil Brown authored Jul 18, 2002

Change partition_name calls to bdev_partition_name were possible.

All part of decreasing reliance on device numbers... atleast in
appearance.

c4909782

[PATCH] MD - Remove the sb from the mddev · 43fb3e86

Neil Brown authored Jul 18, 2002

Remove the sb from the mddev

Now that al the important information is in mddev, we don't need
to have an sb off the mddev.  We only keep the per-device ones.

Previously we determined if "set_array_info" had been run byb checking
mddev->sb.  Now we check mddev->raid_disks on the assumption that
any valid array MUST have a non-zero number of devices.

43fb3e86

[PATCH] MD - Remove dependance on superblock · bab5d712

Neil Brown authored Jul 18, 2002

Remove dependance on superblock

All the remaining field of interest in the superblock
get duplicated in the mddev struture and this is treated as
authoritative.  The superblock gets completely generated at
write time, and all useful information extracted at read time.

This means that we can slot in different superblock formats
without affecting the bulk of the code.

bab5d712

[PATCH] MD - Move persistent from superblock to mddev · 5e601b35
Neil Brown authored Jul 18, 2002
```
Move persistent from superblock to mddev

Tidyup calc_dev_sboffset and calc_dev_size on the way
```
5e601b35

[PATCH] MD - Remove number and raid_disk from personality arrays · 9f3b0380

Neil Brown authored Jul 18, 2002

Remove number and raid_disk from personality arrays

These are redundant.  number not needed any more
raid_disk never was as that is the index.

9f3b0380

[PATCH] MD - nr_disks is gone from multipath/raid1 · 4395b447
Neil Brown authored Jul 18, 2002
```
nr_disks is gone from multipath/raid1

Never used.
```
4395b447

[PATCH] MD - Remove old_dev field. · f2421da3

Neil Brown authored Jul 18, 2002

Remove old_dev field.

We used to monitor the pervious device number of a
component device for superblock maintenance.  This is
not needed any more.

f2421da3

[PATCH] MD - Don't maintain disc status in superblock. · d109d34c

Neil Brown authored Jul 18, 2002

Don't maintain disc status in superblock.

The state is now in rdev so we don't maintain it
in superblock any more.
We also nolonger test content of superblock for
disk status
mddev->spare is now an rdev and not a superblock fragment.

d109d34c

[PATCH] MD - when writing superblock, generate from mddev/rdev info. · 1b114450

Neil Brown authored Jul 18, 2002

when writing superblock, generate from mddev/rdev info.

Rather than relying on the superblock info being kept up-to-date,
we regenerate the superblock from mddev/rdev info before
each write.

1b114450

[PATCH] MD - Add "degraded" field to md device · d58aa811

Neil Brown authored Jul 18, 2002

Add "degraded" field to md device

This is used to determine if a spare should be added
without relying on the superblock.

d58aa811

[PATCH] MD - Add in_sync flag to each rdev · 8ee83145

Neil Brown authored Jul 18, 2002

Add in_sync flag to each rdev

This currently mirrors the MD_DISK_SYNC superblock flag,
but soon it will be authoritative and the superblock will
only be consulted at start time.

8ee83145

[PATCH] MD - Add raid_disk field to rdev · 9347ddf5

Neil Brown authored Jul 18, 2002

Add raid_disk field to rdev

Also change find_rdev_nr to find based on position
in array (raid_disk) not position in superblock (number).

9347ddf5

[PATCH] MD - Improve handling of spares in md · 82081640

Neil Brown authored Jul 18, 2002

Improve handling of spares in md

- hot_remove_disk is given the raid_disk rather than descriptor number
  so that it can find the device in internal array directly, no search.
- spare_inactive now uses mddev->spare->raid_disk instead of
  mddev->spare->number so it can find the device directly without searching
- spare_write does not need number.  It can use mddev->spare->raid_disk as above.
- spare_active does not need &mddev->spare.  It finds the descriptor directly
  and fixes it without this pointer

82081640

[PATCH] MD - Remove concept of 'spare' drive for multipath. · 03aa5c1c

Neil Brown authored Jul 18, 2002

Remove concept of 'spare' drive for multipath.

Multipath now treats all working devices as
active and does io to to first working one.

03aa5c1c

[PATCH] MD - Set desc_nr more sanely. · 999a2029

Neil Brown authored Jul 18, 2002

Set desc_nr more sanely.

Currently rdev->desc_nr is set in sync_sbs which is typcially
called just before writing out the superblocks, which is an
odd place to set it.
It is also called when a new disk is added (which is sane) and
when an old disc is imported ... which is quesitonable.

With this patch it is set when a new disk is added, and when
the superblocks are being analysed, which makes lots of sense.

MULTIPATH is particularly an issue here.  The old code tried
to figure the desc_nr for an rdev by matching device numbers in
the superblock.  This doesn't make a lot of sense as
device numbers can change.  Now MULTIPATH components
get sequential desc_nrs.

999a2029

[PATCH] MD - Move md_update_sb calls · 6f42312c

Neil Brown authored Jul 18, 2002

Move md_update_sb calls

When a change which requires a superblock update happens
at interrupt time, we currently set a flag (sb_dirty) and
wakeup to per-array thread (raid1/raid5d/multipathd) to
do the actual update.

This patch centralises this.  The sb_update is now done
by the mdrecoveryd thread.  As this is always woken up after
the error handler is called, we don't need the call to wakeup
the local thread any more.

With this, we don't need "md_update_sb" to lock the array
any more and only use __md_update_sb which is local to md.c
So we rename __md_update_sb back to md_update_sb and stop
exporting it.

6f42312c

[PATCH] MD - Pass the correct bdev to md_error · a15b60a2

Neil Brown authored Jul 18, 2002

Pass the correct bdev to md_error

After a call to generic_make_request, bio->bi_bdev can have changed
(e.g. by a re-mapped like raid0).  So we cannot trust it for reporting
the source of an error.  This patch takes care to find the correct
bdev.

a15b60a2

[PATCH] MD - Rdev list cleanups. · 2a9400e9

Neil Brown authored Jul 18, 2002

Rdev list cleanups.

An "rdev" can be on three different lists.
 - the list of all rdevs
 - the list of pending rdevs
 - the list of rdevs for a given mddev

The first list is now only used to list "unused" devices in
/proc/mdstat, and only pending rdevs can be unused, so this list
isn't necessary.
An rdev cannot be both pending and in an mddev, so we know rdev will
only be on one list at at time.

This patch discards  the all_raid_disks list, and changes the
pending list to use "same_set" in the rdev.  It also changes
/proc/mdstat to iterate through pending devices, rather than through
all devices.

So now an rdev is only on one list, either the pending list
or the list of rdevs for a given mddev.  This means that
ITERATE_RDEV_GENERIC doesn't need to be told which field,
to walk down: there is ony one.

2a9400e9

[PATCH] MD - Get rid of find_rdev_all · 70e96bef

Neil Brown authored Jul 18, 2002

Get rid of find_rdev_all

find_rdev_all is now only used to check if a device is already
used in an md array.

We change lock_rdev so that it claims the bdev for
the specific rdev rather than for rdevs in general.
Now lock_rdev will check if the bdev is inuse by another array
or not, so the find_rdev_all check isn't needed and is removed,
along with find_rdev_all itself.

We also make sure that the error code from lock_rdev is
propagated up properly.

70e96bef

[PATCH] MD - Use symbolic names for multipath (-4) and linear (-1) · a0f86742

Neil Brown authored Jul 18, 2002

Use symbolic names for multipath (-4) and linear (-1)

Also, a variable called "level" was being used to store a
"level" and a "personality" number.  This is potentially
confusing, so it is now two variables.

a0f86742

[PATCH] MD - Don't "analyze_sb" when creating new array. · 376163df

Neil Brown authored Jul 18, 2002

Don't "analyze_sb" when creating new array.

When creating a new array (and we have an mddev->sb),
don't both to analyze the superblocks.  There is no point.
Also, these means we always allocate the array sb in
analyze_sbs, rather than conditionally.

376163df

[PATCH] MD - Embed bio in mp_bh rather than separate allocation. · e3de153e

Neil Brown authored Jul 18, 2002

Embed bio in mp_bh rather than separate allocation.

multipath currently allocates an mp_bh and a bio for each
request.  With this patch, the bio is made to be part of the
mp_bh so there is only one allocation, and it from a private
pool (the bio was allocated from a shared pool).

Also remove "remaining" and "cmd" from mp_bh which aren't used.
And remove spare (unused) from multipath_private_data.

e3de153e

[PATCH] MD - 27 - Remove state field from multipath mp_bh structure. · 8e2a19e7

Neil Brown authored Jul 18, 2002

Remove state field from multipath mp_bh structure.

The MPBH_Uptodate flag is set but never used,
The MPBH_SyncPhase flag was never used.
These a both legacy from the copying of raid1.c

MPBH_PreAlloc is no longer needed as due to use of
mempools, so the state field can go...

8e2a19e7

[PATCH] MD - Get multipath to use mempool · e18a7e5c
Neil Brown authored Jul 18, 2002
```
Get multipath to use mempool

... rather than maintaining it's own mempool
```
e18a7e5c

[PATCH] MD - Remove dead consistancy checking code from multipath. · 663c6269

Neil Brown authored Jul 18, 2002

Remove dead consistancy checking code from multipath.

This "consistancy_check" is carried over from raid1 on which multipath
was based, and was not used in raid1 and has since been removed.  Now
it gets removed from multipath too.

663c6269

[PATCH] MD - Remove bdput calls from raid personalities. · 82b0fad1

Neil Brown authored Jul 18, 2002

Remove bdput calls from raid personalities.

Some of the md personalities currently hold a counted reference
on a bdev.  This is not necessary as the main md module will always
hold a counted reference in the rdev.
This patch removes the code to take and drop these unnecessary
references.

82b0fad1

18 Jul, 2002 2 commits

[PATCH] Fix typo in net/sunrpc/xprt.c · 389a5884

Trond Myklebust authored Jul 18, 2002

The appended patch fixes a typo in net/sunrpc/xprt.c: We want to
ensure that we play safe, and only increment the UDP congestion window
when we have successfully transmitted a full frame of data.

In addition, we should perhaps still 'slow start' the UDP congestion
code rather than assuming that we can immediately fire off 8
requests. IOW revert the value of RPC_INITCWND.

389a5884

[PATCH] Fix NFS locking bug · df458c00

Trond Myklebust authored Jul 17, 2002

Here's one bugfix which might help to explain the GRANTED failure. The
bug has been there all along (so I'll probably want to send this to
Marcelo too).

The code in question in supposed to ensure that we don't wait on a
reply if the RPC call doesn't expect one. However, if the socket
transmission failed for some reason, we do actually want to loop and
try again...

This bug will hit the RPC call in nlmsvc_grant_blocked().

df458c00

16 Jul, 2002 9 commits

Kernel version 2.5.26 · 0d84f0ac
Linus Torvalds authored Jul 16, 2002

0d84f0ac

[PATCH] RPC over UDP congestion control updates [8/8] · fefe89f4

Trond Myklebust authored Jul 16, 2002

When determining who gets access to the socket, give priority to
requests that are being resent. Despite the fact that congestion
control now applies to resends, we still want to ensure that resends
get ACKed as soon as possible (and before we start sending off new
requests).

fefe89f4

[PATCH] RPC over UDP congestion control updates [7/8] · 0b51abc8

Trond Myklebust authored Jul 16, 2002

  - Divorce the allocation of free request slots and the congestion
    control. Make the congestion control apply only to when we
    actually send data over the wire. This means that we *do* apply
    congestion control to resent requests: if a timeout has occured,
    and there are too many requests on the wire, delay resending until
    the congestion algorithm allows it.

  - Improve spinlocking by putting the congestion avoidance algoritm
    under xprt->sock_lock. This lock has to be taken *anyway* in
    (almost) all cases where we are updating the congestion control
    data.

0b51abc8

[PATCH] RPC over UDP congestion control updates [6/8] · 4edf0555

Trond Myklebust authored Jul 16, 2002

Eliminate the arbitrary timeouts in xprt_adjust_cwnd(). Strict
enforcement of the congestion avoidance algorithm as detailed in Van
Jacobson's 1998 paper http://www-nrg.ee.lbl.gov/nrg-papers.html
Congestion Avoidance and Control.

4edf0555

[PATCH] RPC over UDP congestion control updates [5/8] · 514349dc
Trond Myklebust authored Jul 16, 2002
```
Clean up the Van Jacobson network congestion control code.
```
514349dc

[PATCH] RPC over UDP congestion control updates [4/8] · c6b43f23

Trond Myklebust authored Jul 16, 2002

Cleanups for the socket locking mechanism.

Improve RPC request ordering by ensuring that RPC tasks that are
already queued on xprt->sending get sent before tasks that happen to
get scheduled just when there is a free slot.

In case the socket send buffer is full, queue the tasks on
xprt->pending rather than xprt->sending in order to eliminate the risk
of accidental wakeups from xprt_release_write() and xprt_write_space().

c6b43f23

[PATCH] RPC over UDP congestion control updates [3/8] · 9ba7d221

Trond Myklebust authored Jul 16, 2002

Improve the response to timeouts. As requests time out, we delay
timing out the remaining requests (in fact we follow exponential
backoff). This is done because we assume either that the round trip
time has been underestimated, or that the network/server is congested,
and we need to back off the resending of new requests.

9ba7d221

[PATCH] RPC over UDP congestion control updates [2/8] · fa7b279e

Trond Myklebust authored Jul 16, 2002

Implement a count of the number of timeouts that have occured since
we last recorded a successful reply from the server.

For the moment this information is merely used in order to improve the
estimate of whether or not the server is down. It will be used in
patch 3/8 in order to improve the timeout backoff algorithm.

fa7b279e

[PATCH] RPC over UDP congestion control updates [1/8] · 77d79030

Trond Myklebust authored Jul 16, 2002

Implement the basic round trip timing algorithm in order to adapt the
timeout values for the most common NFS operations to the server's
rate of response.
Algorithm is described in Van Jacobson's paper 1998 paper
on http://www-nrg.ee.lbl.gov/nrg-papers.html, and is the same as is
used for most TCP stacks.

Following the *BSD code, we implement separate rtt timers for GETATTR,
LOOKUP, READ/READDIR/READLINK, and WRITE. In addition to this, there
is one extra timer for the COMMIT operation.
All the remaining RPC calls use the current system in which a fixed
timeout value gets set by the 'timeo' mount option.

In case of a timeout, the current exponential backoff algoritm is
implemented. Subsequent patches will improve this...

77d79030

15 Jul, 2002 1 commit

[PATCH] Fix bug in xdr_kunmap() · ad4d2648

Trond Myklebust authored Jul 15, 2002

The following patch fixes a bug in xdr_kunmap() that has been known to
deadlock TCP mounts on highmem systems.  It also removes an unnecessary
call to flush_page_to_ram().

ad4d2648