Commits · a8a0a133b03c6863d0f77229d19befca4de905fa · nexedi / linux

03 Jul, 2011 40 commits

isci: pare back error messsages · a8a0a133

Dan Williams authored Jul 01, 2011

The messages emitted from task.c and some from request.c likely
duplicate (in a less undertandable way) what is reported by the
midlayer.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a8a0a133

isci: cleanup silicon revision detection · dc00c8b6

Dan Williams authored Jul 01, 2011

Perform checking per-pci device (even though all systems will only have
1 pci device in this generation), and delete support for silicon that
does not report a proper revision (i.e. A0).
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

dc00c8b6

isci: merge scu_unsolicited_frame.h into unsolicited_frame_control.h · 4e4dca3d
Dan Williams authored Jul 01, 2011
```
Does not need its own file.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
```
4e4dca3d

isci: merge sata.[ch] into request.c · 16ba7709

Dan Williams authored Jul 01, 2011

Undo some needless separation.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

16ba7709

isci: kill 'get/set' macros · 34a99158

Dan Williams authored Jul 01, 2011

Most of these simple dereference macros are longer than their open coded
equivalent.  Deleting enum sci_controller_mode is thrown in for good
measure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

34a99158

isci: retire scic_sds_ and scic_ prefixes · 89a7301f

Dan Williams authored Jun 30, 2011

The distinction between scic_sds_ scic_ and sci_ are no longer relevant
so just unify the prefixes on sci_.  The distinction between isci_ and
sci_ is historically significant, and useful for comparing the old
'core' to the current Linux driver. 'sci_' represents the former core as
well as the routines that are closer to the hardware and protocol than
their 'isci_' brethren. sci == sas controller interface.

Also unwind the 'sds1' out of the parameter structs.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

89a7301f

isci: unify isci_host and scic_sds_controller · d9dcb4ba

Dan Williams authored Jun 30, 2011

Remove the distinction between these two implementations and unify on
isci_host (local instances named ihost).  Hmmm, we had two
'oem_parameters' instances, one was unused... nice.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

d9dcb4ba

isci: unify isci_remote_device and scic_sds_remote_device · 78a6f06e

Dan Williams authored Jun 30, 2011

Remove the distinction between these two implementations and unify on
isci_remote_device (local instances named idev).
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

78a6f06e

isci: unify isci_port and scic_sds_port · ffe191c9

Dan Williams authored Jun 29, 2011

Remove the distinction between these two implementations and unify on
isci_port (local instances named iport).  The duplicate '->owning_port' and
'->isci_port' in both isci_phy and isci_remote_device will be fixed in a later
patch... this is just the straightforward rename/unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ffe191c9

isci: fix scic_sds_remote_device_terminate_requests · 76802ce6

Dan Williams authored Jun 29, 2011

Commit 0815632 "isci: unify remote_device stop_handlers" introduced the
possibility that not all requests get terminated if we reach the
request_count.  Now that we properly reference count devices we don't
need this self-defense and can do the straightforward scan of all active
requests.
Reported-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Acked-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

76802ce6

isci: unify isci_phy and scic_sds_phy · 85280955

Dan Williams authored Jun 28, 2011

They are one in the same object so remove the distinction.  The near
duplicate fields (owning_port, and isci_port) will be cleaned up
after the scic_sds_port isci_port unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

85280955

isci: unify isci_request and scic_sds_request · 5076a1a9

Dan Williams authored Jun 27, 2011

They are one in the same object so remove the distinction.  The near
duplicate fields (owning_controller, and isci_host) will be cleaned up
after the scic_sds_contoller isci_host unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

5076a1a9

isci: rename / clean up scic_sds_stp_request · ba7cb223

Dan Williams authored Jun 27, 2011

* Rename scic_sds_stp_request to isci_stp_request
* Remove the unused fields and union indirection
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ba7cb223

isci: preallocate requests · db056250

Dan Williams authored Jun 17, 2011

the dma_pool interface is optimized for object_size << page_size which
is not the case with isci_request objects and the dma_pool routines show
up in the top of the profile.

The old io_request_table which tracked whether tci slots were in-flight
or not is replaced with an IREQ_ACTIVE flag per request.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

db056250

isci: combine request flags · 38d8879b

Dan Williams authored Jun 23, 2011

Combine three bools into one unsigned long 'flags'. Doesn't increase the
request size due to packing. (to do: optimize the structure layout).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

38d8879b

isci: unify can_queue tracking on the tci_pool, uplevel tag assignment · 312e0c24

Dan Williams authored Jun 28, 2011

The tci_pool tracks our outstanding command slots which are also the 'index'
portion of our tags.  Grabbing the tag early in ->lldd_execute_task let's us
drop the isci_host_can_queue() and ->was_tag_assigned_by_user infrastructure.
->was_tag_assigned_by_user required the task context to be duplicated in
request-local buffer.  With the tci established early we can build the
task_context directly into its final location and skip a memcpy.

With the task context buffer at a known address at request construction we
have the opportunity/obligation to also fix sgl handling.  This rework feels
like it belongs in another patch but the sgl handling and task_context are too
intertwined.
1/ fix the 'ab' pair embedded in the task context to point to the 'cd' pair in
   the task context (previously we were prematurely linking to the staging
   buffer).
2/ fix the broken iteration of pio sgls that assumes all sgls are relative to
   the request, and does a dangerous looking reverse lookup of physical
   address to virtual address.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

312e0c24

isci: Terminate dev requests on FIS err bit rx in NCQ · 9274f45e

Jeff Skirvin authored Jun 23, 2011

When the remote device transitions to a not-ready state because of
an NCQ error condition, all outstanding requests to that device
are terminated and completed to libsas on the normal path.  The
device then waits for a READ LOG EXT command to issue on the task
management path.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

9274f45e

isci: fix frame received locking · 4cffe13e

Dan Williams authored Jun 23, 2011

Updates to the frame_rcvd before need to be atomic with respect to when
they are evaluated by libsas.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

4cffe13e

isci: possible buffer overflow in isci_parse_oem_parameters fixed · 7cafbf1b

Maciej Patelczyk authored Jun 21, 2011

scu_index is a parameter of isci_parse_eom_parameters and is an index
in controller table. There is a check: scu_index > SCI_MAX_CONTROLLERS
which is insufficient and should be: scu_index >= SCI_MAX_CONTROLLERS.
scu_index is used as an index in the table which size is
SCI_MAX_CONTROLLERS.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

7cafbf1b

isci: fix isci_task_execute_tmf completion · 086a0dab

Dan Williams authored Jun 21, 2011

1/ fix the timeout for wait_for_completion_timeout
2/ In the tmf timeout case we need to wait for our termination callback
3/ Once the request is successfully started it will be freed according to the
   normal lifetime for requests.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

086a0dab

isci: fix support for arbitrarily large smp requests · e9bf7095

Dan Williams authored Jun 16, 2011

Instead of duplicating the smp request buffer reuse the one provided by
libsas. This future proofs the driver to support arbitrarily large smp
requests, and shrinks the request structure size by ~700 bytes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

e9bf7095

isci: fix dma_unmap_sg usage · ddcc7e34

Dan Williams authored Jun 17, 2011

One bug and a cleanup:
1/ Fix cases where we were unmapping invalid addresses (smp requests were
   being unmapped)

[  604.662770] ------------[ cut here ]------------
[  604.668026] WARNING: at lib/dma-debug.c:800 check_unmap+0x418/0x740()
[  604.675315] Hardware name: SandyBridge Platform
[  604.680465] isci 0000:03:00.0: DMA-API: device driver tries to free an invalid DMA memory address

2/ The unmap routine is too large to be an inline function, and
   isci_request_io_request_get_next_sge is unused.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ddcc7e34

isci: fix smp response frame overrun · 5edc3348

Dan Williams authored Jun 16, 2011

Due to a typo we currently copy way too much when copying over the
response data, but since a request is likely backed by a full page
allocation we don't corrupt live data.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

5edc3348

isci: kill device_sequence · ff60639d

Dan Williams authored Jun 17, 2011

Now that we have upleveled device reassignment protection to the
isci_remote_device reference count we no longer need this level of
self-defense.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ff60639d

isci: kill isci_remote_device_change_state() · f2088267

Dan Williams authored Jun 16, 2011

Now that "stopping/stopped" are one in the same and signalled by a NULL device
pointer the rest of the device status infrastructure can be removed (->status
and ->state_lock). The "not ready for i/o state" is replaced with a state
flag, and is evaluated under scic_lock so that we don't see transients from
taking the device reference to submitting the i/o.

This also fixes a potential leakage of can_queue slots in the rare case that
SAS_TASK_ABORTED is set at submission.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f2088267

isci: atomic device lookup and reference counting · 209fae14

Dan Williams authored Jun 13, 2011

We have unsafe references to remote devices that are notified to
disappear at lldd_dev_gone.  In order to clean this up we need a single
canonical source for device lookups and stable references once a lookup
succeeds.  Towards that end guarantee that domain_device.lldd_dev is
NULL as soon as we start the process of stopping a device.  Any code
path that wants to safely lookup a remote device must do so through
task->dev->lldd_dev (isci_lookup_device()).

For in-flight references outside of scic_lock we need reference counting
to ensure that the device is not recycled before we are done with it.
Simplify device back references to just scic_sds_request.target_device
which is now the only permissible internal reference that is maintained
relative to the reference count.

There were two occasions where we wanted new i/o's to be treated as
SAS_TASK_UNDELIVERED but where the domain_dev->lldd_dev link is still
intact.  Introduce a 'gone' flag to prevent i/o while waiting for libsas
to take action on the port down event.

One 'core' leftover is that we currently call
scic_remote_device_destruct() from isci_remote_device_deconstruct()
which is called when the 'core' says the device is stopped.  It would be
more natural for the final put to trigger
isci_remote_device_deconstruct() but this implementation is deferred as
it requires other changes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

209fae14

isci: fix ssp response iu buffer size in isci_tmf · 360b03ed

Dan Williams authored Jun 15, 2011

In isci_task_request_complete() we save the response/sense data from the
command.  Make sure isci_tmf has enough space to hold the full response.

[ it does not look like we actually use this data, and
  response_data_len/sense_data_len should be specifying the byte count,
  in any event do the simple fix first so we don't corrupt memory ]
Reported-by: Adam Gruchala <adam.gruchala@intel.com>
Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

360b03ed

isci: cleanup request allocation · 0d0cf14c

Dan Williams authored Jun 13, 2011

Rather than return an error code and update a pointer that was passed by
reference just return the request object directly (or null if allocation
failed).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

0d0cf14c

isci: cleanup/optimize queue increment macros · 994a9303

Dan Williams authored Jun 09, 2011

Every single i/o or event completion incurs a test and branch to see if
the cycle bit changed.  For power-of-2 queue sizes the cycle bit can be
read directly from the rollover of the queue pointer.

Likely premature optimization, but the hidden if() and hidden
assignments / side-effects in the macros were already asking to be
cleaned up.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

994a9303

isci: cleanup tag macros · dd047c8e

Dan Williams authored Jun 09, 2011

A tag is a 16 bit number where the upper four bits is a sequence number
and the remainder is the task context index (tci).  Sanitize the macro
names and shave 256-bytes out of scic_sds_controller by reducing the size of
io_request_sequence.

scic_sds_io_tag_construct --> ISCI_TAG
scic_sds_io_tag_get_sequence --> ISCI_TAG_SEQ
scic_sds_io_tag_get_index() --> ISCI_TAG_TCI
scic_sds_io_sequence_increment() [delete / open code]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

dd047c8e

isci: cleanup/optimize pool implementation · ac668c69

Dan Williams authored Jun 07, 2011

The circ_buf macros are ~6% faster, as measured by perf, because they take
advantage of power-of-two math assumptions i.e. no test and branch for
rollover. Their semantics are clearer than the hidden side effects in pool.h
(like sci_pool_get() which hides an assignment).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ac668c69

isci: Disable link layer hang detection · 9b917987

Jeff Skirvin authored Jun 20, 2011

Some targets exceed the hang detect timer.  Use the OS timeout to
catch hung tasks.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

9b917987

isci: Hard reset failure will link reset all phys in the port · fd0527ab

Jeff Skirvin authored Jun 20, 2011

In the case where the hard reset process fails, each link in
the port is put through a link reset sequence.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

fd0527ab

isci: Explicitly decode remote node ready and suspended states · fd536601

Jeff Skirvin authored Jun 20, 2011

The remote node context should only signal a device reset condition
in a suspended state.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

fd536601

isci: fix isci_terminate_pending() list management · 980d3aeb

Dan Williams authored Jun 20, 2011

Walk through the list of pending requests being careful to consider that
multiple requests can be terminated when the lock is dropped (i.e.
invalidating the 'next' reference established by
list_for_each_entry_safe).

Also noticed that all callers to isci_terminate_pending_requests()
specifying terminating, so just drop the parameter.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

980d3aeb

isci: Handle timed-out request terminations correctly · 77c852f3

Jeff Skirvin authored Jun 20, 2011

In the situation where a termination of an I/O times-out,
make sure that the linkage from the request to the task
is severed completely.  Also make sure that the selection
of tasks to terminate occurs under scic_lock.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

77c852f3

isci: Requests that do not start must be set to "complete" · f53a3a32

Jeff Skirvin authored Jun 20, 2011

Requests that fail at start because of a reset pending condition
must be set to complete in order to allow for later cleanup.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f53a3a32

isci: Add decode for SMP request retry error condition · cde76fbf

Jeff Skirvin authored Jun 20, 2011

There are situations with slow expanders in which a first attempt
to execute an SMP request will fail with a timeout.  Immediate
subsequent retries will generally succeed.  This change makes sure
SMP I/O failures are immediately failed to libsas so that retries
happen with no discovery process timeout delay.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cde76fbf

isci: filter broadcast change notifications during SMP phy resets · 61aaff49

Jeff Skirvin authored Jun 21, 2011

When resetting a sata device in the domain we have seen occasions where
libsas prematurely marks a device gone in the time it takes for the
device to re-establish the link.  This plays badly with software raid
arrays.  Other libsas drivers have non-uniform delays in their reset
handlers to try to cover this condition, but not sufficient to close the
hole.  Given that a sata device can take many seconds to recover we
filter bcns and poll for the device reattach state before notifying
libsas that the port needs the domain to be rediscovered.  Once this has
been proven out at the lldd level we can think about uplevelling this
feature to a common implementation in libsas.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[ use kzalloc instead of kmem_cache ]
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[ use eventq and time macros ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

61aaff49

isci: Move the reset delay after the remote node resumption. · ff717ab0

Jeff Skirvin authored Jun 20, 2011

Delay after bringing up the RNC to allow for resumption latency.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ff717ab0