- 29 Feb, 2012 25 commits
-
-
Dan Williams authored
libsas ata error handling is already async but this does not help the scan case. Move initial link recovery out from under host->scan_mutex, and delay synchronization with eh until after all port probe/recovery work has been queued. Device ordering is maintained with scan order by still calling sas_rphy_add() in order of domain discovery. Since we now scan the domain list when invoking libata-eh we need to be careful to check for fully initialized ata ports. Acked-by: Jack Wang <jack_wang@usish.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
ata devices are always scanned after ssp. Prior to the ata error handling reworks libsas would tend to scan devices in ascending expander phy order. Restore this ordering by deferring ssp discovery to a DISCE_PROBE event, and keep the probe order consistent with the discovery order, not the placement of sata devices. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
If the phy is attached to a new sas address unregister the first address before processing the new attachment. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
libsas fails to discover all sata devices in the domain. If a device fails negotiation and does not transmit a signature fis the link needs recovery. libata already understands how to manage slow to come up links, so treat these conditions as ata device attach events for the purposes of creating an ata_port. This allows libata to manage retrying link bring up. Rediscovery is modified to be careful about checking changes in dev_type. It looks like libsas leaks old devices if the sas address changes, but that's a fix for another patch. Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Make sas-port naming consistent with the expander-attached case whereby the phy-id is the last digit in the port name. Otherwise we get the random behavior of the allocation order. Reported-by: Patrick Thomson <patrick.s.thomson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
It's difficult to determine which domain_device is triggering error recovery, so convert messages like: sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028 sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029 ... ata7: sas eh calling libata port error handler ata8: sas eh calling libata port error handler ...into: sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp) sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp) ... sas: ata7: end_device-21:1: dev error handler sas: ata8: end_device-20:0:5: dev error handler which shows attached link rate, device type, and associates a domain_device with its ata_port id to correlate messages emitted from libata-eh. As Doug notes, we can also take the opportunity to clarify expander phy routing capabilities. [dgilbert@interlog.com: clarify table2table with 'U'] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Maciej Trela authored
Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY we were holding a reference while the destruct was pending in case the domain was torn down before the desctruct event ran. That case is covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed memory, or worse frees the memory while another agent holds a reference. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
We need to hold drain_mutex across the unregistration as port down events queue device removal as chained events, so we need to make sure no other drainers are active. [ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326() [ 1118.681982] Hardware name: S2600CP [ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8 ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas] [ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1 [ 1118.716727] Call Trace: [ 1118.719867] [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d [ 1118.727000] [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c [ 1118.733942] [<ffffffff81056d44>] __queue_work+0x11a/0x326 [ 1118.740481] [<ffffffff81056f99>] queue_work_on+0x1b/0x22 [ 1118.746925] [<ffffffff81057106>] queue_work+0x37/0x3e [ 1118.753105] [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas] [ 1118.761094] [<ffffffff813217c3>] scsi_queue_work+0x42/0x44 [ 1118.767717] [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas] [ 1118.775509] [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas] [ 1118.783319] [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas] [ 1118.792731] [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas] [ 1118.800339] [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas] [ 1118.808342] [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas] [ 1118.816055] [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci] [ 1118.823384] [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci] Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Similar to the conversion of the transport-class reset we want bsg initiated resets to be managed by libata. Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
If we have a domain with sas and sata devices there may still be sas recovery actions to take after peeling off the commands to send to libata. Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
ata_port lifetime in libata follows the host. In libsas it follows the scsi_target. Once scsi_remove_device() has caused all commands to be completed it allows scsi_remove_target() to immediately proceed to freeing the ata_port causing bug reports like: [ 848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107 [ 848.400262] general protection fault: 0000 [#1] SMP [ 848.406244] CPU 4 [ 848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan] [ 848.432060] [ 848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP [ 848.445310] RIP: 0010:[<ffffffff8126a68c>] [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.454787] RSP: 0018:ffff8807f868dca0 EFLAGS: 00010002 [ 848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0 [ 848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002 [ 848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b [ 848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b [ 848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e [ 848.503067] FS: 0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000 [ 848.512899] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0 [ 848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000) [ 848.555327] Stack: [ 848.557959] ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0 [ 848.567072] ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703 [ 848.576190] ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb [ 848.585281] Call Trace: [ 848.588409] [<ffffffff8126a6e0>] spin_bug+0x26/0x28 [ 848.594357] [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88 [ 848.601283] [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65 [ 848.609089] [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata] [ 848.618331] [<ffffffff81061813>] ? async_schedule+0x17/0x17 [ 848.625060] [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas] [ 848.632655] [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125 [ 848.639670] [<ffffffff81057439>] process_one_work+0x207/0x38d [ 848.646577] [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d [ 848.653681] [<ffffffff810576f7>] worker_thread+0x138/0x21c [ 848.660305] [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d [ 848.667493] [<ffffffff8105b098>] kthread+0x9d/0xa5 [ 848.673382] [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166 [ 848.681304] [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10 [ 848.688324] [<ffffffff814af534>] ? retint_restore_args+0x13/0x13 [ 848.695530] [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b [ 848.702929] [<ffffffff814b7700>] ? gs_change+0x13/0x13 [ 848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89 [ 848.732467] RIP [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.738905] RSP <ffff8807f868dca0> [ 848.743743] ---[ end trace 143161646eee8caa ]--- ...so arrange for the ata_port to have the same end of life as the domain device. Reported-by: Marcin Tomczak <marcin.tomczak@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
If the top level expander is hot removed, mark all child devices as gone before unregistration to short circuit futile recovery. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
When scrolling forward through the eh list (in a clear_q scenario) it is possible to encounter commands that won the completion vs eh race. Rather than sprinkle more "if (!task)" throughout the handler just make a pass through the list and delete the race winners before handling the rest. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Prior to commit 61aaff49 "isci: filter broadcast change notifications during SMP phy resets" we borrowed the MVS_DEV_EH approach from the mvsas driver for preventing ->lldd_I_T_nexus_reset() events during ata discovery. This hack was protecting against the old ->phy_reset() in ata_bus_probe(), but since the conversion to the new error handling this hack is preventing resets from reaching ata devices. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Remove ->eh_device_reset_handler() and ->eh_bus_reset_handler() for the same reason they are not implemented for libata hosts, they cannot be implemented reliably with ata-eh. ATA error recovery wants to divert all resets to the eh thread and wait for completion, these handlers may be invoked from a non-blocking ioctl. The other path they are called from is libsas-eh, and if we escalate past I_T_nexus reset we have larger problems i.e. tear down all in-flight commands in the domain potentially without notification to the lldd if it has chosen not to implement ->lldd_clear_nexus_port() / ->lldd_clear_nexus_ha(). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Report to libata whether the link to the given domain_device is up and the signature fis has been received. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Driving resets from libsas-eh is pre-mature as libata will make a decision about performing a softreset. Currently libata determines whether to perform a softreset based on ata_eh_followup_srst_needed(), and none of those conditions apply to isci. Remove the srst implementation and translate ->lldd_lu_reset() for ata devices as a request to drive a reset via libata-eh. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
A hard reset to isci in the direct-attached case is one where the driver internally manages debouncing the link. In the sas-expander-attached case a hard reset is one that clears affiliations. The driver should not be prematurely dropping affiliations at run time, that decision (to force expander hard resets to ata devices) is left to userspace to manage. So, arrange for I_T_nexus resets to be sas-link-resets in the expander-attached case and isci-hard-resets in the direct-attached case. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
It only tracks whether the port is stopping in order to gate new devices being discovered while the port is stopping. However, since the check and subsequent handling is unlocked there is nothing to stop the port from going down immediately after the check. Driver is already prepared to handle devices arriving on stale ports, and those will be cleaned up by an eventual ->lldd_dev_gone() notification. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
This field is a holdover from the OS abstraction conversion. The stable phy to port lookups are done via iphy->ownining_port under scic_lock. After this conversion to use port->lldd_port the only volatile lookup is the initial lookup in isci_port_formed(). After that point any lookup via a successfully notified domain_device is guaranteed to be valid until the domain_device is destroyed. Delete ->start_complete as it is only set once and is set as a consequence of the port going link up, by definition of getting a port formed event the port is "ready". While we are correcting port lookups also move the asd_sas_port table out from under the isci_port. This is to preclude any temptation to use container_of() to convert an asd_sas_port to an isci_port, the association is dynamic and under libsas control. Tested-by: Maciej Trela <maciej.trela@intel.com> [dmilburn@redhat.com: fix i686 compile error] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
The commands that timeout when a disk is forcibly removed may trigger libata to attempt recovery of the device. If libsas has decided to remove the device don't permit ata to continue to issue resets to its last known phy. The primary motivation for this patch is hotplug testing by writing 0 to /sys/class/sas_phy/phyX/enable. Without this check this test leads to libata issuing a reset and re-enabling the device that wants to be torn down. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
In the direct-attached case this routine returns the phy on which this device was first discovered. Which is broken if we want to support wide-targets, as this phy reference can become stale even though the port is still active. In the expander-attached case this routine tries to lookup the phy by scanning the attached sas addresses of the parent expander, and BUG_ONs if it can't find it. However since eh and the libsas workqueue run independently we can still be attempting device recovery via eh after libsas has recorded the device as detached. This is even easier to hit now that eh is blocked while device domain rediscovery takes place, and that libata is fed more timed out commands increasing the chances that it will try to recover the ata device. Arrange for dev->phy to always point to a last known good phy, it may be stale after the port is torn down, but it will catch up for wide port reconfigurations, and never be NULL. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
No sense in issuing or retrying commands to an expander that has been removed. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that have been hot-removed" marked the parent device of an end-device as gone when all the phys to the end device have been deleted. The expander device is still present until its parent is removed. This is a benign change until the smp_execute_task() path is taught to check ->gone. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Use ata_wait_after_reset() to poll for link recovery after a reset. This combined with sas_ha->eh_mutex prevents expander rediscovery from probing phys in an intermediate state. Local discovery does not have a mechanism to filter link status changes during this timeout, so it remains the responsibility of lldds to prevent premature port teardown. Although once all lldd's support ->lldd_ata_check_ready() that could be used as a gate to local port teardown. The signature fis is re-transmitted when the link comes back so we should be revalidating the ata device class, but that is left to a future patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
- 19 Feb, 2012 15 commits
-
-
Dan Williams authored
Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a pathological configuration could take 25 seconds per ata device (serialized) to recover. Run per-port recoveries in parallel. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Jeff Skirvin authored
SAS does not tag SMP requests, and at least one lldd (isci) does not permit more than one in-flight request at a time. [jejb: fix sas_init_dev tab issues while we're at it] Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Jeff Skirvin authored
In the case of an explicit sas_phy_enable call to disable a phy, the LLDD provides the calls to sas_phy_disconnected and the PHYE_LOSS_OF_SIGNAL event. NOTE: This assumes that the lldd(s) generate the notification, which appears to be the case, but only verfied on isci. Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Execute the link-reset triggered by sas_phy_enable via transport_sas_phy_reset so that it can be managed by libata. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Link resets leave ata affiliations intact, so arrange for libsas to make an effort to avoid dropping the device due to a slow-to-recover link. Towards this end carry out reset in the host workqueue so that it can check for ata devices and kick the reset request to libata. Hard resets, in contrast, bypass libata since they are meant for associating an ata device with another initiator in the domain (tears down affiliations). Need to add a new transport_sas_phy_reset() since the current sas_phy_reset() is a utility function to libsas lldds. They are not prepared for it to loop back into eh. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Extend the sas transport class to allow transport users to attach extra data to a sas_phy (->hostdata). Use this area in libsas to move resets to workq context in preparation for scheduling ata device resets through libata-eh. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Since sata devices can take several seconds to recover the link on reset the 0.5 seconds that libsas currently waits may not be enough. Instead if we are rediscovering a phy that was previously attached to a sata device let libata handle any resets to encourage the device to transmit the initial fis. Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas should stop encountering phys in an intermediate state, until then this will loop until the fis is transmitted or ->attached_sas_addr gets cleared, but in the more likely initial discovery case we keep existing behavior. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh perform a reset. In the sata device case defer the commands that triggered the reset to libata-eh context so it can perform its pre and post reset management. In the sas_ata_post_internal() case the reset request is falling on deaf ears as the sas_task is immediately destroyed without any reset action. Since it is currently a nop, and likely superfluous given the conversion to new-style libata-eh, just drop the request. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
libsas-eh if it successfully aborts an ata command will hide the timeout condition (AC_ERR_TIMEOUT) from libata. The command likely completes with the all-zero task->task_status it started with. Instead, interpret a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the scmd around for libata-eh to handle. Tested-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Until we have told the lldd to forget a task a timed out operation can return from the hardware at any time. Since completion frees the task we need to make sure that no tasks run their normal completion handler once eh has decided to manage the task. Similar to ata_scsi_cmd_error_handler() freeze completions to let eh judge the outcome of the race. Task collector mode is problematic because it presents a situation where a task can be timed out and aborted before the lldd has even seen it. For this case we need to guarantee that a task that an lldd has been told to forget does not get queued after the lldd says "never seen it". With sas_scsi_timed_out we achieve this with the ->task_queue_flush mutex, rather than adding more time. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
We invoke task->task_done() to free the task in the eh case, but at this point we are prepared for scsi_eh_flush_done_q() to finish off the scmd. Introduce sas_end_task() to capture the final response status from the lldd and free the task. Also take the opportunity to kill this warning. drivers/scsi/libsas/sas_scsi_host.c: In function ‘sas_end_task’: drivers/scsi/libsas/sas_scsi_host.c:102:3: warning: case value ‘2’ not in enumerated type ‘enum exec_status’ [-Wswitch] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Since sas_ata does not implement ->freeze(), completions for scmds and internal commands can still arrive concurrent with ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively. By the time either of those is called libata has committed to completing the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has lost the race. In the sas_ata_post_internal() case we take on the additional responsibility of freeing the sas_task to close the race with sas_ata_task_done() freeing the the task while sas_ata_post_internal() is in the process of invoking ->lldd_abort_task(). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
Prior to the conversion to the new-style libata-eh sas_ata_task_done() may have been the last opportunity to clean up the scmd, but now libata-eh explicitly handles this case. It also races against sas-eh. If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could trigger a spurious decrement of shost->host_failed. Current lldds have the band-aid of checking SAS_TASK_STATE_ABORTED before calling ->task_done(), but better to just let the scmds escalate to libata for race free cleanup. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
sas_discover_sata() notifies lldds of sata devices twice. Once to allow the 'identify' to be sent, and a second time to allow aic94xx (the only libsas driver that cares about sata_dev.identify) to setup NCQ parameters before the device becomes known to the midlayer. Replace this double notification and intervening 'identify' with an explicit ->lldd_ata_set_dmamode notification. With this change all ata internal commands are issued by libata, so we no longer need sas_issue_ata_cmd(). The data from the identify command only needs to be cached in one location so ata_device.id replaces domain_device.sata_dev.identify. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-
Dan Williams authored
libata error handling provides for a timeout for link recovery. libsas must not rescan for previously known devices in this interval otherwise it may remove a device that is simply waiting for its link to recover. Let libata-eh make the determination of when the link is stable and prevent libsas (host workqueue) from taking action while this determination is pending. Using a mutex (ha->disco_mutex) to flush and disable revalidation while eh is running requires any discovery action that may block on eh be moved to its own context outside the lock. Probing ATA devices explicitly waits on ata-eh and the cache-flush-io issued during device removal may also pend awaiting eh completion. Essentially any rphy add/remove activity needs to run outside the lock. This adds two new cleanup states for sas_unregister_domain_devices() 'allocated-but-not-probed', and 'flagged-for-destruction'. In the 'allocated-but-not-probed' state dev->rphy points to a rphy that is known to have not been through a sas_rphy_add() event. At domain teardown check if this device is still pending probe and cleanup accordingly. Similarly if a device has already been queued for removal then sas_unregister_domain_devices has nothing to do. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-