- 16 May, 2010 2 commits
-
-
Roland Dreier authored
Merge branches 'amso1100', 'bkl', 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'iser', 'masked-atomics', 'misc', 'mthca' and 'nes' into for-next
-
Julia Lawall authored
Use kmemdup when some other buffer is immediately copied into the allocated region. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; statement S; @@ - to = \(kmalloc\|kzalloc\)(size,flag); + to = kmemdup(from,size,flag); if (to==NULL || ...) S - memcpy(to, from, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 12 May, 2010 4 commits
-
-
Dan Carpenter authored
We shouldn't free things here because we free them later. The call tree looks like this: iser_connect() ==> initiating the connection establishment and later iser_cma_handler() => iser_route_handler() => iser_create_ib_conn_res() if we fail here, eventually iser_conn_release() is called, resulting in a double free. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Or Gerlitz authored
The iser connection teardown flow isn't over until the underlying Connection Manager (e.g the IB CM) delivers a disconnected or timeout event through the RDMA-CM. When the remote (target) side isn't reachable, e.g when some HW e.g port/hca/switch isn't functioning or taken down administratively, the CM timeout flow is used and the event may be generated only after relatively long time -- on the order of tens of seconds. The current iser code exposes this possibly long delay to higher layers, specifically to the iscsid daemon and iscsi kernel stack. As a result, the iscsi stack doesn't respond well: this low-level CM delay is added to the fail-over time under HA schemes such as the one provided by DM multipath through the multipathd(8) service. This patch enhances the reference counting scheme on iser's IB connections so that the disconnect flow initiated by iscsid from user space (ep_disconnect) doesn't wait for the CM to deliver the disconnect/timeout event. (The connection teardown isn't done from iser's view point until the event is delivered) The iser ib (rdma) connection object is destroyed when its reference count reaches zero. When this happens on the RDMA-CM callback context, extra care is taken so that the RDMA-CM does the actual destroying of the associated ID, since doing it in the callback is prohibited. The reference count of iser ib connection normally reaches three, where the <ref, deref> relations are 1. conn <init, terminate> 2. conn <bind, stop/destroy> 3. cma id <create, disconnect/error/timeout callbacks> With this patch, multipath fail-over time is about 30 seconds, while without this patch, multipath fail-over time is about 130 seconds. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Or Gerlitz authored
The iscsi connection object life cycle includes binding and unbinding (conn_stop) to/from the iscsi transport connection object. Since iscsi connection objects are recycled, at the time the transport connection (e.g iser's IB connection) is released, it is not valid to touch the iscsi connection tied to the transport back-pointer since it may already point to a different transport connection. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Or Gerlitz authored
Add handler to handle events such as port up and down. This is useful when testing high-availability schemes such as multi-pathing. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 05 May, 2010 1 commit
-
-
Roland Dreier authored
Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 28 Apr, 2010 1 commit
-
-
Roland Dreier authored
Using compile-time designated initializers for the handler arrays instead of open-coding the initialization in iwch_cm_init() is (IMHO) cleaner, and leads to substantially smaller code: on my x86-64 build, bloat-o-meter shows: add/remove: 0/1 grow/shrink: 4/3 up/down: 4/-1682 (-1678) function old new delta tx_ack 167 168 +1 state_set 55 56 +1 start_ep_timer 99 100 +1 pass_establish 177 178 +1 act_open_req_arp_failure 39 38 -1 sched 84 82 -2 iwch_cm_init 442 91 -351 work_handlers 1328 - -1328 Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 22 Apr, 2010 1 commit
-
-
Or Gerlitz authored
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 21 Apr, 2010 11 commits
-
-
Vladimir Sokolovsky authored
Add support for masked atomic operations (masked compare and swap, masked fetch and add). Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Vladimir Sokolovsky authored
- Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD send opcodes that can be used to post "masked atomic compare and swap" and "masked atomic fetch and add" work request respectively. - Add masked_atomic_cap capability. - Add mask fields to atomic struct of ib_send_wr - Add new opcodes to ib_wc_opcode The new operations are described more precisely below: * Masked Compare and Swap (MskCmpSwap) The MskCmpSwap atomic operation is an extension to the CmpSwap operation defined in the IB spec. MskCmpSwap allows the user to select a portion of the 64 bit target data for the “compare” check as well as to restrict the swap to a (possibly different) portion. The pseudo code below describes the operation: | atomic_response = *va | if (!((compare_add ^ *va) & compare_add_mask)) then | *va = (*va & ~(swap_mask)) | (swap & swap_mask) | | return atomic_response The additional operands are carried in the Extended Transport Header. Atomic response generation and packet format for MskCmpSwap is as for standard IB Atomic operations. * Masked Fetch and Add (MFetchAdd) The MFetchAdd Atomic operation extends the functionality of the standard IB FetchAdd by allowing the user to split the target into multiple fields of selectable length. The atomic add is done independently on each one of this fields. A bit set in the field_boundary parameter specifies the field boundaries. The pseudo code below describes the operation: | bit_adder(ci, b1, b2, *co) | { | value = ci + b1 + b2 | *co = !!(value & 2) | | return value & 1 | } | | #define MASK_IS_SET(mask, attr) (!!((mask)&(attr))) | bit_position = 1 | carry = 0 | atomic_response = 0 | | for i = 0 to 63 | { | if ( i != 0 ) | bit_position = bit_position << 1 | | bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position), | MASK_IS_SET(compare_add, bit_position), &new_carry) | if (bit_add_res) | atomic_response |= bit_position | | carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position))) | } | | return atomic_response Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Tetsuo Handa authored
Randomize local port allocation in the way sctp_get_port_local() does. Update rover at the end of loop since we're likely to pick a valid port on the first try. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Roland Dreier authored
This allows the compiler to do a bit better; on my x86-64 build: add/remove: 0/2 grow/shrink: 1/0 up/down: 2288/-2365 (-77) function old new delta nes_init_phy 273 2561 +2288 nes_init_1g_phy 469 - -469 nes_init_2025_phy 1896 - -1896 Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Chien Tung authored
nes_{read,write}_1G_phy_reg() are using phy_lock while nes_{read,write}_10G_phy_reg() leave that to the caller. Remove phy_lock from 1G routines and leave the locking to the caller. Add additional phy_lock calls around 1G read/write. Signed-off-by: Chien Tung <chien.tin.tung@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Steve Wise authored
Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
FUJITA Tomonori authored
The DMA API is preferred; no functional change. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
FUJITA Tomonori authored
The DMA API is preferred; no functional change. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Steve Wise authored
The low level cxgb3 driver can return NET_XMIT_CN and friends. The iw_cxgb3 driver should _not_ treat these as errors. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
FUJITA Tomonori authored
The DMA API is preferred; no functional change. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
Roland Dreier authored
Several RDMA user-access drivers have file_operations structures with no .llseek method set. None of the drivers actually do anything with f_pos, so this means llseek is essentially a NOP, instead of returning an error as leaving other file_operations methods unimplemented would do. This is mostly harmless, except that a NULL .llseek means that default_llseek() is used, and this function grabs the BKL, which we would like to avoid. Since llseek does nothing useful on these files, we would like it to return an error to userspace instead of silently grabbing the BKL and succeeding. For nearly all of the file types, we take the belt-and-suspenders approach of setting the .llseek method to no_llseek and also calling nonseekable_open(); the exception is the uverbs_event files, which are created with anon_inode_getfile(), which already sets f_mode the same way as nonseekable_open() would. This work is motivated by Arnd Bergmann's bkl-removal tree. Signed-off-by: Roland Dreier <rolandd@cisco.com>
-
- 09 Apr, 2010 20 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Check correct variable for allocation failure RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp() RDMA/cm: Set num_paths when manually assigning path records IB/cm: Fix device_create() return value check
-
git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds authored
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] Update default configuration. [S390] nss: add missing .previous statement to asm function [S390] increase default size of vmalloc area [S390] s390: disable change bit override [S390] fix io_return critical section cleanup [S390] sclp_async: potential buffer overflow [S390] arch/s390/kernel: Add missing unlock
-
git://git.kernel.dk/linux-2.6-blockLinus Torvalds authored
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits) cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch loop: Update mtime when writing using aops block: expose the statistics in blkio.time and blkio.sectors for the root cgroup backing-dev: Handle class_create() failure Block: Fix block/elevator.c elevator_get() off-by-one error drbd: lc_element_by_index() never returns NULL cciss: unlock on error path cfq-iosched: Do not merge queues of BE and IDLE classes cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging i2o: Remove the dangerous kobj_to_i2o_device macro block: remove 16 bytes of padding from struct request on 64bits cfq-iosched: fix a kbuild regression block: make CONFIG_BLK_CGROUP visible Remove GENHD_FL_DRIVERFS block: Export max number of segments and max segment size in sysfs block: Finalize conversion of block limits functions block: Fix overrun in lcm() and move it to lib vfs: improve writeback_inodes_wb() paride: fix off-by-one test drbd: fix al-to-on-disk-bitmap for 4k logical_block_size ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6Linus Torvalds authored
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (29 commits) drm/nouveau: bail out of auxch transaction if we repeatedly recieve defers drm/nv50: implement gpio set/get routines drm/nv50: parse/use some more de-magiced parts of gpio table entries drm/nouveau: store raw gpio table entry in bios gpio structs drm/nv40: Init some tiling-related PGRAPH state. drm/nv50: Add NVA3 support in ctxprog/ctxvals generator. drm/nv50: another dodgy DP hack drm/nv50: punt hotplug irq handling out to workqueue drm/nv50: preserve an unknown SOR_MODECTRL value for DP encoders drm/nv50: Allow using the NVA3 new compute class. drm/nv50: cleanup properly if PDISPLAY init fails drm/nouveau: fixup the init failure paths some more drm/nv50: fix instmem init on IGPs if stolen mem crosses 4GiB mark drm/nv40: add LVDS table quirk for Dell Latitude D620 drm/nv40: rework lvds table parsing drm/nouveau: detect vram amount once, and save the value drm/nouveau: remove some unused members from drm_nouveau_private drm/nouveau: Make use of TTM busy_placements. drm/nv50: add more 0x100c80 flushy magic drm/nv50: fix fbcon when framebuffer above 4GiB mark ...
-
David Howells authored
radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set() or radix_tree_tag_clear(). The problem is that the double tag_get() in radix_tree_tag_get(): if (!tag_get(node, tag, offset)) saw_unset_tag = 1; if (height == 1) { int ret = tag_get(node, tag, offset); may see the value change due to the action of set/clear. RCU is no protection against this as no pointers are being changed, no nodes are being replaced according to a COW protocol - set/clear alter the node directly. The documentation in linux/radix-tree.h, however, says that radix_tree_tag_get() is an exception to the rule that "any function modifying the tree or tags (...) must exclude other modifications, and exclude any functions reading the tree". The problem is that the next statement in radix_tree_tag_get() checks that the tag doesn't vary over time: BUG_ON(ret && saw_unset_tag); This has been seen happening in FS-Cache: https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various comments that the value of the tag may change whilst the RCU read lock is held, and thus that the return value of radix_tree_tag_get() may not be relied upon unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from running concurrently with it. Reported-by: Romain DEGEZ <romain.degez@smartjog.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pekka Enberg authored
As suggested by Linus, fix up kmem_ptr_validate() to handle non-kernel pointers more graciously. The patch changes kmem_ptr_validate() to use the newly introduced kern_ptr_validate() helper to check that a pointer is a valid kernel pointer before we attempt to convert it into a 'struct page'. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pekka Enberg authored
As suggested by Linus, introduce a kern_ptr_validate() helper that does some sanity checks to make sure a pointer is a valid kernel pointer. This is a preparational step for fixing SLUB kmem_ptr_validate(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
This reverts commit ba168fc3. It changes user-visible sysfs interfaces, and breaks some existing user space applications which apparently rely on the fact that the output does not contain the "0x" prefix. Requested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roland Dreier authored
-
Martin Schwidefsky authored
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Heiko Carstens authored
The savesys_ipl_nss asm function is put into the .init.text section however it is missing a ".previous" section which would restore the previous section. Luckily all functions in early.c are init functions so it doesn't matter currently. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Martin Schwidefsky authored
The default size of the vmalloc area is currently 1 GB. The memory resource controller uses about 10 MB of vmalloc space per gigabyte of memory. That turns a system with more than ~100 GB memory unbootable with the default vmalloc size. It costs us nothing to increase the default size to some more adequate value, e.g. 128 GB. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Christian Borntraeger authored
commit 6a985c61 ([S390] s390: use change recording override for kernel mapping) deactivated the change bit recording for the kernel mapping to improve the performance. This works most of the time, but there are cases (e.g. kernel runs in home space, futex atomic compare xcmg) where we modify user memory with the kernel mapping instead of the user mapping. Instead of fixing these cases, this patch just deactivates change bit override to avoid future problems with other kernel code that might use the kernel mapping for user memory. CC: stable@kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Martin Schwidefsky authored
If a machine check interrupts the io interrupt handler on one of the instructions between io_return and io_leave the critical section cleanup code will move the return psw to io_work_loop. By doing that the switch from the asynchronous interrupt stack to the process stack is skipped. If e.g. TIF_NEED_RESCHED is set things break because the scheduler is called with the asynchronous interrupts stack. Moving the psw back to io_return instead fixes the problem. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Dan Carpenter authored
"len" hasn't been properly range checked so we shouldn't use it as an array offset. This can only be written to by root but it would still be annoying to accidentally write more than 3 characters and corrupt your memory. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Julia Lawall authored
In the default case the lock is not unlocked. The return is converted to a goto, to share the unlock at the end of the function. A simplified version of the semantic patch that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ expression E1; identifier f; @@ f (...) { <+... * spin_lock_irq (E1,...); ... when != E1 * return ...; ...+> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-
Divyesh Shah authored
When CFQ dispatches requests forcefully due to a barrier or changing iosched, it runs through all cfqq's dispatching requests and then expires each queue. However, it does not activate a cfqq before flushing its IOs resulting in using stale values for computing slice_used. This patch fixes it by calling activate queue before flushing reuqests from each queue. This is useful mostly for barrier requests because when the iosched is changing it really doesnt matter if we have incorrect accounting since we're going to break down all structures anyway. We also now expire the current timeslice before moving on with the dispatch to accurately account slice used for that cfqq. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Dave Airlie authored
* 'nouveau/for-airlied' of ../drm-nouveau-next: (21 commits) drm/nouveau: bail out of auxch transaction if we repeatedly recieve defers drm/nv50: implement gpio set/get routines drm/nv50: parse/use some more de-magiced parts of gpio table entries drm/nouveau: store raw gpio table entry in bios gpio structs drm/nv40: Init some tiling-related PGRAPH state. drm/nv50: Add NVA3 support in ctxprog/ctxvals generator. drm/nv50: another dodgy DP hack drm/nv50: punt hotplug irq handling out to workqueue drm/nv50: preserve an unknown SOR_MODECTRL value for DP encoders drm/nv50: Allow using the NVA3 new compute class. drm/nv50: cleanup properly if PDISPLAY init fails drm/nouveau: fixup the init failure paths some more drm/nv50: fix instmem init on IGPs if stolen mem crosses 4GiB mark drm/nv40: add LVDS table quirk for Dell Latitude D620 drm/nv40: rework lvds table parsing drm/nouveau: detect vram amount once, and save the value drm/nouveau: remove some unused members from drm_nouveau_private drm/nouveau: Make use of TTM busy_placements. drm/nv50: add more 0x100c80 flushy magic drm/nv50: fix fbcon when framebuffer above 4GiB mark ...
-
Ben Skeggs authored
There's one known case where we never stop recieving DEFER, and loop here forever. Lets not do that.. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
-
Ben Skeggs authored
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
-