- 20 Nov, 2013 23 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32Linus Torvalds authored
Pull AVR32 updates from Hans-Christian Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: avr32: uapi: be sure of "_UAPI" prefix for all guard macros avr32: add kprobe_ctlblk memory struct avr32: fix out-of-range jump in large kernels avr32: setup crt for early panic()
-
git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-nextLinus Torvalds authored
Pull squashfs updates from Phillip Lougher: "These patches optionally improve the multi-threading peformance of Squashfs by adding parallel decompression, and direct decompression into the page cache, eliminating an intermediate buffer (removing memcpy overhead and lock contention)" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Check stream is not NULL in decompressor_multi.c Squashfs: Directly decompress into the page cache for file data Squashfs: Restructure squashfs_readpage() Squashfs: Generalise paging handling in the decompressors Squashfs: add multi-threaded decompression using percpu variable squashfs: Enhance parallel I/O Squashfs: Refactor decompressor interface and code
-
Linus Torvalds authored
This reverts commit ea1e7ed3. Al points out that while the commit *does* actually create a separate slab for the page->ptl allocation, that slab is never actually used, and the code continues to use kmalloc/kfree. Damien Wyart points out that the original patch did have the conversion to use kmem_cache_alloc/free, so it got lost somewhere on its way to me. Revert the half-arsed attempt that didn't do anything. If we really do want the special slab (remember: this is all relevant just for debug builds, so it's not necessarily all that critical) we might as well redo the patch fully. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs bits and pieces from Al Viro: "Assorted bits that got missed in the first pull request + fixes for a couple of coredump regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold try_to_ascend() into the sole remaining caller dcache.c: get rid of pointless macros take read_seqbegin_or_lock() and friends to seqlock.h consolidate simple ->d_delete() instances gfs2: endianness misannotations dump_emit(): use __kernel_write(), not vfs_write() dump_align(): fix the dumb braino
-
Al Viro authored
Note that pmds[i] is simply uninitialized at that point... Granted, it's very hard to hit (you need split page locks *and* kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing), but the code is obviously bogus. Introduced by commit 09ef4939 ("x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull more ACPI and power management updates from Rafael Wysocki: - ACPI-based device hotplug fixes for issues introduced recently and a fix for an older error code path bug in the ACPI PCI host bridge driver - Fix for recently broken OMAP cpufreq build from Viresh Kumar - Fix for a recent hibernation regression related to s2disk - Fix for a locking-related regression in the ACPI EC driver from Puneet Kumar - System suspend error code path fix related to runtime PM and runtime PM documentation update from Ulf Hansson - cpufreq's conservative governor fix from Xiaoguang Chen - New processor IDs for intel_idle and turbostat and removal of an obsolete Kconfig option from Len Brown - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg - Removal of several ACPI video DMI blacklist entries that are not necessary any more from Aaron Lu - Rework of the ACPI companion representation in struct device and code cleanup related to that change from Rafael J Wysocki, Lan Tianyu and Jarkko Nikula - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from Jarkko Nikula * tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed() ACPI / PCI root: Clear driver_data before failing enumeration ACPI / hotplug: Fix PCI host bridge hot removal ACPI / hotplug: Fix acpi_bus_get_device() return value check cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs() ACPI / video: clean up DMI table for initial black screen problem ACPI / EC: Ensure lock is acquired before accessing ec struct members PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps() ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac spi: Use stable dev_name for ACPI enumerated SPI slaves i2c: Use stable dev_name for ACPI enumerated I2C slaves ACPI: Provide acpi_dev_name accessor for struct acpi_device device name ACPI / bind: Use (put|get)_device() on ACPI device objects too ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node cpufreq: OMAP: Fix compilation error 'r & ret undeclared' PM / Runtime: Fix error path for prepare PM / Runtime: Update documentation around probe|remove|suspend cpufreq: conservative: set requested_freq to policy max when it is over policy max ...
-
git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds authored
Pull slave-dmaengine changes from Vinod Koul: "This brings for slave dmaengine: - Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as dmaengine can only transfer and not verify validaty of dma transfers - Bunch of fixes across drivers: - cppi41 driver fixes from Daniel - 8 channel freescale dma engine support and updated bindings from Hongbo - msx-dma fixes and cleanup by Markus - DMAengine updates from Dan: - Bartlomiej and Dan finalized a rework of the dma address unmap implementation. - In the course of testing 1/ a collection of enhancements to dmatest fell out. Notably basic performance statistics, and fixed / enhanced test control through new module parameters 'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and Linus [Walleij] for their review. - Testing the raid related corner cases of 1/ triggered bugs in the recently added 16-source operation support in the ioatdma driver. - Some minor fixes / cleanups to mv_xor and ioatdma" * 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits) dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers dma: mv_xor: Remove unneeded NULL address check ioat: fix ioat3_irq_reinit ioat: kill msix_single_vector support raid6test: add new corner case for ioatdma driver ioatdma: clean up sed pool kmem_cache ioatdma: fix selection of 16 vs 8 source path ioatdma: fix sed pool selection ioatdma: Fix bug in selftest after removal of DMA_MEMSET. dmatest: verbose mode dmatest: convert to dmaengine_unmap_data dmatest: add a 'wait' parameter dmatest: add basic performance metrics dmatest: add support for skipping verification and random data setup dmatest: use pseudo random numbers dmatest: support xor-only, or pq-only channels in tests dmatest: restore ability to start test at module load and init dmatest: cleanup redundant "dmatest: " prefixes dmatest: replace stored results mechanism, with uniform messages Revert "dmatest: append verify result to results" ...
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block IO fixes from Jens Axboe: "Normally I'd defer my initial for-linus pull request until after the merge window, but a race was uncovered in the virtio-blk conversion to blk-mq that could cause hangs. So here's a small collection of fixes for you to pull: - The fix for the virtio-blk IO hang reported by Dave Chinner, from Shaohua and myself. - Add the Insert blktrace event for blk-mq. This makes 'btt' happy when it is doing it's state transition analysis. - Ensure that blk-mq has disk/partition stats enabled by default, instead of making it opt-in. - A fix for __bio_add_page() and large sector counts" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add blktrace insert event trace virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations blk-mq: ensure that we set REQ_IO_STAT so diskstats work bio: fix argument of __bio_add_page() for max_sectors > 0xffff
-
git://neil.brown.name/mdLinus Torvalds authored
Pull md update from Neil Brown: "Mostly optimisations and obscure bug fixes. - raid5 gets less lock contention - raid1 gets less contention between normal-io and resync-io during resync" * tag 'md/3.13' of git://neil.brown.name/md: md/raid5: Use conf->device_lock protect changing of multi-thread resources. md/raid5: Before freeing old multi-thread worker, it should flush them. md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE. UAPI: include <asm/byteorder.h> in linux/raid/md_p.h raid1: Rewrite the implementation of iobarrier. raid1: Add some macros to make code clearly. raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array. raid1: Add a field array_frozen to indicate whether raid in freeze state. md: Convert use of typedef ctl_table to struct ctl_table md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes. md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread. md: fix some places where mddev_lock return value is not checked. raid5: Retry R5_ReadNoMerge flag when hit a read error. raid5: relieve lock contention in get_active_stripe() raid5: relieve lock contention in get_active_stripe() wait: add wait_event_cmd() md/raid5.c: add proper locking to error path of raid5_start_reshape. md: fix calculation of stacking limits on level change. raid5: Use slow_path to release stripe when mddev->thread is null
-
Chen Gang authored
For all uapi headers, need use "_UAPI" prefix for its guard macro (which will be stripped by "scripts/headers_installer.sh"). Also remove redundant files (bitsperlong.h, errno.h, fcntl.h, ioctl.h, ioctls.h, ipcbuf.h, kvm_para.h, mman.h, poll.h, resource.h, siginfo.h, statfs.h, and unistd.h) which are already in Kbuild. Also be sure that all "#endif" only have one empty line above, and each file has guard macro. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
-
Eirik Aanonsen authored
This re-enables kprobes on AVR32 architecture. Signed-off-by: Eirik Aanonsen <eaa@wprmedical.com> Signed-off-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
-
Andreas Bießmann authored
This patch fixes following error (for big kernels): ---8<--- arch/avr32/boot/u-boot/head.o: In function `no_tag_table': (.init.text+0x44): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o arch/avr32/kernel/built-in.o: In function `bad_return': (.ex.text+0x236): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o --->8--- It comes up when the kernel increases and 'panic()' is too far away to fit in the +/- 2MiB range. Which in turn issues from the 21-bit displacement in 'br{cond4}' mnemonic which is one of the two ways to do jumps (rjmp has just 10-bit displacement and therefore a way smaller range). This fact was stated before in 8d29b7b9. One solution to solve this is to add a local storage for the symbol address and just load the $pc with that value. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
-
Andreas Bießmann authored
Before the CRT was (fully) set up in kernel_entry (bss cleared before in _start, but also not before jump to panic() in no_tag_table case). This patch fixes this up to have a fully working CRT when branching to panic() in no_tag_table. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
-
Phillip Lougher authored
Fix static checker complaint that stream is not checked in squashfs_decompressor_destroy(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
-
Phillip Lougher authored
This introduces an implementation of squashfs_readpage_block() that directly decompresses into the page cache. This uses the previously added page handler abstraction to push down the necessary kmap_atomic/kunmap_atomic operations on the page cache buffers into the decompressors. This enables direct copying into the page cache without using the slow kmap/kunmap calls. The code detects when multiple threads are racing in squashfs_readpage() to decompress the same block, and avoids this regression by falling back to using an intermediate buffer. This patch enhances the performance of Squashfs significantly when multiple processes are accessing the filesystem simultaneously because it not only reduces memcopying, but it more importantly eliminates the lock contention on the intermediate buffer. Using single-thread decompression. dd if=file1 of=/dev/null bs=4096 & dd if=file2 of=/dev/null bs=4096 & dd if=file3 of=/dev/null bs=4096 & dd if=file4 of=/dev/null bs=4096 Before: 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s After: 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
-
Phillip Lougher authored
Restructure squashfs_readpage() splitting it into separate functions for datablocks, fragments and sparse blocks. Move the memcpying (from squashfs cache entry) implementation of squashfs_readpage_block into file_cache.c This allows different implementations to be supported. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
-
Phillip Lougher authored
Further generalise the decompressors by adding a page handler abstraction. This adds helpers to allow the decompressors to access and process the output buffers in an implementation independant manner. This allows different types of output buffer to be passed to the decompressors, with the implementation specific aspects handled at decompression time, but without the knowledge being held in the decompressor wrapper code. This will allow the decompressors to handle Squashfs cache buffers, and page cache pages. This patch adds the abstraction and an implementation for the caches. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
-
Phillip Lougher authored
Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: * the nature of percpu variables ensures decompression is load-balanced across the multiple cores. * simplicity. Disadvantages: it limits decompression to one thread per core. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
-
Minchan Kim authored
Now squashfs have used for only one stream buffer for decompression so it hurts parallel read performance so this patch supports multiple decompressor to enhance performance parallel I/O. Four 1G file dd read on KVM machine which has 2 CPU and 4G memory. dd if=test/test1.dat of=/dev/null & dd if=test/test2.dat of=/dev/null & dd if=test/test3.dat of=/dev/null & dd if=test/test4.dat of=/dev/null & old : 1m39s -> new : 9s * From v1 * Change comp_strm with decomp_strm - Phillip * Change/add comments - Phillip Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
-
Phillip Lougher authored
The decompressor interface and code was written from the point of view of single-threaded operation. In doing so it mixed a lot of single-threaded implementation specific aspects into the decompressor code and elsewhere which makes it difficult to seamlessly support multiple different decompressor implementations. This patch does the following: 1. It removes compressor_options parsing from the decompressor init() function. This allows the decompressor init() function to be dynamically called to instantiate multiple decompressors, without the compressor options needing to be read and parsed each time. 2. It moves threading and all sleeping operations out of the decompressors. In doing so, it makes the decompressors non-blocking wrappers which only deal with interfacing with the decompressor implementation. 3. It splits decompressor.[ch] into decompressor generic functions in decompressor.[ch], and moves the single threaded decompressor implementation into decompressor_single.c. The result of this patch is Squashfs should now be able to support multiple decompressors by adding new decompressor_xxx.c files with specialised implementations of the functions in decompressor_single.c Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
-
Jens Axboe authored
We need it to make 'btt' from blktrace happy, otherwise we are missing one state transition. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Shaohua Li authored
It isn't safe to call it without holding the vblk->vq_lock. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Fixed another condition of virtqueue_kick() not holding the lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Mahesh Rajashekhara authored
It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Nov, 2013 17 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc fixes from David Miller: "Two merge window fallout build fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: merge fix sparc64: fix build regession
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linuxLinus Torvalds authored
Pull ia64 fix from Tony Luck: "Unbreak ia64 build by avoiding circular dependency" * tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: kernel/bounds: avoid circular dependencies in generated headers
-
Kirill A. Shutemov authored
<linux/spinlock.h> has heavy dependencies on other header files. It triggers circular dependencies in generated headers on IA64, at least: CC kernel/bounds.s In file included from /home/space/kas/git/public/linux/arch/ia64/include/asm/thread_info.h:9:0, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from arch/ia64/include/generated/asm/preempt.h:1, from include/linux/preempt.h:18, from include/linux/spinlock.h:50, from kernel/bounds.c:14: /home/space/kas/git/public/linux/arch/ia64/include/asm/asm-offsets.h:1:35: fatal error: generated/asm-offsets.h: No such file or directory compilation terminated. Let's replace <linux/spinlock.h> with <linux/spinlock_types.h>, it's enough to find out size of spinlock_t. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-and-Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
David S. Miller authored
Johannes Berg says: ==================== genetlink: clean up multicast group APIs The generic netlink multicast group registration doesn't have to be dynamic, and can thus be simplified just like I did with the ops. This removes some complexity in registration code. Additionally, two users of generic netlink already use multicast groups in a wrong way, add workarounds for those two to keep the userspace API working, but at the same time make them not clash with other users of multicast groups as might happen now. While making it all a bit easier, also prevent such abuse by adding checks to the APIs so each family can only use the groups it owns. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Add a static inline to generic netlink to wrap netlink_set_err() to make it easier to use here - use it in openvswitch (the only generic netlink user of netlink_set_err()). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There's no reason to have the family pointer there since it can just be passed internally where needed, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There are no users of this API remaining, and we'll soon change group registration to be static (like ops are now) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
There's no need to unregister the multicast group if the generic netlink family is registered immediately after. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
The drop monitor code is abusing the genetlink API and is statically using the generic netlink multicast group 1, even if that group belongs to somebody else (which it invariably will, since it's not reserved.) Make the drop monitor code use the proper APIs to reserve a group ID, but also reserve the group id 1 in generic netlink code to preserve the userspace API. Since drop monitor can be a module, don't clear the bit for it on unregistration. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrey Vagin authored
snd_nxt must be updated synchronously with sk_send_head. Otherwise tp->packets_out may be updated incorrectly, what may bring a kernel panic. Here is a kernel panic from my host. [ 103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150 ... [ 146.301158] Call Trace: [ 146.301158] [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0 Before this panic a tcp socket was restored. This socket had sent and unsent data in the write queue. Sent data was restored in repair mode, then the socket was switched from reapair mode and unsent data was restored. After that the socket was switched back into repair mode. In that moment we had a socket where write queue looks like this: snd_una snd_nxt write_seq |_________|________| | sk_send_head After a second switching from repair mode the state of socket was changed: snd_una snd_nxt, write_seq |_________ ________| | sk_send_head This state is inconsistent, because snd_nxt and sk_send_head are not synchronized. Bellow you can find a call trace, how packets_out can be incremented twice for one skb, if snd_nxt and sk_send_head are not synchronized. In this case packets_out will be always positive, even when sk_write_queue is empty. tcp_write_wakeup skb = tcp_send_head(sk); tcp_fragment if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq)) tcp_adjust_pcount(sk, skb, diff); tcp_event_new_data_sent tp->packets_out += tcp_skb_pcount(skb); I think update of snd_nxt isn't required, when a socket is switched from repair mode. Because it's initialized in tcp_connect_init. Then when a write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent, so it's always is in consistent state. I have checked, that the bug is not reproduced with this patch and all tests about restoring tcp connections work fine. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ying Xue authored
init_card() calls dev_get_by_name() to get a network deceive. But it doesn't decrease network device reference count after the device is used. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
fan.du authored
After searching rt by the vti tunnel dst/src parameter, if this rt has neither attached to any transformation nor the transformation is not tunnel oriented, this rt should be released back to ip layer. otherwise causing dst memory leakage. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-