- 12 Jan, 2017 9 commits
-
-
Vadim Lomovtsev authored
While probing BGX we requesting appropriate QLM for it's configuration and get LMAC count by that request. Then, while reading configured MAC values from SSDT table we need to save them in proper mapping: BGX[i]->lmac[j].mac = <MAC value> to later provide for initialization stuff. In order to fill such mapping properly we need to add lmac index to be used while acpi initialization since at this moment bgx->lmac_count already contains actual value. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
rtm_table is an 8-bit field while table ids are allowed up to u32. Commit 709772e6 ("net: Fix routing tables with id > 255 for legacy software") added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the table id is > 255. The table id returned on get route requests should do the same. Fixes: c36ba660 ("net: Allow user to get table id from route lookup") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Timur Tabi authored
Commit 6ffe1c4c ("net: qcom/emac: fix of_node and phydev leaks") fixed the problem with reference leaks on phydev, but the fix is device-tree specific. When the driver unloads, the reference is dropped only on DT systems. Instead, it's cleaner if up grab an reference on ACPI systems. When the driver unloads, we can drop the reference without having to check whether we're on a DT system. Signed-off-by: Timur Tabi <timur@codeaurora.org> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Handle failure in lwtunnel_fill_encap adding attributes to skb. Fixes: 571e7226 ("ipv4: support for fib route lwtunnel encap attributes") Fixes: 19e42e45 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kazuya Mizuguchi authored
Remove Rx overflow log messages as in an environment where logging results in network traffic logging may cause further overflows. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> [simon: reworked changelog] Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Couple of fixes Couple of simple fixes from Arkadi and Elad. Please queue these up for stable. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Elad Raz authored
The event_data starts from address 0x00-0x0C and not from 0x08-0x014. This leads to duplication with other fields in the Event Queue Element such as sub-type, cqn and owner. Fixes: eda6500a ("mlxsw: Add PCI bus implementation") Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
During transmission the skb is checked for headroom in order to add vendor specific header. In case the skb needs to be re-allocated, skb_realloc_headroom() is called to make a private copy of the original, but doesn't release it. Current code assumes that the original skb is released during reallocation and only releases it at the error path which causes a memory leak. Fix this by adding the original skb release to the main path. Fixes: d003462a ("mlxsw: Simplify mlxsw_sx_port_xmit function") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
During transmission the skb is checked for headroom in order to add vendor specific header. In case the skb needs to be re-allocated, skb_realloc_headroom() is called to make a private copy of the original, but doesn't release it. Current code assumes that the original skb is released during reallocation and only releases it at the error path which causes a memory leak. Fix this by adding the original skb release to the main path. Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 11 Jan, 2017 31 commits
-
-
stephen hemminger authored
The receive callback (in tasklet context) is using RCU to get reference to associated VF network device but this is not safe. RCU read lock needs to be held. Found by running with full lockdep debugging enabled. Fixes: f207c10d ("hv_netvsc: use RCU to protect vf_netdev") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martynas Pumputis authored
Otherwise, a xfrm policy with sport/dport being set cannot be matched. Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Fix the hw rx checksum is always enabled, and the user couldn't switch it to sw rx checksum. Note that the RTL_VER_01 only support sw rx checksum only. Besides, the hw rx checksum for RTL_VER_02 is disabled after commit b9a321b4 ("r8152: Fix broken RX checksums."). Re-enable it. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Torvalds authored
Merge fixes from Andrew Morton: "27 fixes. There are three patches that aren't actually fixes. They're simple function renamings which are nice-to-have in mainline as ongoing net development depends on them." * akpm: (27 commits) timerfd: export defines to userspace mm/hugetlb.c: fix reservation race when freeing surplus pages mm/slab.c: fix SLAB freelist randomization duplicate entries zram: support BDI_CAP_STABLE_WRITES zram: revalidate disk under init_lock mm: support anonymous stable page mm: add documentation for page fragment APIs mm: rename __page_frag functions to __page_frag_cache, drop order from drain mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free mm, memcg: fix the active list aging for lowmem requests when memcg is enabled mm: don't dereference struct page fields of invalid pages mailmap: add codeaurora.org names for nameless email commits signal: protect SIGNAL_UNKILLABLE from unintentional clearing. mm: pmd dirty emulation in page fault handler ipc/sem.c: fix incorrect sem_lock pairing lib/Kconfig.debug: fix frv build failure mm: get rid of __GFP_OTHER_NODE mm: fix remote numa hits statistics mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} ocfs2: fix crash caused by stale lvb with fsdlm plugin ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Fix rtlwifi crash, from Larry Finger. 2) Memory disclosure in appletalk ipddp routing code, from Vlad Tsyrklevich. 3) r8152 can erroneously split an RX packet into multiple URBs if the Rx FIFO is not empty when we suspend. Fix this by waiting for the FIFO to empty before suspending. From Hayes Wang. 4) Two GRO fixes (enter slow path when not enough SKB tail room exists, disable frag0 optimizations when there are IPV6 extension headers) from Eric Dumazet and Herbert Xu. 5) A series of mlx5e bug fixes (do source udp port offloading for tunnels properly, Ip fragment matching fixes, handling firmware errors properly when installing TC rules, etc.) from Saeed Mahameed, Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel Jurgens. 6) Two VRF fixes from David Ahern (don't skip multipath selection for VRF paths, disallow VRF to be configured with table ID 0). * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) net: vrf: do not allow table id 0 net: phy: marvell: fix Marvell 88E1512 used in SGMII mode sctp: Fix spelling mistake: "Atempt" -> "Attempt" net: ipv4: Fix multipath selection with vrf cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig gro: use min_t() in skb_gro_reset_offset() net/mlx5: Only cancel recovery work when cleaning up device net/mlx5e: Remove WARN_ONCE from adaptive moderation code net/mlx5e: Un-register uplink representor on nic_disable net/mlx5e: Properly handle FW errors while adding TC rules net/mlx5e: Fix kbuild warnings for uninitialized parameters net/mlx5e: Set inline mode requirements for matching on IP fragments net/mlx5e: Properly get address type of encapsulation IP headers net/mlx5e: TC ipv4 tunnel encap offload error flow fixes net/mlx5e: Warn when rejecting offload attempts of IP tunnels net/mlx5e: Properly handle offloading of source udp port for IP tunnels gro: Disable frag0 optimization on IPv6 ext headers gro: Enter slow-path if there is no tailroom mlx4: Return EOPNOTSUPP instead of ENOTSUPP net/af_iucv: don't use paged skbs for TX on HiperSockets ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fix from Herbert Xu: "This fixes a regression in aesni that renders it useless if it's built-in with a modular pcbc configuration" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - Fix failure when built-in with modular pcbc
-
David Ahern authored
Frank reported that vrf devices can be created with a table id of 0. This breaks many of the run time table id checks and should not be allowed. Detect this condition at create time and fail with EINVAL. Fixes: 193125db ("net: Introduce VRF device driver") Reported-by: Frank Kellermann <frank.kellermann@atos.net> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King authored
When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the fiber page is used for the SGMII host-side connection. The PHY driver notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page for the link status, and ends up reading the MAC-side status instead of the outgoing (copper) link. This leads to incorrect results reported via ethtool. If the PHY is connected via SGMII to the host, ignore the fiber page. However, continue to allow the existing power management code to suspend and resume the fiber page. Fixes: 6cfb3bcc ("Marvell phy: check link status in case of fiber link.") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Trivial fix to spelling mistake in WARN_ONCE message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
fib_select_path does not call fib_select_multipath if oif is set in the flow struct. For VRF use cases oif is always set, so multipath route selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif check similar to what is done in fib_table_lookup. Add saddr and proto to the flow struct for the fib lookup done by the VRF driver to better match hash computation for a flow. Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
On 32bit arches, (skb->end - skb->data) is not 'unsigned int', so we shall use min_t() instead of min() to avoid a compiler error. Fixes: 1272ce87 ("gro: Enter slow-path if there is no tailroom") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Saeed Mahameed says: ==================== Mellanox mlx5 fixes and cleanups 2017-01-10 This series includes some mlx5e general cleanups from Daniel, Gil, Hadar and myself. Also it includes some critical mlx5e TC offloads fixes from Or Gerlitz. For -stable: - net/mlx5e: Remove WARN_ONCE from adaptive moderation code Although this fix doesn't affect any functionality, I thought it is better to clean this -WARN_ONCE- up for -stable in case someone hits such corner case. Please apply and let me know if there's any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Jurgens authored
Do not attempt to drain the health workqueue when unloading the device in the recovery flow, this can cause a deadlock when the recovery work tries to cancel itself with sync. Because the work is no longer unconditionally canceled when unloading, it must be explicitly canceled in the AER flow. fixes: 689a248d ("net/mlx5: Cancel recovery work in remove flow") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gil Rockah authored
When trying to do interface down or changing interface configuration under heavy traffic, some of the adaptive moderation corner cases can occur and leave a WARN_ONCE call trace in the kernel log. Those WARN_ONCE are meant for debug only, and should have been inserted only under debug. We avoid such call traces by removing those WARN_ONCE. Fixes: cb3c7fd4 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Gil Rockah <gilr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
The code before this patch registered uplink e-Switch representor on nic_enable and unregistered on nic_cleanup, the right place for this unregister is in nic_disable. Fixes: 127ea380 ("net/mlx5: Add Representors registration API") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
When the firmware returns an error (common example is an attempt to add twice the same rule which is refused by the some FWs), we are not properly derefing/cleaning few resources allocated on the way. Examples are vport vlan deref under eswitch vlan offloads, and encap entry/neighbour deref under eswitch encapsulation offloads, fix that. Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Fixes: 8b32580d ('net/mlx5e: Add TC vlan action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
kbuild warn about parameters that may be used uninitialized, fix it. Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
For e-switch level matching on packets being an IP fragment, we need to make sure the source vport inline mode is L3, fix that. Fixes: 3f7d0eb4 ('net/mlx5e: Offload TC matching on packets being IP fragments') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
As done elsewhere in our TC/flower offload code, the address type of the encapsulation IP headers should be realized accroding to the addr_type field of the encapsulation control dissector key, do that. Fixes: bbd00f7e ('net/mlx5e: Add TC tunnel release action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
When the route lookup fails we should return the actual error. When the neigh isn't valid, we should return -EOPNOTSUPP as done in similar cases along the code. When the offload can't take place as of invalid neigh etc, we must release the neigh. Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
We silently reject offloading of IPv6 tunnels, non vxlan tunnels, vxlan tunnels where the dst port to match is not provided, etc. Be a bit more verbose and print a warning so the user better realizes what went wrong here and can fix it. Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Fixes: bbd00f7e ('net/mlx5e: Add TC tunnel release action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Or Gerlitz authored
We can offload the matching on source udp port of ip tunnels for decapsulation. We can not offload setting source udp port for tunnels as part of encapsulation. Fix both the code that deals with matching offload (decap) and the code that deal with encap offload to align with that. Fixes: a54e20b4 ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads') Fixes: bbd00f7e ('net/mlx5e: Add TC tunnel release action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mike Frysinger authored
Since userspace is expected to call timerfd syscalls directly with these flags/ioctls, make sure we export them so they don't have to duplicate the values themselves. Link: http://lkml.kernel.org/r/20161219064052.7196-1-vapier@gentoo.orgSigned-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
return_unused_surplus_pages() decrements the global reservation count, and frees any unused surplus pages that were backing the reservation. Commit 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") added a call to cond_resched_lock in the loop freeing the pages. As a result, the hugetlb_lock could be dropped, and someone else could use the pages that will be freed in subsequent iterations of the loop. This could result in inconsistent global hugetlb page state, application api failures (such as mmap) failures or application crashes. When dropping the lock in return_unused_surplus_pages, make sure that the global reservation count (resv_huge_pages) remains sufficiently large to prevent someone else from claiming pages about to be freed. Analyzed by Paul Cassella. Fixes: 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Paul Cassella <cassella@cray.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
John Sperbeck authored
This patch fixes a bug in the freelist randomization code. When a high random number is used, the freelist will contain duplicate entries. It will result in different allocations sharing the same chunk. It will result in odd behaviours and crashes. It should be uncommon but it depends on the machines. We saw it happening more often on some machines (every few hours of running tests). Fixes: c7ce4f60 ("mm: SLAB freelist randomization") Link: http://lkml.kernel.org/r/20170103181908.143178-1-thgarnie@google.comSigned-off-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true without stable page support. So, If the data is changed under us, zram can make buffer overrun so that zsmalloc free object chain is broken so system goes crash like below https://bugzilla.suse.com/show_bug.cgi?id=997574 This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block device needing *stable write*". Fixes: da9556a2 ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hyeoncheol Lee <cheol.lee@lge.com> Cc: <yjay.kim@lge.com> Cc: Sangseok Lee <sangseok.lee@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Commit b4c5c609 ("zram: avoid lockdep splat by revalidate_disk") moved revalidate_disk call out of init_lock to avoid lockdep false-positive splat. However, commit 08eee69f ("zram: remove init_lock in zram_make_request") removed init_lock in IO path so there is no worry about lockdep splat. So, let's restore it. This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next patch. Fixes: da9556a2 ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hyeoncheol Lee <cheol.lee@lge.com> Cc: <yjay.kim@lge.com> Cc: Sangseok Lee <sangseok.lee@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
During developemnt for zram-swap asynchronous writeback, I found strange corruption of compressed page, resulting in: Modules linked in: zram(E) CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G E 4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff88007620b840 task.stack: ffff880078090000 RIP: set_freeobj.part.43+0x1c/0x1f RSP: 0018:ffff880078093ca8 EFLAGS: 00010246 RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8 RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246 RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88 R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0 Call Trace: obj_malloc+0x22b/0x260 zs_malloc+0x1e4/0x580 zram_bvec_rw+0x4cd/0x830 [zram] page_requests_rw+0x9c/0x130 [zram] zram_thread+0xe6/0x173 [zram] kthread+0xca/0xe0 ret_from_fork+0x25/0x30 With investigation, it reveals currently stable page doesn't support anonymous page. IOW, reuse_swap_page can reuse the page without waiting writeback completion so it can overwrite page zram is compressing. Unfortunately, zram has used per-cpu stream feature from v4.7. It aims for increasing cache hit ratio of scratch buffer for compressing. Downside of that approach is that zram should ask memory space for compressed page in per-cpu context which requires stricted gfp flag which could be failed. If so, it retries to allocate memory space out of per-cpu context so it could get memory this time and compress the data again, copies it to the memory space. In this scenario, zram assumes the data should never be changed but it is not true unless stable page supports. So, If the data is changed under us, zram can make buffer overrun because second compression size could be bigger than one we got in previous trial and blindly, copy bigger size object to smaller buffer which is buffer overrun. The overrun breaks zsmalloc free object chaining so system goes crash like above. I think below is same problem. https://bugzilla.suse.com/show_bug.cgi?id=997574 Unfortunately, reuse_swap_page should be atomic so that we cannot wait on writeback in there so the approach in this patch is simply return false if we found it needs stable page. Although it increases memory footprint temporarily, it happens rarely and it should be reclaimed easily althoug it happened. Also, It would be better than waiting of IO completion, which is critial path for application latency. Fixes: da9556a2 ("zram: user per-cpu compression streams") Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Hyeoncheol Lee <cheol.lee@lge.com> Cc: <yjay.kim@lge.com> Cc: Sangseok Lee <sangseok.lee@lge.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Duyck authored
This is a first pass at trying to add documentation for the page_frag APIs. They may still change over time but for now I thought I would try to get these documented so that as more network drivers and stack calls make use of them we have one central spot to document how they are meant to be used. Link: http://lkml.kernel.org/r/20170104024157.13451.6758.stgit@localhost.localdomainSigned-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Duyck authored
This patch does two things. First it goes through and renames the __page_frag prefixed functions to __page_frag_cache so that we can be clear that we are draining or refilling the cache, not the frags themselves. Second we drop the order parameter from __page_frag_cache_drain since we don't actually need to pass it since all fragments are either order 0 or must be a compound page. Link: http://lkml.kernel.org/r/20170104023954.13451.5678.stgit@localhost.localdomainSigned-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-