- 18 Nov, 2014 1 commit
-
-
Joe Stringer authored
Suggested-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Nov, 2014 9 commits
-
-
Daniel Borkmann authored
It has been reported that generating an MLD listener report on devices with large MTUs (e.g. 9000) and a high number of IPv6 addresses can trigger a skb_over_panic(): skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20 head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0 dev:port1 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:100! invalid opcode: 0000 [#1] SMP Modules linked in: ixgbe(O) CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4 [...] Call Trace: <IRQ> [<ffffffff80578226>] ? skb_put+0x3a/0x3b [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70 mld_newpack() skb allocations are usually requested with dev->mtu in size, since commit 72e09ad1 ("ipv6: avoid high order allocations") we have changed the limit in order to be less likely to fail. However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb) macros, which determine if we may end up doing an skb_put() for adding another record. To avoid possible fragmentation, we check the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong assumption as the actual max allocation size can be much smaller. The IGMP case doesn't have this issue as commit 57e1ab6e ("igmp: refine skb allocations") stores the allocation size in the cb[]. Set a reserved_tailroom to make it fit into the MTU and use skb_availroom() helper instead. This also allows to get rid of igmp_skb_size(). Reported-by: Wei Liu <lw1a2.jing@gmail.com> Fixes: 72e09ad1 ("ipv6: avoid high order allocations") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: David L Stevens <david.stevens@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin Hauke authored
Added the USB VID/PID for the HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) Signed-off-by: Martin Hauke <mardnh@gmx.de> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitchDavid S. Miller authored
Pravin B Shelar says: ==================== Open vSwitch Following fixes are accumulated in ovs-repo. Three of them are related to protocol processing, one is related to memory leak in case of error and one is to fix race. Patch "Validate IPv6 flow key and mask values" has conflicts with net-next, Let me know if you want me to send the patch for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anish Bhatt authored
Solves possible lockup issues that can be seen from firmware DCB agents calling into the DCB app api. DCB firmware event queues can be tied in with NAPI so that dcb events are generated in softIRQ context. This can results in calls to dcb_*app() functions which try to take the dcb_lock. If the the event triggers while we also have the dcb_lock because lldpad or some other agent happened to be issuing a get/set command we could see a cpu lockup. This code was not originally written with firmware agents in mind, hence grabbing dcb_lock from softIRQ context was not considered. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexey Khoroshilov authored
In case of any failure ieee802154fake_probe() just calls unregister_netdev(). But it does not look safe to unregister netdevice before it was registered. The patch implements straightforward resource deallocation in case of failure in ieee802154fake_probe(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter updates for your net tree, they are: 1) Fix missing initialization of the range structure (allocated in the stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann. 2) Make sure the data we receive from userspace contains the req_version structure, otherwise return an error incomplete on truncated input. From Dan Carpenter. 3) Fix handling og skb->sk which may cause incorrect handling of connections from a local process. Via Simon Horman, patch from Calvin Owens. 4) Fix wrong netns in nft_compat when setting target and match params structure. 5) Relax chain type validation in nft_compat that was recently included, this broke the matches that need to be run from the route chain type. Now iptables-test.py automated regression tests report success again and we avoid the only possible problematic case, which is the use of nat targets out of nat chain type. 6) Use match->table to validate the tablename, instead of the match->name. Again patch for nft_compat. 7) Restore the synchronous release of objects from the commit and abort path in nf_tables. This is causing two major problems: splats when using nft_compat, given that matches and targets may sleep and call_rcu is invoked from softirq context. Moreover Patrick reported possible event notification reordering when rules refer to anonymous sets. 8) Fix race condition in between packets that are being confirmed by conntrack and the ctnetlink flush operation. This happens since the removal of the central spinlock. Thanks to Jesper D. Brouer to looking into this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Ogness authored
The TX_IN_SEL offset for the CPSW_PORT/TX_IN_CTL register was incorrect. This caused the Dual MAC mode to never get set when it should. It also caused possible unintentional setting of a bit in the CPSW_PORT/TX_BLKS_REM register. The purpose of setting the Dual MAC mode for this register is to: "... allow packets from both ethernet ports to be written into the FIFO without one port starving the other port." - AM335x ARM TRM Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hannes Frederic Sowa authored
Otherwise the exported symbols might be discarded because of no users in vmlinux. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Panu Matilainen authored
Trying to add an unreachable route incorrectly returns -ESRCH if if custom FIB rules are present: [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: Network is unreachable [root@localhost ~]# ip rule add to 55.66.77.88 table 200 [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4 RTNETLINK answers: No such process [root@localhost ~]# Commit 83886b6b ("[NET]: Change "not found" return value for rule lookup") changed fib_rules_lookup() to use -ESRCH as a "not found" code internally, but for user space it should be translated into -ENETUNREACH. Handle the translation centrally in ipv4-specific fib_lookup(), leaving the DECnet case alone. On a related note, commit b7a71b51 ("ipv4: removed redundant conditional") removed a similar translation from ip_route_input_slow() prematurely AIUI. Fixes: b7a71b51 ("ipv4: removed redundant conditional") Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 14 Nov, 2014 30 commits
-
-
Jarno Rajahalme authored
Reject flow label key and mask values with invalid bits set. Introduced by commit 3fdbd1ce ("openvswitch: add ipv6 'set' action"). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
-
Pravin B Shelar authored
dp read operations depends on ovs_dp_cmd_fill_info(). This API needs to looup vport to find dp name, but vport lookup can fail. Therefore to keep vport reference alive we need to take ovs lock. Introduced by commit 6093ae9a ("openvswitch: Minimize dp and vport critical sections"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
-
Daniele Di Proietto authored
match_validate() enforce that a mask matching on NDP attributes has also an exact match on ICMPv6 type. The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of 'struct sw_flow_key', which is 16-bit wide. Therefore, an exact match on ICMPv6 type should only check the first 8 bits. This commit fixes a bug that prevented flows with an exact match on NDP field from being installed Introduced by commit 03f0d916 ("openvswitch: Mega flow implementation"). Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
-
Jesse Gross authored
The checksum of ICMPv6 packets uses the IP pseudoheader as part of the calculation, unlike ICMP in IPv4. This was not implemented, which means that modifying the IP addresses of an ICMPv6 packet would cause the checksum to no longer be correct as the psuedoheader did not match. Introduced by commit 3fdbd1ce ("openvswitch: add ipv6 'set' action"). Reported-by: Neal Shrader <icosahedral@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
-
Pravin B Shelar authored
Need to free memory in case of sample action error. Introduced by commit 651887b0 ("openvswitch: Sample action without side effects"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
-
David S. Miller authored
Joe Stringer says: ==================== Implement ndo_gso_check() for vxlan nics Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other UDP-based encapsulation protocols where the format and size of the header may differ. This patch series implements a generic ndo_gso_check() for detecting VXLAN, then reuses it for these NICs. Implementation shamelessly stolen from Tom Herbert (with minor fixups): http://thread.gmane.org/gmane.linux.network/332428/focus=333111 v2: Drop i40e/fm10k patches (code diverged; handling separately). Refactor common code into vxlan_gso_check() helper. Minor style fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Stringer authored
Use vxlan_gso_check() to advertise offload support for this NIC. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Stringer authored
Use vxlan_gso_check() to advertise offload support for this NIC. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Stringer authored
Use vxlan_gso_check() to advertise offload support for this NIC. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Sathya Perla <sperla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Stringer authored
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other UDP-based encapsulation protocols where the format and size of the header differs. This patch implements a generic ndo_gso_check() for VXLAN which will only advertise GSO support when the skb looks like it contains VXLAN (or no UDP tunnelling at all). Implementation shamelessly stolen from Tom Herbert: http://thread.gmane.org/gmane.linux.network/332428/focus=333111Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessDavid S. Miller authored
John W. Linville says: ==================== pull request: wireless 2014-11-13 Please pull this set of a few more wireless fixes intended for the 3.18 stream... For the mac80211 bits, Johannes says: "This has just one fix, for an issue with the CCMP decryption that can cause a kernel crash. I'm not sure it's remotely exploitable, but it's an important fix nonetheless." For the iwlwifi bits, Emmanuel says: "Two fixes here - we weren't updating mac80211 if a scan was cut short by RFKILL which confused cfg80211. As a result, the latter wouldn't allow to run another scan. Liad fixes a small bug in the firmware dump." On top of that... Arend van Spriel corrects a channel width conversion that caused a WARNING in brcmfmac. Hauke Mehrtens avoids a NULL pointer dereference in b43. Larry Finger hits a trio of rtlwifi bugs left over from recent backporting from the Realtek vendor driver. Miaoqing Pan fixes a clocking problem in ath9k that could affect packet timestamps and such. Stanislaw Gruszka addresses an payload alignment issue that has been plaguing rt2x00. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vincent BENAYOUN authored
There could be a signed overflow in the following code. The expression, (32-logmask) is comprised between 0 and 31 included. It may be equal to 31. In such a case the left shift will produce a signed integer overflow. According to the C99 Standard, this is an undefined behavior. A simple fix is to replace the signed int 1 with the unsigned int 1U. Signed-off-by: Vincent BENAYOUN <vincent.benayoun@trust-in-soft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
bill bonaparte authored
After removal of the central spinlock nf_conntrack_lock, in commit 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock"), it is possible to race against get_next_corpse(). The race is against the get_next_corpse() cleanup on the "unconfirmed" list (a per-cpu list with seperate locking), which set the DYING bit. Fix this race, in __nf_conntrack_confirm(), by removing the CT from unconfirmed list before checking the DYING bit. In case race occured, re-add the CT to the dying list. While at this, fix coding style of the comment that has been updated. Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: bill bonaparte <programme110@gmail.com> Signed-off-by: bill bonaparte <programme110@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio bugfix from Michael S Tsirkin: "This fixes a crash in virtio console multi-channel mode that got introduced in -rc1" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_console: move early VQ enablement
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) sunhme driver lacks DMA mapping error checks, based upon a report by Meelis Roos. 2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee. 3) DMA memory allocation sizes are wrong in systemport ethernet driver, fix from Florian Fainelli. 4) Fix use after free in mac80211 defragmentation code, from Johannes Berg. 5) Some networking uapi headers missing from Kbuild file, from Stephen Hemminger. 6) TUN driver gets csum_start offset wrong when VLAN accel is enabled, and macvtap has a similar bug, from Herbert Xu. 7) Adjust several tunneling drivers to set dev->iflink after registry, because registry sets that to -1 overwriting whatever we did. From Steffen Klassert. 8) Geneve forgets to set inner tunneling type, causing GSO segmentation to fail on some NICs. From Jesse Gross. 9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and Giuseppe CAVALLARO. 10) Fix spurious timeouts with NewReno on low traffic connections, from Marcelo Leitner. 11) Fix descriptor updates in enic driver, from Govindarajulu Varadarajan. 12) PPP calls bpf_prog_create() with locks held, which isn't kosher. Fix from Takashi Iwai. 13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel Borkmann. 14) psock_fanout selftest accesses past the end of the mmap ring, fix from Shuah Khan. 15) Fix PTP timestamping for VLAN packets, from Richard Cochran. 16) netlink_unbind() calls in netlink pass wrong initial argument, from Hiroaki SHIMODA. 17) vxlan socket reuse accidently reuses a socket when the address family is different, so we have to explicitly check this, from Marcelo Lietner. 18) Fix missing include in nft_reject_bridge.c breaking the build on ppc and other architectures, from Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits) vxlan: Do not reuse sockets for a different address family smsc911x: power-up phydev before doing a software reset. lib: rhashtable - Remove weird non-ASCII characters from comments net/smsc911x: Fix delays in the PHY enable/disable routines net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode netlink: Properly unbind in error conditions. net: ptp: fix time stamp matching logic for VLAN packets. cxgb4 : dcb open-lldp interop fixes selftests/net: psock_fanout seg faults in sock_fanout_read_ring() net: bcmgenet: apply MII configuration in bcmgenet_open() net: bcmgenet: connect and disconnect from the PHY state machine net: qualcomm: Fix dependency ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx net: phy: Correctly handle MII ioctl which changes autonegotiation. ipv6: fix IPV6_PKTINFO with v4 mapped net: sctp: fix memory leak in auth key management net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet net: ppp: Don't call bpf_prog_create() in ppp_lock net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set cxgb4 : Fix bug in DCB app deletion ...
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: add IIO include files kernel/panic.c: update comments for print_tainted mem-hotplug: reset node present pages when hot-adding a new pgdat mem-hotplug: reset node managed pages when hot-adding a new pgdat mm/debug-pagealloc: correct freepage accounting and order resetting fanotify: fix notification of groups with inode & mount marks mm, compaction: prevent infinite loop in compact_zone mm: alloc_contig_range: demote pages busy message from warn to info mm/slab: fix unalignment problem on Malta with EVA due to slab merge mm/page_alloc: restrict max order of merging on isolated pageblock mm/page_alloc: move freepage counting logic to __free_one_page() mm/page_alloc: add freepage on isolate pageblock to correct buddy list mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype mm/compaction: skip the range until proper target pageblock is met zram: avoid kunmap_atomic() of a NULL pointer
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
Pull Ceph fixes from Sage Weil: "There is an overflow bug fix for cephfs from Zheng, a fix for handling large authentication ticket buffers in libceph from Ilya, and a few fixes for the request handling code from Ilya that affect RBD volumes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: change from BUG to WARN for __remove_osd() asserts libceph: clear r_req_lru_item in __unregister_linger_request() libceph: unlink from o_linger_requests when clearing r_osd libceph: do not crash on large auth tickets ceph: fix flush tid comparision
-
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hidLinus Torvalds authored
Pull HID fixes from Jiri Kosina: - fix for an oops in HID core upon repeated subdriver insertion/removal under certain circumstances, by Benjamin Tissoires - quirk for another Elan Touchscreen device, by Adel Gadllah * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: core: cleanup .claimed field on disconnect HID: usbhid: enable always-poll quirk for Elan Touchscreen 0103
-
Daniel Baluta authored
Files under include/linux/iio were not reported as part of the IIO subsystem. Signed-off-by: Daniel Baluta <daniel.baluta@intel.com> Reported-by: Cristina Ciocan <cristina.ciocan@intel.com> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Peter Meerwald <pmeerw@pmeerw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xie XiuQi authored
Commit 69361eef ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag, but failed to update the comments for print_tainted(). So, update the comments. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
When memory is hot-added, all the memory is in offline state. So clear all zones' present_pages because they will be updated in online_pages() and offline_pages(). Otherwise, /proc/zoneinfo will corrupt: When the memory of node2 is offline: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 8388608 managed 0 When we online memory on node2: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 16777216 managed 8388608 Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
One thing I did in this patch is fixing freepage accounting. If we clear guard page and link it onto isolate buddy list, we should not increase freepage count. This patch adds conditional branch to skip counting in this case. Without this patch, this overcounting happens frequently if guard order is set and CMA is used. Another thing fixed in this patch is the target to reset order. In __free_one_page(), we check the buddy page whether it is a guard page or not. And, if so, we should clear guard attribute on the buddy page and reset order of it to 0. But, current code resets original page's order rather than buddy one's. Maybe, this doesn't have any problem, because whole merged page's order will be re-assigned soon. But, it is better to correct code. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Several people have reported occasionally seeing processes stuck in compact_zone(), even triggering soft lockups, in 3.18-rc2+. Testing a revert of commit e14c720e ("mm, compaction: remember position within pageblock in free pages scanner") fixed the issue, although the stuck processes do not appear to involve the free scanner. Finally, by code inspection, the bug was found in isolate_migratepages() which uses a slightly different condition to detect if the migration and free scanners have met, than compact_finished(). That has not been a problem until commit e14c720e allowed the free scanner position between individual invocations to be in the middle of a pageblock. In a relatively rare case, the migration scanner position can end up at the beginning of a pageblock, with the free scanner position in the middle of the same pageblock. If it's the migration scanner's turn, isolate_migratepages() exits immediately (without updating the position), while compact_finished() decides to continue compaction, resulting in a potentially infinite loop. The system can recover only if another process creates enough high-order pages to make the watermark checks in compact_finished() pass. This patch fixes the immediate problem by bumping the migration scanner's position to meet the free scanner in isolate_migratepages(), when both are within the same pageblock. This causes compact_finished() to terminate properly. A more robust check in compact_finished() is planned as a cleanup for better future maintainability. Fixes: e14c720e ("mm, compaction: remember position within pageblock in free pages scanner) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: P. Christeas <xrg@linux.gr> Tested-by: P. Christeas <xrg@linux.gr> Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2Reported-by: Norbert Preining <preining@logic.at> Tested-by: Norbert Preining <preining@logic.at> Link: https://lkml.org/lkml/2014/11/4/904Reported-by: Pavel Machek <pavel@ucw.cz> Link: https://lkml.org/lkml/2014/11/7/164 Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Nazarewicz authored
Having test_pages_isolated failure message as a warning confuses users into thinking that it is more serious than it really is. In reality, if called via CMA, allocation will be retried so a single test_pages_isolated failure does not prevent allocation from succeeding. Demote the warning message to an info message and reformat it such that the text "failed" does not appear and instead a less worrying "PFNS busy" is used. This message is trivially reproducible on a 10GB x86 machine on 3.16.y kernels configured with CONFIG_DMA_CMA. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Unlike SLUB, sometimes, object isn't started at the beginning of the slab in SLAB. This causes the unalignment problem after slab merging is supported by commit 12220dea ("mm/slab: support slab merge"). Following is the report from Markos that fail to boot on Malta with EVA. Calibrating delay loop... 19.86 BogoMIPS (lpj=99328) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 4096 (order: 0, 16384 bytes) Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes) Kernel bug detected[#1]: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea #1631 task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000 epc : 80141190 alloc_unbound_pwq+0x234/0x304 Not tainted ra : 80141184 alloc_unbound_pwq+0x228/0x304 Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000) Call Trace: alloc_unbound_pwq+0x234/0x304 apply_workqueue_attrs+0x11c/0x294 __alloc_workqueue_key+0x23c/0x470 init_workqueues+0x320/0x400 do_one_initcall+0xe8/0x23c kernel_init_freeable+0x9c/0x224 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c [ end trace cb88537fdc8fa200 ] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b alloc_unbound_pwq() allocates slab object from pool_workqueue. This kmem_cache requires 256 bytes alignment, but, current merging code doesn't honor that, and merge it with kmalloc-256. kmalloc-256 requires only cacheline size alignment so that above failure occurs. However, in x86, kmalloc-256 is luckily aligned in 256 bytes, so the problem didn't happen on it. To fix this problem, this patch introduces alignment mismatch check in find_mergeable(). This will fix the problem. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Markos Chandras <Markos.Chandras@imgtec.com> Tested-by: Markos Chandras <Markos.Chandras@imgtec.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
Current pageblock isolation logic could isolate each pageblock individually. This causes freepage accounting problem if freepage with pageblock order on isolate pageblock is merged with other freepage on normal pageblock. We can prevent merging by restricting max order of merging to pageblock order if freepage is on isolate pageblock. A side-effect of this change is that there could be non-merged buddy freepage even if finishing pageblock isolation, because undoing pageblock isolation is just to move freepage from isolate buddy list to normal buddy list rather than to consider merging. So, the patch also makes undoing pageblock isolation consider freepage merge. When un-isolation, freepage with more than pageblock order and it's buddy are checked. If they are on normal pageblock, instead of just moving, we isolate the freepage and free it in order to get merged. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
All the caller of __free_one_page() has similar freepage counting logic, so we can move it to __free_one_page(). This reduce line of code and help future maintenance. This is also preparation step for "mm/page_alloc: restrict max order of merging on isolated pageblock" which fix the freepage counting problem on freepage with more than pageblock order. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joonsoo Kim authored
In free_pcppages_bulk(), we use cached migratetype of freepage to determine type of buddy list where freepage will be added. This information is stored when freepage is added to pcp list, so if isolation of pageblock of this freepage begins after storing, this cached information could be stale. In other words, it has original migratetype rather than MIGRATE_ISOLATE. There are two problems caused by this stale information. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage will be added to normal buddy list so that it could be allocated without any restriction. And the other problem is incorrect freepage accounting. Freepages on isolate pageblock should not be counted for number of freepage. Following is the code snippet in free_pcppages_bulk(). /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); if (likely(!is_migrate_isolate_page(page))) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); } As you can see above snippet, current code already handle second problem, incorrect freepage accounting, by re-fetching pageblock migratetype through is_migrate_isolate_page(page). But, because this re-fetched information isn't used for __free_one_page(), first problem would not be solved. This patch try to solve this situation to re-fetch pageblock migratetype before __free_one_page() and to use it for __free_one_page(). In addition to move up position of this re-fetch, this patch use optimization technique, re-fetching migratetype only if there is isolate pageblock. Pageblock isolation is rare event, so we can avoid re-fetching in common case with this optimization. This patch also correct migratetype of the tracepoint output. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-