- 21 May, 2017 5 commits
-
-
Ihar Hrachyshka authored
The addr_type retrieval can be costly, so it's worth trying to avoid its calculation as much as possible. This patch makes it calculated only for gratuitous ARP packets. This is especially important since later we may want to move is_garp calculation outside of arp_accept block, at which point the costly operation will be executed for all setups. The patch is the result of a discussion in net-dev: http://marc.info/?l=linux-netdev&m=149506354216994Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ihar Hrachyshka authored
The code is quite involving already to earn a separate function for itself. If anything, it helps arp_process readability. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ihar Hrachyshka authored
the is_garp code deals just with gratuitous ARP packets, not every unsolicited packet. This patch is a result of a discussion in netdev: http://marc.info/?l=linux-netdev&m=149506354216994Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
When tcp_disconnect() is called, inet_csk_delack_init() sets icsk->icsk_ack.rcv_mss to 0. This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() => __tcp_select_window() call path to have division by 0 issue. So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 18 May, 2017 22 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds authored
Pull MD fixes from Shaohua Li: - Several bug fixes for raid5-cache from Song Liu, mainly handle journal disk error - Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak - Simplify external metadata array sysfs handling from Artur Paszkiewicz - Optimize raid0 discard handling from me, now raid0 will dispatch large discard IO directly to underlayer disks. * tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: prefer disk without bad blocks md/r5cache: handle sync with data in write back cache md/r5cache: gracefully handle journal device errors for writeback mode md/raid1/10: avoid unnecessary locking md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first md/md0: optimize raid0 discard handling md: don't return -EAGAIN in md_allow_write for external metadata arrays md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Don't allow negative TCP reordering values, from Soheil Hassas Yeganeh. 2) Don't overflow while parsing ipv6 header options, from Craig Gallek. 3) Handle more cleanly the case where an individual route entry during a dump will not fit into the allocated netlink SKB, from David Ahern. 4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann. 5) Allow neighbour updates to converge more quickly via gratuitous ARPs, from Ihar Hrachyshka. 6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet. 7) Fix use after free in x25 protocol init, from Lin Zhang. 8) Valid VLAN pvid ranges passed into br_validate(), from Tobias Jungel. 9) NULL out address lists in child sockets in SCTP, this is similar to the fix we made for inet connection sockets last week. From Eric Dumazet. 10) Fix NULL deref in mlxsw driver, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) mlxsw: spectrum: Avoid possible NULL pointer dereference sh_eth: Do not print an error message for probe deferral sh_eth: Use platform device for printing before register_netdev() mlxsw: spectrum_router: Fix rif counter freeing routine mlxsw: spectrum_dpipe: Fix incorrect entry index cxgb4: update latest firmware version supported qmi_wwan: add another Lenovo EM74xx device ID sctp: do not inherit ipv6_{mc|ac|fl}_list from parent udp: make *udp*_queue_rcv_skb() functions static bridge: netlink: check vlan_default_pvid range net: ethernet: faraday: To support device tree usage. net: x25: fix one potential use-after-free issue bpf: adjust verifier heuristics ipv6: Check ip6_find_1stfragopt() return value properly. selftests/bpf: fix broken build due to types.h bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST. bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration. net: fix compile error in skb_orphan_partial() ipv6: Prevent overrun when parsing v6 header options neighbour: update neigh timestamps iff update is effective ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc fixes from David Miller: "Three sparc bug fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc/ftrace: Fix ftrace graph time measurement sparc: Fix -Wstringop-overflow warning sparc64: Fix mapping of 64k pages with MAP_FIXED
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fix from Masahiro Yamada: "Fix headers_install to not delete pre-existing headers in the install destination" * tag 'kbuild-fixes-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: skip install/check of headers right under uapi directories
-
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds authored
Pull pid namespace fixes from Eric Biederman: "These are two bugs that turn out to have simple fixes that were reported during the merge window. Both of these issues have existed for a while and it just happens that they both were reported at almost the same time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
-
Linus Torvalds authored
Merge tag 'hwmon-for-linus-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fix from Guenter Roeck: "Fix problem with hotplug state machine in coretemp driver" * tag 'hwmon-for-linus-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (coretemp) Handle frozen hotplug state correctly
-
Ido Schimmel authored
In case we got an FDB notification for a port that doesn't exist we execute an FDB entry delete to prevent it from re-appearing the next time we poll for notifications. If the operation failed we would trigger a NULL pointer dereference as 'mlxsw_sp_port' is NULL. Fix it by reporting the error using the underlying bus device instead. Fixes: 12f1501e ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
EPROBE_DEFER is not an error, hence printing an error message like sh-eth ee700000.ethernet: failed to initialise MDIO may confuse the user. To fix this, suppress the error message in case of probe deferral. While at it, shorten the message, and add the actual error code. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
The MDIO initialization failure message is printed using the network device, before it has been registered, leading to: (null): failed to initialise MDIO Use the platform device instead to fix this: sh-eth ee700000.ethernet: failed to initialise MDIO Fixes: daacf03f ("sh_eth: Register MDIO bus before registering the network device") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: couple of fixes Couple of fixes from Arkadi ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
During rif counter freeing the counter index can be invalid. Add check of validity before freeing the counter. Fixes: e0c0afd8 ("mlxsw: spectrum: Support for counters on router interfaces") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
In case of disabled counters the entry index will be incorrect. Fix this by moving the entry index set before the counter status check. Fixes: 2ba5999f ("mlxsw: spectrum: Add Support for erif table entries access") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ganesh Goudar authored
Change t4fw_version.h to update latest firmware version number to 1.16.43.0. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bjørn Mork authored
In their infinite wisdom, and never ending quest for end user frustration, Lenovo has decided to use a new USB device ID for the wwan modules in their 2017 laptops. The actual hardware is still the Sierra Wireless EM7455 or EM7430, depending on region. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
SCTP needs fixes similar to 83eaddab ("ipv6/dccp: do not inherit ipv6_mc_list from parent"), otherwise bad things can happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Since the udp memory accounting refactor, we don't need any more to export the *udp*_queue_rcv_skb(). Make them static and fix a couple of sparse warnings: net/ipv4/udp.c:1615:5: warning: symbol 'udp_queue_rcv_skb' was not declared. Should it be static? net/ipv6/udp.c:572:5: warning: symbol 'udpv6_queue_rcv_skb' was not declared. Should it be static? Fixes: 850cbadd ("udp: use it's own memory accounting schema") Fixes: c915fe13 ("udplite: fix NULL pointer dereference") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobias Jungel authored
Currently it is allowed to set the default pvid of a bridge to a value above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and returns -EINVAL in case the pvid is out of bounds. Reproduce by calling: [root@test ~]# ip l a type bridge [root@test ~]# ip l a type dummy [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1 [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999 [root@test ~]# ip l s dummy0 master bridge0 [root@test ~]# bridge vlan port vlan ids bridge0 9999 PVID Egress Untagged dummy0 9999 PVID Egress Untagged Fixes: 0f963b75 ("bridge: netlink: add support for default_pvid") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Greentime Hu authored
To support device tree usage for ftmac100. Signed-off-by: Greentime Hu <green.hu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
linzhang authored
The function x25_init is not properly unregister related resources on error handler.It is will result in kernel oops if x25_init init failed, so add properly unregister call on error handler. Also, i adjust the coding style and make x25_register_sysctl properly return failure. Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
The patch in the Fixes references COMPAT_XT_ALIGN in the definition of XT_DATA_TO_USER, outside an #ifdef CONFIG_COMPAT block. Split XT_DATA_TO_USER into separate compat and non compat variants and define the first inside an CONFIG_COMPAT block. This simplifies both variants by removing branches inside the macro. Fixes: 324318f0 ("netfilter: xtables: zero padding in data_to_user") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Daniel Borkmann authored
Current limits with regards to processing program paths do not really reflect today's needs anymore due to programs becoming more complex and verifier smarter, keeping track of more data such as const ALU operations, alignment tracking, spilling of PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for smarter matching of what LLVM generates. This also comes with the side-effect that we result in fewer opportunities to prune search states and thus often need to do more work to prove safety than in the past due to different register states and stack layout where we mismatch. Generally, it's quite hard to determine what caused a sudden increase in complexity, it could be caused by something as trivial as a single branch somewhere at the beginning of the program where LLVM assigned a stack slot that is marked differently throughout other branches and thus causing a mismatch, where verifier then needs to prove safety for the whole rest of the program. Subsequently, programs with even less than half the insn size limit can get rejected. We noticed that while some programs load fine under pre 4.11, they get rejected due to hitting limits on more recent kernels. We saw that in the vast majority of cases (90+%) pruning failed due to register mismatches. In case of stack mismatches, majority of cases failed due to different stack slot types (invalid, spill, misc) rather than differences in spilled registers. This patch makes pruning more aggressive by also adding markers that sit at conditional jumps as well. Currently, we only mark jump targets for pruning. For example in direct packet access, these are usually error paths where we bail out. We found that adding these markers, it can reduce number of processed insns by up to 30%. Another option is to ignore reg->id in probing PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning slightly as well by up to 7% observed complexity reduction as stand-alone. Meaning, if a previous path with register type PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then in the current state a PTR_TO_MAP_VALUE_OR_NULL register for the same map X must be safe as well. Last but not least the patch also adds a scheduling point and bumps the current limit for instructions to be processed to a more adequate value. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Do not use unsigned variables to see if it returns a negative error or not. Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options") Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 May, 2017 13 commits
-
-
Yonghong Song authored
Commit 0a5539f6 ("bpf: Provide a linux/types.h override for bpf selftests.") caused a build failure for tools/testing/selftest/bpf because of some missing types: $ make -C tools/testing/selftests/bpf/ ... In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8: ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64' __aligned_u64 key; ... /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline' static __always_inline __u16 __swab16p(const __u16 *p) ... The type __aligned_u64 is defined in linux:include/uapi/linux/types.h. The fix is to copy missing type definition into tools/testing/selftests/bpf/include/uapi/linux/types.h. Adding additional include "string.h" resolves __always_inline issue. Fixes: 0a5539f6 ("bpf: Provide a linux/types.h override for bpf selftests.") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Torvalds authored
Merge tag 'for-4.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a couple DM thin provisioning fixes - a few request-based DM and DM multipath fixes for issues that were made when merging Christoph's changes with Bart's changes for 4.12 - a DM bufio unsigned overflow fix - a couple pure fixes for the DM cache target. - various very small tweaks to the DM cache target that enable considerable speed improvements in the face of continuous IO. Given that the cache target was significantly reworked for 4.12 I see no reason to sit on these advances until 4.13 considering the favorable results associated with such minimalist tweaks. * tag 'for-4.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: handle kmalloc failure allocating background_tracker struct dm bufio: make the parameter "retain_bytes" unsigned long dm mpath: multipath_clone_and_map must not return -EIO dm mpath: don't return -EIO from dm_report_EIO dm rq: add a missing break to map_request dm space map disk: fix some book keeping in the disk space map dm thin metadata: call precommit before saving the roots dm cache policy smq: don't do any writebacks unless IDLE dm cache: simplify the IDLE vs BUSY state calculation dm cache: track all IO to the cache rather than just the origin device's IO dm cache policy smq: stop preemptively demoting blocks dm cache policy smq: put newly promoted entries at the top of the multiqueue dm cache policy smq: be more aggressive about triggering a writeback dm cache policy smq: only demote entries in bottom half of the clean multiqueue dm cache: fix incorrect 'idle_time' reset in IO tracker
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Here are some bugfixes from I2C, especially removing a wrongly displayed error message for all i2c muxes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xgene: Set ACPI_COMPANION_I2C i2c: mv64xxx: don't override deferred probing when getting irq i2c: mux: only print failure message on error i2c: mux: reg: rename label to indicate what it does i2c: mux: reg: put away the parent i2c adapter on probe failure
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: DCBX fixes. 2 bug fixes for the case where the NIC's firmware DCBX agent is enabled. With these fixes, we will return the proper information to lldpad. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Otherwise, all the host based DCBX settings from lldpad will fail if the firmware DCBX agent is running. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
In the current code, bnxt_dcb_init() is called too early before we determine if the firmware DCBX agent is running or not. As a result, we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED flags properly to report to DCBNL. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
If CONFIG_INET is not set, net/core/sock.c can not compile : net/core/sock.c: In function ‘skb_orphan_partial’: net/core/sock.c:1810:2: error: implicit declaration of function ‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration] if (skb_is_tcp_pure_ack(skb)) ^ Fix this by always including <net/tcp.h> Fixes: f6ba8d33 ("netem: fix skb_orphan_partial()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Liam R. Howlett authored
The ftrace function_graph time measurements of a given function is not accurate according to those recorded by ftrace using the function filters. This change pulls the x86_64 fix from 'commit 722b3c74 ("ftrace/graph: Trace function entry before updating index")' into the sparc specific prepare_ftrace_return which stops ftrace from counting interrupted tasks in the time measurement. Example measurements for select_task_rq_fair running "hackbench 100 process 1000": | tracing/trace_stat/function0 | function_graph Before patch | 2.802 us | 4.255 us After patch | 2.749 us | 3.094 us Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Orlando Arias authored
Greetings, GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows in calls to string handling functions [1][2]. Due to the way ``empty_zero_page'' is declared in arch/sparc/include/setup.h, this causes a warning to trigger at compile time in the function mem_init(), which is subsequently converted to an error. The ensuing patch fixes this issue and aligns the declaration of empty_zero_page to that of other architectures. Thank you. Cheers, Orlando. [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html [2] https://gcc.gnu.org/gcc-7/changes.htmlSigned-off-by: Orlando Arias <oarias@knights.ucf.edu> -------------------------------------------------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nitin Gupta authored
An incorrect huge page alignment check caused mmap failure for 64K pages when MAP_FIXED is used with address not aligned to HPAGE_SIZE. Orabug: 25885991 Fixes: dcd1912d ("sparc64: Add 64K page size support") Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Craig Gallek authored
The KASAN warning repoted below was discovered with a syzkaller program. The reproducer is basically: int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP); send(s, &one_byte_of_data, 1, MSG_MORE); send(s, &more_than_mtu_bytes_data, 2000, 0); The socket() call sets the nexthdr field of the v6 header to NEXTHDR_HOP, the first send call primes the payload with a non zero byte of data, and the second send call triggers the fragmentation path. The fragmentation code tries to parse the header options in order to figure out where to insert the fragment option. Since nexthdr points to an invalid option, the calculation of the size of the network header can made to be much larger than the linear section of the skb and data is read outside of it. This fix makes ip6_find_1stfrag return an error if it detects running out-of-bounds. [ 42.361487] ================================================================== [ 42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730 [ 42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789 [ 42.366469] [ 42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41 [ 42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 [ 42.368824] Call Trace: [ 42.369183] dump_stack+0xb3/0x10b [ 42.369664] print_address_description+0x73/0x290 [ 42.370325] kasan_report+0x252/0x370 [ 42.370839] ? ip6_fragment+0x11c8/0x3730 [ 42.371396] check_memory_region+0x13c/0x1a0 [ 42.371978] memcpy+0x23/0x50 [ 42.372395] ip6_fragment+0x11c8/0x3730 [ 42.372920] ? nf_ct_expect_unregister_notifier+0x110/0x110 [ 42.373681] ? ip6_copy_metadata+0x7f0/0x7f0 [ 42.374263] ? ip6_forward+0x2e30/0x2e30 [ 42.374803] ip6_finish_output+0x584/0x990 [ 42.375350] ip6_output+0x1b7/0x690 [ 42.375836] ? ip6_finish_output+0x990/0x990 [ 42.376411] ? ip6_fragment+0x3730/0x3730 [ 42.376968] ip6_local_out+0x95/0x160 [ 42.377471] ip6_send_skb+0xa1/0x330 [ 42.377969] ip6_push_pending_frames+0xb3/0xe0 [ 42.378589] rawv6_sendmsg+0x2051/0x2db0 [ 42.379129] ? rawv6_bind+0x8b0/0x8b0 [ 42.379633] ? _copy_from_user+0x84/0xe0 [ 42.380193] ? debug_check_no_locks_freed+0x290/0x290 [ 42.380878] ? ___sys_sendmsg+0x162/0x930 [ 42.381427] ? rcu_read_lock_sched_held+0xa3/0x120 [ 42.382074] ? sock_has_perm+0x1f6/0x290 [ 42.382614] ? ___sys_sendmsg+0x167/0x930 [ 42.383173] ? lock_downgrade+0x660/0x660 [ 42.383727] inet_sendmsg+0x123/0x500 [ 42.384226] ? inet_sendmsg+0x123/0x500 [ 42.384748] ? inet_recvmsg+0x540/0x540 [ 42.385263] sock_sendmsg+0xca/0x110 [ 42.385758] SYSC_sendto+0x217/0x380 [ 42.386249] ? SYSC_connect+0x310/0x310 [ 42.386783] ? __might_fault+0x110/0x1d0 [ 42.387324] ? lock_downgrade+0x660/0x660 [ 42.387880] ? __fget_light+0xa1/0x1f0 [ 42.388403] ? __fdget+0x18/0x20 [ 42.388851] ? sock_common_setsockopt+0x95/0xd0 [ 42.389472] ? SyS_setsockopt+0x17f/0x260 [ 42.390021] ? entry_SYSCALL_64_fastpath+0x5/0xbe [ 42.390650] SyS_sendto+0x40/0x50 [ 42.391103] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 42.391731] RIP: 0033:0x7fbbb711e383 [ 42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383 [ 42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003 [ 42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018 [ 42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad [ 42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00 [ 42.397257] [ 42.397411] Allocated by task 3789: [ 42.397702] save_stack_trace+0x16/0x20 [ 42.398005] save_stack+0x46/0xd0 [ 42.398267] kasan_kmalloc+0xad/0xe0 [ 42.398548] kasan_slab_alloc+0x12/0x20 [ 42.398848] __kmalloc_node_track_caller+0xcb/0x380 [ 42.399224] __kmalloc_reserve.isra.32+0x41/0xe0 [ 42.399654] __alloc_skb+0xf8/0x580 [ 42.400003] sock_wmalloc+0xab/0xf0 [ 42.400346] __ip6_append_data.isra.41+0x2472/0x33d0 [ 42.400813] ip6_append_data+0x1a8/0x2f0 [ 42.401122] rawv6_sendmsg+0x11ee/0x2db0 [ 42.401505] inet_sendmsg+0x123/0x500 [ 42.401860] sock_sendmsg+0xca/0x110 [ 42.402209] ___sys_sendmsg+0x7cb/0x930 [ 42.402582] __sys_sendmsg+0xd9/0x190 [ 42.402941] SyS_sendmsg+0x2d/0x50 [ 42.403273] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 42.403718] [ 42.403871] Freed by task 1794: [ 42.404146] save_stack_trace+0x16/0x20 [ 42.404515] save_stack+0x46/0xd0 [ 42.404827] kasan_slab_free+0x72/0xc0 [ 42.405167] kfree+0xe8/0x2b0 [ 42.405462] skb_free_head+0x74/0xb0 [ 42.405806] skb_release_data+0x30e/0x3a0 [ 42.406198] skb_release_all+0x4a/0x60 [ 42.406563] consume_skb+0x113/0x2e0 [ 42.406910] skb_free_datagram+0x1a/0xe0 [ 42.407288] netlink_recvmsg+0x60d/0xe40 [ 42.407667] sock_recvmsg+0xd7/0x110 [ 42.408022] ___sys_recvmsg+0x25c/0x580 [ 42.408395] __sys_recvmsg+0xd6/0x190 [ 42.408753] SyS_recvmsg+0x2d/0x50 [ 42.409086] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 42.409513] [ 42.409665] The buggy address belongs to the object at ffff88000969e780 [ 42.409665] which belongs to the cache kmalloc-512 of size 512 [ 42.410846] The buggy address is located 24 bytes inside of [ 42.410846] 512-byte region [ffff88000969e780, ffff88000969e980) [ 42.411941] The buggy address belongs to the page: [ 42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 [ 42.413298] flags: 0x100000000008100(slab|head) [ 42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c [ 42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000 [ 42.415074] page dumped because: kasan: bad access detected [ 42.415604] [ 42.415757] Memory state around the buggy address: [ 42.416222] ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 42.416904] ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 42.418273] ^ [ 42.418588] ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.419273] ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 42.419882] ================================================================== Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Masahiro Yamada authored
Since commit 61562f98 ("uapi: export all arch specifics directories"), "make INSTALL_HDR_PATH=$root/usr headers_install" deletes standard glibc headers and others in $(root)/usr/include. The cause of the issue is that headers_install now starts descending from arch/$(hdr-arch)/include/uapi with $(root)/usr/include for its destination when installing asm headers. So, headers already there are assumed to be unwanted. When headers_install starts descending from include/uapi with $(root)/usr/include for its destination, it works around the problem by creating an dummy destination $(root)/usr/include/uapi, but this is tricky. To fix the problem in a clean way is to skip headers install/check in include/uapi and arch/$(hdr-arch)/include/uapi because we know there are only sub-directories in uapi directories. A good side effect is the empty destination $(root)/usr/include/uapi will go away. I am also removing the trailing slash in the headers_check target to skip checking in arch/$(hdr-arch)/include/uapi. Fixes: 61562f98 ("uapi: export all arch specifics directories") Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
-
Ihar Hrachyshka authored
It's a common practice to send gratuitous ARPs after moving an IP address to another device to speed up healing of a service. To fulfill service availability constraints, the timing of network peers updating their caches to point to a new location of an IP address can be particularly important. Sometimes neigh_update calls won't touch neither lladdr nor state, for example if an update arrives in locktime interval. The neigh->updated value is tested by the protocol specific neigh code, which in turn will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the call to neigh_update() or not. As a result, we may effectively ignore the update request, bailing out of touching the neigh entry, except that we still bump its timestamps inside neigh_update. This may be a problem for updates arriving in quick succession. For example, consider the following scenario: A service is moved to another device with its IP address. The new device sends three gratuitous ARP requests into the network with ~1 seconds interval between them. Just before the first request arrives to one of network peer nodes, its neigh entry for the IP address transitions from STALE to DELAY. This transition, among other things, updates neigh->updated. Once the kernel receives the first gratuitous ARP, it ignores it because its arrival time is inside the locktime interval. The kernel still bumps neigh->updated. Then the second gratuitous ARP request arrives, and it's also ignored because it's still in the (new) locktime interval. Same happens for the third request. The node eventually heals itself (after delay_first_probe_time seconds since the initial transition to DELAY state), but it just wasted some time and require a new ARP request/reply round trip. This unfortunate behaviour both puts more load on the network, as well as reduces service availability. This patch changes neigh_update so that it bumps neigh->updated (as well as neigh->confirmed) only once we are sure that either lladdr or entry state will change). In the scenario described above, it means that the second gratuitous ARP request will actually update the entry lladdr. Ideally, we would update the neigh entry on the very first gratuitous ARP request. The locktime mechanism is designed to ignore ARP updates in a short timeframe after a previous ARP update was honoured by the kernel layer. This would require tracking timestamps for state transitions separately from timestamps when actual updates are received. This would probably involve changes in neighbour struct. Therefore, the patch doesn't tackle the issue of the first gratuitous APR ignored, leaving it for a follow-up. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-