- 20 Jul, 2017 5 commits
-
-
Daniel Borkmann authored
Edward reported that there's an issue in min/max value bounds tracking when signed and unsigned compares both provide hints on limits when having unknown variables. E.g. a program such as the following should have been rejected: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff8a94cda93400 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 11: (65) if r1 s> 0x1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 14: (b7) r0 = 0 15: (95) exit What happens is that in the first part ... 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 ... r1 carries an unsigned value, and is compared as unsigned against a register carrying an immediate. Verifier deduces in reg_set_min_max() that since the compare is unsigned and operation is greater than (>), that in the fall-through/false case, r1's minimum bound must be 0 and maximum bound must be r2. Latter is larger than the bound and thus max value is reset back to being 'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now 'R1=inv,min_value=0'. The subsequent test ... 11: (65) if r1 s> 0x1 goto pc+2 ... is a signed compare of r1 with immediate value 1. Here, verifier deduces in reg_set_min_max() that since the compare is signed this time and operation is greater than (>), that in the fall-through/false case, we can deduce that r1's maximum bound must be 1, meaning with prior test, we result in r1 having the following state: R1=inv,min_value=0,max_value=1. Given that the actual value this holds is -8, the bounds are wrongly deduced. When this is being added to r0 which holds the map_value(_adj) type, then subsequent store access in above case will go through check_mem_access() which invokes check_map_access_adj(), that will then probe whether the map memory is in bounds based on the min_value and max_value as well as access size since the actual unknown value is min_value <= x <= max_value; commit fce366a9 ("bpf, verifier: fix alu ops against map_value{, _adj} register types") provides some more explanation on the semantics. It's worth to note in this context that in the current code, min_value and max_value tracking are used for two things, i) dynamic map value access via check_map_access_adj() and since commit 06c1c049 ("bpf: allow helpers access to variable memory") ii) also enforced at check_helper_mem_access() when passing a memory address (pointer to packet, map value, stack) and length pair to a helper and the length in this case is an unknown value defining an access range through min_value/max_value in that case. The min_value/max_value tracking is /not/ used in the direct packet access case to track ranges. However, the issue also affects case ii), for example, the following crafted program based on the same principle must be rejected as well: 0: (b7) r2 = 0 1: (bf) r3 = r10 2: (07) r3 += -512 3: (7a) *(u64 *)(r10 -16) = -8 4: (79) r4 = *(u64 *)(r10 -16) 5: (b7) r6 = -1 6: (2d) if r4 > r6 goto pc+5 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 7: (65) if r4 s> 0x1 goto pc+4 R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512 R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 8: (07) r4 += 1 9: (b7) r5 = 0 10: (6a) *(u16 *)(r10 -512) = 0 11: (85) call bpf_skb_load_bytes#26 12: (b7) r0 = 0 13: (95) exit Meaning, while we initialize the max_value stack slot that the verifier thinks we access in the [1,2] range, in reality we pass -7 as length which is interpreted as u32 in the helper. Thus, this issue is relevant also for the case of helper ranges. Resetting both bounds in check_reg_overflow() in case only one of them exceeds limits is also not enough as similar test can be created that uses values which are within range, thus also here learned min value in r1 is incorrect when mixed with later signed test to create a range: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff880ad081fa00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (65) if r1 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4 R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 14: (b7) r0 = 0 15: (95) exit This leaves us with two options for fixing this: i) to invalidate all prior learned information once we switch signed context, ii) to track min/max signed and unsigned boundaries separately as done in [0]. (Given latter introduces major changes throughout the whole verifier, it's rather net-next material, thus this patch follows option i), meaning we can derive bounds either from only signed tests or only unsigned tests.) There is still the case of adjust_reg_min_max_vals(), where we adjust bounds on ALU operations, meaning programs like the following where boundaries on the reg get mixed in context later on when bounds are merged on the dst reg must get rejected, too: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff89b2bf87ce00 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+6 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = 2 10: (3d) if r2 >= r1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp 11: (b7) r7 = 1 12: (65) if r7 s> 0x0 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp 13: (b7) r0 = 0 14: (95) exit from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp 15: (0f) r7 += r1 16: (65) if r7 s> 0x4 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 17: (0f) r0 += r7 18: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp 19: (b7) r0 = 0 20: (95) exit Meaning, in adjust_reg_min_max_vals() we must also reset range values on the dst when src/dst registers have mixed signed/ unsigned derived min/max value bounds with one unbounded value as otherwise they can be added together deducing false boundaries. Once both boundaries are established from either ALU ops or compare operations w/o mixing signed/unsigned insns, then they can safely be added to other regs also having both boundaries established. Adding regs with one unbounded side to a map value where the bounded side has been learned w/o mixing ops is possible, but the resulting map value won't recover from that, meaning such op is considered invalid on the time of actual access. Invalid bounds are set on the dst reg in case i) src reg, or ii) in case dst reg already had them. The only way to recover would be to perform i) ALU ops but only 'add' is allowed on map value types or ii) comparisons, but these are disallowed on pointers in case they span a range. This is fine as only BPF_JEQ and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers which potentially turn them into PTR_TO_MAP_VALUE type depending on the branch, so only here min/max value cannot be invalidated for them. In terms of state pruning, value_from_signed is considered as well in states_equal() when dealing with adjusted map values. With regards to breaking existing programs, there is a small risk, but use-cases are rather quite narrow where this could occur and mixing compares probably unlikely. Joint work with Josef and Edward. [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html Fixes: 48461135 ("bpf: allow access into map value arrays") Reported-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sabrina Dubroca authored
In some cases, offset can overflow and can cause an infinite loop in ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and cap it at IPV6_MAXPLEN, since packets larger than that should be invalid. This problem has been here since before the beginning of git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The array data is only populated with valid information from userspace if cmd != SIOCDEVPRIVATE, other cases the array contains garbage on the stack. The subsequent switch statement acts on a subcommand in data[0] which could be any garbage value if cmd is SIOCDEVPRIVATE which seems incorrect to me. Instead, just return EOPNOTSUPP for the case where cmd == SIOCDEVPRIVATE to avoid this issue. As a side note, I suspect that the original intention of the code was for this ioctl to work just for cmd == SIOCDEVPRIVATE (and the current logic is reversed). However, I don't wont to change the current semantics in case any userspace code relies on this existing behaviour. Detected by CoverityScan, CID#139647 ("Uninitialized scalar variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
This reverts commit cd8966e7. The duplicate CHANGEADDR event message is sent regardless of link status whereas the setlink changes only generate a notification when the link is up. Not sending a notification when the link is down breaks dhcpcd which only processes hwaddr changes when the link is down. Fixes reported regression: https://bugzilla.kernel.org/show_bug.cgi?id=196355Reported-by: Yaroslav Isakov <yaroslav.isakov@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin Hundebøll authored
Commit f39908d3 ('net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10') added support for setting the CMODE for the 6390X family, but only enabled it for 9290 and 6390 - and left out 6390X. Fix support for setting the CMODE on 6390X also by assigning mv88e6390x_port_set_cmode() to the .port_set_cmode function pointer in mv88e6390x_ops too. Fixes: f39908d3 ("net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10") Signed-off-by: Martin Hundebøll <mnhu@prevas.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Jul, 2017 4 commits
-
-
Arun Parameswaran authored
Add SoC specific compatibility strings to the Broadcom DTE based PTP clock binding document. Fixed the document heading and node name. Fixes: 80d60761 ("dt-binding: ptp: add bindings document for dte based ptp clock") Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eugeniy Paltsev authored
Unconditional reset dwmac before HW init if reset controller is present. In existing implementation we reset dwmac only after second module probing: (module load -> unload -> load again [reset happens]) Now we reset dwmac at every module load: (module load [reset happens] -> unload -> load again [reset happens]) Also some reset controllers have only reset callback instead of assert + deassert callbacks pair, so handle this case. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
The ifr.ifr_name is passed around and assumed to be NULL terminated. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Levin, Alexander authored
ifr name is assumed to be a valid string by the kernel, but nothing was forcing username to pass a valid string. In turn, this would cause panics as we tried to access the string past it's valid memory. Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 18 Jul, 2017 6 commits
-
-
Dan Carpenter authored
We accidentally return an uninitialized variable. Fixes: cf56c2f8 ("netfilter: remove old pre-netns era hook api") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing netlink message sanity check in nfnetlink, patch from Mateusz Jurczyk. 2) We now have netfilter per-netns hooks, so let's kill global hook infrastructure, this infrastructure is known to be racy with netns. We don't care about out of tree modules. Patch from Florian Westphal. 3) find_appropriate_src() is buggy when colissions happens after the conversion of the nat bysource to rhashtable. Also from Florian. 4) Remove forward chain in nf_tables arp family, it's useless and it is causing quite a bit of confusion, from Florian Westphal. 5) nf_ct_remove_expect() is called with the wrong parameter, causing kernel oops, patch from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Eric noticed that in udp_recvmsg() we still need to access skb->dst while processing the IP options. Since commit 0a463c78 ("udp: avoid a cache miss on dequeue") skb->dst is no more available at recvmsg() time and bad things will happen if we enter the relevant code path. This commit address the issue, avoid clearing skb->dst if any IP options are present into the relevant skb. Since the IP CB is contained in the first skb cacheline, we can test it to decide to leverage the consume_stateless_skb() optimization, without measurable additional cost in the faster path. v1 -> v2: updated commit message tags Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue") Reported-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christophe Jaillet authored
If 'dma_set_mask_and_coherent()' fails, we must undo the previous 'pci_request_regions()' call. Adjust corresponding 'goto' to jump at the right place of the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Potapenko authored
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gao Feng authored
The global percpu variable ppp_xmit_recursion is used to detect the ppp xmit recursion to avoid the deadlock, which is caused by one CPU tries to lock the xmit lock twice. But it would report false recursion when one CPU wants to send the skb from two different PPP devices, like one L2TP on the PPPoE. It is a normal case actually. Now use one percpu member of struct ppp instead of the gloable variable to detect the xmit recursion of one ppp device. Fixes: 55454a56 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Liu Jianying <jianying.liu@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Jul, 2017 7 commits
-
-
WANG Cong authored
When req->rsk_listener is NULL, sk_to_full_sk() returns NULL too, so we have to check its return value against NULL here. Fixes: 40304b2a ("bpf: BPF support for sock_ops") Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
GCC 7 added a new -Wimplicit-fallthrough warning. It's only enabled with W=1, but since linux/jhash.h is included in over hundred places (including other global headers) it seems worthwhile fixing this warning. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the uninitialised expectation instead of existing one, so del_timer chokes on random memory address. Fixes: ec0e3f01 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()") Reported-by: Sergey Kvachonok <ravenexp@gmail.com> Tested-by: Sergey Kvachonok <ravenexp@gmail.com> Cc: Gao Feng <fgao@ikuai8.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
arp packets cannot be forwarded. They can be bridged, but then they can be filtered using either ebtables or nftables bridge family. The bridge netfilter exposes a "call-arptables" switch which pushes packets into arptables, but lets not expose this for nftables, so better close this asap. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
When doing initial conversion to rhashtable I replaced the bucket walk with a single rhashtable_lookup_fast(). When moving to rhlist I failed to properly walk the list of identical tuples, but that is what is needed for this to work correctly. The table contains the original tuples, so the reply tuples are all distinct. We currently decide that mapping is (not) in range only based on the first entry, but in case its not we need to try the reply tuple of the next entry until we either find an in-range mapping or we checked all the entries. This bug makes nat core attempt collision resolution while it might be able to use the mapping as-is. Fixes: 870190a9 ("netfilter: nat: convert nat bysrc hash to rhashtable") Reported-by: Jaco Kroon <jaco@uls.co.za> Tested-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
no more users in the tree, remove this. The old api is racy wrt. module removal, all users have been converted to the netns-aware api. The old api pretended we still have global hooks but that has not been true for a long time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Mateusz Jurczyk authored
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < NLMSG_HDRLEN expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 16 Jul, 2017 15 commits
-
-
David S. Miller authored
Doug Berger says: ==================== bcmgenet: Fragmented SKB corrections Two issues were observed in a review of the bcmgenet driver support for fragmented SKBs which are addressed by this patch set. The first addresses a problem that could occur if the driver is not able to DMA map a fragment of the SKB. This would be a highly unusual event but it would leave the hardware descriptors in an invalid state which should be prevented. The second is a hazard that could occur if the driver is able to reclaim the first control block of a fragmented SKB before all of its fragments have completed processing by the hardware. In this case the SKB could be freed leading to reuse of memory that is still in use by hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Doug Berger authored
Since the skb is attached to the first control block of a fragmented skb it is possible that the skb could be freed when reclaiming that control block before all fragments of the skb have been consumed by the hardware and unmapped. This commit introduces first_cb and last_cb pointers to the skb control block used by the driver to keep track of which transmit control blocks within a transmit ring are the first and last ones associated with the skb. It then splits the bcmgenet_free_cb() function into transmit (bcmgenet_free_tx_cb) and receive (bcmgenet_free_rx_cb) versions that can handle the unmapping of dma mapped memory and cleaning up the corresponding control block structure so that the skb is only freed after the last associated transmit control block is reclaimed. Fixes: 1c1008c7 ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Doug Berger authored
In case we fail to map a single fragment, we would be leaving the transmit ring populated with stale entries. This commit introduces the helper function bcmgenet_put_txcb() which takes care of rewinding the per-ring write pointer back to where we left. It also consolidates the functionality of bcmgenet_xmit_single() and bcmgenet_xmit_frag() into the bcmgenet_xmit() function to make the unmapping of control blocks cleaner. Fixes: 1c1008c7 ("net: bcmgenet: add main driver file") Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Commit 07d4510f ("dt-bindings: net: bgmac: add bindings documentation for bgmac") added both brcm,amac-nsp.txt and brcm,bgmac-nsp.txt. The former is actually the one that got updated and is in use by the bgmac driver while the latter is duplicating the former and is not used nor updated. Fixes: 07d4510f ("dt-bindings: net: bgmac: add bindings documentation for bgmac") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Arvind Yadav says: ==================== Constify isdn pci_device_id's. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 11803 544 1 12348 303c isdn/hardware/avm/c4.o File size After adding 'const': text data bss dec hex filename 11931 416 1 12348 303c isdn/hardware/avm/c4.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 21656 1024 96 22776 58f8 isdn/hardware/mISDN/hfcpci.o File size After adding 'const': text data bss dec hex filename 22424 256 96 22776 58f8 isdn/hardware/mISDN/hfcpci.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 9963 1936 16 11915 2e8b isdn/hardware/mISDN/avmfritz.o File size After adding 'const': text data bss dec hex filename 10091 1808 16 11915 2e8b isdn/hardware/mISDN/avmfritz.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 13959 4080 24 18063 468f isdn/hardware/mISDN/w6692.o File size After adding 'const': text data bss dec hex filename 14087 3952 24 18063 468f isdn/hardware/mISDN/w6692.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 63450 1536 1492 66478 103ae isdn/hardware/mISDN/hfcmulti.o File size After adding 'const': text data bss dec hex filename 64698 288 1492 66478 103ae isdn/hardware/mISDN/hfcmulti.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 10941 1776 16 12733 31bd isdn/hardware/mISDN/netjet.o File size After adding 'const': text data bss dec hex filename 11005 1712 16 12733 31bd isdn/hardware/mISDN/netjet.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 6224 655 8 6887 1ae7 isdn/hardware/eicon/divasmain.o File size After adding 'const': text data bss dec hex filename 6608 271 8 6887 1ae7 isdn/hardware/eicon/divasmain.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 5989 576 0 6565 19a5 isdn/hisax/hisax_fcpcipnp.o File size After adding 'const': text data bss dec hex filename 6085 480 0 6565 19a5 isdn/hisax/hisax_fcpcipnp.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 10512 536 4 11052 2b2c drivers/isdn/hisax/hfc4s8s_l1.o File size After adding 'const': text data bss dec hex filename 10672 376 4 11052 2b2c drivers/isdn/hisax/hfc4s8s_l1.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arvind Yadav authored
pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. File size before: text data bss dec hex filename 13686 2064 4416 20166 4ec6 drivers/isdn/hisax/config.o File size After adding 'const': text data bss dec hex filename 15030 720 4416 20166 4ec6 drivers/isdn/hisax/config.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Jul, 2017 3 commits
-
-
Neal Cardwell authored
Fixes the following behavior: for connections that had no RTT sample at the time of initializing congestion control, BBR was initializing the pacing rate to a high nominal rate (based an a guess of RTT=1ms, in case this is LAN traffic). Then BBR never adjusted the pacing rate downward upon obtaining an actual RTT sample, if the connection never filled the pipe (e.g. all sends were small app-limited writes()). This fix adjusts the pacing rate upon obtaining the first RTT sample. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
Fix a corner case noticed by Eric Dumazet, where BBR's setting sk->sk_pacing_rate to 0 during initialization could theoretically cause packets in the sending host to hang if there were packets "in flight" in the pacing infrastructure at the time the BBR congestion control state is initialized. This could occur if the pacing infrastructure happened to race with bbr_init() in a way such that the pacer read the 0 rather than the immediately following non-zero pacing rate. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
Introduce a helper to initialize the BBR pacing rate unconditionally, based on the current cwnd and RTT estimate. This is a pure refactor, but is needed for two following fixes. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-