1. 13 Apr, 2019 4 commits
    • Alexander Potapenko's avatar
      netfilter: conntrack: initialize ct->timeout · 8176c833
      Alexander Potapenko authored
      KMSAN started reporting an error when accessing ct->timeout for the
      first time without initialization:
      
       BUG: KMSAN: uninit-value in __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
       ...
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x131/0x2a0 mm/kmsan/kmsan.c:624
       __msan_warning+0x7a/0xf0 mm/kmsan/kmsan_instr.c:310
       __nf_ct_refresh_acct+0x1ae/0x470 net/netfilter/nf_conntrack_core.c:1765
       nf_ct_refresh_acct ./include/net/netfilter/nf_conntrack.h:201
       nf_conntrack_udp_packet+0xb44/0x1040 net/netfilter/nf_conntrack_proto_udp.c:122
       nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1605
       nf_conntrack_in+0x1250/0x26c9 net/netfilter/nf_conntrack_core.c:1696
       ...
       Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205
       kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
       kmsan_kmalloc+0xa9/0x130 mm/kmsan/kmsan_hooks.c:173
       kmem_cache_alloc+0x554/0xb10 mm/slub.c:2789
       __nf_conntrack_alloc+0x16f/0x690 net/netfilter/nf_conntrack_core.c:1342
       init_conntrack+0x6cb/0x2490 net/netfilter/nf_conntrack_core.c:1421
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: cc169213 ("netfilter: conntrack: avoid same-timeout update")
      Cc: Florian Westphal <fw@strlen.de>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8176c833
    • Florian Westphal's avatar
      netfilter: conntrack: don't set related state for different outer address · 1025ce75
      Florian Westphal authored
      Luca Moro says:
       ------
      The issue lies in the filtering of ICMP and ICMPv6 errors that include an
      inner IP datagram.
      For these packets, icmp_error_message() extract the ICMP error and inner
      layer to search of a known state.
      If a state is found the packet is tagged as related (IP_CT_RELATED).
      
      The problem is that there is no correlation check between the inner and
      outer layer of the packet.
      So one can encapsulate an error with an inner layer matching a known state,
      while its outer layer is directed to a filtered host.
      In this case the whole packet will be tagged as related.
      This has various implications from a rule bypass (if a rule to related
      trafic is allow), to a known state oracle.
      
      Unfortunately, we could not find a real statement in a RFC on how this case
      should be filtered.
      The closest we found is RFC5927 (Section 4.3) but it is not very clear.
      
      A possible fix would be to check that the inner IP source is the same than
      the outer destination.
      
      We believed this kind of attack was not documented yet, so we started to
      write a blog post about it.
      You can find it attached to this mail (sorry for the extract quality).
      It contains more technical details, PoC and discussion about the identified
      behavior.
      We discovered later that
      https://www.gont.com.ar/papers/filtering-of-icmp-error-messages.pdf
      described a similar attack concept in 2004 but without the stateful
      filtering in mind.
       -----
      
      This implements above suggested fix:
      In icmp(v6) error handler, take outer destination address, then pass
      that into the common function that does the "related" association.
      
      After obtaining the nf_conn of the matching inner-headers connection,
      check that the destination address of the opposite direction tuple
      is the same as the outer address and only set RELATED if thats the case.
      Reported-by: default avatarLuca Moro <luca.moro@synacktiv.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1025ce75
    • Florian Westphal's avatar
      selftests: netfilter: check icmp pkttoobig errors are set as related · becf2319
      Florian Westphal authored
      When an icmp error such as pkttoobig is received, conntrack checks
      if the "inner" header (header of packet that did not fit link mtu)
      is matches an existing connection, and, if so, sets that packet as
      being related to the conntrack entry it found.
      
      It was recently reported that this "related" setting also works
      if the inner header is from another, different connection (i.e.,
      artificial/forged icmp error).
      
      Add a test, followup patch will add additional "inner dst matches
      outer dst in reverse direction" check before setting related state.
      
      Link: https://www.synacktiv.com/posts/systems/icmp-reachable.htmlSigned-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      becf2319
    • Stephen Suryaputra's avatar
      ipv4: recompile ip options in ipv4_link_failure · ed0de45a
      Stephen Suryaputra authored
      Recompile IP options since IPCB may not be valid anymore when
      ipv4_link_failure is called from arp_error_report.
      
      Refer to the commit 3da1ed7a ("net: avoid use IPCB in cipso_v4_error")
      and the commit before that (9ef6b42a) for a similar issue.
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed0de45a
  2. 12 Apr, 2019 19 commits
  3. 11 Apr, 2019 17 commits
    • David Ahern's avatar
      selftests: fib_tests: Fix 'Command line is not complete' errors · a5f62298
      David Ahern authored
      A couple of tests are verifying a route has been removed. The helper
      expects the prefix as the first part of the expected output. When
      checking that a route has been deleted the prefix is empty leading
      to an invalid ip command:
      
        $ ip ro ls match
        Command line is not complete. Try option "help"
      
      Fix by moving the comparison of expected output and output to a new
      function that is used by both check_route and check_route6. Use the
      new helper for the 2 checks on route removal.
      
      Also, remove the reset of 'set -x' in route_setup which overrides the
      user managed setting.
      
      Fixes: d69faad7 ("selftests: fib_tests: Add prefix route tests with metric")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5f62298
    • YueHaibing's avatar
      net: netrom: Fix error cleanup path of nr_proto_init · d3706566
      YueHaibing authored
      Syzkaller report this:
      
      BUG: unable to handle kernel paging request at fffffbfff830524b
      PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
      Oops: 0000 [#1] SMP KASAN PTI
      CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
      Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
      RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
      RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
      RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
      RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
      R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
      R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
      FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       __list_add include/linux/list.h:60 [inline]
       list_add include/linux/list.h:79 [inline]
       proto_register+0x444/0x8f0 net/core/sock.c:3375
       nr_proto_init+0x73/0x4b3 [netrom]
       ? 0xffffffffc1628000
       ? 0xffffffffc1628000
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
       bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      CR2: fffffbfff830524b
      ---[ end trace 039ab24b305c4b19 ]---
      
      If nr_proto_init failed, it may forget to call proto_unregister,
      tiggering this issue.This patch rearrange code of nr_proto_init
      to avoid such issues.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3706566
    • Andy Duan's avatar
      net: fec: manage ahb clock in runtime pm · d7c3a206
      Andy Duan authored
      Some SOC like i.MX6SX clock have some limits:
      - ahb clock should be disabled before ipg.
      - ahb and ipg clocks are required for MAC MII bus.
      So, move the ahb clock to runtime management together with
      ipg clock.
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7c3a206
    • Nikolay Aleksandrov's avatar
      net: bridge: multicast: use rcu to access port list from br_multicast_start_querier · c5b493ce
      Nikolay Aleksandrov authored
      br_multicast_start_querier() walks over the port list but it can be
      called from a timer with only multicast_lock held which doesn't protect
      the port list, so use RCU to walk over it.
      
      Fixes: c83b8fab ("bridge: Restart queries when last querier expires")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5b493ce
    • David S. Miller's avatar
      Merge branch 'thunderx-xdp-mtu' · 9a4dda81
      David S. Miller authored
      Matteo Croce says:
      
      ====================
      Fix thunderx MTU with XDP
      
      The thunderx driver can't use XDP with all MTU values.
      This patches sets the right MTU values, and add a check to avoid setting
      a wrong value which will not function.
      
      v3: Fix a copy-paste from two functions, tested on proper hardware:
      
      2: enP2p1s0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether 1c:1b:0d:0d:52:a4 brd ff:ff:ff:ff:ff:ff
      [  787.019730] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1800.
      RTNETLINK answers: Operation not supported
      [  800.574568] nicvf 0002:01:00.1 enP2p1s0v0: Link is Up 10000 Mbps Full duplex
      [  807.248321] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1500.
      RTNETLINK answers: Invalid argument
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a4dda81
    • Matteo Croce's avatar
      net: thunderx: don't allow jumbo frames with XDP · 1f227d16
      Matteo Croce authored
      The thunderx driver forbids to load an eBPF program if the MTU is too high,
      but this can be circumvented by loading the eBPF, then raising the MTU.
      
      Fix this by limiting the MTU if an eBPF program is already loaded.
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f227d16
    • Matteo Croce's avatar
      net: thunderx: raise XDP MTU to 1508 · 5ee15c10
      Matteo Croce authored
      The thunderx driver splits frames bigger than 1530 bytes to multiple
      pages, making impossible to run an eBPF program on it.
      This leads to a maximum MTU of 1508 if QinQ is in use.
      
      The thunderx driver forbids to load an eBPF program if the MTU is higher
      than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
      protocols which need some more headroom.
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ee15c10
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 796fff0c
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2019-04-11
      
      here are some fixes in different areas of the smc code for the net
      tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      796fff0c
    • Ursula Braun's avatar
      net/smc: move unhash before release of clcsock · f61bca58
      Ursula Braun authored
      Commit <26d92e95>
      ("net/smc: move unhash as early as possible in smc_release()")
      fixes one occurrence in the smc code, but the same pattern exists
      in other places. This patch covers the remaining occurrences and
      makes sure, the unhash operation is done before the smc->clcsock is
      released. This avoids a potential use-after-free in smc_diag_dump().
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f61bca58
    • Karsten Graul's avatar
      net/smc: fix return code from FLUSH command · 8ef659f1
      Karsten Graul authored
      The FLUSH command is used to empty the pnet table. No return code is
      expected from the command. Commit a9d8b0b1e3d6 added namespace support
      for the pnet table and changed the FLUSH command processing to call
      smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
      returns -ENOENT when no entry was deleted, which is now the return code
      of the FLUSH command. As a result the FLUSH command will return an error
      when the pnet table is already empty.
      Restore the expected behavior and let FLUSH always return 0.
      
      Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ef659f1
    • Ursula Braun's avatar
      net/smc: propagate file from SMC to TCP socket · 07603b23
      Ursula Braun authored
      fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
      that are delivered when out-of-band data arrives on socket fd.
      If an SMC socket program makes use of such an fcntl() call, it fails
      in case of fallback to TCP-mode. In case of fallback the traffic is
      processed with the internal TCP socket. Propagating field "file" from the
      SMC socket to the internal TCP socket fixes the issue.
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07603b23
    • Kangjie Lu's avatar
      net/smc: fix a NULL pointer dereference · e183d4e4
      Kangjie Lu authored
      In case alloc_ordered_workqueue fails, the fix returns NULL
      to avoid NULL pointer dereference.
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e183d4e4
    • Karsten Graul's avatar
      net/smc: wait for pending work before clcsock release_sock · fd57770d
      Karsten Graul authored
      When the clcsock is already released using sock_release() and a pending
      smc_listen_work accesses the clcsock than that will fail. Solve this
      by canceling and waiting for the work to complete first. Because the
      work holds the sock_lock it must make sure that the lock is not hold
      before the new helper smc_clcsock_release() is invoked. And before the
      smc_listen_work starts working check if the parent listen socket is
      still valid, otherwise stop the work early.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd57770d
    • Lorenzo Bianconi's avatar
      net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv · 988dc4a9
      Lorenzo Bianconi authored
      gue tunnels run iptunnel_pull_offloads on received skbs. This can
      determine a possible use-after-free accessing guehdr pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      988dc4a9
    • Hoang Le's avatar
      tipc: missing entries in name table of publications · d1841533
      Hoang Le authored
      When binding multiple services with specific type 1Ki, 2Ki..,
      this leads to some entries in the name table of publications
      missing when listed out via 'tipc name show'.
      
      The problem is at identify zero last_type conditional provided
      via netlink. The first is initial 'type' when starting name table
      dummping. The second is continuously with zero type (node state
      service type). Then, lookup function failure to finding node state
      service type in next iteration.
      
      To solve this, adding more conditional to marked as dirty type and
      lookup correct service type for the next iteration instead of select
      the first service as initial 'type' zero.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1841533
    • Jason Wang's avatar
      vhost: reject zero size iova range · 813dbeb6
      Jason Wang authored
      We used to accept zero size iova range which will lead a infinite loop
      in translate_desc(). Fixing this by failing the request in this case.
      
      Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      813dbeb6
    • Jakub Kicinski's avatar
      net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded() · b4f47f38
      Jakub Kicinski authored
      Unlike '&&' operator, the '&' does not have short-circuit
      evaluation semantics.  IOW both sides of the operator always
      get evaluated.  Fix the wrong operator in
      tls_is_sk_tx_device_offloaded(), which would lead to
      out-of-bounds access for for non-full sockets.
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4f47f38