1. 22 Oct, 2017 15 commits
    • Eric Dumazet's avatar
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f
      Eric Dumazet authored
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      864e2a1f
    • Geert Uytterhoeven's avatar
      of_mdio: Fix broken PHY IRQ in case of probe deferral · 66bdede4
      Geert Uytterhoeven authored
      If an Ethernet PHY is initialized before the interrupt controller it is
      connected to, a message like the following is printed:
      
          irq: no irq domain found for /interrupt-controller@e61c0000 !
      
      However, the actual error is ignored, leading to a non-functional (POLL)
      PHY interrupt later:
      
          Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
      
      Depending on whether the PHY driver will fall back to polling, Ethernet
      may or may not work.
      
      To fix this:
        1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
           of_irq_get().
           Unlike the former, the latter returns -EPROBE_DEFER if the
           interrupt controller is not yet available, so this condition can be
           detected.
           Other errors are handled the same as before, i.e. use the passed
           mdio->irq[addr] as interrupt.
        2. Propagate and handle errors from of_mdiobus_register_phy() and
           of_mdiobus_register_device().
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66bdede4
    • Randy Dunlap's avatar
      textsearch: fix typos in library helpers · 7433a8d6
      Randy Dunlap authored
      Fix spellos (typos) in textsearch library helpers.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7433a8d6
    • David Howells's avatar
      rxrpc: Don't release call mutex on error pointer · 6cb3ece9
      David Howells authored
      Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
      call pointer actually holds an error value.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cb3ece9
    • David S. Miller's avatar
      Merge branch 'stmmac-hw-tstamp-fixes' · 748759d5
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      net: stmmac: Fix HW timestamping
      
      Three fixes for HW timestamping feature, all of them for RX side.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      748759d5
    • Jose Abreu's avatar
      net: stmmac: Prevent infinite loop in get_rx_timestamp_status() · 9454360d
      Jose Abreu authored
      Prevent infinite loop by correctly setting the loop condition to
      break when i == 10.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9454360d
    • Jose Abreu's avatar
      net: stmmac: Fix stmmac_get_rx_hwtstamp() · 98870943
      Jose Abreu authored
      When using GMAC4 the valid timestamp is from CTX next desc but
      we are passing the previous desc to get_rx_timestamp_status()
      callback.
      
      Fix this and while at it rework a little bit the function logic.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98870943
    • Jose Abreu's avatar
      net: stmmac: Add missing call to dev_kfree_skb() · 9c8080d0
      Jose Abreu authored
      When RX HW timestamp is enabled and a frame is discarded we are
      not freeing the skb but instead only setting to NULL the entry.
      
      Add a call to dev_kfree_skb_any() so that skb entry is correctly
      freed.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c8080d0
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · 0247880a
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: spectrum: Configure TTL of "inherit" for offloaded tunnels
      
      Petr says:
      
      Currently mlxsw only offloads tunnels that are configured with TTL of "inherit"
      (which is the default). However, Spectrum defaults to 255 and the driver
      neglects to change the configuration. Thus the tunnel packets from offloaded
      tunnels always have TTL of 255, even though tunnels with explicit TTL of 255 are
      never actually offloaded.
      
      To fix this, introduce support for TIGCR, the register that keeps the related
      bits of global tunnel configuration, and use it on first offload to properly
      configure inheritance of TTL of tunnel packets from overlay packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0247880a
    • Petr Machata's avatar
      mlxsw: spectrum_router: Configure TIGCR on init · dcbda282
      Petr Machata authored
      Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
      do. Configure TIGCR on router init so that the TTL of tunnel packets is
      copied from the overlay packets.
      
      Fixes: ee954d1a ("mlxsw: spectrum_router: Support GRE tunnels")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcbda282
    • Petr Machata's avatar
      mlxsw: reg: Add Tunneling IPinIP General Configuration Register · 14aefd90
      Petr Machata authored
      The TIGCR register is used for setting up the IPinIP Tunnel
      configuration.
      
      Fixes: ee954d1a ("mlxsw: spectrum_router: Support GRE tunnels")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14aefd90
    • Niklas Söderlund's avatar
      net: ethtool: remove error check for legacy setting transceiver type · 95491e3c
      Niklas Söderlund authored
      Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
      restores the transceiver type to struct ethtool_link_settings and
      convert_link_ksettings_to_legacy_settings() but forgets to remove the
      error check for the same in convert_legacy_settings_to_link_ksettings().
      This prevents older versions of ethtool to change link settings.
      
          # ethtool --version
          ethtool version 3.16
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          Cannot set new settings: Invalid argument
            not setting speed
            not setting duplex
            not setting autoneg
      
      While newer versions of ethtool works.
      
          # ethtool --version
          ethtool version 4.10
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
          [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
      
      Fixes: 19cab887 ("net: ethtool: Add back transceiver type")
      Signed-off-by: default avatarNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reported-by: default avatarRenjith R V <renjith.rv@quest-global.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95491e3c
    • Craig Gallek's avatar
      soreuseport: fix initialization race · 1b5f962e
      Craig Gallek authored
      Syzkaller stumbled upon a way to trigger
      WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
      reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
      
      There are two initialization paths for the sock_reuseport structure in a
      socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
      SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
      assumedthat the socket lock protected both of these paths when it actually
      only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
      double allocation by running these paths concurrently.
      
      This patch moves the check for double allocation into the reuseport_alloc
      function which is protected by a global spin lock.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b5f962e
    • Nikolay Aleksandrov's avatar
      net: bridge: fix returning of vlan range op errors · 66c54517
      Nikolay Aleksandrov authored
      When vlan tunnels were introduced, vlan range errors got silently
      dropped and instead 0 was returned always. Restore the previous
      behaviour and return errors to user-space.
      
      Fixes: efa5356b ("bridge: per vlan dst_metadata netlink support")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66c54517
    • Willem de Bruijn's avatar
      sock: correct sk_wmem_queued accounting on efault in tcp zerocopy · 54d43117
      Willem de Bruijn authored
      Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
      after triggering an EFAULT in __zerocopy_sg_from_iter.
      
      On this error, skb_zerocopy_stream_iter resets the skb to its state
      before the operation with __pskb_trim. It cannot kfree_skb like
      datagram callers, as the skb may have data from a previous send call.
      
      __pskb_trim calls skb_condense for unowned skbs, which adjusts their
      truesize. These tcp skbuffs are owned and their truesize must add up
      to sk_wmem_queued. But they match because their skb->sk is NULL until
      tcp_transmit_skb.
      
      Temporarily set skb->sk when calling __pskb_trim to signal that the
      skbuffs are owned and avoid the skb_condense path.
      
      Fixes: 52267790 ("sock: add MSG_ZEROCOPY")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54d43117
  2. 21 Oct, 2017 24 commits
    • David S. Miller's avatar
      Merge branch 'bpf-range-marking-fixes' · d2b27624
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Two BPF fixes for range marking
      
      The set contains two fixes for direct packet access range
      markings and test cases for all direct packet access patterns
      that the verifier matches on.
      
      They are targeted for net tree, note that once net gets merged
      into net-next, there will be a minor merge conflict due to
      signature change of the function find_good_pkt_pointers() as
      well as data_meta patterns present in net-next tree. You can
      just add bool false to the data_meta patterns and I will
      follow-up with properly converting the patterns for data_meta
      in a similar way.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2b27624
    • Daniel Borkmann's avatar
      bpf: add test cases to bpf selftests to cover all access tests · b37242c7
      Daniel Borkmann authored
      Lets add test cases to cover really all possible direct packet
      access tests for good/bad access cases so we keep tracking them.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b37242c7
    • Daniel Borkmann's avatar
      bpf: fix pattern matches for direct packet access · 0fd4759c
      Daniel Borkmann authored
      Alexander had a test program with direct packet access, where
      the access test was in the form of data + X > data_end. In an
      unrelated change to the program LLVM decided to swap the branches
      and emitted code for the test in form of data + X <= data_end.
      We hadn't seen these being generated previously, thus verifier
      would reject the program. Therefore, fix up the verifier to
      detect all test cases, so we don't run into such issues in the
      future.
      
      Fixes: b4e432f1 ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
      Reported-by: default avatarAlexander Alemayhu <alexander@alemayhu.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd4759c
    • Daniel Borkmann's avatar
      bpf: fix off by one for range markings with L{T, E} patterns · fb2a311a
      Daniel Borkmann authored
      During review I noticed that the current logic for direct packet
      access marking in check_cond_jmp_op() has an off by one for the
      upper right range border when marking in find_good_pkt_pointers()
      with BPF_JLT and BPF_JLE. It's not really harmful given access
      up to pkt_end is always safe, but we should nevertheless correct
      the range marking before it becomes ABI. If pkt_data' denotes a
      pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end
      in the true branch as well as for pkt_end <= pkt_data' in the false
      branch we mark the range with X although it should really be X - 1
      in these cases. For example, X could be pkt_end - pkt_data, then
      when testing for pkt_data' < pkt_end the verifier simulation cannot
      deduce that a byte load of pkt_data' - 1 would succeed in this
      branch.
      
      Fixes: b4e432f1 ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb2a311a
    • John Fastabend's avatar
      bpf: devmap fix arithmetic overflow in bitmap_size calculation · 8695a539
      John Fastabend authored
      An integer overflow is possible in dev_map_bitmap_size() when
      calculating the BITS_TO_LONG logic which becomes, after macro
      replacement,
      
      	(((n) + (d) - 1)/ (d))
      
      where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
      overflow cast to u64 before arithmetic.
      Reported-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8695a539
    • David S. Miller's avatar
      Merge branch 'aquantia-fixes' · 43ebf97f
      David S. Miller authored
      Igor Russkikh says:
      
      ====================
      net: aquantia: Atlantic driver 10/2017 updates
      
      This patchset fixes various issues in driver,
      improves parameters for better performance on 10Gbit link
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43ebf97f
    • Igor Russkikh's avatar
      net: aquantia: Bad udp rate on default interrupt coalescing · 417a3ae4
      Igor Russkikh authored
      Default Tx rates cause very long ISR delays on Tx.
      0xff is 510us delay, giving only ~ 2000 interrupts per seconds for
      Tx rings cleanup. With these settings udp tx rate was never higher than
      ~800Mbps on a single stream. Changing min delay to 0xF makes it
      way better with ~6Gbps
      
      TCP stream performance is almost unaffected by this change, since LSO
      optimizations play important role.
      
      CPU load is affected insignificantly by this change.
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      417a3ae4
    • Igor Russkikh's avatar
      net: aquantia: Enable coalescing management via ethtool interface · b82ee71a
      Igor Russkikh authored
      Aquantia NIC allows both TX and RX interrupt throttle rate (ITR)
      management, but this was used in a very limited way via predefined
      values. This patch allows to setup ITR default values via module
      command line arguments and via standard ethtool coalescing settings.
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b82ee71a
    • Igor Russkikh's avatar
      net: aquantia: mmio unmap was not performed on driver removal · 6849540a
      Igor Russkikh authored
      That may lead to mmio resource leakage.
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6849540a
    • Igor Russkikh's avatar
      net: aquantia: Limit number of MSIX irqs to the number of cpus · 4c8bb609
      Igor Russkikh authored
      There is no much practical use from having MSIX vectors more that number
      of cpus, thus cap this first with preconfigured limit, then with number
      of cpus online.
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c8bb609
    • Igor Russkikh's avatar
      net: aquantia: Fixed transient link up/down/up notification · 93d87b8f
      Igor Russkikh authored
      When doing ifconfig down/up, driver did not reported carrier_off neither
      in nic_stop nor in nic_start. That caused link to be visible as "up"
      during couple of seconds immediately after "ifconfig up".
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93d87b8f
    • Igor Russkikh's avatar
      net: aquantia: Add queue restarts stats counter · 5d8d84e9
      Igor Russkikh authored
      Queue stat strings are cleaned up, duplicate stat name strings removed,
      queue restarts counter added
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d8d84e9
    • Igor Russkikh's avatar
      net: aquantia: Reset nic statistics on interface up/down · 65e665e6
      Igor Russkikh authored
      Internal statistics system on chip never gets reset until hardware
      reboot. This is quite inconvenient in terms of ethtool statistics usage.
      
      This patch implements incremental statistics update inside of
      service callback.
      
      Upon nic initialization, first request is done to fetch
      initial stat data, current collected stat data gets cleared.
      Internal statistics mailbox readout is improved to save space and
      increase readability
      Signed-off-by: default avatarPavel Belous <pavel.belous@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65e665e6
    • Matteo Croce's avatar
      udp: make some messages more descriptive · 197df02c
      Matteo Croce authored
      In the UDP code there are two leftover error messages with very few meaning.
      Replace them with a more descriptive error message as some users
      reported them as "strange network error".
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      197df02c
    • Stefano Brivio's avatar
      geneve: Fix function matching VNI and tunnel ID on big-endian · 772e97b5
      Stefano Brivio authored
      On big-endian machines, functions converting between tunnel ID
      and VNI use the three LSBs of tunnel ID storage to map VNI.
      
      The comparison function eq_tun_id_and_vni(), on the other hand,
      attempted to map the VNI from the three MSBs. Fix it by using
      the same check implemented on LE, which maps VNI from the three
      LSBs of tunnel ID.
      
      Fixes: 2e0b26e1 ("geneve: Optimize geneve device lookup.")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      772e97b5
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.14-20171019' of... · c69d75ae
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.14-20171019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2017-10-19
      
      this is a pull request of 11 patches for the upcoming 4.14 release.
      
      There are 6 patches by ZHU Yi for the flexcan driver, that work around
      the CAN error handling state transition problems found in various
      incarnations of the flexcan IP core.
      
      The patch by Colin Ian King fixes a potential NULL pointer deref in the
      CAN broad cast manager (bcm). One patch by me replaces a direct deref of a RCU
      protected pointer by rcu_access_pointer. My second patch adds missing
      OOM error handling in af_can. A patch by Stefan Mätje for the esd_usb2
      driver fixes the dlc in received RTR frames. And the last patch is by
      Wolfgang Grandegger, it fixes a busy loop in the gs_usb driver in case
      it runs out of TX contexts.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c69d75ae
    • Dexuan Cui's avatar
      hv_sock: add locking in the open/close/release code paths · b4562ca7
      Dexuan Cui authored
      Without the patch, when hvs_open_connection() hasn't completely established
      a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
      inserted the sock into the connected queue), vsock_stream_connect() may see
      the sk_state change and return the connection to the userspace, and next
      when the userspace closes the connection quickly, hvs_release() may not see
      the connection in the connected queue; finally hvs_open_connection()
      inserts the connection into the queue, but we won't be able to purge the
      connection for ever.
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4562ca7
    • Gavin Shan's avatar
      net/ncsi: Fix length of GVI response packet · 0a90e251
      Gavin Shan authored
      The length of GVI (GetVersionInfo) response packet should be 40 instead
      of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.
      
       # ethtool --ncsi eth0 swstats
           :
       RESPONSE     OK       TIMEOUT  ERROR
       =======================================
       GVI          0        0        2
      
      With this applied, no error reported on GVI response packets:
      
       # ethtool --ncsi eth0 swstats
           :
       RESPONSE     OK       TIMEOUT  ERROR
       =======================================
       GVI          2        0        0
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a90e251
    • Gavin Shan's avatar
      net/ncsi: Enforce failover on link monitor timeout · 52b4c862
      Gavin Shan authored
      The NCSI channel has been configured to provide service if its link
      monitor timer is enabled, regardless of its state (inactive or active).
      So the timeout event on the link monitor indicates the out-of-service
      on that channel, for which a failover is needed.
      
      This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
      timeout, regardless the channel's original state (inactive or active).
      Also, the link is put into "down" state to give the failing channel
      lowest priority when selecting for the active channel. The state of
      failing channel should be set to active in order for deinitialization
      and failover to be done.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52b4c862
    • Gavin Shan's avatar
      net/ncsi: Disable HWA mode when no channels are found · 100ef01f
      Gavin Shan authored
      When there are no NCSI channels probed, HWA (Hardware Arbitration)
      mode is enabled. It's not correct because HWA depends on the fact:
      NCSI channels exist and all of them support HWA mode. This disables
      HWA when no channels are probed.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      100ef01f
    • Samuel Mendoza-Jonas's avatar
      net/ncsi: Stop monitor if channel times out or is inactive · 0795fb20
      Samuel Mendoza-Jonas authored
      ncsi_channel_monitor() misses stopping the channel monitor in several
      places that it should, causing a WARN_ON_ONCE() to trigger when the
      monitor is re-started later, eg:
      
      [  459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90
      [  459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140
      [  459.040000] Hardware name: ASpeed SoC
      [  459.040000] Workqueue: events ncsi_dev_work
      [  459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24)
      [  459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28)
      [  459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108)
      [  459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38)
      [  459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90)
      [  459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc)
      [  459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474)
      [  459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450)
      [  459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570)
      [  459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164)
      [  459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c)
      
      This also updates the monitor instead of just returning if
      ncsi_xmit_cmd() fails to send the get-link-status command so that the
      monitor properly times out.
      
      Fixes: e6f44ed6 "net/ncsi: Package and channel management"
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0795fb20
    • Samuel Mendoza-Jonas's avatar
      net/ncsi: Fix AEN HNCDSC packet length · 6850d0f8
      Samuel Mendoza-Jonas authored
      Correct the value of the HNCDSC AEN packet.
      Fixes: 7a82ecf4 "net/ncsi: NCSI AEN packet handler"
      Signed-off-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6850d0f8
    • Eric Dumazet's avatar
      packet: avoid panic in packet_getsockopt() · 509c7a1e
      Eric Dumazet authored
      syzkaller got crashes in packet_getsockopt() processing
      PACKET_ROLLOVER_STATS command while another thread was managing
      to change po->rollover
      
      Using RCU will fix this bug. We might later add proper RCU annotations
      for sparse sake.
      
      In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
      variant, as spotted by John.
      
      Fixes: a9b63918 ("packet: rollover statistics")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      509c7a1e
    • Eric Dumazet's avatar
      tcp/dccp: fix ireq->opt races · c92e8c02
      Eric Dumazet authored
      syzkaller found another bug in DCCP/TCP stacks [1]
      
      For the reasons explained in commit ce105008 ("tcp/dccp: fix
      ireq->pktopts race"), we need to make sure we do not access
      ireq->opt unless we own the request sock.
      
      Note the opt field is renamed to ireq_opt to ease grep games.
      
      [1]
      BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
      Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
      
      CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
       ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
       tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
       tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
       tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
       __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
       tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
       tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
       tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
       tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x40c341
      RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
      RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
      R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
      
      Allocated by task 3295:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       __do_kmalloc mm/slab.c:3725 [inline]
       __kmalloc+0x162/0x760 mm/slab.c:3734
       kmalloc include/linux/slab.h:498 [inline]
       tcp_v4_save_options include/net/tcp.h:1962 [inline]
       tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
       tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
       tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
       tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
       tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
       tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3306:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xca/0x250 mm/slab.c:3820
       inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
       __sk_destruct+0xfd/0x910 net/core/sock.c:1560
       sk_destruct+0x47/0x80 net/core/sock.c:1595
       __sk_free+0x57/0x230 net/core/sock.c:1603
       sk_free+0x2a/0x40 net/core/sock.c:1614
       sock_put include/net/sock.h:1652 [inline]
       inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
       tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c92e8c02
  3. 20 Oct, 2017 1 commit
    • David S. Miller's avatar
      Merge branch 'sockmap-fixes' · e95c6cf4
      David S. Miller authored
      John Fastabend says:
      
      ====================
      sockmap fixes for net
      
      The following implements a set of fixes for sockmap and changes the
      API slightly in a few places to reduce preempt_disable/enable scope.
      We do this here in net because it requires an API change and this
      avoids getting stuck with legacy API going forward.
      
      The short description:
      
      Access to skb mark is removed, it is problematic when we add
      features in the future because mark is a union and used by the
      TCP/socket code internally. We don't want to expose this to the
      BPF programs or let programs change the values.
      
      The other change is caching metadata in the skb itself between
      when the BPF program returns a redirect code and the core code
      implements the redirect. This avoids having per cpu metadata.
      
      Finally, tighten restriction on using sockmap to CAP_NET_ADMIN and
      only SOCK_STREAM sockets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e95c6cf4