1. 30 Jul, 2018 6 commits
    • Yonghong Song's avatar
      tools/bpftool: fix a percpu_array map dump problem · 573b3aa6
      Yonghong Song authored
      I hit the following problem when I tried to use bpftool
      to dump a percpu array.
      
        $ sudo ./bpftool map show
        61: percpu_array  name stub  flags 0x0
                key 4B  value 4B  max_entries 1  memlock 4096B
        ...
        $ sudo ./bpftool map dump id 61
        bpftool: malloc.c:2406: sysmalloc: Assertion
        `(old_top == initial_top (av) && old_size == 0) || \
         ((unsigned long) (old_size) >= MINSIZE && \
         prev_inuse (old_top) && \
         ((unsigned long) old_end & (pagesize - 1)) == 0)'
        failed.
        Aborted
      
      Further debugging revealed that this is due to
      miscommunication between bpftool and kernel.
      For example, for the above percpu_array with value size of 4B.
      The map info returned to user space has value size of 4B.
      
      In bpftool, the values array for lookup is allocated like:
         info->value_size * get_possible_cpus() = 4 * get_possible_cpus()
      In kernel (kernel/bpf/syscall.c), the values array size is
      rounded up to multiple of 8.
         round_up(map->value_size, 8) * num_possible_cpus()
         = 8 * num_possible_cpus()
      So when kernel copies the values to user buffer, the kernel will
      overwrite beyond user buffer boundary.
      
      This patch fixed the issue by allocating and stepping through
      percpu map value array properly in bpftool.
      
      Fixes: 71bb428f ("tools: bpf: add bpftool")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      573b3aa6
    • Dmitry Safonov's avatar
      netlink: Don't shift with UB on nlk->ngroups · 61f4b237
      Dmitry Safonov authored
      On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
      hang during boot.
      Check for 0 ngroups and use (unsigned long long) as a type to shift.
      
      Fixes: 7acf9d42 ("netlink: Do not subscribe to non-existent groups").
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f4b237
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.18-20180730' of... · af87f72e
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.18-20180730' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2018-07-30
      
      this is a pull request of one patch for net/master.
      
      The patch by Anton Vasilyev and the Linux Driver Verification project
      fixes a memory leak in the ems_usb driver's disconnect function.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af87f72e
    • Sabrina Dubroca's avatar
      net/ipv6: fix metrics leak · df18b504
      Sabrina Dubroca authored
      Since commit d4ead6b3 ("net/ipv6: move metrics from dst to
      rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from()
      assigns the rt->from pointer and increases the refcount on from's
      metrics. This reference is never released.
      
      Introduce the fib6_metrics_release() helper and use it to release the
      metrics.
      
      Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df18b504
    • Xiao Liang's avatar
      xen-netfront: wait xenbus state change when load module manually · 822fb18a
      Xiao Liang authored
      When loading module manually, after call xenbus_switch_state to initializes
      the state of the netfront device, the driver state did not change so fast
      that may lead no dev created in latest kernel. This patch adds wait to make
      sure xenbus knows the driver is not in closed/unknown state.
      
      Current state:
      [vm]# ethtool eth0
      Settings for eth0:
      	Link detected: yes
      [vm]# modprobe -r xen_netfront
      [vm]# modprobe  xen_netfront
      [vm]# ethtool eth0
      Settings for eth0:
      Cannot get device settings: No such device
      Cannot get wake-on-lan settings: No such device
      Cannot get message level: No such device
      Cannot get link status: No such device
      No data available
      
      With the patch installed.
      [vm]# ethtool eth0
      Settings for eth0:
      	Link detected: yes
      [vm]# modprobe -r xen_netfront
      [vm]# modprobe xen_netfront
      [vm]# ethtool eth0
      Settings for eth0:
      	Link detected: yes
      Signed-off-by: default avatarXiao Liang <xiliang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      822fb18a
    • Anton Vasilyev's avatar
      can: ems_usb: Fix memory leak on ems_usb_disconnect() · 72c05f32
      Anton Vasilyev authored
      ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there
      is no its deallocation in ems_usb_disconnect().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAnton Vasilyev <vasilyev@ispras.ru>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      72c05f32
  2. 29 Jul, 2018 11 commits
  3. 28 Jul, 2018 5 commits
    • Stefan Wahren's avatar
      net: lan78xx: fix rx handling before first packet is send · 136f55f6
      Stefan Wahren authored
      As long the bh tasklet isn't scheduled once, no packet from the rx path
      will be handled. Since the tx path also schedule the same tasklet
      this situation only persits until the first packet transmission.
      So fix this issue by scheduling the tasklet after link reset.
      
      Link: https://github.com/raspberrypi/linux/issues/2617
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
      Suggested-by: default avatarFloris Bos <bos@je-eigen-domein.nl>
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      136f55f6
    • John Hurley's avatar
      nfp: flower: fix port metadata conversion bug · ee614c87
      John Hurley authored
      Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type
      return value but, if the repr type is unknown, returns a value of type
      enum nfp_flower_cmsg_port_type.  This means that if FW encodes the port
      ID in a way the driver does not understand instead of dropping the frame
      driver may attribute it to a physical port (uplink) provided the port
      number is less than physical port count.
      
      Fix this and ensure a net_device of NULL is returned if the repr can not
      be determined.
      
      Fixes: 1025351a ("nfp: add flower app")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee614c87
    • Taehee Yoo's avatar
      bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog() · 71eb5255
      Taehee Yoo authored
      bpf_parse_prog() is protected by rcu_read_lock().
      so that GFP_KERNEL is not allowed in the bpf_parse_prog().
      
      [51015.579396] =============================
      [51015.579418] WARNING: suspicious RCU usage
      [51015.579444] 4.18.0-rc6+ #208 Not tainted
      [51015.579464] -----------------------------
      [51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU read-side critical section!
      [51015.579510] other info that might help us debug this:
      [51015.579532] rcu_scheduler_active = 2, debug_locks = 1
      [51015.579556] 2 locks held by ip/1861:
      [51015.579577]  #0: 00000000a8c12fd1 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x2e0/0x910
      [51015.579711]  #1: 00000000bf815f8e (rcu_read_lock){....}, at: lwtunnel_build_state+0x96/0x390
      [51015.579842] stack backtrace:
      [51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208
      [51015.579891] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [51015.579911] Call Trace:
      [51015.579950]  dump_stack+0x74/0xbb
      [51015.580000]  ___might_sleep+0x16b/0x3a0
      [51015.580047]  __kmalloc_track_caller+0x220/0x380
      [51015.580077]  kmemdup+0x1c/0x40
      [51015.580077]  bpf_parse_prog+0x10e/0x230
      [51015.580164]  ? kasan_kmalloc+0xa0/0xd0
      [51015.580164]  ? bpf_destroy_state+0x30/0x30
      [51015.580164]  ? bpf_build_state+0xe2/0x3e0
      [51015.580164]  bpf_build_state+0x1bb/0x3e0
      [51015.580164]  ? bpf_parse_prog+0x230/0x230
      [51015.580164]  ? lock_is_held_type+0x123/0x1a0
      [51015.580164]  lwtunnel_build_state+0x1aa/0x390
      [51015.580164]  fib_create_info+0x1579/0x33d0
      [51015.580164]  ? sched_clock_local+0xe2/0x150
      [51015.580164]  ? fib_info_update_nh_saddr+0x1f0/0x1f0
      [51015.580164]  ? sched_clock_local+0xe2/0x150
      [51015.580164]  fib_table_insert+0x201/0x1990
      [51015.580164]  ? lock_downgrade+0x610/0x610
      [51015.580164]  ? fib_table_lookup+0x1920/0x1920
      [51015.580164]  ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0
      [51015.580164]  ? rtm_to_fib_config+0x637/0xbd0
      [51015.580164]  inet_rtm_newroute+0xed/0x1b0
      [51015.580164]  ? rtm_to_fib_config+0xbd0/0xbd0
      [51015.580164]  rtnetlink_rcv_msg+0x331/0x910
      [ ... ]
      
      Fixes: 3a0af8fd ("bpf: BPF for lightweight tunnel infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      71eb5255
    • Daniel Borkmann's avatar
      bpf: fix bpf_skb_load_bytes_relative pkt length check · 3eee1f75
      Daniel Borkmann authored
      The len > skb_headlen(skb) cannot be used as a maximum upper bound
      for the packet length since it does not have any relation to the full
      linear packet length when filtering is used from upper layers (e.g.
      in case of reuseport BPF programs) as by then skb->data, skb->len
      already got mangled through __skb_pull() and others.
      
      Fixes: 4e1ec56c ("bpf: add skb_load_bytes_relative helper")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      3eee1f75
    • Thomas Richter's avatar
      perf build: Build error in libbpf missing initialization · b611da43
      Thomas Richter authored
      In linux-next tree compiling the perf tool with additional make flags
      EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2" causes a compiler error.
      It is the warning 'variable may be used uninitialized' which is treated
      as error: I compile it using a FEDORA 28 installation, my gcc compiler
      version: gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20). The file that
      causes the error is tools/lib/bpf/libbpf.c.
      
        [root@p23lp27] # make V=1 EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2"
        [...]
        Makefile.config:849: No openjdk development package found, please
           install JDK package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel
        Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h'
                differs from latest version at 'include/uapi/linux/if_link.h'
          CC       libbpf.o
        libbpf.c: In function ‘bpf_perf_event_read_simple’:
        libbpf.c:2342:6: error: ‘ret’ may be used uninitialized in this
        			function [-Werror=maybe-uninitialized]
          int ret;
              ^
        cc1: all warnings being treated as errors
        mv: cannot stat './.libbpf.o.tmp': No such file or directory
        /home6/tmricht/linux-next/tools/build/Makefile.build:96: recipe for target 'libbpf.o' failed
      Suggested-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b611da43
  4. 27 Jul, 2018 4 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · d0fdb366
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2018-07-27
      
      1) Fix PMTU handling of vti6. We update the PMTU on
         the xfrm dst_entry which is not cached anymore
         after the flowchache removal. So update the
         PMTU of the original dst_entry instead.
         From Eyal Birger.
      
      2) Fix a leak of kernel memory to userspace.
         From Eric Dumazet.
      
      3) Fix a possible dst_entry memleak in xfrm_lookup_route.
         From Tommi Rantala.
      
      4) Fix a skb leak in case we can't call nlmsg_multicast
         from xfrm_nlmsg_multicast. From Florian Westphal.
      
      5) Fix a leak of a temporary buffer in the error path of
         esp6_input. From Zhen Lei.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0fdb366
    • Gal Pressman's avatar
      net: ena: Fix use of uninitialized DMA address bits field · 101f0cd4
      Gal Pressman authored
      UBSAN triggers the following undefined behaviour warnings:
      [...]
      [   13.236124] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:468:22
      [   13.240043] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
      [...]
      [   13.744769] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:373:4
      [   13.748694] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
      [...]
      
      When splitting the address to high and low, GENMASK_ULL is used to generate
      a bitmask with dma_addr_bits field from io_sq (in ena_com_prepare_tx and
      ena_com_add_single_rx_desc).
      The problem is that dma_addr_bits is not initialized with a proper value
      (besides being cleared in ena_com_create_io_queue).
      Assign dma_addr_bits the correct value that is stored in ena_dev when
      initializing the SQ.
      
      Fixes: 1738cd3e ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarGal Pressman <pressmangal@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      101f0cd4
    • Martin KaFai Lau's avatar
      bpf: btf: Use exact btf value_size match in map_check_btf() · 5f300e80
      Martin KaFai Lau authored
      The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
      '> map->value_size' to ensure map_seq_show_elem() will not
      access things beyond an array element.
      
      Yonghong suggested that using '!=' is a more correct
      check.  The 8 bytes round_up on value_size is stored
      in array->elem_size.  Hence, using '!=' on map->value_size
      is a proper check.
      
      This patch also adds new tests to check the btf array
      key type and value type.  Two of these new tests verify
      the btf's value_size (the change in this patch).
      
      It also fixes two existing tests that wrongly encoded
      a btf's type size (pprint_test) and the value_type_id (in one
      of the raw_tests[]).  However, that do not affect these two
      BTF verification tests before or after this test changes.
      These two tests mainly failed at array creation time after
      this patch.
      
      Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
      Suggested-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5f300e80
    • Taehee Yoo's avatar
      xdp: add NULL pointer check in __xdp_return() · 36e0f12b
      Taehee Yoo authored
      rhashtable_lookup() can return NULL. so that NULL pointer
      check routine should be added.
      
      Fixes: 02b55e56 ("xdp: add MEM_TYPE_ZERO_COPY")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      36e0f12b
  5. 26 Jul, 2018 7 commits
    • Avinash Repaka's avatar
      RDS: RDMA: Fix the NULL-ptr deref in rds_ib_get_mr · 9e630bcb
      Avinash Repaka authored
      Registration of a memory region(MR) through FRMR/fastreg(unlike FMR)
      needs a connection/qp. With a proxy qp, this dependency on connection
      will be removed, but that needs more infrastructure patches, which is a
      work in progress.
      
      As an intermediate fix, the get_mr returns EOPNOTSUPP when connection
      details are not populated. The MR registration through sendmsg() will
      continue to work even with fast registration, since connection in this
      case is formed upfront.
      
      This patch fixes the following crash:
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544
      RSP: 0018:ffff8801b059f890 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e
      RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068
      RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200
      R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200
      R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000
      FS:  00007f4d050af700(0000) GS:ffff8801db300000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271
       rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357
       rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347
       SYSC_setsockopt net/socket.c:1849 [inline]
       SyS_setsockopt+0x189/0x360 net/socket.c:1828
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4456d9
      RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9
      RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004
      RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000
      R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005
      Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00
      48
      b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02
      00 0f
      85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00
      RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP:
      ffff8801b059f890
      ---[ end trace 7e1cea13b85473b0 ]---
      
      Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarAvinash Repaka <avinash.repaka@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e630bcb
    • Tariq Toukan's avatar
      net: rollback orig value on failure of dev_qdisc_change_tx_queue_len · 7effaf06
      Tariq Toukan authored
      Fix dev_change_tx_queue_len so it rolls back original value
      upon a failure in dev_qdisc_change_tx_queue_len.
      This is already done for notifirers' failures, share the code.
      
      In case of failure in dev_qdisc_change_tx_queue_len, some tx queues
      would still be of the new length, while they should be reverted.
      Currently, the revert is not done, and is marked with a TODO label
      in dev_qdisc_change_tx_queue_len, and should find some nice solution
      to do it.
      Yet it is still better to not apply the newly requested value.
      
      Fixes: 48bfd55e ("net_sched: plug in qdisc ops change_tx_queue_len")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reported-by: default avatarRan Rozenstein <ranro@mellanox.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7effaf06
    • tangpengpeng's avatar
      net: fix amd-xgbe flow-control issue · 7f3fc7dd
      tangpengpeng authored
      If we enable or disable xgbe flow-control by ethtool ,
      it does't work.Because the parameter is not properly
      assigned,so we need to adjust the assignment order
      of the parameters.
      
      Fixes: c1ce2f77 ("amd-xgbe: Fix flow control setting logic")
      Signed-off-by: default avatartangpengpeng <tangpengpeng@higon.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f3fc7dd
    • Arjun Vynipadath's avatar
      cxgb4: Added missing break in ndo_udp_tunnel_{add/del} · 942a656f
      Arjun Vynipadath authored
      Break statements were missing for Geneve case in
      ndo_udp_tunnel_{add/del}, thereby raw mac matchall
      entries were not getting added.
      
      Fixes: c746fc0e("cxgb4: add geneve offload support for T6")
      Signed-off-by: default avatarArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      942a656f
    • Jakub Kicinski's avatar
      netdevsim: don't leak devlink resources · c259b4fb
      Jakub Kicinski authored
      Devlink resources registered with devlink_resource_register() have
      to be unregistered.
      
      Fixes: 37923ed6 ("netdevsim: Add simple FIB resource controller via devlink")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c259b4fb
    • Björn Töpel's avatar
      xsk: fix poll/POLLIN premature returns · d24458e4
      Björn Töpel authored
      Polling for the ingress queues relies on reading the producer/consumer
      pointers of the Rx queue.
      
      Prior this commit, a cached consumer pointer could be used, instead of
      the actual consumer pointer and therefore report POLLIN prematurely.
      
      This patch makes sure that the non-cached consumer pointer is used
      instead.
      Reported-by: default avatarQi Zhang <qi.z.zhang@intel.com>
      Tested-by: default avatarQi Zhang <qi.z.zhang@intel.com>
      Fixes: c497176c ("xsk: add Rx receive functions and poll support")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d24458e4
    • Wang YanQing's avatar
      bpf, x32: Fix regression caused by commit 24dea047 · 9e4e5b5c
      Wang YanQing authored
      Commit 24dea047 ("bpf, x32: remove ld_abs/ld_ind")
      removed the 4 /* Extra space for skb_copy_bits buffer */
      from _STACK_SIZE, but it didn't fix the concerned code
      in emit_prologue and emit_epilogue, and this error will
      bring very strange kernel runtime errors. This patch
      fixes it.
      
      Fixes: 24dea047 ("bpf, x32: remove ld_abs/ld_ind")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Bisected-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9e4e5b5c
  6. 25 Jul, 2018 7 commits
    • Wei Yongjun's avatar
      net: igmp: make function __ip_mc_inc_group() static · b87bac10
      Wei Yongjun authored
      Fixes the following sparse warnings:
      
      net/ipv4/igmp.c:1391:6: warning:
       symbol '__ip_mc_inc_group' was not declared. Should it be static?
      
      Fixes: 6e2059b5 ("ipv4/igmp: init group mode as INCLUDE when join source group")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87bac10
    • Lawrence Brakmo's avatar
      tcp: ack immediately when a cwr packet arrives · 9aee4000
      Lawrence Brakmo authored
      We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
      problem is triggered when the last packet of a request arrives CE
      marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
      to 1 (because there are no packets in flight). When the 1st packet of
      the next request arrives, the ACK was sometimes delayed even though it
      is CWR marked, adding up to 40ms to the RPC latency.
      
      This patch insures that CWR marked data packets arriving will be acked
      immediately.
      
      Packetdrill script to reproduce the problem:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 2:3(1) ack 2001
      
      0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
      0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
      0.200 > [ect01] . 3:3(0) ack 4001
      
      0.210 < [ce] P. 4001:4501(500) ack 3 win 257
      
      +0.001 read(4, ..., 4500) = 4500
      +0 write(4, ..., 1) = 1
      +0 > [ect01] PE. 3:4(1) ack 4501
      
      +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
      // Previously the ACK sequence below would be 4501, causing a long RTO
      +0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack
      
      +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
      +0 > [ect01] . 4:4(0) ack 6501     // now acks everything
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      
      Modified based on comments by Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aee4000
    • dann frazier's avatar
      hinic: Link the logical network device to the pci device in sysfs · 7856e861
      dann frazier authored
      Otherwise interfaces get exposed under /sys/devices/virtual, which
      doesn't give udev the context it needs for PCI-based predictable
      interface names.
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7856e861
    • Toshiaki Makita's avatar
      virtio_net: Fix incosistent received bytes counter · ecbc42ca
      Toshiaki Makita authored
      When received packets are dropped in virtio_net driver, received packets
      counter is incremented but bytes counter is not.
      As a result, for instance if we drop all packets by XDP, only received
      is counted and bytes stays 0, which looks inconsistent.
      IMHO received packets/bytes should be counted if packets are produced by
      the hypervisor, like what common NICs on physical machines are doing.
      So fix the bytes counter.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecbc42ca
    • Daniel Borkmann's avatar
      Merge branch 'bpf-annotate-kv-pair' · 684cce1c
      Daniel Borkmann authored
      Martin KaFai Lau says:
      
      ====================
      The series allows the BPF loader to figure out the btf_key_id
      and btf_value_id from a map's name by using BPF_ANNOTATE_KV_PAIR()
      similarly as in iproute2 commit f823f36012fb ("bpf: implement
      btf handling and map annotation").
      
      It also removes the old 'typedef' way which requires two separate
      typedefs (one for the key and one for the value).
      
      By doing this, iproute2 and libbpf have one consistent way to
      figure out the btf_key_type_id and btf_value_type_id for a map.
      
      The first two patches are some prep/cleanup works. The last patch
      introduces BPF_ANNOTATE_KV_PAIR.
      
      v3:
      - Replace some more *int*_t and u* usages with the
        equivalent __[su]* in btf.c
      
      v2:
      - Fix the incorrect '&&' check on container_type
        in bpf_map_find_btf_info().
      - Expose the existing static btf_type_by_id() instead of
        creating a new one.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      684cce1c
    • Martin KaFai Lau's avatar
      bpf: Introduce BPF_ANNOTATE_KV_PAIR · 38d5d3b3
      Martin KaFai Lau authored
      This patch introduces BPF_ANNOTATE_KV_PAIR to signal the
      bpf loader about the btf key_type and value_type of a bpf map.
      Please refer to the changes in test_btf_haskv.c for its usage.
      Both iproute2 and libbpf loader will then have the same
      convention to find out the map's btf_key_type_id and
      btf_value_type_id from a map's name.
      
      Fixes: 8a138aed ("bpf: btf: Add BTF support to libbpf")
      Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      38d5d3b3
    • Martin KaFai Lau's avatar
      bpf: Replace [u]int32_t and [u]int64_t in libbpf · 5b891af7
      Martin KaFai Lau authored
      This patch replaces [u]int32_t and [u]int64_t usage with
      __[su]32 and __[su]64.  The same change goes for [u]int16_t
      and [u]int8_t.
      
      Fixes: 8a138aed ("bpf: btf: Add BTF support to libbpf")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5b891af7