1. 11 Apr, 2018 25 commits
    • David S. Miller's avatar
      Merge branch 'l2tp-tunnel-creation-fixes' · 0c84cee8
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      l2tp: tunnel creation fixes
      
      L2TP tunnel creation is racy. We need to make sure that the tunnel
      returned by l2tp_tunnel_create() isn't going to be freed while the
      caller is using it. This is done in patch #1, by separating tunnel
      creation from tunnel registration.
      
      With the tunnel registration code in place, we can now check for
      duplicate tunnels in a race-free way. This is done in patch #2, which
      incidentally removes the last use of l2tp_tunnel_find().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c84cee8
    • Guillaume Nault's avatar
      l2tp: fix race in duplicate tunnel detection · f6cd651b
      Guillaume Nault authored
      We can't use l2tp_tunnel_find() to prevent l2tp_nl_cmd_tunnel_create()
      from creating a duplicate tunnel. A tunnel can be concurrently
      registered after l2tp_tunnel_find() returns. Therefore, searching for
      duplicates must be done at registration time.
      
      Finally, remove l2tp_tunnel_find() entirely as it isn't use anywhere
      anymore.
      
      Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6cd651b
    • Guillaume Nault's avatar
      l2tp: fix races in tunnel creation · 6b9f3423
      Guillaume Nault authored
      l2tp_tunnel_create() inserts the new tunnel into the namespace's tunnel
      list and sets the socket's ->sk_user_data field, before returning it to
      the caller. Therefore, there are two ways the tunnel can be accessed
      and freed, before the caller even had the opportunity to take a
      reference. In practice, syzbot could crash the module by closing the
      socket right after a new tunnel was returned to pppol2tp_create().
      
      This patch moves tunnel registration out of l2tp_tunnel_create(), so
      that the caller can safely hold a reference before publishing the
      tunnel. This second step is done with the new l2tp_tunnel_register()
      function, which is now responsible for associating the tunnel to its
      socket and for inserting it into the namespace's list.
      
      While moving the code to l2tp_tunnel_register(), a few modifications
      have been done. First, the socket validation tests are done in a helper
      function, for clarity. Also, modifying the socket is now done after
      having inserted the tunnel to the namespace's tunnels list. This will
      allow insertion to fail, without having to revert theses modifications
      in the error path (a followup patch will check for duplicate tunnels
      before insertion). Either the socket is a kernel socket which we
      control, or it is a user-space socket for which we have a reference on
      the file descriptor. In any case, the socket isn't going to be closed
      from under us.
      
      Reported-by: syzbot+fbeeb5c3b538e8545644@syzkaller.appspotmail.com
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b9f3423
    • Sabrina Dubroca's avatar
      tun: send netlink notification when the device is modified · 83c1f36f
      Sabrina Dubroca authored
      I added dumping of link information about tun devices over netlink in
      commit 1ec010e7 ("tun: export flags, uid, gid, queue information
      over netlink"), but didn't add the missing netlink notifications when
      the device's exported properties change.
      
      This patch adds notifications when owner/group or flags are modified,
      when queues are attached/detached, and when a tun fd is closed.
      Reported-by: default avatarThomas Haller <thaller@redhat.com>
      Fixes: 1ec010e7 ("tun: export flags, uid, gid, queue information over netlink")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83c1f36f
    • Sabrina Dubroca's avatar
      tun: set the flags before registering the netdevice · 9fffc5c6
      Sabrina Dubroca authored
      Otherwise, register_netdevice advertises the creation of the device with
      the default flags, instead of what the user requested.
      Reported-by: default avatarThomas Haller <thaller@redhat.com>
      Fixes: 1ec010e7 ("tun: export flags, uid, gid, queue information over netlink")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fffc5c6
    • Phil Elwell's avatar
      lan78xx: Don't reset the interface on open · 47b99865
      Phil Elwell authored
      Commit 92571a1a ("lan78xx: Connect phy early") moves the PHY
      initialisation into lan78xx_probe, but lan78xx_open subsequently calls
      lan78xx_reset. As well as forcing a second round of link negotiation,
      this reset frequently prevents the phy interrupt from being generated
      (even though the link is up), rendering the interface unusable.
      
      Fix this issue by removing the lan78xx_reset call from lan78xx_open.
      
      Fixes: 92571a1a ("lan78xx: Connect phy early")
      Signed-off-by: default avatarPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47b99865
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Fixes-for-net' · 9cf74f59
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Fixes for net.
      
      This bug fix series include NULL pointer fixes in ethtool -x code path
      and in the error clean up path when freeing IRQs, a ring accounting bug
      that missed rings used by the RDMA driver, and 3 bug fixes related to TC
      Flower and VF-reps.
      
      v2: Fixed commit message of patch 4.  Changed the pound sign to $ sign
      in front of the ip command.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cf74f59
    • Michael Chan's avatar
      bnxt_en: Fix NULL pointer dereference at bnxt_free_irq(). · cb98526b
      Michael Chan authored
      When open fails during ethtool -L ring change, for example, the driver
      may crash at bnxt_free_irq() because bp->bnapi is NULL.
      
      If we fail to allocate all the new rings, bnxt_open_nic() will free
      all the memory including bp->bnapi.  Subsequent call to bnxt_close_nic()
      will try to dereference bp->bnapi in bnxt_free_irq().
      
      Fix it by checking for !bp->bnapi in bnxt_free_irq().
      
      Fixes: e5811b8c ("bnxt_en: Add IRQ remapping logic.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb98526b
    • Michael Chan's avatar
      bnxt_en: Need to include RDMA rings in bnxt_check_rings(). · 11c3ec7b
      Michael Chan authored
      With recent changes to reserve both L2 and RDMA rings, we need to include
      the RDMA rings in bnxt_check_rings().  Otherwise we will under-estimate
      the rings we need during ethtool -L and may lead to failure.
      
      Fixes: fbcfc8e4 ("bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11c3ec7b
    • Sriharsha Basavapatna's avatar
      bnxt_en: Support max-mtu with VF-reps · 9d96465b
      Sriharsha Basavapatna authored
      While a VF is configured with a bigger mtu (> 1500), any packets that
      are punted to the VF-rep (slow-path) get dropped by OVS kernel-datapath
      with the following message: "dropped over-mtu packet". Fix this by
      returning the max-mtu value for a VF-rep derived from its corresponding VF.
      VF-rep's mtu can be changed using 'ip' command as shown in this example:
      
      	$ ip link set bnxt0_pf0vf0 mtu 9000
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d96465b
    • Sriharsha Basavapatna's avatar
      bnxt_en: Ignore src port field in decap filter nodes · 479ca3bf
      Sriharsha Basavapatna authored
      The driver currently uses src port field (along with other fields) in the
      decap tunnel key, while looking up and adding tunnel nodes. This leads to
      redundant cfa_decap_filter_alloc() requests to the FW and flow-miss in the
      flow engine. Fix this by ignoring the src port field in decap tunnel nodes.
      
      Fixes: f484f678 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter")
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      479ca3bf
    • Andy Gospodarek's avatar
      bnxt_en: do not allow wildcard matches for L2 flows · e85a9be9
      Andy Gospodarek authored
      Before this patch the following commands would succeed as far as the
      user was concerned:
      
      $ tc qdisc add dev p1p1 ingress
      $ tc filter add dev p1p1 parent ffff: protocol all \
      	flower skip_sw action drop
      $ tc filter add dev p1p1 parent ffff: protocol ipv4 \
      	flower skip_sw src_mac 00:02:00:00:00:01/44 action drop
      
      The current flow offload infrastructure used does not support wildcard
      matching for ethernet headers, so do not allow the second or third
      commands to succeed.  If a user wants to drop traffic on that interface
      the protocol and MAC addresses need to be specified explicitly:
      
      $ tc qdisc add dev p1p1 ingress
      $ tc filter add dev p1p1 parent ffff: protocol arp \
      	flower skip_sw action drop
      $ tc filter add dev p1p1 parent ffff: protocol ipv4 \
      	flower skip_sw action drop
      ...
      $ tc filter add dev p1p1 parent ffff: protocol ipv4 \
      	flower skip_sw src_mac 00:02:00:00:00:01 action drop
      $ tc filter add dev p1p1 parent ffff: protocol ipv4 \
      	flower skip_sw src_mac 00:02:00:00:00:02 action drop
      ...
      
      There are also checks for VLAN parameters in this patch as other callers
      may wildcard those parameters even if tc does not.  Using different
      flow infrastructure could allow this to work in the future for L2 flows,
      but for now it does not.
      
      Fixes: 2ae7408f ("bnxt_en: bnxt: add TC flower filter offload support")
      Signed-off-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e85a9be9
    • Michael Chan's avatar
      bnxt_en: Fix ethtool -x crash when device is down. · 7991cb9c
      Michael Chan authored
      Fix ethtool .get_rxfh() crash by checking for valid indirection table
      address before copying the data.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7991cb9c
    • David S. Miller's avatar
      Merge branch 'vhost-fix-vhost_vq_access_ok-log-check' · 949989b3
      David S. Miller authored
      Stefan Hajnoczi says:
      
      ====================
      vhost: fix vhost_vq_access_ok() log check
      
      v3:
       * Rebased onto net/master and resolved conflict [DaveM]
      
      v2:
       * Rewrote the conditional to make the vq access check clearer [Linus]
       * Added Patch 2 to make the return type consistent and harder to misuse [Linus]
      
      The first patch fixes the vhost virtqueue access check which was recently
      broken.  The second patch replaces the int return type with bool to prevent
      future bugs.
      ====================
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      949989b3
    • Stefan Hajnoczi's avatar
      vhost: return bool from *_access_ok() functions · ddd3d408
      Stefan Hajnoczi authored
      Currently vhost *_access_ok() functions return int.  This is error-prone
      because there are two popular conventions:
      
      1. 0 means failure, 1 means success
      2. -errno means failure, 0 means success
      
      Although vhost mostly uses #1, it does not do so consistently.
      umem_access_ok() uses #2.
      
      This patch changes the return type from int to bool so that false means
      failure and true means success.  This eliminates a potential source of
      errors.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddd3d408
    • Stefan Hajnoczi's avatar
      vhost: fix vhost_vq_access_ok() log check · d14d2b78
      Stefan Hajnoczi authored
      Commit d65026c6 ("vhost: validate log
      when IOTLB is enabled") introduced a regression.  The logic was
      originally:
      
        if (vq->iotlb)
            return 1;
        return A && B;
      
      After the patch the short-circuit logic for A was inverted:
      
        if (A || vq->iotlb)
            return A;
        return B;
      
      This patch fixes the regression by rewriting the checks in the obvious
      way, no longer returning A when vq->iotlb is non-NULL (which is hard to
      understand).
      
      Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d14d2b78
    • Eric Auger's avatar
      vhost: Fix vhost_copy_to_user() · 7ced6c98
      Eric Auger authored
      vhost_copy_to_user is used to copy vring used elements to userspace.
      We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
      
      Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
      Signed-off-by: default avatarEric Auger <eric.auger@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ced6c98
    • David S. Miller's avatar
      Merge branch 'Aquantia-atlantic-critical-fixes-04-2018' · 98239c90
      David S. Miller authored
      Igor Russkikh says:
      
      ====================
      Aquantia atlantic critical fixes 04/2018
      
      Two regressions on latest 4.16 driver reported by users
      
      Some of old FW (1.5.44) had a link management logic which prevents
      driver to make clean reset. Driver of 4.16 has a full hardware reset
      implemented and that broke the link and traffic on such a cards.
      
      Second is oops on shutdown callback in case interface is already
      closed or was never opened.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98239c90
    • Igor Russkikh's avatar
      net: aquantia: oops when shutdown on already stopped device · 9a11aff2
      Igor Russkikh authored
      In case netdev is closed at the moment of pci shutdown, aq_nic_stop
      gets called second time. napi_disable in that case hangs indefinitely.
      In other case, if device was never opened at all, we get oops because
      of null pointer access.
      
      We should invoke aq_nic_stop conditionally, only if device is running
      at the moment of shutdown.
      Reported-by: default avatarDavid Arcari <darcari@redhat.com>
      Fixes: 90869ddf ("net: aquantia: Implement pci shutdown callback")
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a11aff2
    • Igor Russkikh's avatar
      net: aquantia: Regression on reset with 1.x firmware · cce96d18
      Igor Russkikh authored
      On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
      is active. With this mode when motherboard gets powered (but no poweron
      happens yet), NIC automatically enables powersave link and watches
      for WOL packet.
      This normally allows to powerup the PC after AC power failures.
      
      Not all motherboards or bios settings gives power to PCI slots,
      so this mode is not enabled on all the hardware.
      
      4.16 linux driver introduced full hardware reset sequence
      This is required since before that we had no NIC hardware
      reset implemented and there were side effects of "not clean start".
      
      But this full reset is incompatible with "dirty wake" WOL feature
      it keeps the PHY link in a special mode forever. As a consequence,
      driver sees no link and no traffic.
      
      To fix this we forcibly change FW state to idle state before doing
      the full reset. This makes FW to restore link state.
      
      Fixes: c8c82eb3 net: aquantia: Introduce global AQC hardware reset sequence
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cce96d18
    • Bassem Boubaker's avatar
      cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN · 53765341
      Bassem Boubaker authored
      The Cinterion AHS8 is a 3G device with one embedded WWAN interface
      using cdc_ether as a driver.
      
      The modem is controlled via AT commands through the exposed TTYs.
      
      AT+CGDCONT write command can be used to activate or deactivate a WWAN
      connection for a PDP context defined with the same command. UE
      supports one WWAN adapter.
      Signed-off-by: default avatarBassem Boubaker <bassem.boubaker@actia.fr>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53765341
    • Tejaswi Tanikella's avatar
      slip: Check if rstate is initialized before uncompressing · 3f01ddb9
      Tejaswi Tanikella authored
      On receiving a packet the state index points to the rstate which must be
      used to fill up IP and TCP headers. But if the state index points to a
      rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
      infinite loop inside ip_fast_csum trying to compute the ip checsum of a
      header with zero length.
      
      89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
      89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
      89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
      89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
      89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
      89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
      89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
      89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
      89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
      89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
      89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
      89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
      89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c
      
      ./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
       ip_fast_csum at arch/arm64/include/asm/checksum.h:40
       (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
      
      Adding a variable to indicate if the current rstate is initialized. If
      such a packet arrives, move to toss state.
      Signed-off-by: default avatarTejaswi Tanikella <tejaswit@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f01ddb9
    • Phil Elwell's avatar
      lan78xx: Avoid spurious kevent 4 "error" · fed56079
      Phil Elwell authored
      lan78xx_defer_event generates an error message whenever the work item
      is already scheduled. lan78xx_open defers three events -
      EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
      of the likelihood (or certainty) of an error message, the DEV_OPEN
      event is added to the set of pending events directly, relying on
      the subsequent deferral of the EVENT_LINK_RESET call to schedule the
      work.  Take the same precaution with EVENT_STAT_UPDATE to avoid a
      totally unnecessary error message.
      Signed-off-by: default avatarPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fed56079
    • Phil Elwell's avatar
      lan78xx: Correctly indicate invalid OTP · 4bfc3380
      Phil Elwell authored
      lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
      content, but the value gets overwritten before it is returned and the
      read goes ahead anyway. Make the read conditional as it should be
      and preserve the error code.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bfc3380
    • Ka-Cheong Poon's avatar
      rds: MP-RDS may use an invalid c_path · a43cced9
      Ka-Cheong Poon authored
      rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
      send a message.  Suppose the RDS connection is not yet up.  In
      rds_send_mprds_hash(), it does
      
      	if (conn->c_npaths == 0)
      		wait_event_interruptible(conn->c_hs_waitq,
      					 (conn->c_npaths != 0));
      
      If it is interrupted before the connection is set up,
      rds_send_mprds_hash() will return a non-zero hash value.  Hence
      rds_sendmsg() will use a non-zero c_path to send the message.  But if
      the RDS connection ends up to be non-MP capable, the message will be
      lost as only the zero c_path can be used.
      Signed-off-by: default avatarKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a43cced9
  2. 10 Apr, 2018 2 commits
    • Sabrina Dubroca's avatar
      ip_gre: clear feature flags when incompatible o_flags are set · 1cc5954f
      Sabrina Dubroca authored
      Commit dd9d598c ("ip_gre: add the support for i/o_flags update via
      netlink") added the ability to change o_flags, but missed that the
      GSO/LLTX features are disabled by default, and only enabled some gre
      features are unused. Thus we also need to disable the GSO/LLTX features
      on the device when the TUNNEL_SEQ or TUNNEL_CSUM flags are set.
      
      These two examples should result in the same features being set:
      
          ip link add gre_none type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 0
      
          ip link set gre_none type gre seq
          ip link add gre_seq type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 1 seq
      
      Fixes: dd9d598c ("ip_gre: add the support for i/o_flags update via netlink")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cc5954f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c18bb396
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) The sockmap code has to free socket memory on close if there is
          corked data, from John Fastabend.
      
       2) Tunnel names coming from userspace need to be length validated. From
          Eric Dumazet.
      
       3) arp_filter() has to take VRFs properly into account, from Miguel
          Fadon Perlines.
      
       4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.
      
       5) Missing idr_remove() in u32_delete_key(), from Cong Wang.
      
       6) More syzbot stuff. Several use of uninitialized value fixes all
          over, from Eric Dumazet.
      
       7) Do not leak kernel memory to userspace in sctp, also from Eric
          Dumazet.
      
       8) Discard frames from unused ports in DSA, from Andrew Lunn.
      
       9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
          Falcon.
      
      10) Do not access dp83640 PHY registers prematurely after reset, from
          Esben Haabendal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        vhost-net: set packet weight of tx polling to 2 * vq size
        net: thunderx: rework mac addresses list to u64 array
        inetpeer: fix uninit-value in inet_getpeer
        dp83640: Ensure against premature access to PHY registers after reset
        devlink: convert occ_get op to separate registration
        ARM: dts: ls1021a: Specify TBIPA register address
        net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
        ibmvnic: Do not reset CRQ for Mobility driver resets
        ibmvnic: Fix failover case for non-redundant configuration
        ibmvnic: Fix reset scheduler error handling
        ibmvnic: Zero used TX descriptor counter on reset
        ibmvnic: Fix DMA mapping mistakes
        tipc: use the right skb in tipc_sk_fill_sock_diag()
        sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
        net: dsa: Discard frames from unused ports
        sctp: do not leak kernel memory to user space
        soreuseport: initialise timewait reuseport field
        ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
        dccp: initialize ireq->ir_mark
        net: fix uninit-value in __hw_addr_add_ex()
        ...
      c18bb396
  3. 09 Apr, 2018 13 commits
    • Linus Torvalds's avatar
      Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · fd3b36d2
      Linus Torvalds authored
      Pull vfs namei updates from Al Viro:
      
       - make lookup_one_len() safe with parent locked only shared(incoming
         afs series wants that)
      
       - fix of getname_kernel() regression from 2015 (-stable fodder, that
         one).
      
      * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        getname_kernel() needs to make sure that ->name != ->iname in long case
        make lookup_one_len() safe to use with directory locked shared
        new helper: __lookup_slow()
        merge common parts of lookup_one_len{,_unlocked} into common helper
      fd3b36d2
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 8ea4a5d8
      Linus Torvalds authored
      Pull orangefs updates from Mike Marshall:
       "Fixes and cleanups:
      
         - Documentation cleanups
      
         - removal of unused code
      
         - make some structs static
      
         - implement Orangefs vm_operations fault callout
      
         - eliminate two single-use functions and put their cleaned up code in
           line.
      
         - replace a vmalloc/memset instance with vzalloc
      
         - fix a race condition bug in wait code"
      
      * tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        Orangefs: documentation updates
        orangefs: document package install and xfstests procedure
        orangefs: remove unused code
        orangefs: make several *_operations structs static
        orangefs: implement vm_ops->fault
        orangefs: open code short single-use functions
        orangefs: replace vmalloc and memset with vzalloc
        orangefs: bug fix for a race condition when getting a slot
      8ea4a5d8
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 190f2ace
      Linus Torvalds authored
      Pull pstore fix from Kees Cook:
       "Fix another compression Kconfig combination missed in testing (Tobias
        Regnery)"
      
      * tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore: fix crypto dependencies without compression
      190f2ace
    • Stephen Smalley's avatar
      selinux: fix missing dput() before selinuxfs unmount · fd40ffc7
      Stephen Smalley authored
      Commit 0619f0f5 ("selinux: wrap selinuxfs state") triggers a BUG
      when SELinux is runtime-disabled (i.e. systemd or equivalent disables
      SELinux before initial policy load via /sys/fs/selinux/disable based on
      /etc/selinux/config SELINUX=disabled).
      
      This does not manifest if SELinux is disabled via kernel command line
      argument or if SELinux is enabled (permissive or enforcing).
      
      Before:
        SELinux:  Disabled at runtime.
        BUG: Dentry 000000006d77e5c7{i=17,n=null}  still in use (1) [unmount of selinuxfs selinuxfs]
      
      After:
        SELinux:  Disabled at runtime.
      
      Fixes: 0619f0f5 ("selinux: wrap selinuxfs state")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd40ffc7
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d8312a3f
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "ARM:
         - VHE optimizations
      
         - EL2 address space randomization
      
         - speculative execution mitigations ("variant 3a", aka execution past
           invalid privilege register access)
      
         - bugfixes and cleanups
      
        PPC:
         - improvements for the radix page fault handler for HV KVM on POWER9
      
        s390:
         - more kvm stat counters
      
         - virtio gpu plumbing
      
         - documentation
      
         - facilities improvements
      
        x86:
         - support for VMware magic I/O port and pseudo-PMCs
      
         - AMD pause loop exiting
      
         - support for AMD core performance extensions
      
         - support for synchronous register access
      
         - expose nVMX capabilities to userspace
      
         - support for Hyper-V signaling via eventfd
      
         - use Enlightened VMCS when running on Hyper-V
      
         - allow userspace to disable MWAIT/HLT/PAUSE vmexits
      
         - usual roundup of optimizations and nested virtualization bugfixes
      
        Generic:
         - API selftest infrastructure (though the only tests are for x86 as
           of now)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
        kvm: x86: fix a prototype warning
        kvm: selftests: add sync_regs_test
        kvm: selftests: add API testing infrastructure
        kvm: x86: fix a compile warning
        KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
        KVM: X86: Introduce handle_ud()
        KVM: vmx: unify adjacent #ifdefs
        x86: kvm: hide the unused 'cpu' variable
        KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
        Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
        kvm: Add emulation for movups/movupd
        KVM: VMX: raise internal error for exception during invalid protected mode state
        KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
        KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
        KVM: x86: Fix misleading comments on handling pending exceptions
        KVM: x86: Rename interrupt.pending to interrupt.injected
        KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
        x86/kvm: use Enlightened VMCS when running on Hyper-V
        x86/hyper-v: detect nested features
        x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
        ...
      d8312a3f
    • Linus Torvalds's avatar
      Fix subtle macro variable shadowing in min_not_zero() · e9092d0d
      Linus Torvalds authored
      Commit 3c8ba0d6 ("kernel.h: Retain constant expression output for
      max()/min()") rewrote our min/max macros to be very clever, but in the
      meantime resurrected a variable name shadow issue that we had had
      previously fixed in commit 589a9785 ("min/max: remove sparse
      warnings when they're nested").
      
      That commit talks about the sparse warnings that this shadowing causes,
      which we ignored as just a minor annoyance.  But it turns out that the
      sparse warning is the least of our problems.  We actually have a real
      bug due to the shadowing through the interaction with "min_not_zero()",
      which ends up doing
      
         min(__x, __y)
      
      internally, and then the new declaration of "__x" and "__y" as new
      variables in __cmp_once() results in a complete mess of an expression,
      and "min_not_zero()" doesn't work at all.
      
      For some odd reason, this only ever caused (reported) problems on s390,
      even though it is a generic issue and most of the (obviously successful)
      testing of the problematic commit had happened on other architectures.
      
      Quoting Sebastian Ott:
       "What happened is that the bio build by the partition detection code
        was attempted to be split by the block layer because the block queue
        had a max_sector setting of 0. blk_queue_max_hw_sectors uses
        min_not_zero."
      
      So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
      macros do not have these kinds of clashes.
      
      [ That said, __UNIQUE_ID() itself has several issues that make it less
        than wonderful.
      
        In particular, the "uniqueness" has a fallback on the line number,
        which means that it's not actually unique in more complex cases if you
        don't build with gcc or clang (which have working unique counters that
        aren't tied to line numbers).
      
        That historical broken fallback also means that we have that pointless
        "prefix" argument that doesn't actually make much sense _except_ for
        the known-broken case. Oh well. ]
      
      Fixes: 3c8ba0d6 ("kernel.h: Retain constant expression output for max()/min()")
      Reported-and-tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9092d0d
    • Linus Torvalds's avatar
      Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm · 7886e8aa
      Linus Torvalds authored
      Pull ARM SA1100 updates from Russell King:
       "We have support for arbitary MMIO registers providing platform GPIOs,
        which allows us to abstract some of the SA11x0 CF support.
      
        This set of updates makes that change"
      
      * 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: sa1100/simpad: switch simpad CF to use gpiod APIs
        ARM: sa1100/shannon: convert to generic CF sockets
        ARM: sa1100/nanoengine: convert to generic CF sockets
        ARM: sa1100/h3xxx: switch h3xxx PCMCIA to use gpiod APIs
        ARM: sa1100/cerf: convert to generic CF sockets
        ARM: sa1100/assabet: convert to generic CF sockets
        ARM: sa1100: provide infrastructure to support generic CF sockets
        pcmcia: sa1100: provide generic CF support
      7886e8aa
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 4a1e0052
      Linus Torvalds authored
      Pull ARM updates from Russell King:
       "A number of core ARM changes:
      
         - Refactoring linker script by Nicolas Pitre
      
         - Enable source fortification
      
         - Add support for Cortex R8"
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: decompressor: fix warning introduced in fortify patch
        ARM: 8751/1: Add support for Cortex-R8 processor
        ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE
        ARM: simplify and fix linker script for TCM
        ARM: linker script: factor out TCM bits
        ARM: linker script: factor out vectors and stubs
        ARM: linker script: factor out unwinding table sections
        ARM: linker script: factor out stuff for the .text section
        ARM: linker script: factor out stuff for the DISCARD section
        ARM: linker script: factor out some common definitions between XIP and non-XIP
      4a1e0052
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 2025fef0
      Linus Torvalds authored
      Pull m68knommu update from Greg Ungerer:
       "Only a single fix to set the DMA masks in the ColdFire FEC platform
        data structure.
      
        This stops the warning from dma-mapping.h at boot time"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: set dma and coherent masks for platform FEC ethernets
      2025fef0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · 5148408a
      Linus Torvalds authored
      Pull alpha updates from Matt Turner:
       "A few small changes for alpha"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
        alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering
        alpha: Implement CPU vulnerabilities sysfs functions.
        alpha: rtc: stop validating rtc_time in .read_time
        alpha: rtc: remove unused set_mmss ops
      5148408a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · becdce1c
      Linus Torvalds authored
      Pull s390 updates from Martin Schwidefsky:
      
       - Improvements for the spectre defense:
          * The spectre related code is consolidated to a single file
            nospec-branch.c
          * Automatic enable/disable for the spectre v2 defenses (expoline vs.
            nobp)
          * Syslog messages for specve v2 are added
          * Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
            functions for spectre v1 and v2
      
       - Add helper macros for assembler alternatives and use them to shorten
         the code in entry.S.
      
       - Add support for persistent configuration data via the SCLP Store Data
         interface. The H/W interface requires a page table that uses 4K pages
         only, the code to setup such an address space is added as well.
      
       - Enable virtio GPU emulation in QEMU. To do this the depends
         statements for a few common Kconfig options are modified.
      
       - Add support for format-3 channel path descriptors and add a binary
         sysfs interface to export the associated utility strings.
      
       - Add a sysfs attribute to control the IFCC handling in case of
         constant channel errors.
      
       - The vfio-ccw changes from Cornelia.
      
       - Bug fixes and cleanups.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
        s390/kvm: improve stack frame constants in entry.S
        s390/lpp: use assembler alternatives for the LPP instruction
        s390/entry.S: use assembler alternatives
        s390: add assembler macros for CPU alternatives
        s390: add sysfs attributes for spectre
        s390: report spectre mitigation via syslog
        s390: add automatic detection of the spectre defense
        s390: move nobp parameter functions to nospec-branch.c
        s390/cio: add util_string sysfs attribute
        s390/chsc: query utility strings via fmt3 channel path descriptor
        s390/cio: rename struct channel_path_desc
        s390/cio: fix unbind of io_subchannel_driver
        s390/qdio: split up CCQ handling for EQBS / SQBS
        s390/qdio: don't retry EQBS after CCQ 96
        s390/qdio: restrict buffer merging to eligible devices
        s390/qdio: don't merge ERROR output buffers
        s390/qdio: simplify math in get_*_buffer_frontier()
        s390/decompressor: trim uncompressed image head during the build
        s390/crypto: Fix kernel crash on aes_s390 module remove.
        s390/defkeymap: fix global init to zero
        ...
      becdce1c
    • haibinzhang(张海斌)'s avatar
      vhost-net: set packet weight of tx polling to 2 * vq size · a2ac9990
      haibinzhang(张海斌) authored
      handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
      polling udp packets with small length(e.g. 1byte udp payload), because setting
      VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
      
      Ping-Latencies shown below were tested between two Virtual Machines using
      netperf (UDP_STREAM, len=1), and then another machine pinged the client:
      
      vq size=256
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           3.319   18.489    57.303
      64               1.643    2.021     2.552
      128              1.825    2.600     3.224
      256              1.997    2.710     4.295
      512              1.860    3.171     4.631
      1024             2.002    4.173     9.056
      2048             2.257    5.650     9.688
      4096             2.093    8.508    15.943
      
      vq size=512
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           6.537   29.177    66.245
      64               2.798    3.614     4.403
      128              2.861    3.820     4.775
      256              3.008    4.018     4.807
      512              3.254    4.523     5.824
      1024             3.079    5.335     7.747
      2048             3.944    8.201    12.762
      4096             4.158   11.057    19.985
      
      Seems pretty consistent, a small dip at 2 VQ sizes.
      Ring size is a hint from device about a burst size it can tolerate. Based on
      benchmarks, set the weight to 2 * vq size.
      
      To evaluate this change, another tests were done using netperf(RR, TX) between
      two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
      tweaked through qemu. Results shown below does not show obvious changes.
      
      vq size=256 TCP_RR                vq size=512 TCP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -7%/        -2%      1/       1/   0%/        -2%
         1/       4/  +1%/         0%      1/       4/  +1%/         0%
         1/       8/  +1%/        -2%      1/       8/   0%/        +1%
        64/       1/  -6%/         0%     64/       1/  +7%/        +3%
        64/       4/   0%/        +2%     64/       4/  -1%/        +1%
        64/       8/   0%/         0%     64/       8/  -1%/        -2%
       256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
       256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
       256/       8/  +2%/         0%    256/       8/  +1%/        -1%
      
      vq size=256 UDP_RR                vq size=512 UDP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
         1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
         1/       8/  -1%/        -1%      1/       8/  -1%/         0%
        64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
        64/       4/  -5%/        -1%     64/       4/  +2%/         0%
        64/       8/   0%/        -1%     64/       8/  -2%/        +1%
       256/       1/  +7%/        +1%    256/       1/  -7%/         0%
       256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
       256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
      
      vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
        64/       1/   0%/        -3%     64/       1/   0%/         0%
        64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
        64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
       256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
       256/       4/  -1%/        -1%    256/       4/  -3%/         0%
       256/       8/  +7%/        +5%    256/       8/  -3%/         0%
       512/       1/  +1%/         0%    512/       1/  -1%/        -1%
       512/       4/  +1%/        -1%    512/       4/   0%/         0%
       512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
      1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
      1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
      1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
      2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
      2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
      2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
      4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
      4096/       4/  +2%/         0%   4096/       4/   0%/         0%
      4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarHaibin Zhang <haibinzhang@tencent.com>
      Signed-off-by: default avatarYunfang Tai <yunfangtai@tencent.com>
      Signed-off-by: default avatarLidong Chen <lidongchen@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2ac9990
    • Vadim Lomovtsev's avatar
      net: thunderx: rework mac addresses list to u64 array · 9b5c4dfb
      Vadim Lomovtsev authored
      It is too expensive to pass u64 values via linked list, instead
      allocate array for them by overall number of mac addresses from netdev.
      
      This eventually removes multiple kmalloc() calls, aviod memory
      fragmentation and allow to put single null check on kmalloc
      return value in order to prevent a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1467429 ("Dereference null return value")
      Fixes: 37c3347e ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b5c4dfb