1. 30 Nov, 2016 23 commits
  2. 28 Nov, 2016 17 commits
    • Michael Holzheu's avatar
      bpf/samples: Fix PT_REGS_IP on s390x and use it · 2dbb4c05
      Michael Holzheu authored
      The files "sampleip_kern.c" and "trace_event_kern.c" directly access
      "ctx->regs.ip" which is not available on s390x. Fix this and use the
      PT_REGS_IP() macro instead.
      
      Also fix the macro for s390x and use "psw.addr" from "pt_regs".
      Reported-by: default avatarZvonko Kosic <zvonko.kosic@de.ibm.com>
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dbb4c05
    • Nikita Yushchenko's avatar
      net: dsa: fix unbalanced dsa_switch_tree reference counting · 7a99cd6e
      Nikita Yushchenko authored
      _dsa_register_switch() gets a dsa_switch_tree object either via
      dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
      in returned object (resulting into caller not owning a reference),
      while later path does create a new object (resulting into caller owning
      a reference).
      
      The rest of _dsa_register_switch() assumes that it owns a reference, and
      calls dsa_put_dst().
      
      This causes a memory breakage if first switch in the tree initialized
      successfully, but second failed to initialize. In particular, freed
      dsa_swith_tree object is left referenced by switch that was initialized,
      and later access to sysfs attributes of that switch cause OOPS.
      
      To fix, need to add kref_get() call to dsa_get_dst().
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a99cd6e
    • David Ahern's avatar
      net: handle no dst on skb in icmp6_send · 79dc7e3f
      David Ahern authored
      Andrey reported the following while fuzzing the kernel with syzkaller:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800666d4200 task.stack: ffff880067348000
      RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
      icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
      RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
      RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
      RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
      R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
      R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
      FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
      Stack:
       ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
       ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
       ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
      Call Trace:
       [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
       [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
       [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
       [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
       [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
       ...
      
      icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
      cases the dst->dev should be preferred for determining the L3 domain
      if the dst has been set on the skb. Fallback to the skb->dev if it has
      not. This covers the case reported here where icmp6_send is invoked on
      Rx before the route lookup.
      
      Fixes: 5d41ce29 ("net: icmp6_send should use dst dev to determine L3 domain")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79dc7e3f
    • David S. Miller's avatar
      Merge branch 'mlx4-fixes' · 2fc8d112
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx4 bug fixes for 4.9
      
      This patchset includes 2 bug fixes:
      * In patch 1 we revert the commit that avoids invoking unregister_netdev
      in shutdown flow, as it introduces netdev presence issues where
      it can be accessed unsafely by ndo operations during the flow.
      * Patch 2 is a simple fix for a variable uninitialization issue.
      
      Series generated against net commit:
      6998cc6e tipc: resolve connection flow control compatibility problem
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc8d112
    • Jack Morgenstein's avatar
      net/mlx4: Fix uninitialized fields in rule when adding promiscuous mode to... · 44b911e7
      Jack Morgenstein authored
      net/mlx4: Fix uninitialized fields in rule when adding promiscuous mode to device managed flow steering
      
      In procedure mlx4_flow_steer_promisc_add(), several fields
      were left uninitialized in the rule structure.
      Correctly initialize these fields.
      
      Fixes: 592e49dd ("net/mlx4: Implement promiscuous mode with device managed flow-steering")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44b911e7
    • Tariq Toukan's avatar
      Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow" · b4353708
      Tariq Toukan authored
      This reverts commit 9d769311.
      
      Using unregister_netdev at shutdown flow prevents calling
      the netdev's ndos or trying to access its freed resources.
      
      This fixes crashes like the following:
       Call Trace:
        [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
        [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
        [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
        [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
        [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
        [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
        [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
        [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
        [<ffffffff815a6205>] linkwatch_event+0x25/0x30
        [<ffffffff81099a82>] process_one_work+0x152/0x400
        [<ffffffff8109a325>] worker_thread+0x125/0x4b0
        [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
        [<ffffffff8109fc6a>] kthread+0xca/0xe0
        [<ffffffff8109fba0>] ? kthread_park+0x60/0x60
        [<ffffffff816a1285>] ret_from_fork+0x25/0x30
      
      Fixes: 9d769311 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reported-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Reported-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4353708
    • Roi Dayan's avatar
      net/sched: Export tc_tunnel_key so its UAPI accessible · faa1fa54
      Roi Dayan authored
      Export tc_tunnel_key so it can be used from user space.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faa1fa54
    • Borislav Petkov's avatar
      amd-xgbe: Fix unused suspend handlers build warning · 91eefaab
      Borislav Petkov authored
      Fix:
      
        drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ defined
          but not used [-Wunused-function]
        drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ defined
          but not used [-Wunused-function]
      
      I see it during randconfig builds here.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91eefaab
    • Julian Wollrath's avatar
    • David S. Miller's avatar
      Merge branch 'fix-RTL8211F-TX-delay-handling' · 68c1644f
      David S. Miller authored
      Martin Blumenstingl says:
      
      ====================
      net: phy: realtek: fix RTL8211F TX-delay handling
      
      The RTL8211F PHY driver currently enables the TX-delay only when the
      phy-mode is PHY_INTERFACE_MODE_RGMII. This is incorrect, because there
      are three RGMII variations of the phy-mode which explicitly request the
      PHY to enable the RX and/or TX delay, while PHY_INTERFACE_MODE_RGMII
      specifies that the PHY should disable the RX and/or TX delays.
      
      Additionally to the RTL8211F PHY driver change this contains a small
      update to the phy-mode documentation to clarify the purpose of the
      RGMII phy-modes.
      While this may not be perfect yet it's at least a start. Please feel
      free to drop this patch from this series and send an improved version
      yourself.
      
      These patches are the results of recent discussions, see [0]
      
      [0] http://lists.infradead.org/pipermail/linux-amlogic/2016-November/001688.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68c1644f
    • Martin Blumenstingl's avatar
      net: phy: realtek: fix enabling of the TX-delay for RTL8211F · e3230494
      Martin Blumenstingl authored
      The old logic always enabled the TX-delay when the phy-mode was set to
      PHY_INTERFACE_MODE_RGMII. There are dedicated phy-modes which tell the
      PHY driver to enable the RX and/or TX delays:
      - PHY_INTERFACE_MODE_RGMII should disable the RX and TX delay in the
        PHY (if required, the MAC should add the delays in this case)
      - PHY_INTERFACE_MODE_RGMII_ID should enable RX and TX delay in the PHY
      - PHY_INTERFACE_MODE_RGMII_TXID should enable the TX delay in the PHY
      - PHY_INTERFACE_MODE_RGMII_RXID should enable the RX delay in the PHY
        (currently not supported by RTL8211F)
      
      With this patch we enable the TX delay for PHY_INTERFACE_MODE_RGMII_ID
      and PHY_INTERFACE_MODE_RGMII_TXID.
      Additionally we now explicity disable the TX-delay, which seems to be
      enabled automatically after a hard-reset of the PHY (by triggering it's
      reset pin) to get a consistent state (as defined by the phy-mode).
      
      This fixes a compatibility problem with some SoCs where the TX-delay was
      also added by the MAC. With the TX-delay being applied twice the TX
      clock was off and TX traffic was broken or very slow (<10Mbit/s) on
      1000Mbit/s links.
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3230494
    • Martin Blumenstingl's avatar
      Documentation: devicetree: clarify usage of the RGMII phy-modes · e5f3a4a5
      Martin Blumenstingl authored
      RGMII requires special RX and/or TX delays depending on the actual
      hardware circuit/wiring. These delays can be added by the MAC, the PHY
      or the designer of the circuit (the latter means that no delay has to
      be added by PHY or MAC).
      There are 4 RGMII phy-modes used describe where a delay should be
      applied:
      - rgmii: the RX and TX delays are either added by the MAC (where the
        exact delay is typically configurable, and can be turned off when no
        extra delay is needed) or not needed at all (because the hardware
        wiring adds the delay already). The PHY should neither add the RX nor
        TX delay in this case.
      - rgmii-rxid: configures the PHY to enable the RX delay. The MAC should
        not add the RX delay in this case.
      - rgmii-txid: configures the PHY to enable the TX delay. The MAC should
        not add the TX delay in this case.
      - rgmii-id: combines rgmii-rxid and rgmii-txid and thus configures the
        PHY to enable the RX and TX delays. The MAC should neither add the RX
        nor TX delay in this case.
      
      Document these cases in the ethernet.txt documentation to make it clear
      when to use each mode.
      If applied incorrectly one might end up with MAC and PHY both enabling
      for example the TX delay, which breaks ethernet TX traffic on 1000Mbit/s
      links.
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5f3a4a5
    • Daniel Borkmann's avatar
      net, sched: respect rcu grace period on cls destruction · d9363774
      Daniel Borkmann authored
      Roi reported a crash in flower where tp->root was NULL in ->classify()
      callbacks. Reason is that in ->destroy() tp->root is set to NULL via
      RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
      this doesn't respect RCU grace period for them, and as a result, still
      outstanding readers from tc_classify() will try to blindly dereference
      a NULL tp->root.
      
      The tp->root object is strictly private to the classifier implementation
      and holds internal data the core such as tc_ctl_tfilter() doesn't know
      about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
      is only checked for NULL in ->get() callback, but nowhere else. This is
      misleading and seemed to be copied from old classifier code that was not
      cleaned up properly. For example, d3fa76ee ("[NET_SCHED]: cls_basic:
      fix NULL pointer dereference") moved tp->root initialization into ->init()
      routine, where before it was part of ->change(), so ->get() had to deal
      with tp->root being NULL back then, so that was indeed a valid case, after
      d3fa76ee, not really anymore. We used to set tp->root to NULL long
      ago in ->destroy(), see 47a1a1d4 ("pkt_sched: remove unnecessary xchg()
      in packet classifiers"); but the NULLifying was reintroduced with the
      RCUification, but it's not correct for every classifier implementation.
      
      In the cases that are fixed here with one exception of cls_cgroup, tp->root
      object is allocated and initialized inside ->init() callback, which is always
      performed at a point in time after we allocate a new tp, which means tp and
      thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
      Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
      handler, same for the tp which is kfree_rcu()'ed right when we return
      from ->destroy() in tcf_destroy(). This means, the head object's lifetime
      for such classifiers is always tied to the tp lifetime. The RCU callback
      invocation for the two kfree_rcu() could be out of order, but that's fine
      since both are independent.
      
      Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
      means that 1) we don't need a useless NULL check in fast-path and, 2) that
      outstanding readers of that tp in tc_classify() can still execute under
      respect with RCU grace period as it is actually expected.
      
      Things that haven't been touched here: cls_fw and cls_route. They each
      handle tp->root being NULL in ->classify() path for historic reasons, so
      their ->destroy() implementation can stay as is. If someone actually
      cares, they could get cleaned up at some point to avoid the test in fast
      path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
      !head should anyone actually be using/testing it, so it at least aligns with
      cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
      destruction (to a sleepable context) after RCU grace period as concurrent
      readers might still access it. (Note that in this case we need to hold module
      reference to keep work callback address intact, since we only wait on module
      unload for all call_rcu()s to finish.)
      
      This fixes one race to bring RCU grace period guarantees back. Next step
      as worked on by Cong however is to fix 1e052be6 ("net_sched: destroy
      proto tp when all filters are gone") to get the order of unlinking the tp
      in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
      RCU_INIT_POINTER() before tcf_destroy() and let the notification for
      removal be done through the prior ->delete() callback. Both are independant
      issues. Once we have that right, we can then clean tp->root up for a number
      of classifiers by not making them RCU pointers, which requires a new callback
      (->uninit) that is triggered from tp's RCU callback, where we just kfree()
      tp->root from there.
      
      Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Reported-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9363774
    • Jon Paul Maloy's avatar
      tipc: fix link statistics counter errors · 95901122
      Jon Paul Maloy authored
      In commit e4bf4f76 ("tipc: simplify packet sequence number
      handling") we changed the internal representation of the packet
      sequence number counters from u32 to u16, reflecting what is really
      sent over the wire.
      
      Since then some link statistics counters have been displaying incorrect
      values, partially because the counters meant to be used as sequence
      number snapshots are now used as direct counters, stored as u32, and
      partially because some counter updates are just missing in the code.
      
      In this commit we correct this in two ways. First, we base the
      displayed packet sent/received values on direct counters instead
      of as previously a calculated difference between current sequence
      number and a snapshot. Second, we add the missing updates of the
      counters.
      
      This change is compatible with the current netlink API, and requires
      no changes to the user space tools.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95901122
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 8eb4adf6
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2016-11-25
      
      1) Fix a refcount leak in vti6.
         From Nicolas Dichtel.
      
      2) Fix a wrong if statement in xfrm_sk_policy_lookup.
         From Florian Westphal.
      
      3) The flowcache watermarks are per cpu. Take this into
         account when comparing to the threshold where we
         refusing new allocations. From Miroslav Urbanek.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8eb4adf6
    • Gao Feng's avatar
      driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails · e824265d
      Gao Feng authored
      The macvtap_newlink registers the netdev rx_handler firstly, but it
      does not unregister the handler if macvlan_common_newlink failed.
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e824265d
    • David S. Miller's avatar
      Merge branch 'more-phydev-leaks' · a1cad5ee
      David S. Miller authored
      Johan Hovold says:
      
      ====================
      net: fix phydev reference leaks
      
      This series fixes a number of phydev reference leaks (and one of_node
      leak) due to failure to put the reference taken by of_phy_find_device().
      
      Note that I did not try to fix drivers/net/phy/xilinx_gmii2rgmii.c which
      still leaks a reference.
      
      Against net but should apply just as fine to net-next.
      
      v2:
       - use put_device() instead of phy_dev_free() to put the references
         taken in net/dsa (patch 1/4).
       - add four new patches fixing similar leaks
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1cad5ee