1. 29 Oct, 2017 12 commits
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in basic filter · c96a4838
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c96a4838
    • Cong Wang's avatar
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang authored
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aa0045d
    • David S. Miller's avatar
      Merge branch 'sctp-endianness-fixes' · 8c83c885
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for some sparse warnings
      
      As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
      warnings or errors checked by sparse appear. They are all problems
      about Endian and type cast.
      
      Most of them are just warnings by which no issues could be caused
      while some might be bugs.
      
      This patchset fixes them with four patches basically according to
      how they are introduced.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c83c885
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced since very beginning · 978aa047
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      They are there since very beginning.
      
      Note after this patch, there still one warning left in
      sctp_outq_flush():
        sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)
      
      Since it has been moved to sctp_stream_outq_migrate on net-next,
      to avoid the extra job when merging net-next to net, I will post
      the fix for it after the merging is done.
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978aa047
    • Xin Long's avatar
      sctp: fix a type cast warnings that causes a_rwnd gets the wrong value · f6fc6bc0
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      Commit d4d6fb57 ("sctp: Try not to change a_rwnd when faking a
      SACK from SHUTDOWN.") expected to use the peers old rwnd and add
      our flight size to the a_rwnd. But with the wrong Endian, it may
      not work as well as expected.
      
      So fix it by converting to the right value.
      
      Fixes: d4d6fb57 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6fc6bc0
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced by transport rhashtable · 8d32503e
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian for the port when
      coding transport rhashtable patches.
      
      Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d32503e
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced by stream reconf · 1da4fc97
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian when coding stream
      reconf patches.
      
      Since commit c0d8bab6 ("sctp: add get and set sockopt for
      reconf_enable") enabled stream reconf feature for users, the
      Fixes tag below would use it.
      
      Fixes: c0d8bab6 ("sctp: add get and set sockopt for reconf_enable")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1da4fc97
    • Cong Wang's avatar
      net_sched: avoid matching qdisc with zero handle · 50317fce
      Cong Wang authored
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50317fce
    • Xin Long's avatar
      sctp: reset owner sk for data chunks on out queues when migrating a sock · d04adf1b
      Xin Long authored
      Now when migrating sock to another one in sctp_sock_migrate(), it only
      resets owner sk for the data in receive queues, not the chunks on out
      queues.
      
      It would cause that data chunks length on the sock is not consistent
      with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
      the old sk would never be freed, and the new sock may crash due to
      the overflow sk_wmem_alloc.
      
      syzbot found this issue with this series:
      
        r0 = socket$inet_sctp()
        sendto$inet(r0)
        listen(r0)
        accept4(r0)
        close(r0)
      
      Although listen() should have returned error when one TCP-style socket
      is in connecting (I may fix this one in another patch), it could also
      be reproduced by peeling off an assoc.
      
      This issue is there since very beginning.
      
      This patch is to reset owner sk for the chunks on out queues so that
      sk sk_wmem_alloc has correct value after accept one sock or peeloff
      an assoc to one sock.
      
      Note that when resetting owner sk for chunks on outqueue, it has to
      sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
      first and then sctp_set_owner_w them after changing assoc->base.sk,
      due to that sctp_wfree and it's callees are using assoc->base.sk.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d04adf1b
    • David S. Miller's avatar
      Merge branch 'sockmap-fixes' · 151516fa
      David S. Miller authored
      John Fastabend says:
      
      ====================
      net: sockmap fixes
      
      Last two fixes (as far as I know) for sockmap code this round.
      
      First, we are using the qdisc cb structure when making the data end
      calculation. This is really just wrong so, store it with the other
      metadata in the correct tcp_skb_cb sturct to avoid breaking things.
      
      Next, with recent work to attach multiple programs to a cgroup a
      specific enumeration of return codes was agreed upon. However,
      I wrote the sk_skb program types before seeing this work and used
      a different convention. Patch 2 in the series aligns the return
      codes to avoid breaking with this infrastructure and also aligns
      with other programming conventions to avoid being the odd duck out
      forcing programs to remember SK_SKB programs are different. Pusing
      to net because its a user visible change. With this SK_SKB program
      return codes are the same as other cgroup program types.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      151516fa
    • John Fastabend's avatar
      bpf: rename sk_actions to align with bpf infrastructure · bfa64075
      John Fastabend authored
      Recent additions to support multiple programs in cgroups impose
      a strict requirement, "all yes is yes, any no is no". To enforce
      this the infrastructure requires the 'no' return code, SK_DROP in
      this case, to be 0.
      
      To apply these rules to SK_SKB program types the sk_actions return
      codes need to be adjusted.
      
      This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
      SK_ABORTED to remove any chance that the API may allow aborted
      program flows to be passed up the stack. This would be incorrect
      behavior and allow programs to break existing policies.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfa64075
    • John Fastabend's avatar
      bpf: bpf_compute_data uses incorrect cb structure · 8108a775
      John Fastabend authored
      SK_SKB program types use bpf_compute_data to store the end of the
      packet data. However, bpf_compute_data assumes the cb is stored in the
      qdisc layer format. But, for SK_SKB this is the wrong layer of the
      stack for this type.
      
      It happens to work (sort of!) because in most cases nothing happens
      to be overwritten today. This is very fragile and error prone.
      Fortunately, we have another hole in tcp_skb_cb we can use so lets
      put the data_end value there.
      
      Note, SK_SKB program types do not use data_meta, they are failed by
      sk_skb_is_valid_access().
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8108a775
  2. 28 Oct, 2017 3 commits
    • Girish Moodalbail's avatar
      tap: reference to KVA of an unloaded module causes kernel panic · dea6e19f
      Girish Moodalbail authored
      The commit 9a393b5d ("tap: tap as an independent module") created a
      separate tap module that implements tap functionality and exports
      interfaces that will be used by macvtap and ipvtap modules to create
      create respective tap devices.
      
      However, that patch introduced a regression wherein the modules macvtap
      and ipvtap can be removed (through modprobe -r) while there are
      applications using the respective /dev/tapX devices. These applications
      cause kernel to hold reference to /dev/tapX through 'struct cdev
      macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
      modules respectively. So,  when the application is later closed the
      kernel panics because we are referencing KVA that is present in the
      unloaded modules.
      
      ----------8<------- Example ----------8<----------
      $ sudo ip li add name mv0 link enp7s0 type macvtap
      $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
        14:mv0@enp7s0:
      $ cat /dev/tap14 &
      $ lsmod |egrep -i 'tap|vlan'
      macvtap                16384  0
      macvlan                24576  1 macvtap
      tap                    24576  3 macvtap
      $ sudo modprobe -r macvtap
      $ fg
      cat /dev/tap14
      ^C
      
      <...system panics...>
      BUG: unable to handle kernel paging request at ffffffffa038c500
      IP: cdev_put+0xf/0x30
      ----------8<-----------------8<----------
      
      The fix is to set cdev.owner to the module that creates the tap device
      (either macvtap or ipvtap). With this set, the operations (in
      fs/char_dev.c) on char device holds and releases the module through
      cdev_get() and cdev_put() and will not allow the module to unload
      prematurely.
      
      Fixes: 9a393b5d (tap: tap as an independent module)
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dea6e19f
    • Eric Dumazet's avatar
      tcp: refresh tp timestamp before tcp_mtu_probe() · ee1836ae
      Eric Dumazet authored
      In the unlikely event tcp_mtu_probe() is sending a packet, we
      want tp->tcp_mstamp being as accurate as possible.
      
      This means we need to call tcp_mstamp_refresh() a bit earlier in
      tcp_write_xmit().
      
      Fixes: 385e2070 ("tcp: use tp->tcp_mstamp in output path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee1836ae
    • Jason Wang's avatar
      tuntap: properly align skb->head before building skb · 63b9ab65
      Jason Wang authored
      An unaligned alloc_frag->offset caused by previous allocation will
      result an unaligned skb->head. This will lead unaligned
      skb_shared_info and then unaligned dataref which requires to be
      aligned for accessing on some architecture. Fix this by aligning
      alloc_frag->offset before the frag refilling.
      
      Fixes: 0bbd7dad ("tun: make tun_build_skb() thread safe")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Wei Wei <dotweiba@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarWei Wei <dotweiba@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63b9ab65
  3. 27 Oct, 2017 8 commits
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 8ab190fb
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2017-10-26
      
      This series contains fixes to e1000, igb, ixgbe and i40e.
      
      Vincenzo Maffione fixes a potential race condition which would result in
      the interface being up but transmits are disabled in the hardware.
      
      Colin Ian King fixes a possible NULL pointer dereference in e1000, which
      was found by Coverity.
      
      Jean-Philippe Brucker fixes a possible kernel panic when a driver cannot
      map a transmit buffer, which is caused by an erroneous test.
      
      Alex provides a fix for ixgbe, which is a partial revert of the commit
      ffed21bc ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
      because the previous commit messed up the exception handling path by
      adding the count back in when we did not need to.  Also fixed a typo,
      where the transmit ITR setting was being used to determine if we were
      using adaptive receive interrupt moderation or not.  Lastly, fixed a
      memory leak by including programming descriptors in the cleaned count.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ab190fb
    • Xin Long's avatar
      ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit · 8aec4959
      Xin Long authored
      When receiving a Toobig icmpv6 packet, ip6gre_err would just set
      tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
      still be using the old value, it has no chance to be updated with
      tunnel dev's mtu.
      
      Jianlin found this issue by reducing route's mtu while running
      netperf, the performance went to 0.
      
      ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
      the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
      to upper socket after setting tunnel dev's mtu.
      
      We couldn't do that for ip6_gre, as gre's inner packet could be
      any protocol, it's difficult to handle them (like lookup upper
      dst) in a good way.
      
      So this patch is to fix it by updating skb_dst(skb)'s pmtu when
      dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
      update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
      performance regression can be caused by this.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8aec4959
    • Xin Long's avatar
      ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err · f8d20b46
      Xin Long authored
      The similar fix in patch 'ipip: only increase err_count for some
      certain type icmp in ipip_err' is needed for ip6gre_err.
      
      In Jianlin's case, udp netperf broke even when receiving a TooBig
      icmpv6 packet.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8d20b46
    • Xin Long's avatar
      ipip: only increase err_count for some certain type icmp in ipip_err · f3594f0a
      Xin Long authored
      t->err_count is used to count the link failure on tunnel and an err
      will be reported to user socket in tx path if t->err_count is not 0.
      udp socket could even return EHOSTUNREACH to users.
      
      Since commit fd58156e ("IPIP: Use ip-tunneling code.") removed
      the 'switch check' for icmp type in ipip_err(), err_count would be
      increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
      failure would be reported out due to this.
      
      In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
      udp netperf failed with the err:
        send_data: data send error: No route to host (errno 113)
      
      We expect this error reported from tunnel to socket when receiving
      some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
      or ICMP_PARAMETERPROB ones.
      
      This patch is to bring 'switch check' for icmp type back to ipip_err
      so that it only reports link failure for the right type icmp, just as
      in ipgre_err() and ipip6_err().
      
      Fixes: fd58156e ("IPIP: Use ip-tunneling code.")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3594f0a
    • Jose Abreu's avatar
      net: stmmac: First Queue must always be in DCB mode · 6d9f0790
      Jose Abreu authored
      According to DWMAC databook the first queue operating mode
      must always be in DCB.
      
      As MTL_QUEUE_DCB = 1, we need to always set the first queue
      operating mode to DCB otherwise driver will think that queue
      is in AVB mode (because MTL_QUEUE_AVB = 0).
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d9f0790
    • Jose Abreu's avatar
      net: stmmac: dwc-qos-eth: Fix typo in DT bindings parsing · 4894ac6b
      Jose Abreu authored
      According to DT bindings documentation we are expecting a
      property called "snps,read-requests" but we are parsing
      instead a property called "read,read-requests".
      
      This is clearly a typo. Fix it.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4894ac6b
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2017-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 5be9541a
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2017-10-26
      
      The series includes some misc fixes for mlx5 core and etherent driver.
      Please pull and let me know if there's any problem.
      
      For -Stable:
      net/mlx5e: Properly deal with encap flows add/del under neigh update (kernels >= 4.12)
      net/mlx5: Fix health work queue spin lock to IRQ safe  (kernels >= 4.13)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5be9541a
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2017-10-25' of... · 9618aec3
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2017-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      pull-request: mac80211 2017-10-25
      
      Here are:
       * follow-up fixes for the WoWLAN security issue, to fix a
         partial TKIP key material problem and to use crypto_memneq()
       * a change for better enforcement of FQ's memory limit
       * a disconnect/connect handling fix, and
       * a user rate mask validation fix
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9618aec3
  4. 26 Oct, 2017 17 commits