1. 01 Jul, 2016 24 commits
    • David S. Miller's avatar
      Merge branch 'bpf-robustify' · 6bd3847b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Further robustify putting BPF progs
      
      This series addresses a potential issue reported to us by Jann Horn
      with regards to putting progs. First patch moves progs generally under
      RCU destruction and second patch refactors getting of progs to simplify
      code a bit. For details, please see individual patches. Note, we think
      that addressing this one in net-next should be sufficient.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bd3847b
    • Daniel Borkmann's avatar
      bpf: refactor bpf_prog_get and type check into helper · 113214be
      Daniel Borkmann authored
      Since bpf_prog_get() and program type check is used in a couple of places,
      refactor this into a small helper function that we can make use of. Since
      the non RO prog->aux part is not used in performance critical paths and a
      program destruction via RCU is rather very unlikley when doing the put, we
      shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
      check, but actually not taking the ref at all (due to being in fdget() /
      fdput() section of the bpf fd) is even cleaner and makes the diff smaller
      as well, so just go for that. Callsites are changed to make use of the new
      helper where possible.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      113214be
    • Daniel Borkmann's avatar
      bpf: generally move prog destruction to RCU deferral · 1aacde3d
      Daniel Borkmann authored
      Jann Horn reported following analysis that could potentially result
      in a very hard to trigger (if not impossible) UAF race, to quote his
      event timeline:
      
       - Set up a process with threads T1, T2 and T3
       - Let T1 set up a socket filter F1 that invokes another filter F2
         through a BPF map [tail call]
       - Let T1 trigger the socket filter via a unix domain socket write,
         don't wait for completion
       - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
       - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
       - Let T3 close the file descriptor for F2, dropping the reference
         count of F2 to 2
       - At this point, T1 should have looked up F2 from the map, but not
         finished executing it
       - Let T3 remove F2 from the BPF map, dropping the reference count of
         F2 to 1
       - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
         the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
         via schedule_work()
       - At this point, the BPF program could be freed
       - BPF execution is still running in a freed BPF program
      
      While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
      event fd we're doing the syscall on doesn't disappear from underneath us
      for whole syscall time, it may not be the case for the bpf fd used as
      an argument only after we did the put. It needs to be a valid fd pointing
      to a BPF program at the time of the call to make the bpf_prog_get() and
      while T2 gets preempted, F2 must have dropped reference to 1 on the other
      CPU. The fput() from the close() in T3 should also add additionally delay
      to the reference drop via exit_task_work() when bpf_prog_release() gets
      called as well as scheduling bpf_prog_free_deferred().
      
      That said, it makes nevertheless sense to move the BPF prog destruction
      generally after RCU grace period to guarantee that such scenario above,
      but also others as recently fixed in ceb56070 ("bpf, perf: delay release
      of BPF prog after grace period") with regards to tail calls won't happen.
      Integrating bpf_prog_free_deferred() directly into the RCU callback is
      not allowed since the invocation might happen from either softirq or
      process context, so we're not permitted to block. Reviewing all bpf_prog_put()
      invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
      their destruction) with call_rcu() look good to me.
      
      Since we don't know whether at the time of attaching the program, we're
      already part of a tail call map, we need to use RCU variant. However, due
      to this, there won't be severely more stress on the RCU callback queue:
      situations with above bpf_prog_get() and bpf_prog_put() combo in practice
      normally won't lead to releases, but even if they would, enough effort/
      cycles have to be put into loading a BPF program into the kernel already.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aacde3d
    • Amitoj Kaur Chawla's avatar
      atm: horizon: Use setup_timer · 466fc793
      Amitoj Kaur Chawla authored
      Convert a call to init_timer and accompanying intializations of
      the timer's data and function fields to a call to setup_timer.
      
      The Coccinelle semantic patch that fixes this problem is
      as follows:
      @@
      expression t,d,f,e1;
      identifier x1;
      statement S1;
      @@
      
      (
      -t.data = d;
      |
      -t.function = f;
      |
      -init_timer(&t);
      +setup_timer(&t,f,d);
      |
      -init_timer_on_stack(&t);
      +setup_timer_on_stack(&t,f,d);
      )
      <... when != S1
      t.x1 = e1;
      ...>
      Signed-off-by: default avatarAmitoj Kaur Chawla <amitoj1606@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      466fc793
    • David S. Miller's avatar
      Merge branch 'qed-next' · e3cc6e37
      David S. Miller authored
      Manish Chopra says:
      
      ====================
      qede: Enhancements
      
      This patch series have few small fastpath features
      support and code refactoring.
      
      Note - regarding get/set tunable configuration via ethtool
      Surprisingly, there is NO ethtool application support for
      such configuration given that we have kernel support.
      Do let us know if we need to add support for that in user ethtool.
      
      Please consider applying this series to "net-next".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3cc6e37
    • Manish Chopra's avatar
    • Manish Chopra's avatar
    • Manish Chopra's avatar
      qede: Utilize xmit_more · 312e0676
      Manish Chopra authored
      This patch uses xmit_more optimization to reduce
      number of TX doorbells write per packet.
      Signed-off-by: default avatarManish <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      312e0676
    • Manish Chopra's avatar
      qede: qede_poll refactoring · c774169d
      Manish Chopra authored
      This patch cleanups qede_poll() routine a bit
      and allows qede_poll() to do single iteration to handle
      TX completion [As under heavy TX load qede_poll() might
      run for indefinite time in the while(1) loop for TX
      completion processing and cause CPU stuck].
      Signed-off-by: default avatarManish <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c774169d
    • Manish Chopra's avatar
      qede: Add support for handling IP fragmented packets. · c72a6125
      Manish Chopra authored
      When handling IP fragmented packets with csum in their
      transport header, the csum isn't changed as part of the
      fragmentation. As a result, the packet containing the
      transport headers would have the correct csum of the original
      packet, but one that mismatches the actual packet that
      passes on the wire. As a result, on receive path HW would
      give an indication that the packet has incorrect csum,
      which would cause qede to discard the incoming packet.
      
      Since HW also delivers a notification of IP fragments,
      change driver behavior to pass such incoming packets
      to stack and let it make the decision whether it needs
      to be dropped.
      Signed-off-by: default avatarManish <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c72a6125
    • David S. Miller's avatar
      Merge branch 'tun-skb_array' · beb528d0
      David S. Miller authored
      Jason Wang says:
      
      ====================
      switch to use tx skb array in tun
      
      This series tries to switch to use skb array in tun. This is used to
      eliminate the spinlock contention between producer and consumer. The
      conversion was straightforward: just introdce a tx skb array and use
      it instead of sk_receive_queue.
      
      A minor issue is to keep the tx_queue_len behaviour, since tun used to
      use it for the length of sk_receive_queue. This is done through:
      
      - add the ability to resize multiple rings at once to avoid handling
        partial resize failure for mutiple rings.
      - add the support for zero length ring.
      - introduce a notifier which was triggered when tx_queue_len was
        changed for a netdev.
      - resize all queues during the tx_queue_len changing.
      
      Tests shows about 15% improvement on guest rx pps:
      
      Before: ~1300000pps
      After : ~1500000pps
      
      Changes from V3:
      - fix kbuild warnings
      - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN
      
      Changes from V2:
      - add multiple rings resizing support for ptr_ring/skb_array
      - add zero length ring support
      - introdce a NETDEV_CHANGE_TX_QUEUE_LEN
      - drop new flags
      
      Changes from V1:
      - switch to use skb array instead of a customized circular buffer
      - add non-blocking support
      - rename .peek to .peek_len
      - drop lockless peeking since test show very minor improvement
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-from-altitude: 34697 feet.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      beb528d0
    • Jason Wang's avatar
      tun: switch to use skb array for tx · 1576d986
      Jason Wang authored
      We used to queue tx packets in sk_receive_queue, this is less
      efficient since it requires spinlocks to synchronize between producer
      and consumer.
      
      This patch tries to address this by:
      
      - switch from sk_receive_queue to a skb_array, and resize it when
        tx_queue_len was changed.
      - introduce a new proto_ops peek_len which was used for peeking the
        skb length.
      - implement a tun version of peek_len for vhost_net to use and convert
        vhost_net to use peek_len if possible.
      
      Pktgen test shows about 15.3% improvement on guest receiving pps for small
      buffers:
      
      Before: ~1300000pps
      After : ~1500000pps
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1576d986
    • Jason Wang's avatar
      net: introduce NETDEV_CHANGE_TX_QUEUE_LEN · 08294a26
      Jason Wang authored
      This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
      will be triggered when tx_queue_len. It could be used by net device
      who want to do some processing at that time. An example is tun who may
      want to resize tx array when tx_queue_len is changed.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08294a26
    • Jason Wang's avatar
      skb_array: add wrappers for resizing · bf900b3d
      Jason Wang authored
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf900b3d
    • Michael S. Tsirkin's avatar
      ptr_ring: support resizing multiple queues · 59e6ae53
      Michael S. Tsirkin authored
      Sometimes, we need support resizing multiple queues at once. This is
      because it was not easy to recover to recover from a partial failure
      of multiple queues resizing.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59e6ae53
    • Jason Wang's avatar
      skb_array: minor tweak · fd68adec
      Jason Wang authored
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd68adec
    • Jason Wang's avatar
      ptr_ring: support zero length ring · 982fb490
      Jason Wang authored
      Sometimes, we need zero length ring. But current code will crash since
      we don't do any check before accessing the ring. This patch fixes this.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      982fb490
    • David S. Miller's avatar
      Merge branch 'sch_hfsc-fixes-cleanups' · 8dc7243a
      David S. Miller authored
      Michal Soltys says:
      
      ====================
      HFSC patches, part 1
      
      It's revised version of part of the patches I submitted really, really long
      time ago (back then I asked Patrick to ignore them as I found some issues
      shortly after submitting).
      
      Anyway this is the first set with very simple fixes/changes though some of them
      relatively subtle (I tried to do very exhaustive commit messages explaining what
      and why with those).
      
      The patches are against net-next tree.
      
      The second set will be heavier - or rather with more complex explanations, among those I have:
      
      - a fix to subtle issue introduced in
        http://permalink.gmane.org/gmane.linux.kernel.commits.2-4/8281
        along with simplifying related stuff
      - update times to 96 bits (which allows to "just" use 32 bit shifts and
        improves curve definition accuracy at more extreme low/high speeds)
      - add curve "merging" instead of just selecting in convex case (computations
        mirror those from concave intersection)
      
      But these are eventually for later.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dc7243a
    • Michal Soltys's avatar
      net/sched/sch_hfsc.c: anchor virtual curve at proper vt in hfsc_change_fsc() · 33ef84a7
      Michal Soltys authored
      cl->cl_vt alone is relative only to the current backlog period, while
      the curve operates on cumulative virtual time. This patch adds missing
      cl->cl_vtoff.
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33ef84a7
    • Michal Soltys's avatar
      net/sched/sch_hfsc.c: go passive after vt update · ab12cb47
      Michal Soltys authored
      When a class is going passive, it should update its cl_vt first
      to be consistent with the last dequeue operation.
      
      Otherwise its cl_vt will be one packet behind and parent's cvtmax might
      not be updated as well.
      
      One possible side effect is if some class goes passive and subsequently
      goes active /without/ its parent going passive - with cl_vt lagging one
      packet behind - comparison made in init_vf() will be affected (same
      period).
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab12cb47
    • Michal Soltys's avatar
      net/sched/sch_hfsc.c: remove leftover dlist and droplist · 2354f056
      Michal Soltys authored
      This is update to:
      commit a09ceb0e ("sched: remove qdisc->drop")
      
      That commit removed qdisc->drop, but left alone dlist and droplist
      that no longer serve any meaningful purpose.
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2354f056
    • Michal Soltys's avatar
      net/sched/sch_hfsc.c: add unlikely() in qdisc_peek_len() · d1d0fc5e
      Michal Soltys authored
      The condition can only succeed on wrong configurations.
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1d0fc5e
    • Michal Soltys's avatar
      net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline · 12d0ad3b
      Michal Soltys authored
      Realtime scheduling implemented in HFSC uses head of the queue to make
      the decision about which packet to schedule next. But in case of any
      head drop, the deadline calculated for the previous head is not
      necessarily correct for the next head (unless both packets have the same
      length).
      
      Thanks to peek() function used during dequeue - which internally is a
      dequeue operation - hfsc is almost safe from this issue, as peek()
      dequeues and isolates the head storing it temporarily until the real
      dequeue happens.
      
      But there is one exception: if after the class activation a drop happens
      before the first dequeue operation, there's never a chance to do the
      peek().
      
      Adding peek() call in enqueue - if this is the first packet in a new
      backlog period AND the scheduler has realtime curve defined - fixes that
      one corner case. The 1st hfsc_dequeue() will use that peeked packet,
      similarly as every subsequent hfsc_dequeue() call uses packet peeked by
      the previous call.
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12d0ad3b
    • Eric Dumazet's avatar
      tcp: md5: use kmalloc() backed scratch areas · 19689e38
      Eric Dumazet authored
      Some arches have virtually mapped kernel stacks, or will soon have.
      
      tcp_md5_hash_header() uses an automatic variable to copy tcp header
      before mangling th->check and calling crypto function, which might
      be problematic on such arches.
      
      David says that using percpu storage is also problematic on non SMP
      builds.
      
      Just use kmalloc() to allocate scratch areas.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19689e38
  2. 30 Jun, 2016 16 commits
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 435c556c
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2016-06-29
      
      This series contains updates and fixes to e1000e, igb, ixgbe and fm10k.  A
      true smorgasbord of changes.
      
      Jake cleans up some obscurity by not using the BIT() macro on bitshift
      operation and also fixed the calculated index when looping through the
      indir array.  Fixes the issue with igb's workqueue item for overflow
      check from causing a surprise remove event.  The ptp_flags variable is
      added to simplify the work of writing several complex MAC type checks
      in the PTP code while fixing the workqueue.
      
      Alex Duyck fixes the receive buffers alignment which should not be L1
      cache aligned, but to 512 bytes instead.
      
      Denys Vlasenko prevents a division by zero which was reported under
      VMWare for e1000e.
      
      Amritha fixes an issue where filters in a child hash table must be
      cleared from the hardware before delete the filter links in ixgbe.
      
      Bhaktipriya Shridhar simply replaces the deprecated create_workqueue()
      with alloc_workqueue() for fm10k.
      
      Tony corrects ixgbe ethtool reporting to show x550 supports hardware
      timestamping of all packets.
      
      Emil fixes an issue where MAC-VLANs on the VF fail to pass traffic due
      to spoofed packets.
      
      Andrew Lunn increases performance on some systems where syncing a buffer
      for DMA is expensive.  So rather than sync the whole 2K receive buffer,
      only synchronize the length of the frame.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      435c556c
    • David S. Miller's avatar
      Merge branch 'nfp-next' · c435e6e0
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: few code improvements
      
      Three small patches for net-next.  First and second patches
      improve the code quality by spelling things correctly and
      removing unused parameters.  Third patch hooks-in standard
      kernel implementation of .get_link() in ethtool ops.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c435e6e0
    • Jakub Kicinski's avatar
      nfp: implement ethtool .get_link() callback · 2370def2
      Jakub Kicinski authored
      Point the ethtool .get_link() callback to the standard
      ethtool_op_get_link() implementation.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2370def2
    • Jakub Kicinski's avatar
      nfp: remove unused parameter from nfp_net_write_mac_addr() · f642963b
      Jakub Kicinski authored
      nfp_net_write_mac_addr() always writes to the BAR the current
      device address taken from netdev struct.  The address given
      as parameter is actually ignored.  Since all callers pass
      netdev->dev_addr simply remove the parameter.
      
      While at it improve the function's kdoc a bit.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f642963b
    • Jakub Kicinski's avatar
      nfp: correct name of control BAR define · 796312cd
      Jakub Kicinski authored
      Spell abbreviation of control as ctrl not crtl.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      796312cd
    • Dan Carpenter's avatar
      be2net: signedness bug in be_msix_enable() · 6fde0e63
      Dan Carpenter authored
      "num_vec" needs to be signed for the error handling to work.
      
      Fixes: e261768e ('be2net: support asymmetric rx/tx queue counts')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarSathya Perla <sathya.perla@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fde0e63
    • Masanari Iida's avatar
      net: netcp: Fix a typo in keystone-netcp.txt · 9b9a553c
      Masanari Iida authored
      This patch fix a spelling typo in keystone-netcp.txt
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b9a553c
    • David S. Miller's avatar
      Merge branch 'mediatek-next' · 833ba3d5
      David S. Miller authored
      John Crispin says:
      
      ====================
      net-next: mediatek: IRQ cleanups, fixes and grouping
      
      This series contains 2 small code cleanups that are leftovers from the
      MIPS support. There is also a small fix that adds proper locking to the
      code accessing the IRQ registers. Without this fix we saw deadlocks caused
      by the last patch of the series, which adds IRQ grouping. The grouping
      feature allows us to use different IRQs for TX and RX. By doing so we can
      use affinity to let the SoC handle the IRQs on different cores.
      
      This series depends on a previous series currently sitting in net.git
      starting with
      	commit 562c5a70 ("net: mediatek: only wake the queue if it is stopped")
      up to
      	commit 82c6544d ("net: mediatek: remove superfluous queue wake up call")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      833ba3d5
    • John Crispin's avatar
      net-next: mediatek: add support for IRQ grouping · 80673029
      John Crispin authored
      The ethernet core has 3 IRQs. Using the IRQ grouping registers we are able
      to separate TX and RX IRQs, which allows us to service them on separate
      cores. This patch splits the IRQ handler into 2 separate functions, one for
      TX and another for RX. The TX housekeeping is split out into its own NAPI
      handler.
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80673029
    • John Crispin's avatar
      net-next: mediatek: add IRQ locking · 7bc9ccec
      John Crispin authored
      The code that enables and disables IRQs is missing proper locking. After
      adding the IRQ grouping patch and routing the RX and TX IRQs to different
      cores we experienced IRQ stalls. Fix this by adding proper locking.
      We use a dedicated lock to reduce the latency if the IRQ code.
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bc9ccec
    • John Crispin's avatar
      net-next: mediatek: don't use intermediate variables to store IRQ masks · eece71e8
      John Crispin authored
      The code currently uses variables to store and never modify the bit masks
      of interrupts. This is legacy code from an early version of the driver
      that supported MIPS based SoCs where the IRQ bits depended on the actual
      SoC. As the bits are the same for all ARM based SoCs using this driver we
      can remove the intermediate variables.
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eece71e8
    • John Crispin's avatar
      net-next: mediatek: remove superfluous register reads · 6e6edd8b
      John Crispin authored
      The driver was originally written for MIPS based SoC. These required the
      IRQ mask register to be read after writing it to ensure that the content
      was actually applied. As this version only works on ARM based SoCs, we can
      safely remove the 2 reads as they are no longer required.
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e6edd8b
    • Mateusz Bajorski's avatar
      fib_rules: Added NLM_F_EXCL support to fib_nl_newrule · 153380ec
      Mateusz Bajorski authored
      When adding rule with NLM_F_EXCL flag then check if the same rule exist.
      If yes then exit with -EEXIST.
      
      This is already implemented in iproute2:
              if (cmd == RTM_NEWRULE) {
                      req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
                      req.r.rtm_type = RTN_UNICAST;
              }
      
      Tested ipv4 and ipv6 with net-next linux on qemu x86
      
      expected behavior after patch:
      localhost ~ # ip rule
      0:    from all lookup local
      32766:    from all lookup main
      32767:    from all lookup default
      localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
      localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
      RTNETLINK answers: File exists
      localhost ~ # ip rule
      0:    from all lookup local
      1005:    from 10.46.177.97 lookup 104
      32766:    from all lookup main
      32767:    from all lookup default
      
      There was already topic regarding this but I don't see any changes
      merged and problem still occurs.
      https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomainSigned-off-by: default avatarMateusz Bajorski <mateusz.bajorski@nokia.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      153380ec
    • Seymour, Shane M's avatar
      tcp: increase size at which tcp_bound_to_half_wnd bounds to > TCP_MSS_DEFAULT · 2631b79f
      Seymour, Shane M authored
      In previous commit 01f83d69
      the following comments were added:
      
      "When peer uses tiny windows, there is no use in packetizing to sub-MSS
      pieces for the sake of SWS or making sure there are enough packets in
      the pipe for fast recovery."
      
      The test should be > TCP_MSS_DEFAULT not >= 512. This allows low end
      devices that send an MSS of 536 (TCP_MSS_DEFAULT) to see better network
      performance by sending it 536 bytes of data at a time instead of bounding
      to half window size (268). Other network stacks work this way, e.g. HP-UX.
      Signed-off-by: default avatarShane Seymour <shane.seymour@hpe.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2631b79f
    • Andrey Vagin's avatar
      tcp: add an ability to dump and restore window parameters · b1ed4c4f
      Andrey Vagin authored
      We found that sometimes a restored tcp socket doesn't work.
      
      A reason of this bug is incorrect window parameters and in this case
      tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The
      other side drops packets with this seq, because seq is less than
      tp->rcv_nxt ( tcp_sequence() ).
      
      Data from a send queue is sent only if there is enough space in a
      window, so when we restore unacked data, we need to expand a window to
      fit this data.
      
      This was in a first version of this patch:
      "tcp: extend window to fit all restored unacked data in a send queue"
      
      Then Alexey recommended me to restore window parameters instead of
      adjusted them according with data in a sent queue. This sounds resonable.
      
      rcv_wnd has to be restored, because it was reported to another side
      and the offered window is never shrunk.
      One of reasons why we need to restore snd_wnd was described above.
      
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1ed4c4f
    • David S. Miller's avatar
      Merge branch 'bridge-igmp-stats' · 641f7e40
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: add support for IGMP/MLD stats
      
      This patchset adds support for the new IFLA_STATS_LINK_XSTATS_SLAVE
      attribute which can be used with RTM_GETSTATS in order to export per-slave
      statistics. It works by passing the attribute to the linkxstats callback
      and if the callback user supports it - it should dump that slave's stats.
      This is much more scalable and permits us to request only a single port's
      statistics instead of dumping everything every time.
      The second patch adds support for per-port IGMP/MLD statistics and uses
      the new API to export them for the bridge and its ports. The stats are
      made in a very lightweight manner, the normal fast-path is not affected
      at all and the flood paths (br_flood/br_multicast_flood) are only affected
      if the packet is IGMP and the IGMP stats have been enabled using cache-hot
      data for the check.
      
      v2: Patch 01 is new, patch 02 has been reworked to use the new API, also
      in addition counters for IGMP/MLD parse errors have been added and members
      are added for per-port multicast traffic stats. The multicast counting has
      been slightly optimized (moved the br_multicast_count inside the IPv4/6
      IGMP functions after the checks for IGMP traffic) to avoid one conditional
      that was on all of the multicast traffic path (both IGMP and other).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      641f7e40