1. 09 Jan, 2015 5 commits
  2. 07 Jan, 2015 3 commits
    • David S. Miller's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · bdec4196
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just a pile of random fixes, including:
      
         1) Do not apply TSO limits to non-TSO packets, fix from Herbert Xu.
      
         2) MDI{,X} eeprom check in e100 driver is reversed, from John W.
            Linville.
      
         3) Missing error return assignments in several ethernet drivers, from
            Julia Lawall.
      
         4) Altera TSE device doesn't come back up after ifconfig down/up
            sequence, fix from Kostya Belezko.
      
         5) Add more cases to the check for whether the qmi_wwan device has a
            bogus MAC address and needs to be assigned a random one.  From
            Kristian Evensen.
      
         6) Fix interrupt hangs in CPSW, from Felipe Balbi.
      
         7) Implement ndo_features_check in r8152 so that the stack doesn't
            feed GSO packets which are outside of the chip's capabilities.
            From Hayes Wang"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
        qla3xxx: don't allow never end busy loop
        xen-netback: fixing the propagation of the transmit shaper timeout
        r8152: support ndo_features_check
        batman-adv: fix potential TT client + orig-node memory leak
        batman-adv: fix multicast counter when purging originators
        batman-adv: fix counter for multicast supporting nodes
        batman-adv: fix lock class for decoding hash in network-coding.c
        batman-adv: fix delayed foreign originator recognition
        batman-adv: fix and simplify condition when bonding should be used
        Revert "mac80211: Fix accounting of the tailroom-needed counter"
        net: ethernet: cpsw: fix hangs with interrupts
        enic: free all rq buffs when allocation fails
        qmi_wwan: Set random MAC on devices with buggy fw
        openvswitch: Consistently include VLAN header in flow and port stats.
        tcp: Do not apply TSO segment limit to non-TSO packets
        Altera TSE: Add missing phydev
        net/mlx4_core: Fix error flow in mlx4_init_hca()
        net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
        qlcnic: Fix return value in qlcnic_probe()
        net: axienet: fix error return code
        ...
      bdec4196
    • Linus Torvalds's avatar
      Merge tag 'for-linus-3' of git://git.code.sf.net/p/openipmi/linux-ipmi · 0adc1803
      Linus Torvalds authored
      Pull IPMI fixlet from Corey Minyard:
       "Fix a compile warning"
      
      * tag 'for-linus-3' of git://git.code.sf.net/p/openipmi/linux-ipmi:
        ipmi: Fix compile warning with tv_usec
      0adc1803
  3. 06 Jan, 2015 32 commits
    • Feng Kan's avatar
      net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI · de7b5b3d
      Feng Kan authored
      This adds support for APM X-Gene ethernet driver to use ACPI table to derive
      ethernet driver parameter.
      Signed-off-by: default avatarFeng Kan <fkan@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de7b5b3d
    • Andy Shevchenko's avatar
      qla3xxx: don't allow never end busy loop · 2abad79a
      Andy Shevchenko authored
      The counter variable wasn't increased at all which may stuck under
      certain circumstances.
      Signed-off-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2abad79a
    • Andy Fleming's avatar
      net/fsl: Add mEMAC MDIO support to XGMAC MDIO · 1fcf77c8
      Andy Fleming authored
      The Freescale mEMAC supports operating at 10/100/1000/10G, and
      its associated MDIO controller is likewise capable of operating
      both Clause 22 and Clause 45 MDIO buses. It is nearly identical
      to the MDIO controller on the XGMAC, so we just modify that
      driver.
      
      Portions of this driver developed by:
      
      Sandeep Singh <sandeep@freescale.com>
      Roy Zang <tie-fei.zang@freescale.com>
      Signed-off-by: default avatarAndy Fleming <afleming@gmail.com>
      Signed-off-by: default avatarShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fcf77c8
    • Ed Swierk's avatar
      ethtool: Extend ethtool plugin module eeprom API to phylib · 2f438366
      Ed Swierk authored
      This patch extends the ethtool plugin module eeprom API to support cards
      whose phy support is delegated to a separate driver.
      
      The handlers for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPROM call the
      module_info and module_eeprom functions if the phy driver provides them;
      otherwise the handlers call the equivalent ethtool_ops functions provided
      by network drivers with built-in phy support.
      Signed-off-by: default avatarEd Swierk <eswierk@skyportsystems.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f438366
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 3b421b80
      Linus Torvalds authored
      Pull ext4 bugfixes from Ted Ts'o:
       "Revert a potential seek_data/hole regression which shows up when using
        ext4 to handle ext3 file systems, plus two minor bug fixes"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: remove spurious KERN_INFO from ext4_warning call
        Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"
        ext4: prevent online resize with backup superblock
      3b421b80
    • Linus Torvalds's avatar
      mm: propagate error from stack expansion even for guard page · fee7e49d
      Linus Torvalds authored
      Jay Foad reports that the address sanitizer test (asan) sometimes gets
      confused by a stack pointer that ends up being outside the stack vma
      that is reported by /proc/maps.
      
      This happens due to an interaction between RLIMIT_STACK and the guard
      page: when we do the guard page check, we ignore the potential error
      from the stack expansion, which effectively results in a missing guard
      page, since the expected stack expansion won't have been done.
      
      And since /proc/maps explicitly ignores the guard page (commit
      d7824370: "mm: fix up some user-visible effects of the stack guard
      page"), the stack pointer ends up being outside the reported stack area.
      
      This is the minimal patch: it just propagates the error.  It also
      effectively makes the guard page part of the stack limit, which in turn
      measn that the actual real stack is one page less than the stack limit.
      
      Let's see if anybody notices.  We could teach acct_stack_growth() to
      allow an extra page for a grow-up/grow-down stack in the rlimit test,
      but I don't want to add more complexity if it isn't needed.
      Reported-and-tested-by: default avatarJay Foad <jay.foad@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fee7e49d
    • David S. Miller's avatar
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge · 627d2cc0
      David S. Miller authored
      Included changes:
      - ensure bonding is used (if enabled) for packets coming in the soft
        interface
      - fix race condition to avoid orig_nodes to be deleted right after
        being added
      - avoid false positive lockdep splats by assigning lockclass to
        the proper hashtable lock objects
      - avoid miscounting of multicast 'disabled' nodes in the network
      - fix memory leak in the Global Translation Table in case of
        originator interval change
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      627d2cc0
    • Palik, Imre's avatar
      xen-netback: fixing the propagation of the transmit shaper timeout · 07ff890d
      Palik, Imre authored
      Since e9ce7cb6 ("xen-netback: Factor queue-specific data into queue struct"),
      the transimt shaper timeout is always set to 0.  The value the user sets via
      xenbus is never propagated to the transmit shaper.
      
      This patch fixes the issue.
      
      Cc: Anthony Liguori <aliguori@amazon.com>
      Signed-off-by: default avatarImre Palik <imrep@amazon.de>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07ff890d
    • Shrikrishna Khare's avatar
      Driver: Vmxnet3: Make Rx ring 2 size configurable · 53831aa1
      Shrikrishna Khare authored
      Rx ring 2 size can be configured by adjusting rx-jumbo parameter
      of ethtool -G.
      Signed-off-by: default avatarRamya Bolla <bollar@vmware.com>
      Signed-off-by: default avatarShreyas Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53831aa1
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2015-01-06' of... · 15ecf7a0
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2015-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Here's just a single fix - a revert of a patch that broke the
      p54 and cw2100 drivers (arguably due to bad assumptions there.)
      Since this affects kernels since 3.17, I decided to revert for
      now and we'll revisit this optimisation properly for -next.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15ecf7a0
    • hayeswang's avatar
      r8152: support ndo_features_check · a5e31255
      hayeswang authored
      Support ndo_features_check to avoid:
       - the transport offset is more than the hw limitation when using hw checksum.
       - the skb->len of a GSO packet is more than the limitation.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5e31255
    • Richard Cochran's avatar
      arm_arch_timer: include clocksource.h directly · 7c8f1e78
      Richard Cochran authored
      This driver makes use of the clocksource code. Previously it had only
      included the proper header indirectly, but that chain was inadvertently
      broken by 74d23cc7 "time: move the timecounter/cyclecounter code into its
      own file."
      
      This patch fixes the issue by including clocksource.h directly.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c8f1e78
    • Hariprasad Shenai's avatar
    • Linus Lüssing's avatar
      batman-adv: fix potential TT client + orig-node memory leak · 9d31b3ce
      Linus Lüssing authored
      This patch fixes a potential memory leak which can occur once an
      originator times out. On timeout the according global translation table
      entry might not get purged correctly. Furthermore, the non purged TT
      entry will cause its orig-node to leak, too. Which additionally can lead
      to the new multicast optimization feature not kicking in because of a
      therefore bogus counter.
      
      In detail: The batadv_tt_global_entry->orig_list holds the reference to
      the orig-node. Usually this reference is released after
      BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()->
      batadv_purge_orig_node()->batadv_update_route()->_batadv_update_route()->
      batadv_tt_global_del_orig() which purges this global tt entry and
      releases the reference to the orig-node.
      
      However, if between two batadv_purge_orig_node() calls the orig-node
      timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't
      reached. Instead the according orig-node is removed from the
      originator hash in _batadv_purge_orig(), the batadv_update_route()
      part is skipped and won't be reached anymore.
      
      Fixing the issue by moving batadv_tt_global_del_orig() out of the rcu
      callback.
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Acked-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      9d31b3ce
    • Linus Lüssing's avatar
      batman-adv: fix multicast counter when purging originators · a5164886
      Linus Lüssing authored
      When purging an orig_node we should only decrease counter tracking the
      number of nodes without multicast optimizations support if it was
      increased through this orig_node before.
      
      A not yet quite initialized orig_node (meaning it did not have its turn
      in the mcast-tvlv handler so far) which gets purged would not adhere to
      this and will lead to a counter imbalance.
      
      Fixing this by adding a check whether the orig_node is mcast-initalized
      before decreasing the counter in the mcast-orig_node-purging routine.
      
      Introduced by 60432d75
      ("batman-adv: Announce new capability via multicast TVLV")
      Reported-by: default avatarTobias Hachmer <tobias@hachmer.de>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      a5164886
    • Linus Lüssing's avatar
      batman-adv: fix counter for multicast supporting nodes · e8829f00
      Linus Lüssing authored
      A miscounting of nodes having multicast optimizations enabled can lead
      to multicast packet loss in the following scenario:
      
      If the first OGM a node receives from another one has no multicast
      optimizations support (no multicast tvlv) then we are missing to
      increase the counter. This potentially leads to the wrong assumption
      that we could safely use multicast optimizations.
      
      Fixings this by increasing the counter if the initial OGM has the
      multicast TVLV unset, too.
      
      Introduced by 60432d75
      ("batman-adv: Announce new capability via multicast TVLV")
      Reported-by: default avatarTobias Hachmer <tobias@hachmer.de>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      e8829f00
    • Martin Hundebøll's avatar
      batman-adv: fix lock class for decoding hash in network-coding.c · f44d5407
      Martin Hundebøll authored
      batadv_has_set_lock_class() is called with the wrong hash table as first
      argument (probably due to a copy-paste error), which leads to false
      positives when running with lockdep.
      
      Introduced-by: 612d2b4f
      ("batman-adv: network coding - save overheard and tx packets for decoding")
      Signed-off-by: default avatarMartin Hundebøll <martin@hundeboll.net>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      f44d5407
    • Linus Lüssing's avatar
      batman-adv: fix delayed foreign originator recognition · 2c667a33
      Linus Lüssing authored
      Currently it can happen that the reception of an OGM from a new
      originator is not being accepted. More precisely it can happen that
      an originator struct gets allocated and initialized
      (batadv_orig_node_new()), even the TQ gets calculated and set correctly
      (batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging
      thread will decide to delete it if it has a chance to jump between
      these two function calls.
      
      This is because batadv_orig_node_new() initializes the last_seen value
      to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to
      other threads by adding it to the hash table already.
      batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct,
      current time a few lines later but if the purging thread jumps in between
      that it will think that the orig_node timed out and will wrongly
      schedule it for deletion already.
      
      If the purging interval is the same as the originator interval (which is
      the default: 1 second), then this game can continue for several rounds
      until the random OGM jitter added enough difference between these
      two (in tests, two to about four rounds seemed common).
      
      Fixing this by initializing the last_seen variable of an orig_node
      to the current time before adding it to the hash table.
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      2c667a33
    • Simon Wunderlich's avatar
      batman-adv: fix and simplify condition when bonding should be used · 329887ad
      Simon Wunderlich authored
      The current condition actually does NOT consider bonding when the
      interface the packet came in from is the soft interface, which is the
      opposite of what it should do (and the comment describes). Fix that and
      slightly simplify the condition.
      Reported-by: default avatarRay Gibson <booray@gmail.com>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      329887ad
    • David S. Miller's avatar
      Merge branch 'rt_cong_ctrl' · a918eb9f
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      net: allow setting congctl via routing table
      
      This is the second part of our work and allows for setting the congestion
      control algorithm via routing table. For details, please see individual
      patches.
      
      Since patch 1 is a bug fix, we suggest applying patch 1 to net, and then
      merging net into net-next, for example, and following up with the remaining
      feature patches wrt dependencies.
      
      Joint work with Florian Westphal, suggested by Hannes Frederic Sowa.
      
      Patch for iproute2 is available under [1], but will be reposted with along
      with the man-page update when this set hits net-next.
      
        [1] http://patchwork.ozlabs.org/patch/418149/
      
      Thanks!
      
      v2 -> v3:
       - Added module auto-loading as suggested by David Miller, thanks!
        - Added patch 2 for handling possible sleeps in fib6
        - While working on this, we discovered a bug, hence fix in patch 1
        - Added auto-loading to patch 4
       - Rebased, retested, rest the same.
      v1 -> v2:
       - Very sorry, I noticed I had decnet disabled during testing.
         Added missing header include in decnet, rest as is.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a918eb9f
    • Daniel Borkmann's avatar
      net: tcp: add per route congestion control · 81164413
      Daniel Borkmann authored
      This work adds the possibility to define a per route/destination
      congestion control algorithm. Generally, this opens up the possibility
      for a machine with different links to enforce specific congestion
      control algorithms with optimal strategies for each of them based
      on their network characteristics, even transparently for a single
      application listening on all links.
      
      For our specific use case, this additionally facilitates deployment
      of DCTCP, for example, applications can easily serve internal
      traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
      would also allow for utilizing e.g. long living, low priority
      background flows for certain destinations/routes while still being
      able for normal traffic to utilize the default congestion control
      algorithm. We also thought about a per netns setting (where different
      defaults are possible), but given its actually a link specific
      property, we argue that a per route/destination setting is the most
      natural and flexible.
      
      The administrator can utilize this through ip-route(8) by appending
      "congctl [lock] <name>", where <name> denotes the name of a
      congestion control algorithm and the optional lock parameter allows
      to enforce the given algorithm so that applications in user space
      would not be allowed to overwrite that algorithm for that destination.
      
      The dst metric lookups are being done when a dst entry is already
      available in order to avoid a costly lookup and still before the
      algorithms are being initialized, thus overhead is very low when the
      feature is not being used. While the client side would need to drop
      the current reference on the module, on server side this can actually
      even be avoided as we just got a flat-copied socket clone.
      
      Joint work with Florian Westphal.
      Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81164413
    • Daniel Borkmann's avatar
      net: tcp: add RTAX_CC_ALGO fib handling · ea697639
      Daniel Borkmann authored
      This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
      control metric to be set up and dumped back to user space.
      
      While the internal representation of RTAX_CC_ALGO is handled as a u32
      key, we avoided to expose this implementation detail to user space, thus
      instead, we chose the netlink attribute that is being exchanged between
      user space to be the actual congestion control algorithm name, similarly
      as in the setsockopt(2) API in order to allow for maximum flexibility,
      even for 3rd party modules.
      
      It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
      it should have been stored in RTAX_FEATURES instead, we first thought
      about reusing it for the congestion control key, but it brings more
      complications and/or confusion than worth it.
      
      Joint work with Florian Westphal.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea697639
    • Daniel Borkmann's avatar
      net: tcp: add key management to congestion control · c5c6a8ab
      Daniel Borkmann authored
      This patch adds necessary infrastructure to the congestion control
      framework for later per route congestion control support.
      
      For a per route congestion control possibility, our aim is to store
      a unique u32 key identifier into dst metrics, which can then be
      mapped into a tcp_congestion_ops struct. We argue that having a
      RTAX key entry is the most simple, generic and easy way to manage,
      and also keeps the memory footprint of dst entries lower on 64 bit
      than with storing a pointer directly, for example. Having a unique
      key id also allows for decoupling actual TCP congestion control
      module management from the FIB layer, i.e. we don't have to care
      about expensive module refcounting inside the FIB at this point.
      
      We first thought of using an IDR store for the realization, which
      takes over dynamic assignment of unused key space and also performs
      the key to pointer mapping in RCU. While doing so, we stumbled upon
      the issue that due to the nature of dynamic key distribution, it
      just so happens, arguably in very rare occasions, that excessive
      module loads and unloads can lead to a possible reuse of previously
      used key space. Thus, previously stale keys in the dst metric are
      now being reassigned to a different congestion control algorithm,
      which might lead to unexpected behaviour. One way to resolve this
      would have been to walk FIBs on the actually rare occasion of a
      module unload and reset the metric keys for each FIB in each netns,
      but that's just very costly.
      
      Therefore, we argue a better solution is to reuse the unique
      congestion control algorithm name member and map that into u32 key
      space through jhash. For that, we split the flags attribute (as it
      currently uses 2 bits only anyway) into two u32 attributes, flags
      and key, so that we can keep the cacheline boundary of 2 cachelines
      on x86_64 and cache the precalculated key at registration time for
      the fast path. On average we might expect 2 - 4 modules being loaded
      worst case perhaps 15, so a key collision possibility is extremely
      low, and guaranteed collision-free on LE/BE for all in-tree modules.
      Overall this results in much simpler code, and all without the
      overhead of an IDR. Due to the deterministic nature, modules can
      now be unloaded, the congestion control algorithm for a specific
      but unloaded key will fall back to the default one, and on module
      reload time it will switch back to the expected algorithm
      transparently.
      
      Joint work with Florian Westphal.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c6a8ab
    • Daniel Borkmann's avatar
      net: tcp: refactor reinitialization of congestion control · 29ba4fff
      Daniel Borkmann authored
      We can just move this to an extra function and make the code
      a bit more readable, no functional change.
      
      Joint work with Florian Westphal.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29ba4fff
    • Florian Westphal's avatar
      net: fib6: convert cfg metric to u32 outside of table write lock · e715b6d3
      Florian Westphal authored
      Do the nla validation earlier, outside the write lock.
      
      This is needed by followup patch which needs to be able to call
      request_module (which can sleep) if needed.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e715b6d3
    • Daniel Borkmann's avatar
      net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference · 0409c9a5
      Daniel Borkmann authored
      When IPv6 host routes with metrics attached are being added, we fetch
      the metrics store from the dst via COW through dst_metrics_write_ptr(),
      added through commit e5fd387a.
      
      One remaining problem here is that we actually call into inet_getpeer()
      and may end up allocating/creating a new peer from the kmemcache, which
      may fail.
      
      Example trace from perf probe (inet_getpeer:41) where create is 1:
      
      ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
        85e294 inet_getpeer.part.7 (<- kmem_cache_alloc())
        85e578 inet_getpeer
        8eb333 ipv6_cow_metrics
        8f10ff fib6_commit_metrics
      
      Therefore, a check for NULL on the return of dst_metrics_write_ptr()
      is necessary here.
      
      Joint work with Florian Westphal.
      
      Fixes: e5fd387a ("ipv6: do not overwrite inetpeer metrics prematurely")
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0409c9a5
    • Hubert Sokolowski's avatar
      net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined · 6cb69742
      Hubert Sokolowski authored
      Add checking whether the call to ndo_dflt_fdb_dump is needed.
      It is not expected to call ndo_dflt_fdb_dump unconditionally
      by some drivers (i.e. qlcnic or macvlan) that defines
      own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
      and don't want ndo_dflt_fdb_dump to be called at all.
      At the same time it is desirable to call the default dump
      function on a bridge device.
      Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
      Add extra checking in br_fdb_dump to avoid duplicate entries
      as now filter_dev can be NULL.
      
      Following tests for filtering have been performed before
      the change and after the patch was applied to make sure
      they are the same and it doesn't break the filtering algorithm.
      
      [root@localhost ~]# cd /root/iproute2-3.18.0/bridge
      [root@localhost bridge]# modprobe dummy
      [root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
      [root@localhost bridge]# brctl addbr br0
      [root@localhost bridge]# brctl addif  br0 dummy0
      [root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
      [root@localhost bridge]# # show all
      [root@localhost bridge]# ./bridge fdb show
      33:33:00:00:00:01 dev p2p1 self permanent
      01:00:5e:00:00:01 dev p2p1 self permanent
      33:33:ff:ac:ce:32 dev p2p1 self permanent
      33:33:00:00:02:02 dev p2p1 self permanent
      01:00:5e:00:00:fb dev p2p1 self permanent
      33:33:00:00:00:01 dev p7p1 self permanent
      01:00:5e:00:00:01 dev p7p1 self permanent
      33:33:ff:79:50:53 dev p7p1 self permanent
      33:33:00:00:02:02 dev p7p1 self permanent
      01:00:5e:00:00:fb dev p7p1 self permanent
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by bridge
      [root@localhost bridge]# ./bridge fdb show br br0
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by port
      [root@localhost bridge]# ./bridge fdb show brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]# # filter by port + bridge
      [root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]#
      Signed-off-by: default avatarHubert Sokolowski <hubert.sokolowski@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cb69742
    • David S. Miller's avatar
      Merge branch 'ip_cmsg_csum' · d4253c62
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      ip: Support checksum returned in csmg
      
      This patch set allows the packet checksum for a datagram socket
      to be returned in csum data in recvmsg. This allows userspace
      to implement its own checksum over the data, for instance if an
      IP tunnel was be implemented in user space, the inner checksum
      could be validated.
      
      Changes in this patch set:
        - Move checksum conversion to inet_sock from udp_sock. This
          generalizes checksum conversion for use with other protocols.
        - Move IP cmsg constants to a header file and make processing
          of the flags more efficient in ip_cmsg_recv
        - Return checksum value in cmsg. This is specifically the unfolded
          32 bit checksum of the full packet starting from the first byte
          returned in recvmsg
      
      Tested: Wrote a little server to get checksums in cmsg for UDP and
              verfied correct checksum is returned.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4253c62
    • Tom Herbert's avatar
      ip: Add offset parameter to ip_cmsg_recv · ad6f939a
      Tom Herbert authored
      Add ip_cmsg_recv_offset function which takes an offset argument
      that indicates the starting offset in skb where data is being received
      from. This will be useful in the case of UDP and provided checksum
      to user space.
      
      ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
      zero.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad6f939a
    • Tom Herbert's avatar
      ip: Add offset parameter to ip_cmsg_recv · 5961de9f
      Tom Herbert authored
      Add ip_cmsg_recv_offset function which takes an offset argument
      that indicates the starting offset in skb where data is being received
      from. This will be useful in the case of UDP and provided checksum
      to user space.
      
      ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
      zero.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5961de9f
    • Tom Herbert's avatar
      ip: IP cmsg cleanup · c44d13d6
      Tom Herbert authored
      Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
      they can be referenced in other source files.
      
      Restructure ip_cmsg_recv to not go through flags using shift, check
      for flags by 'and'. This eliminates both the shift and a conditional
      per flag check.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c44d13d6
    • Tom Herbert's avatar
      ip: Move checksum convert defines to inet · 224d019c
      Tom Herbert authored
      Move convert_csum from udp_sock to inet_sock. This allows the
      possibility that we can use convert checksum for different types
      of sockets and also allows convert checksum to be enabled from
      inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      224d019c