1. 03 Dec, 2013 2 commits
    • Venkat Venkatsubra's avatar
      rds: prevent BUG_ON triggered on congestion update to loopback · 18fc25c9
      Venkat Venkatsubra authored
      After congestion update on a local connection, when rds_ib_xmit returns
      less bytes than that are there in the message, rds_send_xmit calls
      back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
      to trigger.
      
      For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
      the message contains 8240 bytes. rds_send_xmit thinks there is more to send
      and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
      =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
      
      The commit 6094628b
      "rds: prevent BUG_ON triggering on congestion map updates" introduced
      this regression. That change was addressing the triggering of a different
      BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
       	BUG_ON(ret != 0 &&
          		 conn->c_xmit_sg == rm->data.op_nents);
      This was the sequence it was going through:
      (rds_ib_xmit)
      /* Do not send cong updates to IB loopback */
      if (conn->c_loopback
         && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
        	rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
          	return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
      }
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
         		 = 8192
        c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
      rds_ib_xmit returns 8240
      rds_send_xmit:
        c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
        and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
      rds_ib_xmit returns 8240
      On this iteration this sequence causes the BUG_ON in rds_send_xmit:
          while (ret) {
          	tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
          	[tmp = 65536 - 57632 = 7904]
          	conn->c_xmit_data_off += tmp;
          	[c_xmit_data_off = 57632 + 7904 = 65536]
          	ret -= tmp;
          	[ret = 8240 - 7904 = 336]
          	if (conn->c_xmit_data_off == sg->length) {
          		conn->c_xmit_data_off = 0;
          		sg++;
          		conn->c_xmit_sg++;
          		BUG_ON(ret != 0 &&
          			conn->c_xmit_sg == rm->data.op_nents);
          		[c_xmit_sg = 1, rm->data.op_nents = 1]
      
      What the current fix does:
      Since the congestion update over loopback is not actually transmitted
      as a message, all that rds_ib_xmit needs to do is let the caller think
      the full message has been transmitted and not return partial bytes.
      It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
      And 64Kb+48 when page size is 64Kb.
      Reported-by: default avatarJosh Hunt <joshhunt00@gmail.com>
      Tested-by: default avatarHonggang Li <honli@redhat.com>
      Acked-by: default avatarBang Nguyen <bang.nguyen@oracle.com>
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18fc25c9
    • Paul Durrant's avatar
      xen-netback: clear vif->task on disconnect · 67fa3660
      Paul Durrant authored
      xenvif_start_xmit() relies on checking vif->task for NULL to determine
      whether the vif is ready to accept packets. The task thread is stopped in
      xenvif_disconnect() but task is not set to NULL. Thus, on a re-connect the
      check will give a false positive.
      
      Also since commit ea732dff (Handle backend
      state transitions in a more robust way) it should not be possible for
      xenvif_connect() to be called if the vif is already connected so change the
      check of vif->tx_irq to a BUG_ON() and also add a BUG_ON(vif->task).
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67fa3660
  2. 02 Dec, 2013 19 commits
    • David S. Miller's avatar
      Revert "net: Handle CHECKSUM_COMPLETE more adequately in pskb_trim_rcsum()." · 7ce5a27f
      David S. Miller authored
      This reverts commit 018c5bba.
      
      It causes regressions for people using chips driven by the sungem
      driver.  Suspicion is that the skb->csum value isn't being adjusted
      properly.
      
      The change also has a bug in that if __pskb_trim() fails, we'll leave
      a corruped skb->csum value in there.  We would really need to revert
      it to it's original value in that case.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ce5a27f
    • Eric Dumazet's avatar
      net: do not pretend FRAGLIST support · 28e24c62
      Eric Dumazet authored
      Few network drivers really supports frag_list : virtual drivers.
      
      Some drivers wrongly advertise NETIF_F_FRAGLIST feature.
      
      If skb with a frag_list is given to them, packet on the wire will be
      corrupt.
      
      Remove this flag, as core networking stack will make sure to
      provide packets that can be sent without corruption.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Anirudha Sarangi <anirudh@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28e24c62
    • Kamala R's avatar
      IPv6: Fixed support for blackhole and prohibit routes · 7150aede
      Kamala R authored
      The behaviour of blackhole and prohibit routes has been corrected by setting
      the input and output pointers of the dst variable appropriately. For
      blackhole routes, they are set to dst_discard and to ip6_pkt_discard and
      ip6_pkt_discard_out respectively for prohibit routes.
      
      ipv6: ip6_pkt_prohibit(_out) should not depend on
      CONFIG_IPV6_MULTIPLE_TABLES
      
      We need ip6_pkt_prohibit(_out) available without
      CONFIG_IPV6_MULTIPLE_TABLES
      Signed-off-by: default avatarKamala R <kamala@aristanetworks.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7150aede
    • François-Xavier Le Bail's avatar
    • Sebastian Siewior's avatar
      net: fec_main: dma_map() only the length of the skb · 2488a54e
      Sebastian Siewior authored
      On tx submit the driver always dma_map_single() FEC_ENET_TX_FRSIZE (=2048)
      bytes. This works because we don't overwrite any memory after the data buffer,
      we remove it from cache if it was there. So we hurt performace in case the
      mapping of a smaller area makes a difference.
      There is also a bug: If the data area starts shortly before the end of
      RAM say 0xc7fffa10 and the RAM ends at 0xc8000000 then we have enough
      space to fit the data area (according to skb->len) but we would map beyond
      end of ram if we are using 2048. In v2.6.31 (against which kernel this patch
      made) there is the following check in dma_cache_maint():
      
      |BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(start + size - 1));
      
      Since the area starting at 0xc8000000 is no longer virt_addr_valid() we
      BUG() during dma_map_single(). The BUG() statement was removed in v3.5-rc1 as
      per 2dc6a016 ("ARM: dma-mapping: use asm-generic/dma-mapping-common.h").
      
      This patch was tested on v2.6.31 and then forward-ported and compile
      tested only against the net tree. I think it is still worth fixing
      mainline even after the BUG() statement is gone.
      Tested-by: default avatarFugang Duan <B38611@freescale.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarFugang Duan <B38611@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2488a54e
    • Mugunthan V N's avatar
      drivers: net: cpsw: fix dt probe for one port ethernet · 3a27bfac
      Mugunthan V N authored
      When only one port of the two port is pinned out, then dt probe is failing
      because second port phy is not found. fixing this by checking the number of
      slaves and breaking the loop.
      Signed-off-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a27bfac
    • Rafael J. Wysocki's avatar
      PCI / tg3: Give up chip reset and carrier loss handling if PCI device is not present · 8496e85c
      Rafael J. Wysocki authored
      Modify tg3_chip_reset() and tg3_close() to check if the PCI network
      adapter device is accessible at all in order to skip poking it or
      trying to handle a carrier loss in vain when that's not the case.
      Introduce a special PCI helper function pci_device_is_present()
      for this purpose.
      
      Of course, this uncovers the lack of the appropriate RTNL locking
      in tg3_suspend() and tg3_resume(), so add that locking in there
      too.
      
      These changes prevent tg3 from burning a CPU at 100% load level for
      solid several seconds after the Thunderbolt link is disconnected from
      a Matrox DS1 docking station.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8496e85c
    • Duan Jiong's avatar
      ipv6: judge the accept_ra_defrtr before calling rt6_route_rcv · 30e56918
      Duan Jiong authored
      when dealing with a RA message, if accept_ra_defrtr is false,
      the kernel will not add the default route, and then deal with
      the following route information options. Unfortunately, those
      options maybe contain default route, so let's judge the
      accept_ra_defrtr before calling rt6_route_rcv.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e56918
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a45299e7
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       - Correction of fuzzy and fragile IRQ_RETVAL macro
       - IRQ related resume fix affecting only XEN
       - ARM/GIC fix for chained GIC controllers
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: Gic: fix boot for chained gics
        irq: Enable all irqs unconditionally in irq_resume
        genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
      a45299e7
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0b57ca3
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Various smaller fixlets, all over the place"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/doc: Fix generation of device-drivers
        sched: Expose preempt_schedule_irq()
        sched: Fix a trivial typo in comments
        sched: Remove unused variable in 'struct sched_domain'
        sched: Avoid NULL dereference on sd_busy
        sched: Check sched_domain before computing group power
        MAINTAINERS: Update file patterns in the lockdep and scheduler entries
      a0b57ca3
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e321ae4c
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc kernel and tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tools lib traceevent: Fix conversion of pointer to integer of different size
        perf/trace: Properly use u64 to hold event_id
        perf: Remove fragile swevent hlist optimization
        ftrace, perf: Avoid infinite event generation loop
        tools lib traceevent: Fix use of multiple options in processing field
        perf header: Fix possible memory leaks in process_group_desc()
        perf header: Fix bogus group name
        perf tools: Tag thread comm as overriden
      e321ae4c
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · bcc2f9b7
      Linus Torvalds authored
      Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
       "Fixes to patches that went in this merge window along with a latent
        bug:
         - Fix lazy flushing in case m2p override fails.
         - Fix module compile issues with ARM/Xen
         - Add missing call to DMA map page for Xen SWIOTLB for ARM"
      
      * tag 'stable/for-linus-3.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/gnttab: leave lazy MMU mode in the case of a m2p override failure
        xen/arm: p2m_init and p2m_lock should be static
        arm/xen: Export phys_to_mach to fix Xen module link errors
        swiotlb-xen: add missing xen_dma_map_page call
      bcc2f9b7
    • Linus Torvalds's avatar
      Merge tag 'spi-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · aeac8103
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A smattering of driver specific fixes here, including a bunch for a
        long standing common pattern in the error handling paths, and a fix
        for an embarrassing thinko in the new devm master registration code"
      
      * tag 'spi-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi/pxa2xx: Restore private register bits.
        spi/qspi: Fix qspi remove path.
        spi/qspi: cleanup pm_runtime error check.
        spi/qspi: set correct platform drvdata in ti_qspi_probe()
        spi/pxa2xx: add new ACPI IDs
        spi: core: invert success test in devm_spi_register_master
        spi: spi-mxs: fix reference leak to master in mxs_spi_remove()
        spi: bcm63xx: fix reference leak to master in bcm63xx_spi_remove()
        spi: txx9: fix reference leak to master in txx9spi_remove()
        spi: mpc512x: fix reference leak to master in mpc512x_psc_spi_do_remove()
        spi: rspi: use platform drvdata correctly in rspi_remove()
        spi: bcm2835: fix reference leak to master in bcm2835_spi_remove()
      aeac8103
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5fc92de3
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "Here is a pile of bug fixes that accumulated while I was in Europe"
      
       1) In fixing kernel leaks to userspace during copying of socket
          addresses, we broke a case that used to work, namely the user
          providing a buffer larger than the in-kernel generic socket address
          structure.  This broke Ruby amongst other things.  Fix from Dan
          Carpenter.
      
       2) Fix regression added by byte queue limit support in 8139cp driver,
          from Yang Yingliang.
      
       3) The addition of MSG_SENDPAGE_NOTLAST buggered up a few sendpage
          implementations, they should just treat it the same as MSG_MORE.
          Fix from Richard Weinberger and Shawn Landden.
      
       4) Handle icmpv4 errors received on ipv6 SIT tunnels correctly, from
          Oussama Ghorbel.  In particular we should send an ICMPv6 unreachable
          in such situations.
      
       5) Fix some regressions in the recent genetlink fixes, in particular
          get the pmcraid driver to use the new safer interfaces correctly.
          From Johannes Berg.
      
       6) macvtap was converted to use a per-cpu set of statistics, but some
          code was still bumping tx_dropped elsewhere.  From Jason Wang.
      
       7) Fix build failure of xen-netback due to missing include on some
          architectures, from Andy Whitecroft.
      
       8) macvtap double counts received packets in statistics, fix from Vlad
          Yasevich.
      
       9) Fix various cases of using *_STATS_BH() when *_STATS() is more
          appropriate.  From Eric Dumazet and Hannes Frederic Sowa.
      
      10) Pktgen ipsec mode doesn't update the ipv4 header length and checksum
          properly after encapsulation.  Fix from Fan Du.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
        net/mlx4_en: Remove selftest TX queues empty condition
        {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
        virtio_net: make all RX paths handle erors consistently
        virtio_net: fix error handling for mergeable buffers
        virtio_net: Fixed a trivial typo (fitler --> filter)
        netem: fix gemodel loss generator
        netem: fix loss 4 state model
        netem: missing break in ge loss generator
        net/hsr: Support iproute print_opt ('ip -details ...')
        net/hsr: Very small fix of comment style.
        MAINTAINERS: Added net/hsr/ maintainer
        ipv6: fix possible seqlock deadlock in ip6_finish_output2
        ixgbe: Make ixgbe_identify_qsfp_module_generic static
        ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
        ixgbe: ixgbe_fwd_ring_down needs to be static
        e1000: fix possible reset_task running after adapter down
        e1000: fix lockdep warning in e1000_reset_task
        e1000: prevent oops when adapter is being closed and reset simultaneously
        igb: Fixed Wake On LAN support
        inet: fix possible seqlock deadlocks
        ...
      5fc92de3
    • Linus Torvalds's avatar
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds authored
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147 ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      Reported-by: default avatarSimon Kirby <sim@hostway.ca>
      Reported-by: default avatarIan Applegate <ia@cloudflare.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
    • Eugenia Emantayev's avatar
      net/mlx4_en: Remove selftest TX queues empty condition · 833846e8
      Eugenia Emantayev authored
      Remove waiting for TX queues to become empty during selftest.
      This check is not necessary for any purpose, and might put
      the driver into an infinite loop.
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      833846e8
    • fan.du's avatar
      {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation · 3868204d
      fan.du authored
      commit a553e4a6 ("[PKTGEN]: IPSEC support")
      tried to support IPsec ESP transport transformation for pktgen, but acctually
      this doesn't work at all for two reasons(The orignal transformed packet has
      bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
      
      - After transpormation, IPv4 header total length needs update,
        because encrypted payload's length is NOT same as that of plain text.
      
      - After transformation, IPv4 checksum needs re-caculate because of payload
        has been changed.
      
      With this patch, armmed pktgen with below cofiguration, Wireshark is able to
      decrypted ESP packet generated by pktgen without any IPv4 checksum error or
      auth value error.
      
      pgset "flag IPSEC"
      pgset "flows 1"
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3868204d
    • Michael S. Tsirkin's avatar
      virtio_net: make all RX paths handle erors consistently · f121159d
      Michael S. Tsirkin authored
      receive mergeable now handles errors internally.
      Do same for big and small packet paths, otherwise
      the logic is too hard to follow.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f121159d
    • Michael S. Tsirkin's avatar
      virtio_net: fix error handling for mergeable buffers · 8fc3b9e9
      Michael S. Tsirkin authored
      Eric Dumazet noticed that if we encounter an error
      when processing a mergeable buffer, we don't
      dequeue all of the buffers from this packet,
      the result is almost sure to be loss of networking.
      
      Jason Wang noticed that we also leak a page and that we don't decrement
      the rq buf count, so we won't repost buffers (a resource leak).
      
      Fix both issues.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael Dalton <mwdalton@google.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fc3b9e9
  3. 01 Dec, 2013 4 commits
  4. 30 Nov, 2013 15 commits