1. 28 Oct, 2010 18 commits
    • Nelson Elhage's avatar
      pktgen: Limit how much data we copy onto the stack. · 448d7b5d
      Nelson Elhage authored
      A program that accidentally writes too much data to the pktgen file can overflow
      the kernel stack and oops the machine. This is only triggerable by root, so
      there's no security issue, but it's still an unfortunate bug.
      
      printk() won't print more than 1024 bytes in a single call, anyways, so let's
      just never copy more than that much data. We're on a fairly shallow stack, so
      that should be safe even with CONFIG_4KSTACKS.
      Signed-off-by: default avatarNelson Elhage <nelhage@ksplice.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      448d7b5d
    • David S. Miller's avatar
      net: Limit socket I/O iovec total length to INT_MAX. · 8acfe468
      David S. Miller authored
      This helps protect us from overflow issues down in the
      individual protocol sendmsg/recvmsg handlers.  Once
      we hit INT_MAX we truncate out the rest of the iovec
      by setting the iov_len members to zero.
      
      This works because:
      
      1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
         writes are allowed and the application will just continue
         with another write to send the rest of the data.
      
      2) For datagram oriented sockets, where there must be a
         one-to-one correspondance between write() calls and
         packets on the wire, INT_MAX is going to be far larger
         than the packet size limit the protocol is going to
         check for and signal with -EMSGSIZE.
      
      Based upon a patch by Linus Torvalds.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8acfe468
    • Dmitry Artamonow's avatar
      USB: gadget: fix ethernet gadget crash in gether_setup · 349f6c5c
      Dmitry Artamonow authored
      Crash is triggered by commit e6484930 ("net: allocate tx queues in
      register_netdevice"), which moved tx netqueue creation into register_netdev.
      So now calling netif_stop_queue() before register_netdev causes an oops.
      Move netif_stop_queue() after net device registration to fix crash.
      Signed-off-by: default avatarDmitry Artamonow <mad_soft@inbox.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      349f6c5c
    • Pavel Emelyanov's avatar
      fib: Fix fib zone and its hash leak on namespace stop · 4aa2c466
      Pavel Emelyanov authored
      When we stop a namespace we flush the table and free one, but the
      added fn_zone-s (and their hashes if grown) are leaked. Need to free.
      Tries releases all its stuff in the flushing code.
      
      Shame on us - this bug exists since the very first make-fib-per-net
      patches in 2.6.27 :(
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4aa2c466
    • Krishna Kumar's avatar
      cxgb3: Fix panic in free_tx_desc() · b1424ed9
      Krishna Kumar authored
      I got a few of these panics (on 2.6.36-rc7) when running high
      number of netperf sessions:
      
      BUG: unable to handle kernel paging request at 0000100000000000
      IP: [<ffffffff813125f0>] skb_release_data+0xa0/0xd0
      Oops: 0000 [#1] SMP
      Pid: 2155, comm: vhost-2115 Not tainted 2.6.36-rc7-ORG #1 49Y6512     /System x3650 M2 -[7947AC1]-
      RIP: 0010:[<ffffffff813125f0>]  [<ffffffff813125f0>] skb_release_data+0xa0/0xd0
      RSP: 0018:ffff880001803738  EFLAGS: 00010206
      RAX: ffff880179b0fc00 RBX: ffff880178b441c0 RCX: 0000000000000000
      RSP: 0018:ffff880001803738  EFLAGS: 00010206
      RAX: ffff880179b0fc00 RBX: ffff880178b441c0 RCX: 0000000000000000
      RDX: ffff880179b0fd40 RSI: 0000000000000000 RDI: 0000100000000000
      RBP: ffff880001803748 R08: 0000000000000001 R09: ffff88017f117000
      R10: ffff88017b990608 R11: ffff88017f117090 R12: ffff880178b441c0
      R13: ffff88017f117090 R14: 0000000000000000 R15: ffff880178b441c0
      FS:  0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000100000000000 CR3: 000000017ea64000 CR4: 00000000000026e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process vhost-2115 (pid: 2155, threadinfo ffff88017d872000, task ffff88017e954680)
      Stack:
      ffff880178b441c0 0000000000000007 ffff880001803768 ffffffff81312119
      <0> 0000000000000000 0000000000000002 ffff880001803778 ffffffff813121f9
      <0> ffff880001803818 ffffffffa012d14c ffffffffa02de076 ffff880001803700
      Call Trace:
      <IRQ>
      [<ffffffff81312119>] __kfree_skb+0x19/0xa0
      [<ffffffff813121f9>] kfree_skb+0x19/0x40
      [<ffffffffa012d14c>] free_tx_desc+0x2fc/0x350 [cxgb3]
      [<ffffffffa02de076>] ? vhost_poll_wakeup+0x16/0x20 [vhost_net]
      [<ffffffffa01323db>] t3_eth_xmit+0x28b/0x380 [cxgb3]
      [<ffffffff8131ce47>] dev_hard_start_xmit+0x377/0x5a0
      [<ffffffff81335a4a>] sch_direct_xmit+0xfa/0x1d0
      [<ffffffff8131d1a9>] dev_queue_xmit+0x139/0x450
      [<ffffffff81326225>] neigh_resolve_output+0x125/0x340
      [<ffffffff8135a77c>] ip_finish_output+0x14c/0x320
      [<ffffffff8135a9fe>] ip_output+0xae/0xc0
      [<ffffffff8135620f>] ip_forward_finish+0x3f/0x50
      [<ffffffff8135641f>] ip_forward+0x1ff/0x400
      [<ffffffff81354789>] ip_rcv_finish+0x119/0x3e0
      [<ffffffff81354c7d>] ip_rcv+0x22d/0x300
      [<ffffffff8131a95b>] __netif_receive_skb+0x29b/0x570
      [<ffffffff8131ba70>] ? netif_receive_skb+0x0/0x80
      [<ffffffff8131bae8>] netif_receive_skb+0x78/0x80
      [<ffffffffa02a96d8>] br_handle_frame_finish+0x198/0x260 [bridge]
      [<ffffffffa02aebc8>] br_nf_pre_routing_finish+0x238/0x380 [bridge]
      [<ffffffff813424bc>] ? nf_hook_slow+0x6c/0x100
      [<ffffffffa02ae990>] ? br_nf_pre_routing_finish+0x0/0x380 [bridge]
      [<ffffffffa02afb08>] br_nf_pre_routing+0x698/0x7a0 [bridge]
      [<ffffffff81342414>] nf_iterate+0x64/0xa0
      [<ffffffffa02a9540>] ? br_handle_frame_finish+0x0/0x260 [bridge]
      [<ffffffff813424bc>] nf_hook_slow+0x6c/0x100
      [<ffffffffa02a9540>] ? br_handle_frame_finish+0x0/0x260 [bridge]
      [<ffffffffa02a9931>] br_handle_frame+0x191/0x240 [bridge]
      [<ffffffffa02a97a0>] ? br_handle_frame+0x0/0x240 [bridge]
      [<ffffffff8131a863>] __netif_receive_skb+0x1a3/0x570
      [<ffffffff812ef3f6>] ? dma_issue_pending_all+0x76/0xa0
      [<ffffffff8131ad32>] process_backlog+0x102/0x200
      [<ffffffff8131c2d0>] net_rx_action+0x100/0x220
      [<ffffffff810548ef>] __do_softirq+0xaf/0x140
      [<ffffffff8100bcdc>] call_softirq+0x1c/0x30
      [<ffffffff8100dfc5>] ? do_softirq+0x65/0xa0
      [<ffffffff8131c6b8>] netif_rx_ni+0x28/0x30
      [<ffffffffa02c305d>] tun_sendmsg+0x2cd/0x4b0 [tun]
      [<ffffffffa02e01af>] handle_tx+0x1df/0x340 [vhost_net]
      [<ffffffffa02e0340>] handle_tx_kick+0x10/0x20 [vhost_net]
      [<ffffffffa02de29b>] vhost_worker+0xbb/0x130 [vhost_net]
      [<ffffffffa02de1e0>] ? vhost_worker+0x0/0x130 [vhost_net]
      [<ffffffffa02de1e0>] ? vhost_worker+0x0/0x130 [vhost_net]
      [<ffffffff81069686>] kthread+0x96/0xa0
      [<ffffffff8100bbe4>] kernel_thread_helper+0x4/0x10
      [<ffffffff810695f0>] ? kthread+0x0/0xa0
      [<ffffffff8100bbe0>] ? kernel_thread_helper+0x0/0x10
      Code: 8b 94 24 d0 00 00 00 49 8b 84 24 d8 00 00 00 48 8d 14 10 0f b7 0a 39 d9 7f d1 48 8b 7a 10 48 85 ff 74 20 48 c7 42 10 00 00 00 00 <48> 8b 1f e8 e8 fb ff ff 48 85 db 48 89 df 75 f0 49 8b 84 24 d8
      
      Patch below fixes the panic. cxgb4 and cxgb4vf already have this fix.
      Signed-off-by: default avatarKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1424ed9
    • Nishanth Aravamudan's avatar
      cxgb3: fix crash due to manipulating queues before registration · 69dcfc8a
      Nishanth Aravamudan authored
      Along the same lines as "cxgb4: fix crash due to manipulating queues
      before registration" (8f6d9f40), before
      commit "net: allocate tx queues in register_netdevice"
      netif_tx_stop_all_queues and related functions could be used between
      device allocation and registration but now only after registration.
      cxgb4 has such a call before registration and crashes now.  Move it
      after register_netdev.
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Cc: eric.dumazet@gmail.com
      Cc: sonnyrao@us.ibm.com
      Cc: Divy Le Ray <divy@chelsio.com>
      Cc: Dimitris Michailidis <dm@chelsio.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Tested-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: default avatarDivy Le Ray <divy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69dcfc8a
    • Pavel Emelyanov's avatar
      8390: Don't oops on starting dev queue · b7126d8c
      Pavel Emelyanov authored
      The __NS8390_init tries to start the device queue before the
      device is registered. This results in an oops (snipped):
      
      [    2.865493] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [    2.866106] IP: [<ffffffffa000602a>] netif_start_queue+0xb/0x12 [8390]
      [    2.881267] Call Trace:
      [    2.881437]  [<ffffffffa000624d>] __NS8390_init+0x102/0x15a [8390]
      [    2.881999]  [<ffffffffa00062ae>] NS8390_init+0x9/0xb [8390]
      [    2.882237]  [<ffffffffa000d820>] ne2k_pci_init_one+0x297/0x354 [ne2k_pci]
      [    2.882955]  [<ffffffff811c7a0e>] local_pci_probe+0x12/0x16
      [    2.883308]  [<ffffffff811c85ad>] pci_device_probe+0xc3/0xef
      [    2.884049]  [<ffffffff8129218d>] driver_probe_device+0xbe/0x14b
      [    2.884937]  [<ffffffff81292260>] __driver_attach+0x46/0x62
      [    2.885170]  [<ffffffff81291788>] bus_for_each_dev+0x49/0x78
      [    2.885781]  [<ffffffff81291fbb>] driver_attach+0x1c/0x1e
      [    2.886089]  [<ffffffff812912ab>] bus_add_driver+0xba/0x227
      [    2.886330]  [<ffffffff8129259a>] driver_register+0x9e/0x115
      [    2.886933]  [<ffffffff811c8815>] __pci_register_driver+0x50/0xac
      [    2.887785]  [<ffffffffa001102c>] ne2k_pci_init+0x2c/0x2e [ne2k_pci]
      [    2.888093]  [<ffffffff81000212>] do_one_initcall+0x7c/0x130
      [    2.888693]  [<ffffffff8106d74f>] sys_init_module+0x99/0x1da
      [    2.888946]  [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
      
      This happens because the netif_start_queue sets respective bit on the dev->_tx
      array which is not yet allocated.
      
      As far as I understand the code removing the netif_start_queue from __NS8390_init
      is OK, since queue will be started later on device open. Plz, correct me if I'm wrong.
      
      Found in the Dave's current tree, so he's in Cc.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7126d8c
    • Gerrit Renker's avatar
      dccp ccid-2: Stop polling · 1c0e0a05
      Gerrit Renker authored
      This updates CCID-2 to use the CCID dequeuing mechanism, converting from
      previous continuous-polling to a now event-driven mechanism.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c0e0a05
    • Gerrit Renker's avatar
      dccp: Refine the wait-for-ccid mechanism · b1fcf55e
      Gerrit Renker authored
      This extends the existing wait-for-ccid routine so that it may be used with
      different types of CCID, addressing the following problems:
      
       1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
          example has a full TX queue and becomes network-limited just as the
          application wants to close, then waiting for CCID-2 to become unblocked
          could lead to an indefinite  delay (i.e., application "hangs").
       2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
          in its sending policy while the queue is being drained. This can lead to
          further delays during which the application will not be able to terminate.
       3) The minimum wait time for CCID-3/4 can be expected to be the queue length
          times the current inter-packet delay. For example if tx_qlen=100 and a delay
          of 15 ms is used for each packet, then the application would have to wait
          for a minimum of 1.5 seconds before being allowed to exit.
       4) There is no way for the user/application to control this behaviour. It would
          be good to use the timeout argument of dccp_close() as an upper bound. Then
          the maximum time that an application is willing to wait for its CCIDs to can
          be set via the SO_LINGER option.
      
      These problems are addressed by giving the CCID a grace period of up to the
      `timeout' value.
      
      The wait-for-ccid function is, as before, used when the application
       (a) has read all the data in its receive buffer and
       (b) if SO_LINGER was set with a non-zero linger time, or
       (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
           state (client application closes after receiving CloseReq).
      
      In addition, there is a catch-all case of __skb_queue_purge() after waiting for
      the CCID. This is necessary since the write queue may still have data when
       (a) the host has been passively-closed,
       (b) abnormal termination (unread data, zero linger time),
       (c) wait-for-ccid could not finish within the given time limit.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1fcf55e
    • Gerrit Renker's avatar
      dccp: Extend CCID packet dequeueing interface · dc841e30
      Gerrit Renker authored
      This extends the packet dequeuing interface of dccp_write_xmit() to allow
       1. CCIDs to take care of timing when the next packet may be sent;
       2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).
      
      The main purpose is to take CCID-2 out of its polling mode (when it is network-
      limited, it tries every millisecond to send, without interruption).
      
      The mode of operation for (2) is as follows:
       * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
       * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
       * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
       * dccp_write_xmit() returns without further action;
       * after some time the wait-condition for CCID becomes true,
       * that CCID schedules the tasklet,
       * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
       * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
       * packet is sent, and possibly more (since dccp_write_xmit() loops).
      
      Code reuse: the taskled function calls dccp_write_xmit(), the timer function
                  reduces to a wrapper around the same code.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc841e30
    • Gerrit Renker's avatar
      dccp: Return-value convention of hc_tx_send_packet() · fe84f414
      Gerrit Renker authored
      This patch reorganises the return value convention of the CCID TX sending
      function, to permit more flexible schemes, as required by subsequent patches.
      
      Currently the convention is
       * values < 0     mean error,
       * a value == 0   means "send now", and
       * a value x > 0  means "send in x milliseconds".
      
      The patch provides symbolic constants and a function to interpret return values.
      
      In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
      corresponding to 65.535 seconds.  This is possible since in CCID-3/4 the
      maximum possible inter-packet gap is fixed at t_mbi = 64 sec.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe84f414
    • Emil Tantilov's avatar
      igbvf: fix panic on load · de7fe787
      Emil Tantilov authored
      Introduced by commit:e6484930
      net: allocate tx queues in register_netdevice
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Acked-by: default avatarGreg Rose <greg.v.rose@intel.com>
      Tested-by: default avatarJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de7fe787
    • Emil Tantilov's avatar
      ixgb: call pci_disable_device in ixgb_remove · ec43a81c
      Emil Tantilov authored
      ixgb fails to work after reload on recent kernels:
      
      rmmod ixgb (dev->current_state = PCI_UNKNOWN)
      modprobe ixgb (pci_enable_device will bail leaving current_state to PCI_UNKNOWN)
      ifup eth0
      do_IRQ: 2.82 No irq handler for vector (irq -1)
      
      The issue was exposed by commit fcd097f3
      PCI: MSI: Remove unsafe and unnecessary hardware access
      
      which avoids HW writes for power states != PCI_D0
      
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec43a81c
    • John Fastabend's avatar
      ixgbe: DCB, fix TX hang occurring in stress condition with PFC · 9806307a
      John Fastabend authored
      The DCB credits refill quantum _must_ be greater than half the max
      packet size. This is needed to guarantee that TX DMA operations
      are not attempted during a pause state. Additionally, the min IFG
      must be set correctly for DCB mode. If a DMA operation is
      requested unexpectedly during the pause state the HW data
      store may be corrupted leading to a DMA hang.  The DMA hang
      requires a reset to correct. This fixes the HW configuration
      to avoid this condition.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: default avatarRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9806307a
    • Carolyn Wyborny's avatar
      e1000e: Add check for reset flags before displaying reset message · affa9dfb
      Carolyn Wyborny authored
      Some parts need to execute resets during normal operation.  This flag
      check ensures that those parts reset without needlessly alarming the
      user.  Other unexpected resets by other parts will dump debug info
      and message the reset action to the user, as originally intended.
      Signed-off-by: default avatarCarolyn Wyborny <carolyn.wyborny@intel.com>
      Acked-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      affa9dfb
    • Carolyn Wyborny's avatar
      e1000e: reset PHY after errors detected · ff10e13c
      Carolyn Wyborny authored
      Some errors can be induced in the PHY via environmental testing
      (specifically extreme temperature changes and electro static
      discharge testing), and in the case of the PHY hanging due to
      this input, this detects the problem and resets to continue.
      This issue only applies to 82574 silicon.
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarCarolyn Wyborny <carolyn.wyborny@intel.com>
      Tested-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff10e13c
    • David S. Miller's avatar
      pch_gbe: Select MII. · 116c1ea0
      David S. Miller authored
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      116c1ea0
    • Jesse Gross's avatar
      igb: Fix unused variable warning. · c1758012
      Jesse Gross authored
      Commit eab6d18d "vlan: Don't check for vlan group before
      vlan_tx_tag_present" removed the need for the adapter variable
      in igb_xmit_frame_ring_adv().  This removes the variable as well
      to avoid the compiler warning.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1758012
  2. 27 Oct, 2010 22 commits