1. 28 Oct, 2009 2 commits
    • Michael S. Tsirkin's avatar
      virtio: order used ring after used index read · 2d61ba95
      Michael S. Tsirkin authored
      On SMP guests, reads from the ring might bypass used index reads. This
      causes guest crashes because host writes to used index to signal ring
      data readiness.  Fix this by inserting rmb before used ring reads.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      2d61ba95
    • Michael S. Tsirkin's avatar
      virtio-pci: fix per-vq MSI-X request logic · 0b22bd0b
      Michael S. Tsirkin authored
      Commit f68d2408
      in 2.6.32-rc1 broke requesting IRQs for per-VQ MSI-X vectors:
      - vector number was used instead of the vector itself
      - we try to request an IRQ for VQ which does not
        have a callback handler
      
      This is a regression that causes warnings in kernel log,
      potentially lower performance as we need to scan vq list,
      and might cause system failure if the interrupt
      requested is in fact needed by another system.
      
      This was not noticed earlier because in most cases
      we were falling back on shared interrupt for all vqs.
      
      The warnings often look like this:
      
      virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
      virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
      virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
      IRQ handler type mismatch for IRQ 1
      current handler: i8042
      Pid: 2400, comm: modprobe Tainted: G        W
      2.6.32-rc3-11952-gf3ed8d8-dirty #1
      Call Trace:
       [<ffffffff81072aed>] ? __setup_irq+0x299/0x304
       [<ffffffff81072ff3>] ? request_threaded_irq+0x144/0x1c1
       [<ffffffff813455af>] ? vring_interrupt+0x0/0x30
       [<ffffffff81346598>] ? vp_try_to_find_vqs+0x583/0x5c7
       [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
       [<ffffffff81346609>] ? vp_find_vqs+0x2d/0x83
       [<ffffffff81345d00>] ? vp_get+0x3c/0x4e
       [<ffffffffa0016373>] ? virtnet_probe+0x2f1/0x428 [virtio_net]
       [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
       [<ffffffffa00150d8>] ? skb_xmit_done+0x0/0x39 [virtio_net]
       [<ffffffff8110ab92>] ? sysfs_do_create_link+0xcb/0x116
       [<ffffffff81345cc2>] ? vp_get_status+0x14/0x16
       [<ffffffff81345464>] ? virtio_dev_probe+0xa9/0xc8
       [<ffffffff8122b11c>] ? driver_probe_device+0x8d/0x128
       [<ffffffff8122b206>] ? __driver_attach+0x4f/0x6f
       [<ffffffff8122b1b7>] ? __driver_attach+0x0/0x6f
       [<ffffffff8122a9f9>] ? bus_for_each_dev+0x43/0x74
       [<ffffffff8122a374>] ? bus_add_driver+0xea/0x22d
       [<ffffffff8122b4a3>] ? driver_register+0xa7/0x111
       [<ffffffffa001a000>] ? init+0x0/0xc [virtio_net]
       [<ffffffff81009051>] ? do_one_initcall+0x50/0x148
       [<ffffffff8106e117>] ? sys_init_module+0xc5/0x21a
       [<ffffffff8100af02>] ? system_call_fastpath+0x16/0x1b
      virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
      virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
      Reported-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Reported-by: default avatarShirley Ma <xma@us.ibm.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      0b22bd0b
  2. 22 Oct, 2009 8 commits
  3. 21 Oct, 2009 14 commits
  4. 20 Oct, 2009 11 commits
    • Andreas Gruenbacher's avatar
      dnotify: ignore FS_EVENT_ON_CHILD · 94552684
      Andreas Gruenbacher authored
      Mask off FS_EVENT_ON_CHILD in dnotify_handle_event().  Otherwise, when there
      is more than one watch on a directory and dnotify_should_send_event()
      succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
      spurious events.
      
      This case was overlooked in commit e42e2773.
      
      	#define _GNU_SOURCE
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <signal.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <fcntl.h>
      	#include <string.h>
      
      	static void create_event(int s, siginfo_t* si, void* p)
      	{
      		printf("create\n");
      	}
      
      	static void delete_event(int s, siginfo_t* si, void* p)
      	{
      		printf("delete\n");
      	}
      
      	int main (void) {
      		struct sigaction action;
      		char *tmpdir, *file;
      		int fd1, fd2;
      
      		sigemptyset (&action.sa_mask);
      		action.sa_flags = SA_SIGINFO;
      
      		action.sa_sigaction = create_event;
      		sigaction (SIGRTMIN + 0, &action, NULL);
      
      		action.sa_sigaction = delete_event;
      		sigaction (SIGRTMIN + 1, &action, NULL);
      
      	#	define TMPDIR "/tmp/test.XXXXXX"
      		tmpdir = malloc(strlen(TMPDIR) + 1);
      		strcpy(tmpdir, TMPDIR);
      		mkdtemp(tmpdir);
      
      	#	define TMPFILE "/file"
      		file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
      		sprintf(file, "%s/%s", tmpdir, TMPFILE);
      
      		fd1 = open (tmpdir, O_RDONLY);
      		fcntl(fd1, F_SETSIG, SIGRTMIN);
      		fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE);
      
      		fd2 = open (tmpdir, O_RDONLY);
      		fcntl(fd2, F_SETSIG, SIGRTMIN + 1);
      		fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE);
      
      		if (fork()) {
      			/* This triggers a create event */
      			creat(file, 0600);
      			/* This triggers a create and delete event (!) */
      			unlink(file);
      		} else {
      			sleep(1);
      			rmdir(tmpdir);
      		}
      
      		return 0;
      	}
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@suse.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      94552684
    • Eric Dumazet's avatar
      net: Fix struct inet_timewait_sock bitfield annotation · abf90cca
      Eric Dumazet authored
      commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
      added 4/8 bytes in struct inet_timewait_sock.
      
      Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
      The 14 bits hole is named tw_pad to make it cleary apparent.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abf90cca
    • Herbert Xu's avatar
      tcp: Try to catch MSG_PEEK bug · b6b39e8f
      Herbert Xu authored
      This patch tries to print out more information when we hit the
      MSG_PEEK bug in tcp_recvmsg.  It's been around since at least
      2005 and it's about time that we finally fix it.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6b39e8f
    • Huang Ying's avatar
      crypto: aesni-intel - Fix irq_fpu_usable usage · 13b79b97
      Huang Ying authored
      When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
      function is changed too, from mesuring whether kernel is using FPU,
      that is, the FPU is NOT available, to measuring whether FPU is usable,
      that is, the FPU is available.
      
      But the usage of irq_fpu_usable in aesni-intel_glue.c is not changed
      accordingly. This patch fixes this.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      13b79b97
    • Eric Dumazet's avatar
      net: Fix IP_MULTICAST_IF · 55b80503
      Eric Dumazet authored
      ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.
      
      This function should be called only with RTNL or dev_base_lock held, or reader
      could see a corrupt hash chain and eventually enter an endless loop.
      
      Fix is to call dev_get_by_index()/dev_put().
      
      If this happens to be performance critical, we could define a new dev_exist_by_index()
      function to avoid touching dev refcount.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55b80503
    • Dave Young's avatar
      bluetooth: static lock key fix · 45054dc1
      Dave Young authored
      When shutdown ppp connection, lockdep waring about non-static key
      will happen, it is caused by the lock is not initialized properly
      at that time.
      
      Fix with tuning the lock/skb_queue_head init order
      
      [   94.339261] INFO: trying to register non-static key.
      [   94.342509] the code is fine but needs lockdep annotation.
      [   94.342509] turning off the locking correctness validator.
      [   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
      [   94.342509] Call Trace:
      [   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
      [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
      [   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
      [   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
      [   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
      [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
      [   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
      [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
      [   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
      [   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
      [   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
      [   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
      [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
      [   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
      [   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
      [   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
      [   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
      [   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
      [   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
      [   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
      [   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
      [   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
      [   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
      [   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
      [   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
      [   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
      [   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
      [   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
      [   94.342509]  [<c0534153>] rest_init+0x53/0x55
      [   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
      [   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96
      Reported-by: default avatarOliver Hartkopp <oliver@hartkopp.net>
      Signed-off-by: default avatarDave Young <hidave.darkstar@gmail.com>
      Tested-by: default avatarOliver Hartkopp <oliver@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45054dc1
    • Dave Young's avatar
      bluetooth: scheduling while atomic bug fix · f74c77cb
      Dave Young authored
      Due to driver core changes dev_set_drvdata will call kzalloc which should be
      in might_sleep context, but hci_conn_add will be called in atomic context
      
      Like dev_set_name move dev_set_drvdata to work queue function.
      
      oops as following:
      
      Oct  2 17:41:59 darkstar kernel: [  438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
      Oct  2 17:41:59 darkstar kernel: [  438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
      Oct  2 17:41:59 darkstar kernel: [  438.001348] 2 locks held by sdptool/2133:
      Oct  2 17:41:59 darkstar kernel: [  438.001350]  #0:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001360]  #1:  (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
      Oct  2 17:41:59 darkstar kernel: [  438.001373] Call Trace:
      Oct  2 17:41:59 darkstar kernel: [  438.001381]  [<c022433f>] __might_sleep+0xde/0xe5
      Oct  2 17:41:59 darkstar kernel: [  438.001386]  [<c0298843>] __kmalloc+0x4a/0x15a
      Oct  2 17:41:59 darkstar kernel: [  438.001392]  [<c03f0065>] ? kzalloc+0xb/0xd
      Oct  2 17:41:59 darkstar kernel: [  438.001396]  [<c03f0065>] kzalloc+0xb/0xd
      Oct  2 17:41:59 darkstar kernel: [  438.001400]  [<c03f04ff>] device_private_init+0x15/0x3d
      Oct  2 17:41:59 darkstar kernel: [  438.001405]  [<c03f24c5>] dev_set_drvdata+0x18/0x26
      Oct  2 17:41:59 darkstar kernel: [  438.001414]  [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001422]  [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001429]  [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001437]  [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001442]  [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001448]  [<c04c8df5>] sys_connect+0x60/0x7a
      Oct  2 17:41:59 darkstar kernel: [  438.001453]  [<c024b703>] ? lock_release_non_nested+0x84/0x1de
      Oct  2 17:41:59 darkstar kernel: [  438.001458]  [<c028804b>] ? might_fault+0x47/0x81
      Oct  2 17:41:59 darkstar kernel: [  438.001462]  [<c028804b>] ? might_fault+0x47/0x81
      Oct  2 17:41:59 darkstar kernel: [  438.001468]  [<c033361f>] ? __copy_from_user_ll+0x11/0xce
      Oct  2 17:41:59 darkstar kernel: [  438.001472]  [<c04c9419>] sys_socketcall+0x82/0x17b
      Oct  2 17:41:59 darkstar kernel: [  438.001477]  [<c020329d>] syscall_call+0x7/0xb
      Signed-off-by: default avatarDave Young <hidave.darkstar@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f74c77cb
    • Julian Anastasov's avatar
      tcp: fix TCP_DEFER_ACCEPT retrans calculation · b103cf34
      Julian Anastasov authored
      Fix TCP_DEFER_ACCEPT conversion between seconds and
      retransmission to match the TCP SYN-ACK retransmission periods
      because the time is converted to such retransmissions. The old
      algorithm selects one more retransmission in some cases. Allow
      up to 255 retransmissions.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b103cf34
    • Julian Anastasov's avatar
      tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT · 0c3d79bc
      Julian Anastasov authored
      Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT
      users to not retransmit SYN-ACKs during the deferring period if
      ACK from client was received. The goal is to reduce traffic
      during the deferring period. When the period is finished
      we continue with sending SYN-ACKs (at least one) but this time
      any traffic from client will change the request to established
      socket allowing application to terminate it properly.
      Also, do not drop acked request if sending of SYN-ACK fails.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c3d79bc
    • Julian Anastasov's avatar
      tcp: accept socket after TCP_DEFER_ACCEPT period · d1b99ba4
      Julian Anastasov authored
      Willy Tarreau and many other folks in recent years
      were concerned what happens when the TCP_DEFER_ACCEPT period
      expires for clients which sent ACK packet. They prefer clients
      that actively resend ACK on our SYN-ACK retransmissions to be
      converted from open requests to sockets and queued to the
      listener for accepting after the deferring period is finished.
      Then application server can decide to wait longer for data
      or to properly terminate the connection with FIN if read()
      returns EAGAIN which is an indication for accepting after
      the deferring period. This change still can have side effects
      for applications that expect always to see data on the accepted
      socket. Others can be prepared to work in both modes (with or
      without TCP_DEFER_ACCEPT period) and their data processing can
      ignore the read=EAGAIN notification and to allocate resources for
      clients which proved to have no data to send during the deferring
      period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
      as a timeout) to wait for data will notice clients that didn't
      send data for 3 seconds but that still resend ACKs.
      Thanks to Willy Tarreau for the initial idea and to
      Eric Dumazet for the review and testing the change.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1b99ba4
    • David S. Miller's avatar
      Revert "tcp: fix tcp_defer_accept to consider the timeout" · a1a2ad91
      David S. Miller authored
      This reverts commit 6d01a026.
      
      Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
      with a more correct way to deal with this.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a2ad91
  5. 19 Oct, 2009 3 commits
    • Tomoki Sekiyama's avatar
      AF_UNIX: Fix deadlock on connecting to shutdown socket · 77238f2b
      Tomoki Sekiyama authored
      I found a deadlock bug in UNIX domain socket, which makes able to DoS
      attack against the local machine by non-root users.
      
      How to reproduce:
      1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
          namespace(*), and shutdown(2) it.
       2. Repeat connect(2)ing to the listening socket from the other sockets
          until the connection backlog is full-filled.
       3. connect(2) takes the CPU forever. If every core is taken, the
          system hangs.
      
      PoC code: (Run as many times as cores on SMP machines.)
      
      int main(void)
      {
      	int ret;
      	int csd;
      	int lsd;
      	struct sockaddr_un sun;
      
      	/* make an abstruct name address (*) */
      	memset(&sun, 0, sizeof(sun));
      	sun.sun_family = PF_UNIX;
      	sprintf(&sun.sun_path[1], "%d", getpid());
      
      	/* create the listening socket and shutdown */
      	lsd = socket(AF_UNIX, SOCK_STREAM, 0);
      	bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
      	listen(lsd, 1);
      	shutdown(lsd, SHUT_RDWR);
      
      	/* connect loop */
      	alarm(15); /* forcely exit the loop after 15 sec */
      	for (;;) {
      		csd = socket(AF_UNIX, SOCK_STREAM, 0);
      		ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
      		if (-1 == ret) {
      			perror("connect()");
      			break;
      		}
      		puts("Connection OK");
      	}
      	return 0;
      }
      
      (*) Make sun_path[0] = 0 to use the abstruct namespace.
          If a file-based socket is used, the system doesn't deadlock because
          of context switches in the file system layer.
      
      Why this happens:
       Error checks between unix_socket_connect() and unix_wait_for_peer() are
       inconsistent. The former calls the latter to wait until the backlog is
       processed. Despite the latter returns without doing anything when the
       socket is shutdown, the former doesn't check the shutdown state and
       just retries calling the latter forever.
      
      Patch:
       The patch below adds shutdown check into unix_socket_connect(), so
       connect(2) to the shutdown socket will return -ECONREFUSED.
      Signed-off-by: default avatarTomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
      Signed-off-by: default avatarMasanori Yoshida <masanori.yoshida.tv@hitachi.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77238f2b
    • Thomas Chou's avatar
      ethoc: clear only pending irqs · 50c54a57
      Thomas Chou authored
      This patch fixed the problem of dropped packets due to lost of
      interrupt requests. We should only clear what was pending at the
      moment we read the irq source reg.
      Signed-off-by: default avatarThomas Chou <thomas@wytron.com.tw>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c54a57
    • Thomas Chou's avatar
      ethoc: inline regs access · 16dd18b0
      Thomas Chou authored
      Signed-off-by: default avatarThomas Chou <thomas@wytron.com.tw>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16dd18b0
  6. 18 Oct, 2009 2 commits