1. 02 Aug, 2010 23 commits
  2. 05 Jul, 2010 17 commits
    • Greg Kroah-Hartman's avatar
      Linux 2.6.32.16 · 6c708176
      Greg Kroah-Hartman authored
      6c708176
    • Wei Yongjun's avatar
      sctp: fix append error cause to ERROR chunk correctly · a0bda22f
      Wei Yongjun authored
      commit 2e3219b5 upstream.
      
      commit 5fa782c2
        sctp: Fix skb_over_panic resulting from multiple invalid \
          parameter errors (CVE-2010-1173) (v4)
      
      cause 'error cause' never be add the the ERROR chunk due to
      some typo when check valid length in sctp_init_cause_fixed().
      Signed-off-by: default avatarWei Yongjun <yjwei@cn.fujitsu.com>
      Reviewed-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      a0bda22f
    • Ben Hutchings's avatar
      qla2xxx: Disable MSI on qla24xx chips other than QLA2432. · 966399a8
      Ben Hutchings authored
      commit 6377a7ae upstream.
      
      On specific platforms, MSI is unreliable on some of the QLA24xx chips, resulting
      in fatal I/O errors under load, as reported in <http://bugs.debian.org/572322>
      and by some RHEL customers.
      Signed-off-by: default avatarGiridhar Malavali <giridhar.malavali@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      966399a8
    • Toshiyuki Okajima's avatar
      KEYS: find_keyring_by_name() can gain access to a freed keyring · 48b97a01
      Toshiyuki Okajima authored
      commit cea7daa3 upstream.
      
      find_keyring_by_name() can gain access to a keyring that has had its reference
      count reduced to zero, and is thus ready to be freed.  This then allows the
      dead keyring to be brought back into use whilst it is being destroyed.
      
      The following timeline illustrates the process:
      
      |(cleaner)                           (user)
      |
      | free_user(user)                    sys_keyctl()
      |  |                                  |
      |  key_put(user->session_keyring)     keyctl_get_keyring_ID()
      |  ||	//=> keyring->usage = 0        |
      |  |schedule_work(&key_cleanup_task)   lookup_user_key()
      |  ||                                   |
      |  kmem_cache_free(,user)               |
      |  .                                    |[KEY_SPEC_USER_KEYRING]
      |  .                                    install_user_keyrings()
      |  .                                    ||
      | key_cleanup() [<= worker_thread()]    ||
      |  |                                    ||
      |  [spin_lock(&key_serial_lock)]        |[mutex_lock(&key_user_keyr..mutex)]
      |  |                                    ||
      |  atomic_read() == 0                   ||
      |  |{ rb_ease(&key->serial_node,) }     ||
      |  |                                    ||
      |  [spin_unlock(&key_serial_lock)]      |find_keyring_by_name()
      |  |                                    |||
      |  keyring_destroy(keyring)             ||[read_lock(&keyring_name_lock)]
      |  ||                                   |||
      |  |[write_lock(&keyring_name_lock)]    ||atomic_inc(&keyring->usage)
      |  |.                                   ||| *** GET freeing keyring ***
      |  |.                                   ||[read_unlock(&keyring_name_lock)]
      |  ||                                   ||
      |  |list_del()                          |[mutex_unlock(&key_user_k..mutex)]
      |  ||                                   |
      |  |[write_unlock(&keyring_name_lock)]  ** INVALID keyring is returned **
      |  |                                    .
      |  kmem_cache_free(,keyring)            .
      |                                       .
      |                                       atomic_dec(&keyring->usage)
      v                                         *** DESTROYED ***
      TIME
      
      If CONFIG_SLUB_DEBUG=y then we may see the following message generated:
      
      	=============================================================================
      	BUG key_jar: Poison overwritten
      	-----------------------------------------------------------------------------
      
      	INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b
      	INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086
      	INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10
      	INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3
      	INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300
      
      	Bytes b4 0xffff880197a7e1f0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
      	  Object 0xffff880197a7e200:  6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk
      
      Alternatively, we may see a system panic happen, such as:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      	IP: [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
      	PGD 6b2b4067 PUD 6a80d067 PMD 0
      	Oops: 0000 [#1] SMP
      	last sysfs file: /sys/kernel/kexec_crash_loaded
      	CPU 1
      	...
      	Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY
      	RIP: 0010:[<ffffffff810e61a3>]  [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
      	RSP: 0018:ffff88006af3bd98  EFLAGS: 00010002
      	RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b
      	RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430
      	RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000
      	R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0
      	R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce
      	FS:  00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      	Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0)
      	Stack:
      	 0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001
      	 0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce
      	 0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3
      	Call Trace:
      	 [<ffffffff810f20ce>] ? get_empty_filp+0x70/0x12f
      	 [<ffffffff810face3>] ? do_filp_open+0x145/0x590
      	 [<ffffffff810ce208>] ? tlb_finish_mmu+0x2a/0x33
      	 [<ffffffff810ce43c>] ? unmap_region+0xd3/0xe2
      	 [<ffffffff810e4393>] ? virt_to_head_page+0x9/0x2d
      	 [<ffffffff81103916>] ? alloc_fd+0x69/0x10e
      	 [<ffffffff810ef4ed>] ? do_sys_open+0x56/0xfc
      	 [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b
      	Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef
      	RIP  [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
      
      This problem is that find_keyring_by_name does not confirm that the keyring is
      valid before accepting it.
      
      Skipping keyrings that have been reduced to a zero count seems the way to go.
      To this end, use atomic_inc_not_zero() to increment the usage count and skip
      the candidate keyring if that returns false.
      
      The following script _may_ cause the bug to happen, but there's no guarantee
      as the window of opportunity is small:
      
      	#!/bin/sh
      	LOOP=100000
      	USER=dummy_user
      	/bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; }
      	for ((i=0; i<LOOP; i++))
      	do
      		/bin/su -c "echo '$i' > /dev/null" $USER
      	done
      	(( add == 1 )) && /usr/sbin/userdel -r $USER
      	exit
      
      Note that the nominated user must not be in use.
      
      An alternative way of testing this may be:
      
      	for ((i=0; i<100000; i++))
      	do
      		keyctl session foo /bin/true || break
      	done >&/dev/null
      
      as that uses a keyring named "foo" rather than relying on the user and
      user-session named keyrings.
      Reported-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      48b97a01
    • Dan Carpenter's avatar
      KEYS: Return more accurate error codes · ec098d19
      Dan Carpenter authored
      commit 4d09ec0f upstream.
      
      We were using the wrong variable here so the error codes weren't being returned
      properly.  The original code returns -ENOKEY.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ec098d19
    • Mikulas Patocka's avatar
      dm snapshot: simplify sector_to_chunk expression · 31f1b308
      Mikulas Patocka authored
      commit 102c6ddb upstream.
      
      Removed unnecessary 'and' masking: The right shift discards the lower
      bits so there is no need to clear them.
      
      (A later patch needs this change to support a 32-bit chunk_mask.)
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      31f1b308
    • Helge Deller's avatar
      parisc: clear floating point exception flag on SIGFPE signal · 3cbc7919
      Helge Deller authored
      commit 550f0d92 upstream.
      
      Clear the floating point exception flag before returning to
      user space. This is needed, else the libc trampoline handler
      may hit the same SIGFPE again while building up a trampoline
      to a signal handler.
      
      Fixes debian bug #559406.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      3cbc7919
    • Yin Kangkai's avatar
      jbd: jbd-debug and jbd2-debug should be writable · 36d28220
      Yin Kangkai authored
      commit 765f8361 upstream.
      
      jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not
      correct. Make it writable so that we can start debuging.
      Signed-off-by: default avatarYin Kangkai <kangkai.yin@intel.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      36d28220
    • Roedel, Joerg's avatar
      KVM: x86: Inject #GP with the right rip on efer writes · fbec9e1f
      Roedel, Joerg authored
      This patch fixes a bug in the KVM efer-msr write path. If a
      guest writes to a reserved efer bit the set_efer function
      injects the #GP directly. The architecture dependent wrmsr
      function does not see this, assumes success and advances the
      rip. This results in a #GP in the guest with the wrong rip.
      This patch fixes this by reporting efer write errors back to
      the architectural wrmsr function.
      Signed-off-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit b69e8cae)
      fbec9e1f
    • Avi Kivity's avatar
      KVM: x86: Add missing locking to arch specific vcpu ioctls · c86db80a
      Avi Kivity authored
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit 8fbf065d)
      c86db80a
    • Avi Kivity's avatar
      KVM: Fix wallclock version writing race · 0890bb8d
      Avi Kivity authored
      Wallclock writing uses an unprotected global variable to hold the version;
      this can cause one guest to interfere with another if both write their
      wallclock at the same time.
      Acked-by: default avatarGlauber Costa <glommer@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit 9ed3c444)
      0890bb8d
    • Avi Kivity's avatar
      KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots · 4a277f9c
      Avi Kivity authored
      On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.
      
      Push the spinlock into mmu_alloc_roots(), and only take it after we've read
      the pdptr.
      Tested-by: default avatarJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit 8facbbff)
      4a277f9c
    • Shane Wang's avatar
      KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) · 66307ba1
      Shane Wang authored
      Per document, for feature control MSR:
      
        Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
              of VMXON in SMX operation causes a general-protection exception.
        Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
              of VMXON outside SMX operation causes a general-protection exception.
      
      This patch is to enable this kind of check with SMX for VMXON in KVM.
      Signed-off-by: default avatarShane Wang <shane.wang@intel.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit cafd6659)
      66307ba1
    • Avi Kivity's avatar
      KVM: MMU: Segregate shadow pages with different cr0.wp · 3b271148
      Avi Kivity authored
      When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
      having u/s=0 and r/w=1.  This allows excessive access if the guest sets
      cr0.wp=1 and accesses through this spte.
      
      Fix by making cr0.wp part of the base role; we'll have different sptes for
      the two cases and the problem disappears.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit 3dbe1415)
      3b271148
    • Sheng Yang's avatar
      KVM: x86: Check LMA bit before set_efer · e4a13296
      Sheng Yang authored
      kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the
      checking of LMA bit didn't work.
      Signed-off-by: default avatarSheng Yang <sheng@linux.intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit a3d204e2)
      e4a13296
    • Avi Kivity's avatar
      KVM: Don't allow lmsw to clear cr0.pe · 90a08dc7
      Avi Kivity authored
      The current lmsw implementation allows the guest to clear cr0.pe, contrary
      to the manual, which breaks EMM386.EXE.
      
      Fix by ORing the old cr0.pe with lmsw's operand.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit f78e9176)
      90a08dc7
    • Glauber Costa's avatar
      x86, paravirt: Add a global synchronization point for pvclock · 1345126c
      Glauber Costa authored
      In recent stress tests, it was found that pvclock-based systems
      could seriously warp in smp systems. Using ingo's time-warp-test.c,
      I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
      (to be fair, it wasn't that bad in most of them). Investigating further, I
      found out that such warps were caused by the very offset-based calculation
      pvclock is based on.
      
      This happens even on some machines that report constant_tsc in its tsc flags,
      specially on multi-socket ones.
      
      Two reads of the same kernel timestamp at approx the same time, will likely
      have tsc timestamped in different occasions too. This means the delta we
      calculate is unpredictable at best, and can probably be smaller in a cpu
      that is legitimately reading clock in a forward ocasion.
      
      Some adjustments on the host could make this window less likely to happen,
      but still, it pretty much poses as an intrinsic problem of the mechanism.
      
      A while ago, I though about using a shared variable anyway, to hold clock
      last state, but gave up due to the high contention locking was likely
      to introduce, possibly rendering the thing useless on big machines. I argue,
      however, that locking is not necessary.
      
      We do a read-and-return sequence in pvclock, and between read and return,
      the global value can have changed. However, it can only have changed
      by means of an addition of a positive value. So if we detected that our
      clock timestamp is less than the current global, we know that we need to
      return a higher one, even though it is not exactly the one we compared to.
      
      OTOH, if we detect we're greater than the current time source, we atomically
      replace the value with our new readings. This do causes contention on big
      boxes (but big here means *BIG*), but it seems like a good trade off, since
      it provide us with a time source guaranteed to be stable wrt time warps.
      
      After this patch is applied, I don't see a single warp in time during 5 days
      of execution, in any of the machines I saw them before.
      Signed-off-by: default avatarGlauber Costa <glommer@redhat.com>
      Acked-by: default avatarZachary Amsden <zamsden@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      CC: Avi Kivity <avi@redhat.com>
      CC: Marcelo Tosatti <mtosatti@redhat.com>
      CC: Zachary Amsden <zamsden@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      (cherry picked from commit 489fb490)
      1345126c