1. 17 Sep, 2010 5 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · 343d04d4
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md:
        md: fix v1.x metadata update when a disk is missing.
        md: call md_update_sb even for 'external' metadata arrays.
      343d04d4
    • Al Viro's avatar
      arm: fix really nasty sigreturn bug · 653d48b2
      Al Viro authored
      If a signal hits us outside of a syscall and another gets delivered
      when we are in sigreturn (e.g. because it had been in sa_mask for
      the first one and got sent to us while we'd been in the first handler),
      we have a chance of returning from the second handler to location one
      insn prior to where we ought to return.  If r0 happens to contain -513
      (-ERESTARTNOINTR), sigreturn will get confused into doing restart
      syscall song and dance.
      
      Incredible joy to debug, since it manifests as random, infrequent and
      very hard to reproduce double execution of instructions in userland
      code...
      
      The fix is simple - mark it "don't bother with restarts" in wrapper,
      i.e. set r8 to 0 in sys_sigreturn and sys_rt_sigreturn wrappers,
      suppressing the syscall restart handling on return from these guys.
      They can't legitimately return a restart-worthy error anyway.
      
      Testcase:
      	#include <unistd.h>
      	#include <signal.h>
      	#include <stdlib.h>
      	#include <sys/time.h>
      	#include <errno.h>
      
      	void f(int n)
      	{
      		__asm__ __volatile__(
      			"ldr r0, [%0]\n"
      			"b 1f\n"
      			"b 2f\n"
      			"1:b .\n"
      			"2:\n" : : "r"(&n));
      	}
      
      	void handler1(int sig) { }
      	void handler2(int sig) { raise(1); }
      	void handler3(int sig) { exit(0); }
      
      	main()
      	{
      		struct sigaction s = {.sa_handler = handler2};
      		struct itimerval t1 = { .it_value = {1} };
      		struct itimerval t2 = { .it_value = {2} };
      
      		signal(1, handler1);
      
      		sigemptyset(&s.sa_mask);
      		sigaddset(&s.sa_mask, 1);
      		sigaction(SIGALRM, &s, NULL);
      
      		signal(SIGVTALRM, handler3);
      
      		setitimer(ITIMER_REAL, &t1, NULL);
      		setitimer(ITIMER_VIRTUAL, &t2, NULL);
      
      		f(-513); /* -ERESTARTNOINTR */
      
      		write(1, "buggered\n", 9);
      		return 1;
      	}
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      653d48b2
    • NeilBrown's avatar
      md: fix v1.x metadata update when a disk is missing. · ddcf3522
      NeilBrown authored
      If an array with 1.x metadata is assembled with the last disk missing,
      md doesn't properly record the fact that the disk was missing.
      
      This is unlikely to cause a real problem as the event count will be
      different to the count on the missing disk so it won't be included in
      the array.  However it could still cause confusion.
      
      So make sure we clear all the relevant slots, not just the early ones.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ddcf3522
    • NeilBrown's avatar
      md: call md_update_sb even for 'external' metadata arrays. · 126925c0
      NeilBrown authored
      Now that we depend on md_update_sb to clear variable bits in
      mddev->flags (rather than trying not to set them) it is important to
      always call md_update_sb when appropriate.
      
      md_check_recovery has this job but explicitly avoids it for ->external
      metadata arrays.  This is not longer appropraite, or needed.
      
      However we do want to avoid taking the mddev lock if only
      MD_CHANGE_PENDING is set as that is not cleared by md_update_sb for
      external-metadata arrays.
      Reported-by: default avatar"Kwolek, Adam" <adam.kwolek@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      126925c0
    • Linus Torvalds's avatar
      Merge branch 'x86-fixes-for-linus' of... · a5b61736
      Linus Torvalds authored
      Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        x86: hpet: Work around hardware stupidity
        x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
        x86, cpufeature: Suppress compiler warning with gcc 3.x
        x86, UV: Fix initialization of max_pnode
      a5b61736
  2. 16 Sep, 2010 10 commits
  3. 15 Sep, 2010 20 commits
  4. 14 Sep, 2010 5 commits
    • Jeff Layton's avatar
      cifs: fix potential double put of TCP session reference · 460cf341
      Jeff Layton authored
      cifs_get_smb_ses must be called on a server pointer on which it holds an
      active reference. It first does a search for an existing SMB session. If
      it finds one, it'll put the server reference and then try to ensure that
      the negprot is done, etc.
      
      If it encounters an error at that point then it'll return an error.
      There's a potential problem here though. When cifs_get_smb_ses returns
      an error, the caller will also put the TCP server reference leading to a
      double-put.
      
      Fix this by having cifs_get_smb_ses only put the server reference if
      it found an existing session that it could use and isn't returning an
      error.
      
      Cc: stable@kernel.org
      Reviewed-by: default avatarSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      460cf341
    • Roland McGrath's avatar
      x86-64, compat: Retruncate rax after ia32 syscall entry tracing · eefdca04
      Roland McGrath authored
      In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a
      32-bit tracee in system call entry.  A %rax value set via ptrace at the
      entry tracing stop gets used whole as a 32-bit syscall number, while we
      only check the low 32 bits for validity.
      
      Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
      in addition to testing the full 64 bits as has already been added.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      eefdca04
    • H. Peter Anvin's avatar
      x86-64, compat: Test %rax for the syscall number, not %eax · 36d001c7
      H. Peter Anvin authored
      On 64 bits, we always, by necessity, jump through the system call
      table via %rax.  For 32-bit system calls, in theory the system call
      number is stored in %eax, and the code was testing %eax for a valid
      system call number.  At one point we loaded the stored value back from
      the stack to enforce zero-extension, but that was removed in checkin
      d4d67150.  An actual 32-bit process
      will not be able to introduce a non-zero-extended number, but it can
      happen via ptrace.
      
      Instead of re-introducing the zero-extension, test what we are
      actually going to use, i.e. %rax.  This only adds a handful of REX
      prefixes to the code.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      36d001c7
    • H. Peter Anvin's avatar
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin authored
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
      c41d68a5
    • Thomas Gleixner's avatar
      x86: hpet: Work around hardware stupidity · 54ff7e59
      Thomas Gleixner authored
      This more or less reverts commits 08be9796 (x86: Force HPET
      readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
      read back to affected ATI chipsets) to the status of commit 8da854cb
      (x86, hpet: Erratum workaround for read after write of HPET
      comparator).
      
      The delta to commit 8da854cb is mostly comments and the change from
      WARN_ONCE to printk_once as we know the call path of this function
      already.
      
      This needs really in depth explanation:
      
      First of all the HPET design is a complete failure. Having a counter
      compare register which generates an interrupt on matching values
      forces the software to do at least one superfluous readback of the
      counter register.
      
      While it is nice in theory to program "absolute" time events it is
      practically useless because the timer runs at some absurd frequency
      which can never be matched to real world units. So we are forced to
      calculate a relative delta and this forces a readout of the actual
      counter value, adding the delta and programming the compare
      register. When the delta is small enough we run into the danger that
      we program a compare value which is already in the past. Due to the
      compare for equal nature of HPET we need to read back the counter
      value after writing the compare rehgister (btw. this is necessary for
      absolute timeouts as well) to make sure that we did not miss the timer
      event. We try to work around that by setting the minimum delta to a
      value which is larger than the theoretical time which elapses between
      the counter readout and the compare register write, but that's only
      true in theory. A NMI or SMI which hits between the readout and the
      write can easily push us beyond that limit. This would result in
      waiting for the next HPET timer interrupt until the 32bit wraparound
      of the counter happens which takes about 306 seconds.
      
      So we designed the next event function to look like:
      
         match = read_cnt() + delta;
         write_compare_ref(match);
         return read_cnt() < match ? 0 : -ETIME;
      
      At some point we got into trouble with certain ATI chipsets. Even the
      above "safe" procedure failed. The reason was that the write to the
      compare register was delayed probably for performance reasons. The
      theory was that they wanted to avoid the synchronization of the write
      with the HPET clock, which is understandable. So the write does not
      hit the compare register directly instead it goes to some intermediate
      register which is copied to the real compare register in sync with the
      HPET clock. That opens another window for hitting the dreaded "wait
      for a wraparound" problem.
      
      To work around that "optimization" we added a read back of the compare
      register which either enforced the update of the just written value or
      just delayed the readout of the counter enough to avoid the issue. We
      unfortunately never got any affirmative info from ATI/AMD about this.
      
      One thing is sure, that we nuked the performance "optimization" that
      way completely and I'm pretty sure that the result is worse than
      before some HW folks came up with those.
      
      Just for paranoia reasons I added a check whether the read back
      compare register value was the same as the value we wrote right
      before. That paranoia check triggered a couple of years after it was
      added on an Intel ICH9 chipset. Venki added a workaround (commit
      8da854cb) which was reading the compare register twice when the first
      check failed. We considered this to be a penalty in general and
      restricted the readback (thus the wasted CPU cycles) to the known to
      be affected ATI chipsets.
      
      This turned out to be a utterly wrong decision. 2.6.35 testers
      experienced massive problems and finally one of them bisected it down
      to commit 30a564be which spured some further investigation.
      
      Finally we got confirmation that the write to the compare register can
      be delayed by up to two HPET clock cycles which explains the problems
      nicely. All we can do about this is to go back to Venki's initial
      workaround in a slightly modified version.
      
      Just for the record I need to say, that all of this could have been
      avoided if hardware designers and of course the HPET committee would
      have thought about the consequences for a split second. It's out of my
      comprehension why designing a working timer is so hard. There are two
      ways to achieve it:
      
       1) Use a counter wrap around aware compare_reg <= counter_reg
          implementation instead of the easy compare_reg == counter_reg
      
          Downsides:
      
      	- It needs more silicon.
      
      	- It needs a readout of the counter to apply a relative
      	  timeout. This is necessary as the counter does not run in
      	  any useful (and adjustable) frequency and there is no
      	  guarantee that the counter which is used for timer events is
      	  the same which is used for reading the actual time (and
      	  therefor for calculating the delta)
      
          Upsides:
      
      	- None
      
        2) Use a simple down counter for relative timer events
      
          Downsides:
      
      	- Absolute timeouts are not possible, which is not a problem
      	  at all in the context of an OS and the expected
      	  max. latencies/jitter (also see Downsides of #1)
      
         Upsides:
      
      	- It needs less or equal silicon.
      
      	- It works ALWAYS
      
      	- It is way faster than a compare register based solution (One
      	  write versus one write plus at least one and up to four
      	  reads)
      
      I would not be so grumpy about all of this, if I would not have been
      ignored for many years when pointing out these flaws to various
      hardware folks. I really hate timers (at least those which seem to be
      designed by janitors).
      
      Though finally we got a reasonable explanation plus a solution and I
      want to thank all the folks involved in chasing it down and providing
      valuable input to this.
      Bisected-by: default avatarNix <nix@esperi.org.uk>
      Reported-by: default avatarArtur Skawina <art.08.09@gmail.com>
      Reported-by: default avatarDamien Wyart <damien.wyart@free.fr>
      Reported-by: default avatarJohn Drescher <drescherjm@gmail.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org
      Acked-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      54ff7e59