1. 01 Oct, 2011 1 commit
  2. 30 Sep, 2011 9 commits
    • Peter Zijlstra's avatar
      posix-cpu-timers: Cure SMP wobbles · d670ec13
      Peter Zijlstra authored
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$ 
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      
      Aside of that it makes the function safe on 32 bit systems. The old
      code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
      64bit value and could be changed on another cpu at the same time.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d670ec13
    • Ram Pai's avatar
      Resource: fix wrong resource window calculation · 47ea91b4
      Ram Pai authored
      __find_resource() incorrectly returns a resource window which overlaps
      an existing allocated window.  This happens when the parent's
      resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
      to all its children resource-windows.
      
      __find_resource() looks for gaps in resource allocation among the
      children resource windows.  When it encounters the last child window it
      blindly tries the range next to one allocated to the last child.  Since
      the last child's window ends at 0xffffffff the calculation overflows,
      leading the algorithm to believe that any window in the range 0x0000000
      to 0xfffffff is available for allocation.  This leads to a conflicting
      window allocation.
      
      Michal Ludvig reported this issue seen on his platform.  The following
      patch fixes the problem and has been verified by Michal.  I believe this
      bug has been there for ages.  It got exposed by git commit 2bbc6942
      ("PCI : ability to relocate assigned pci-resources")
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Tested-by: default avatarMichal Ludvig <mludvig@logix.net.nz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47ea91b4
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://github.com/NewDreamNetwork/ceph-client · 92bb062f
      Linus Torvalds authored
      * 'for-linus' of git://github.com/NewDreamNetwork/ceph-client:
        libceph: fix pg_temp mapping update
        libceph: fix pg_temp mapping calculation
        libceph: fix linger request requeuing
        libceph: fix parse options memory leak
        libceph: initialize ack_stamp to avoid unnecessary connection reset
      92bb062f
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus · 7409b713
      Linus Torvalds authored
      * 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
        [media] omap3isp: Fix build error in ispccdc.c
        [media] uvcvideo: Fix crash when linking entities
        [media] v4l: Make sure we hold a reference to the v4l2_device before using it
        [media] v4l: Fix use-after-free case in v4l2_device_release
        [media] uvcvideo: Set alternate setting 0 on resume if the bus has been reset
        [media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2
      7409b713
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 · 0ecdb12a
      Linus Torvalds authored
      * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
        [S390] cio: fix cio_tpi ignoring adapter interrupts
        [S390] gmap: always up mmap_sem properly
        [S390] Do not clobber personality flags on exec
      0ecdb12a
    • Linus Torvalds's avatar
      Merge git://github.com/davem330/sparc · 5fe858b5
      Linus Torvalds authored
      * git://github.com/davem330/sparc:
        sparc64: Force the execute bit in OpenFirmware's translation entries.
        sparc: Make '-p' boot option meaningful again.
        sparc, exec: remove redundant addr_limit assignment
        sparc64: Future proof Niagara cpu detection.
      5fe858b5
    • Linus Torvalds's avatar
      Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux · 8e8e500f
      Linus Torvalds authored
      * 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux:
        drm/i915: FBC off for ironlake and older, otherwise on by default
        drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI
        drm/i915: Enable dither whenever display bpc < frame buffer bpc
      8e8e500f
    • Benjamin Herrenschmidt's avatar
      powerpc: Fix device-tree matching for Apple U4 bridge · 16fa42af
      Benjamin Herrenschmidt authored
      Apple Quad G5 has some oddity in it's device-tree which causes the new
      generic matching code to fail to relate nodes for PCI-E devices below U4
      with their respective struct pci_dev.  This breaks graphics on those
      machines among others.
      
      This fixes it using a quirk which copies the node pointer from the host
      bridge for the root complex, which makes the generic code work for the
      children afterward.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16fa42af
    • wangyanqing's avatar
      bootup: move 'usermodehelper_enable()' a little earlier · b0f84374
      wangyanqing authored
      Commit d5767c53 ("bootup: move 'usermodehelper_enable()' to the end
      of do_basic_setup()") moved 'usermodehelper_enable()' to end of
      do_basic_setup() to after the initcalls.  But then I get failed to let
      uvesafb work on my computer, and lose the splash boot.
      
      So maybe we could start usermodehelper_enable a little early to make
      some task work that need eary init with the help of user mode.
      
      [ I would *really* prefer that initcalls not call into user space - even
        the real 'init' hasn't been execve'd yet, after all! But for uvesafb
        it really does look like we don't have much choice.
      
        I considered doing this when we mount the root filesystem, but
        depending on config options that is in multiple places.  We could do
        the usermode helper enable as a rootfs_initcall()..
      
        So I'm just using wang yanqing's trivial patch.  It's not wonderful,
        but it's simple and should work.  We should revisit this some day,
        though.      - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0f84374
  3. 29 Sep, 2011 1 commit
    • David S. Miller's avatar
      sparc64: Force the execute bit in OpenFirmware's translation entries. · f4142cba
      David S. Miller authored
      In the OF 'translations' property, the template TTEs in the mappings
      never specify the executable bit.  This is the case even though some
      of these mappings are for OF's code segment.
      
      Therefore, we need to force the execute bit on in every mapping.
      
      This problem can only really trigger on Niagara/sun4v machines and the
      history behind this is a little complicated.
      
      Previous to sun4v, the sun4u TTE entries lacked a hardware execute
      permission bit.  So OF didn't have to ever worry about setting
      anything to handle executable pages.  Any valid TTE loaded into the
      I-TLB would be respected by the chip.
      
      But sun4v Niagara chips have a real hardware enforced executable bit
      in their TTEs.  So it has to be set or else the I-TLB throws an
      instruction access exception with type code 6 (protection violation).
      
      We've been extremely fortunate to not get bitten by this in the past.
      
      The best I can tell is that the OF's mappings for it's executable code
      were mapped using permanent locked mappings on sun4v in the past.
      Therefore, the fact that we didn't have the exec bit set in the OF
      translations we would use did not matter in practice.
      
      Thanks to Greg Onufer for helping me track this down.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4142cba
  4. 28 Sep, 2011 9 commits
    • Linus Torvalds's avatar
      bootup: move 'usermodehelper_enable()' to the end of do_basic_setup() · d5767c53
      Linus Torvalds authored
      Doing it just before starting to call into cpu_idle() made a sick kind
      of sense only because the original bug we fixed (see commit
      288d5abe: "Boot up with usermodehelper disabled") was about problems
      with some scheduler data structures not being initialized, and they had
      better be initialized at that point.
      
      But it really didn't make any other conceptual sense, and doing it after
      the initial "schedule()" call for the idle thread actually opened up a
      race: what if the main initialization thread did everything without
      needing to sleep, and got all the way into user land too? Without
      actually having scheduled back to the idle thread?
      
      Now, in normal circumstances that doesn't ever happen, but it looks like
      Richard Cochran triggered exactly that on his ARM IXP4xx machines:
      
        "I have some ARM IXP4xx based machines that use the two on chip MAC
         ports (aka NPEs).  The NPE needs a firmware in order to function.
         Ever since the following commit [that 288d5abe one], it is no
         longer possible to bring up the interfaces during the init scripts."
      
      with a call trace showing an ioctl coming from user space. Richard says:
      
        "The init is busybox, and the startup script does mount, syslogd, and
         then ifup, so that all can go by quickly."
      
      The fix is to move the usermodehelper_enable() into the main 'init'
      thread, and just put it after we've done all our initcalls.  By then,
      everything really should be up, but we've obviously not actually started
      the user-mode portion of init yet.
      Reported-and-tested-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5767c53
    • Sage Weil's avatar
      libceph: fix pg_temp mapping update · 8adc8b3d
      Sage Weil authored
      The incremental map updates have a record for each pg_temp mapping that is
      to be add/updated (len > 0) or removed (len == 0).  The old code was
      written as if the updates were a complete enumeration; that was just wrong.
      Update the code to remove 0-length entries and drop the rbtree traversal.
      
      This avoids misdirected (and hung) requests that manifest as server
      errors like
      
      [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      8adc8b3d
    • Sage Weil's avatar
      libceph: fix pg_temp mapping calculation · 782e182e
      Sage Weil authored
      We need to apply the modulo pg_num calculation before looking up a pgid in
      the pg_temp mapping rbtree.  This fixes pg_temp mappings, and fixes
      (some) misdirected requests that result in messages like
      
      [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
      
      on the server and stall make the client block without getting a reply (at
      least until the pg_temp mapping goes way, but that can take a long long
      time).
      
      Reorder calc_pg_raw() a bit to make more sense.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      782e182e
    • Linus Torvalds's avatar
      Merge git://github.com/davem330/net · 2ef7b45a
      Linus Torvalds authored
      * git://github.com/davem330/net:
        ipv6-multicast: Fix memory leak in IPv6 multicast.
        ipv6: check return value for dst_alloc
        net: check return value for dst_alloc
        ipv6-multicast: Fix memory leak in input path.
        bnx2x: add missing break in bnx2x_dcbnl_get_cap
        bnx2x: fix WOL by enablement PME in config space
        bnx2x: fix hw attention handling
        net: fix a typo in Documentation/networking/scaling.txt
        ath9k: Fix a dma warning/memory leak
        rtlwifi: rtl8192cu: Fix unitialized struct
        iwlagn: fix dangling scan request
        batman-adv: do_bcast has to be true for broadcast packets only
        cfg80211: Fix validation of AKM suites
        iwlegacy: do not use interruptible waits
        iwlegacy: fix command queue timeout
        ath9k_hw: Fix Rx DMA stuck for AR9003 chips
      2ef7b45a
    • Linus Torvalds's avatar
      Merge git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6 · 07117e30
      Linus Torvalds authored
      * git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
        [SCSI] 3w-9xxx: fix iommu_iova leak
        [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
        [SCSI] scsi: qla4xxx needs libiscsi.o
        [SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
        [SCSI] aacraid: reset should disable MSI interrupt
      07117e30
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · c54a06d4
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: Free queue resources at blk_release_queue()
      c54a06d4
    • Linus Torvalds's avatar
      Merge branch 'writeback-for-linus' of git://github.com/fengguang/linux · e689ec80
      Linus Torvalds authored
      * 'writeback-for-linus' of git://github.com/fengguang/linux:
        writeback: show raw dirtied_when in trace writeback_single_inode
      e689ec80
    • Hannes Reinecke's avatar
      block: Free queue resources at blk_release_queue() · 777eb1bf
      Hannes Reinecke authored
      A kernel crash is observed when a mounted ext3/ext4 filesystem is
      physically removed. The problem is that blk_cleanup_queue() frees up
      some resources eg by calling elevator_exit(), which are not checked for
      in normal operation. So we should rather move these calls to the
      destructor function blk_release_queue() as at that point all remaining
      references are gone. However, in doing so we have to ensure that any
      externally supplied queue_lock is disconnected as the driver might free
      up the lock after the call of blk_cleanup_queue(),
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      777eb1bf
    • David S. Miller's avatar
  5. 27 Sep, 2011 18 commits
  6. 26 Sep, 2011 2 commits