1. 25 May, 2012 2 commits
    • Dave Airlie's avatar
      dma-buf: add vmap interface · 98f86c9e
      Dave Airlie authored
      The main requirement I have for this interface is for scanning out
      using the USB gpu devices. Since these devices have to read the
      framebuffer on updates and linearly compress it, using kmaps
      is a major overhead for every update.
      
      v2: fix warn issues pointed out by Sylwester Nawrocki.
      
      v3: fix compile !CONFIG_DMA_SHARED_BUFFER and add _GPL for now
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Reviewed-by: default avatarRob Clark <rob.clark@linaro.org>
      Signed-off-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      98f86c9e
    • Daniel Vetter's avatar
      dma-buf: mmap support · 4c78513e
      Daniel Vetter authored
      Compared to Rob Clark's RFC I've ditched the prepare/finish hooks
      and corresponding ioctls on the dma_buf file. The major reason for
      that is that many people seem to be under the impression that this is
      also for synchronization with outstanding asynchronous processsing.
      I'm pretty massively opposed to this because:
      
      - It boils down reinventing a new rather general-purpose userspace
        synchronization interface. If we look at things like futexes, this
        is hard to get right.
      - Furthermore a lot of kernel code has to interact with this
        synchronization primitive. This smells a look like the dri1 hw_lock,
        a horror show I prefer not to reinvent.
      - Even more fun is that multiple different subsystems would interact
        here, so we have plenty of opportunities to create funny deadlock
        scenarios.
      
      I think synchronization is a wholesale different problem from data
      sharing and should be tackled as an orthogonal problem.
      
      Now we could demand that prepare/finish may only ensure cache
      coherency (as Rob intended), but that runs up into the next problem:
      We not only need mmap support to facilitate sw-only processing nodes
      in a pipeline (without jumping through hoops by importing the dma_buf
      into some sw-access only importer), which allows for a nicer
      ION->dma-buf upgrade path for existing Android userspace. We also need
      mmap support for existing importing subsystems to support existing
      userspace libraries. And a loot of these subsystems are expected to
      export coherent userspace mappings.
      
      So prepare/finish can only ever be optional and the exporter /needs/
      to support coherent mappings. Given that mmap access is always
      somewhat fallback-y in nature I've decided to drop this optimization,
      instead of just making it optional. If we demonstrate a clear need for
      this, supported by benchmark results, we can always add it in again
      later as an optional extension.
      
      Other differences compared to Rob's RFC is the above mentioned support
      for mapping a dma-buf through facilities provided by the importer.
      Which results in mmap support no longer being optional.
      
      Note that this dma-buf mmap patch does _not_ support every possible
      insanity an existing subsystem could pull of with mmap: Because it
      does not allow to intercept pagefaults and shoot down ptes importing
      subsystems can't add some magic of their own at these points (e.g. to
      automatically synchronize with outstanding rendering or set up some
      special resources). I've done a cursory read through a few mmap
      implementions of various subsytems and I'm hopeful that we can avoid
      this (and the complexity it'd bring with it).
      
      Additonally I've extended the documentation a bit to explain the hows
      and whys of this mmap extension.
      
      In case we ever want to add support for explicitly cache maneged
      userspace mmap with a prepare/finish ioctl pair, we could specify that
      userspace needs to mmap a different part of the dma_buf, e.g. the
      range starting at dma_buf->size up to dma_buf->size*2. This works
      because the size of a dma_buf is invariant over it's lifetime. The
      exporter would obviously need to fall back to coherent mappings for
      both ranges if a legacy clients maps the coherent range and the
      architecture cannot suppor conflicting caching policies. Also, this
      would obviously be optional and userspace needs to be able to fall
      back to coherent mappings.
      
      v2:
      - Spelling fixes from Rob Clark.
      - Compile fix for !DMA_BUF from Rob Clark.
      - Extend commit message to explain how explicitly cache managed mmap
        support could be added later.
      - Extend the documentation with implementations notes for exporters
        that need to manually fake coherency.
      
      v3:
      - dma_buf pointer initialization goof-up noticed by Rebecca Schultz
        Zavin.
      
      Cc: Rob Clark <rob.clark@linaro.org>
      Cc: Rebecca Schultz Zavin <rebecca@android.com>
      Acked-by: default avatarRob Clark <rob.clark@linaro.org>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      4c78513e
  2. 20 May, 2012 1 commit
  3. 19 May, 2012 10 commits
    • Linus Torvalds's avatar
      Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6 · d6c77973
      Linus Torvalds authored
      Pull PA-RISC fixes from James Bottomley:
       "This is a set of three bug fixes that gets parisc running again on
        systems with PA1.1 processors.
      
        Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug
        that only affects PA7300LC processors.  We also have another pending
        fix to do with the sectional arrangement of vmlinux.lds, but there's a
        query on it during testing on one particular system type, so I'll hold
        off sending it in for now."
      
      * tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
        [PARISC] fix panic on prefetch(NULL) on PA7300LC
        [PARISC] fix crash in flush_icache_page_asm on PA1.1
        [PARISC] fix PA1.1 oops on boot
      d6c77973
    • Linus Torvalds's avatar
      Merge branch 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5d120458
      Linus Torvalds authored
      Pull x86 linker bug workarounds from Peter Anvin.
      
      GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
      turns certain relocation entries absolute.  Section-relative symbols
      that are part of otherwise empty sections are silently changed them to
      absolute.  We rely on section-relative symbols staying section-relative,
      and actually have several sections in the linker script solely for this
      purpose.
      
      See for example
      
         http://sourceware.org/bugzilla/show_bug.cgi?id=14052
      
      We could just black-list the buggy linker, but it appears that it got
      shipped in at least F17, and possibly other distros too, so it's sadly
      not some rare unusual case.
      
      This backports the workaround from the x86/trampoline branch, and as
      Peter says: "This is not a minimal fix, not at all, but it is a tested
      code base."
      
      * 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, relocs: When printing an error, say relative or absolute
        x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
        x86, realmode: 16-bit real-mode code support for relocs tool
      
      (*) That's a manly release numbering system. Stupid, sure. But manly.
      5d120458
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 14e931a2
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "A few small, but important fixes.  Most of them are marked for stable
        as well
      
         - Fix failure to release a semaphore on error path in mtip32xx.
         - Fix crashable condition in bio_get_nr_vecs().
         - Don't mark end-of-disk buffers as mapped, limit it to i_size.
         - Fix for build problem with CONFIG_BLOCK=n on arm at least.
         - Fix for a buffer overlow on UUID partition printing.
         - Trivial removal of unused variables in dac960."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: fix buffer overflow when printing partition UUIDs
        Fix blkdev.h build errors when BLOCK=n
        bio allocation failure due to bio_get_nr_vecs()
        block: don't mark buffers beyond end of disk as mapped
        mtip32xx: release the semaphore on an error path
        dac960: Remove unused variables from DAC960_CreateProcEntries()
      14e931a2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a2ae9787
      Linus Torvalds authored
      Pull one more networking bug-fix from David Miller:
       "One last straggler.
      
        Eric Dumazet's pktgen unload oops fix was not entirely complete, but
        all the cases should be handled properly now....  fingers crossed."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        pktgen: fix module unload for good
      a2ae9787
    • Hugh Dickins's avatar
      memcg,thp: fix res_counter:96 regression · 62ade86a
      Hugh Dickins authored
      Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
      a flurry of hundreds of warnings at kernel/res_counter.c:96, where
      res_counter_uncharge_locked() does WARN_ON(counter->usage < val).
      
      The first trace of each flurry implicates __mem_cgroup_cancel_charge()
      of mc.precharge, and an audit of mc.precharge handling points to
      mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850
      ("memcg: avoid THP split in task migration").
      
      Checking !mc.precharge is good everywhere else, when a single page is to
      be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
      follow, is liable to result in underflow (a lot can change since the
      precharge was estimated).
      
      Simply check against HPAGE_PMD_NR: there's probably a better
      alternative, trying precharge for more, splitting if unsuccessful; but
      this one-liner is safer for now - no kernel/res_counter.c:96 warnings
      seen in 26 hours.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62ade86a
    • H. Peter Anvin's avatar
      x86, relocs: When printing an error, say relative or absolute · 24ab82bd
      H. Peter Anvin authored
      When the relocs tool throws an error, let the error message say if it
      is an absolute or relative symbol.  This should make it a lot more
      clear what action the programmer needs to take and should help us find
      the reason if additional symbol bugs show up.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      24ab82bd
    • H. Peter Anvin's avatar
      x86, relocs: Workaround for binutils 2.22.52.0.1 section bug · a3e854d9
      H. Peter Anvin authored
      GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
      section-relative to absolute if they are in a section of zero length.
      This turns the symbols __init_begin and __init_end into absolute
      symbols.  Let the relocs program know that those should be treated as
      relative symbols.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
      a3e854d9
    • H. Peter Anvin's avatar
      x86, realmode: 16-bit real-mode code support for relocs tool · 6520fe55
      H. Peter Anvin authored
      A new option is added to the relocs tool called '--realmode'.
      This option causes the generation of 16-bit segment relocations
      and 32-bit linear relocations for the real-mode code. When
      the real-mode code is moved to the low-memory during kernel
      initialization, these relocation entries can be used to
      relocate the code properly.
      
      In the assembly code 16-bit segment relocations must be relative
      to the 'real_mode_seg' absolute symbol. Linear relocations must be
      relative to a symbol prefixed with 'pa_'.
      
      16-bit segment relocation is used to load cs:ip in 16-bit code.
      Linear relocations are used in the 32-bit code for relocatable
      data references. They are declared in the linker script of the
      real-mode code.
      
      The relocs tool is moved to arch/x86/tools/relocs.c, and added new
      target archscripts that can be used to build scripts needed building
      an architecture.  be compiled before building the arch/x86 tree.
      
      [ hpa: accelerating this because it detects invalid absolute
        relocations, a serious bug in binutils 2.22.52.0.x which currently
        produces bad kernels. ]
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.comSigned-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      6520fe55
    • Linus Torvalds's avatar
      Merge tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · b1dab2f0
      Linus Torvalds authored
      Pull a dm fix from Alasdair G Kergon:
       "A fix to the thin provisioning userspace interface."
      
      * tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm thin: fix table output when pool target disables discard passdown internally
      b1dab2f0
    • Mike Snitzer's avatar
      dm thin: fix table output when pool target disables discard passdown internally · f402693d
      Mike Snitzer authored
      When the thin pool target clears the discard_passdown parameter
      internally, it incorrectly changes the table line reported to userspace.
      This breaks dumb string comparisons on these table lines in generic
      userspace device-mapper library code and leads to tables being reloaded
      repeatedly when nothing is actually meant to be changing.
      
      This patch corrects this by no longer changing the table line when
      discard passdown was disabled.
      
      We can still tell when discard passdown is overridden by looking for the
      message "Discard unsupported by data device (sdX): Disabling discard passdown."
      
      This automatic detection is also moved from the 'load' to the 'resume'
      so that it is re-evaluated should the properties of underlying devices
      change.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      f402693d
  4. 18 May, 2012 12 commits
    • Linus Torvalds's avatar
      Merge tag 'md-3.4-fixes' of git://neil.brown.name/md · 2f05af8b
      Linus Torvalds authored
      Pull one more md bugfix from NeilBrown:
       "Fix bug in recent fix to RAID10.
      
        Without this patch, recovery will crash"
      
      * tag 'md-3.4-fixes' of git://neil.brown.name/md:
        md/raid10: fix transcription error in calc_sectors conversion.
      2f05af8b
    • Linus Torvalds's avatar
      Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 8394edf3
      Linus Torvalds authored
      Pull tile tree bugfix from Chris Metcalf:
       "This fixes a security vulnerability (and correctness bug) in tilegx"
      
      * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
        tilegx: enable SYSCALL_WRAPPERS support
      8394edf3
    • NeilBrown's avatar
      md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5
      NeilBrown authored
      The old code was
      		sector_div(stride, fc);
      the new code was
      		sector_dir(size, conf->near_copies);
      
      'size' is right (the stride various wasn't really needed), but
      'fc' means 'far_copies', and that is an important difference.
      
      Signed-off-by: NeilBrown <neilb@suse.de>       
      b0d634d5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch-bomb) · 73f1f5dd
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton.
      
      * emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
        frv: delete incorrect task prototypes causing compile fail
        slub: missing test for partial pages flush work in flush_all()
        fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
        drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
      73f1f5dd
    • Linus Torvalds's avatar
      proc: move fd symlink i_mode calculations into tid_fd_revalidate() · 30a08bf2
      Linus Torvalds authored
      Instead of doing the i_mode calculations at proc_fd_instantiate() time,
      move them into tid_fd_revalidate(), which is where the other inode state
      (notably uid/gid information) is updated too.
      
      Otherwise we'll end up with stale i_mode information if an fd is re-used
      while the dentry still hangs around.  Not that anything really *cares*
      (symlink permissions don't really matter), but Tetsuo Handa noticed that
      the owner read/write bits don't always match the state of the
      readability of the file descriptor, and we _used_ to get this right a
      long time ago in a galaxy far, far away.
      
      Besides, aside from fixing an ugly detail (that has apparently been this
      way since commit 61a28784: "proc: Remove the hard coded inode
      numbers" in 2006), this removes more lines of code than it adds.  And it
      just makes sense to update i_mode in the same place we update i_uid/gid.
      
      Al Viro correctly points out that we could just do the inode fill in the
      inode iops ->getattr() function instead.  However, that does require
      somewhat slightly more invasive changes, and adds yet *another* lookup
      of the file descriptor.  We need to do the revalidate() for other
      reasons anyway, and have the file descriptor handy, so we might as well
      fill in the information at this point.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarEric Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30a08bf2
    • Eric Dumazet's avatar
      pktgen: fix module unload for good · d4b11335
      Eric Dumazet authored
      commit c57b5468 (pktgen: fix crash at module unload) did a very poor
      job with list primitives.
      
      1) list_splice() arguments were in the wrong order
      
      2) list_splice(list, head) has undefined behavior if head is not
      initialized.
      
      3) We should use the list_splice_init() variant to clear pktgen_threads
      list.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4b11335
    • Chris Metcalf's avatar
      tilegx: enable SYSCALL_WRAPPERS support · e6d9668e
      Chris Metcalf authored
      Some discussion with the glibc mailing lists revealed that this was
      necessary for 64-bit platforms with MIPS-like sign-extension rules
      for 32-bit values.  The original symptom was that passing (uid_t)-1 to
      setreuid() was failing in programs linked -pthread because of the "setxid"
      mechanism for passing setxid-type function arguments to the syscall code.
      SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
      proper sign-extension and is thus the appropriate fix for this problem.
      
      On other platforms (s390, powerpc, sparc64, and mips) this was fixed
      in 2.6.28.6.  The general issue is tracked as CVE-2009-0029.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      e6d9668e
    • Linus Torvalds's avatar
      Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 3d994497
      Linus Torvalds authored
      Pull a machine check recovery fix from Tony Luck.
      
      I really don't like how the MCE code does some of the things it does,
      but this does seem to be an improvement.
      
      * tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        x86/mce: Only restart instruction after machine check recovery if it is safe
      3d994497
    • Paul Gortmaker's avatar
      frv: delete incorrect task prototypes causing compile fail · 93c2d656
      Paul Gortmaker authored
      Commit 41101809 ("fork: Provide weak arch_release_[task_struct|
      thread_info] functions") in -tip highlights a problem in the frv arch,
      where it has needles prototypes for alloc_task_struct_node and
      free_task_struct.  This now shows up as:
      
        kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
        kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration
      
      since that commit turned them into real functions.  Since arch/frv does
      does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e.  it just
      uses the generic ones) it shouldn't list these at all.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93c2d656
    • majianpeng's avatar
      slub: missing test for partial pages flush work in flush_all() · 02e1a9cd
      majianpeng authored
      I found some kernel messages such as:
      
          SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
          Pid: 6143, comm: mdadm Tainted: G           O 3.4.0-rc6+        #75
          Call Trace:
          kmem_cache_destroy+0x328/0x400
          free_conf+0x2d/0xf0 [raid456]
          stop+0x41/0x60 [raid456]
          md_stop+0x1a/0x60 [md_mod]
          do_md_stop+0x74/0x470 [md_mod]
          md_ioctl+0xff/0x11f0 [md_mod]
          blkdev_ioctl+0xd8/0x7a0
          block_ioctl+0x3b/0x40
          do_vfs_ioctl+0x96/0x560
          sys_ioctl+0x91/0xa0
          system_call_fastpath+0x16/0x1b
      
      Then using kmemleak I found these messages:
      
          unreferenced object 0xffff8800b6db7380 (size 112):
            comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
            hex dump (first 32 bytes):
              01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff  .....N..........
              ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff  .........@J.....
            backtrace:
              kmemleak_alloc+0x21/0x50
              kmem_cache_alloc+0xeb/0x1b0
              kmem_cache_open+0x2f1/0x430
              kmem_cache_create+0x158/0x320
              setup_conf+0x649/0x770 [raid456]
              run+0x68b/0x840 [raid456]
              md_run+0x529/0x940 [md_mod]
              do_md_run+0x18/0xc0 [md_mod]
              md_ioctl+0xba8/0x11f0 [md_mod]
              blkdev_ioctl+0xd8/0x7a0
              block_ioctl+0x3b/0x40
              do_vfs_ioctl+0x96/0x560
              sys_ioctl+0x91/0xa0
              system_call_fastpath+0x16/0x1b
      
      This bug was introduced by commit a8364d55 ("slub: only IPI CPUs that
      have per cpu obj to flush"), which did not include checks for per cpu
      partial pages being present on a cpu.
      Signed-off-by: default avatarmajianpeng <majianpeng@gmail.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Tested-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02e1a9cd
    • Cyrill Gorcunov's avatar
      fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries · eb94cd96
      Cyrill Gorcunov authored
      map_files/ entries are never supposed to be executed, still curious
      minds might try to run them, which leads to the following deadlock
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.4.0-rc4-24406-g841e6a6 #121 Not tainted
        -------------------------------------------------------
        bash/1556 is trying to acquire lock:
         (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1
      
        but task is already holding lock:
         (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&sig->cred_guard_mutex){+.+.+.}:
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_killable_nested+0x40/0x45
               lock_trace+0x24/0x59
               proc_map_files_lookup+0x5a/0x165
               __lookup_hash+0x52/0x73
               do_lookup+0x276/0x2b1
               walk_component+0x3d/0x114
               do_last+0xfc/0x540
               path_openat+0xd3/0x306
               do_filp_open+0x3d/0x89
               do_sys_open+0x74/0x106
               sys_open+0x21/0x23
               tracesys+0xdd/0xe2
      
        -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
               check_prev_add+0x6a/0x1ef
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_nested+0x40/0x45
               do_lookup+0x267/0x2b1
               walk_component+0x3d/0x114
               link_path_walk+0x1f9/0x48f
               path_openat+0xb6/0x306
               do_filp_open+0x3d/0x89
               open_exec+0x25/0xa0
               do_execve_common+0xea/0x2f9
               do_execve+0x43/0x45
               sys_execve+0x43/0x5a
               stub_execve+0x6c/0xc0
      
      This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
      and when do_lookup happens we try to grab task->signal->cred_guard_mutex
      again in lock_trace.
      
      Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
      and in proc_map_files_readdir() instead of lock_trace(), the caller must
      be CAP_SYS_ADMIN granted anyway.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb94cd96
    • Rajkumar Kasirajan's avatar
      drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01 · c0a5f4a0
      Rajkumar Kasirajan authored
      The reset date of the ST Micro version of PL031 is 2000-01-01.  The
      correct weekday for 2000-01-01 is saturday, but pl031 is initialized to
      sunday.  This may lead to alarm malfunction, so configure the correct
      wday if RTC_DR indicates reset.
      Signed-off-by: default avatarRajkumar Kasirajan <rajkumar.kasirajan@stericsson.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: Mattias Wallin <mattias.wallin@stericsson.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0a5f4a0
  5. 17 May, 2012 15 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 42ea7d7f
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "Small set of fixes again."
      
      * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
        ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
        ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
        ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
        ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
      42ea7d7f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 39c20285
      Linus Torvalds authored
      Pull two networking fixes from David S. Miller:
      
      1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
         been present in do_tcp_sendpages() since that function was written in
         2002.
      
         When we block to wait for memory we have to unconditionally try and
         push out pending TCP data, otherwise we can block for an unreasonably
         long amount of time.
      
      2) Fix deadlock in e1000, fixes kernel bugzilla 43132
      
         From Tushar Dave.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        e1000: Prevent reset task killing itself.
        tcp: do_tcp_sendpages() must try to push data out on oom conditions
      39c20285
    • Rafael J. Wysocki's avatar
      ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold · 5c7dd710
      Rafael J. Wysocki authored
      Commit 1cc0c998 ("ACPI: Fix D3hot v D3cold confusion") introduced a
      bug in __acpi_bus_set_power() and changed the behavior of
      acpi_pci_set_power_state() in such a way that it generally doesn't work
      as expected if PCI_D3hot is passed to it as the second argument.
      
      First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
      __acpi_bus_set_power() and the explicit_set flag is set for the D3cold
      state, the function will try to execute AML method called "_PS4", which
      doesn't exist.
      
      Fix this by adding a check to ensure that the name of the AML method
      to execute for transitions to ACPI_STATE_D3_COLD is correct in
      __acpi_bus_set_power().  Also make sure that the explicit_set flag
      for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
      acpi_power_transition() to avoid accessing power resources for
      ACPI_STATE_D3_COLD, because they don't exist.
      
      Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
      target state, the function will request a transition to
      ACPI_STATE_D3_HOT instead of ACPI_STATE_D3.  However,
      ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
      method is defined for the given device, which is rare.  This causes
      problems to happen on systems where devices were successfully put
      into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
      now.  In particular, some unused graphics adapters are not turned
      off as a result.
      
      To fix this issue restore the old behavior of
      acpi_pci_set_power_state(), which is to request a transition to
      ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
      PCI_D3cold is passed to it as the argument.
      
      This approach is not ideal, because generally power should not
      be removed from devices if PCI_D3hot is the target power state,
      but since this behavior is relied on, we have no choice but to
      restore it at the moment and spend more time on designing a
      better solution in the future.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=43228Reported-by: default avatarrocko <rockorequin@hotmail.com>
      Reported-by: default avatarCristian Rodríguez <crrodriguez@opensuse.org>
      Reported-and-tested-by: default avatarPeter <lekensteyn@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c7dd710
    • Tushar Dave's avatar
      e1000: Prevent reset task killing itself. · 8ce6909f
      Tushar Dave authored
      Killing reset task while adapter is resetting causes deadlock.
      Only kill reset task if adapter is not resetting.
      Ref bug #43132 on bugzilla.kernel.org
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ce6909f
    • Willy Tarreau's avatar
      tcp: do_tcp_sendpages() must try to push data out on oom conditions · bad115cf
      Willy Tarreau authored
      Since recent changes on TCP splicing (starting with commits 2f533844
      "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
      tcp_sendpages() should call tcp_push() once"), I started seeing
      massive stalls when forwarding traffic between two sockets using
      splice() when pipe buffers were larger than socket buffers.
      
      Latest changes (net: netdev_alloc_skb() use build_skb()) made the
      problem even more apparent.
      
      The reason seems to be that if do_tcp_sendpages() fails on out of memory
      condition without being able to send at least one byte, tcp_push() is not
      called and the buffers cannot be flushed.
      
      After applying the attached patch, I cannot reproduce the stalls at all
      and the data rate it perfectly stable and steady under any condition
      which previously caused the problem to be permanent.
      
      The issue seems to have been there since before the kernel migrated to
      git, which makes me think that the stalls I occasionally experienced
      with tux during stress-tests years ago were probably related to the
      same issue.
      
      This issue was first encountered on 3.0.31 and 3.2.17, so please backport
      to -stable.
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      bad115cf
    • Linus Torvalds's avatar
      Merge branch '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · eea03647
      Linus Torvalds authored
      Pull two more target-core updates from Nicholas Bellinger:
       "The first patch addresses a SPC-2 reservations RELEASE bug in a
        special (iscsi specific) multi-ISID setup case that was allowing the
        same initiator to be able to incorrect release it's own reservation on
        a different SCSI path with enforce_pr_isid=1 operation.  This bug was
        caught by Bernhard Kohl.
      
        The second patch is to address a bug with FILEIO backends where the
        incorrect number of blocks for READ_CAPACITY was being reported after
        an underlying device-mapper block_device size change.  This patch uses
        now i_size_read() in fd_get_blocks() for FILEIO backends with an
        underlying block_device, instead of trying to determine this value at
        setup time during fd_create_virtdevice().  (hch CC'ed)
      
        Both are CC'ed to stable."
      
      * '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix bug in handling of FILEIO + block_device resize ops
        target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups
      eea03647
    • Nicholas Bellinger's avatar
      target: Fix bug in handling of FILEIO + block_device resize ops · cd9323fd
      Nicholas Bellinger authored
      This patch fixes a bug in the handling of FILEIO w/ underlying block_device
      resize operations where the original fd_dev->fd_dev_size was incorrectly being
      used in fd_get_blocks() for READ_CAPACITY response payloads.
      
      This patch avoids using fd_dev->fd_dev_size for FILEIO devices with
      an underlying block_device, and instead changes fd_get_blocks() to
      get the sector count directly from i_size_read() as recommended by hch.
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      cd9323fd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 1be5f0b7
      Linus Torvalds authored
      Pull slave-dmaengine fixes fromVinod Koul:
       "fixes of cylic dma usages in slave dma drivers"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: fix cyclic dma usage
        dmaengine: pl330: dont complete descriptor for cyclic dma
      1be5f0b7
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 42e8b9c0
      Linus Torvalds authored
      Pull last minute virtio fixes from Michael S. Tsirkin:
       "Here are a couple of last minute virtio fixes for 3.4.  Hope it's not
        too late yes - I might have tried too hard to make sure the fix is
        well tested.
      
        Fixes are by Amit and myself.  One fixes module removal and one
        suspend of a VM, the last one the handling of out of memory condition.
      
        They are thus very low risk as most people never hit these paths, but
        do fix very annoying problems for people that do use the feature.
      
        Signed-off-by: Michael S. Tsirkin <mst@redhat.com>"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_net: invoke softirqs after __napi_schedule
        virtio: balloon: let host know of updated balloon size before module removal
        virtio: console: tell host of open ports after resume from s3/s4
      42e8b9c0
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 674ff517
      Linus Torvalds authored
      Pull ARM: SoC fixes from Olof Johansson:
       "I will stop trying to predict when we're done with fixes for a
        release.
      
        Here's another small batch of three patches for arm-soc:
      
         - A fix for a boot time WARN_ON() due to irq domain conversion on
           PRIMA2
         - Fix for a regression in Tegra SMP spinup code due to swapped
           register offsets
         - Fixed config dependency for mv_cesa crypto driver to avoid build
           breakage"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller
        crypto: mv_cesa requires on CRYPTO_HASH to build
        ARM: tegra: Fix flow controller accesses
      674ff517
    • Linus Torvalds's avatar
      Merge tag 'md-3.4-fixes' of git://neil.brown.name/md · 36a1987c
      Linus Torvalds authored
      Pull two md fixes from NeilBrown:
       "One fixes a bug in the new raid10 resize code so is relevant to 3.4
        only.
      
        The other fixes a bug in the use of md by dm-raid, so is relevant to
        any kernel with dm-raid support"
      
      * tag 'md-3.4-fixes' of git://neil.brown.name/md:
        MD: Add del_timer_sync to mddev_suspend (fix nasty panic)
        md/raid10: set dev_sectors properly when resizing devices in array.
      36a1987c
    • Linus Torvalds's avatar
      Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and... · 31ae9835
      Linus Torvalds authored
      Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull perf, x86 and scheduler updates from Ingo Molnar.
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tracing: Do not enable function event with enable
        perf stat: handle ENXIO error for perf_event_open
        perf: Turn off compiler warnings for flex and bison generated files
        perf stat: Fix case where guest/host monitoring is not supported by kernel
        perf build-id: Fix filename size calculation
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
        x86: Fix section annotation of acpi_map_cpu2node()
        x86/microcode: Ensure that module is only loaded on supported Intel CPUs
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
      31ae9835
    • Will Deacon's avatar
      ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path · 56cb2484
      Will Deacon authored
      Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits
      on entry to sig handler") flushes the VFP state prior to entering a
      signal handler so that a VFP operation inside the handler will trap and
      force a restore of ABI-compliant registers. Reflushing and disabling VFP
      on the sigreturn path is predicated on the saved thread state indicating
      that VFP was used by the handler -- however for SMP platforms this is
      only set on context-switch, making the check unreliable and causing VFP
      register corruption in userspace since the register values are not
      necessarily those restored from the sigframe.
      
      This patch unconditionally flushes the VFP state after a signal handler.
      Since we already perform the flush before the handler and the flushing
      itself happens lazily, the redundant flush when VFP is not used by the
      handler is essentially a nop.
      Reported-by: default avatarJon Medhurst <tixy@linaro.org>
      Signed-off-by: default avatarJon Medhurst <tixy@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      56cb2484
    • Vitaly Andrianov's avatar
      ARM: 7418/1: LPAE: fix access flag setup in mem_type_table · 1a3abcf4
      Vitaly Andrianov authored
      A zero value for prot_sect in the memory types table implies that
      section mappings should never be created for the memory type in question.
      This is checked for in alloc_init_section().
      
      With LPAE, we set a bit to mask access flag faults for kernel mappings.
      This breaks the aforementioned (!prot_sect) check in alloc_init_section().
      
      This patch fixes this bug by first checking for a non-zero
      prot_sect before setting the PMD_SECT_AF flag.
      Signed-off-by: default avatarVitaly Andrianov <vitalya@ti.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      1a3abcf4
    • Michael S. Tsirkin's avatar
      virtio_net: invoke softirqs after __napi_schedule · ec13ee80
      Michael S. Tsirkin authored
      __napi_schedule might raise softirq but nothing
      causes do_softirq to trigger, so it does not in fact
      run. As a result,
      the error message "NOHZ: local_softirq_pending 08"
      sometimes occurs during boot of a KVM guest when the network service is
      started and we are oom:
      
        ...
        Bringing up loopback interface:  [  OK  ]
        Bringing up interface eth0:
        Determining IP information for eth0...NOHZ: local_softirq_pending 08
         done.
        [  OK  ]
        ...
      
      Further, receive queue processing might get delayed
      indefinitely until some interrupt triggers:
      virtio_net expected napi to be run immediately.
      
      One way to cause do_softirq to be executed is by
      invoking local_bh_enable(). As __napi_schedule is
      normally called from bh or irq context, this
      seems to make sense: disable bh before __napi_schedule
      and enable afterwards.
      
      In fact it's a very complicated way of calling do_softirq(),
      and works since this function is only used when we are not
      in interrupt context.  It's not hot at all, in any ideal scenario.
      Reported-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Tested-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ec13ee80