1. 26 Aug, 2010 6 commits
  2. 20 Aug, 2010 3 commits
  3. 13 Aug, 2010 31 commits
    • Greg Kroah-Hartman's avatar
      Linux 2.6.32.19 · c6f69a47
      Greg Kroah-Hartman authored
      c6f69a47
    • Linus Torvalds's avatar
      x86: don't send SIGBUS for kernel page faults · 495b5936
      Linus Torvalds authored
      commit 96054569 upstream.
      
      It's wrong for several reasons, but the most direct one is that the
      fault may be for the stack accesses to set up a previous SIGBUS.  When
      we have a kernel exception, the kernel exception handler does all the
      fixups, not some user-level signal handler.
      
      Even apart from the nested SIGBUS issue, it's also wrong to give out
      kernel fault addresses in the signal handler info block, or to send a
      SIGBUS when a system call already returns EFAULT.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      495b5936
    • Linus Torvalds's avatar
      mm: fix missing page table unmap for stack guard page failure case · ab832422
      Linus Torvalds authored
      commit 5528f913 upstream.
      
      .. which didn't show up in my tests because it's a no-op on x86-64 and
      most other architectures.  But we enter the function with the last-level
      page table mapped, and should unmap it at exit.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ab832422
    • Linus Torvalds's avatar
      mm: keep a guard page below a grow-down stack segment · 7e281afe
      Linus Torvalds authored
      commit 320b2b8d upstream.
      
      This is a rather minimally invasive patch to solve the problem of the
      user stack growing into a memory mapped area below it.  Whenever we fill
      the first page of the stack segment, expand the segment down by one
      page.
      
      Now, admittedly some odd application might _want_ the stack to grow down
      into the preceding memory mapping, and so we may at some point need to
      make this a process tunable (some people might also want to have more
      than a single page of guarding), but let's try the minimal approach
      first.
      
      Tested with trivial application that maps a single page just below the
      stack, and then starts recursing.  Without this, we will get a SIGSEGV
      _after_ the stack has smashed the mapping.  With this patch, we'll get a
      nice SIGBUS just as the stack touches the page just above the mapping.
      Requested-by: default avatarKeith Packard <keithp@keithp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7e281afe
    • KAMEZAWA Hiroyuki's avatar
      mm: fix corruption of hibernation caused by reusing swap during image saving · 46dc12d5
      KAMEZAWA Hiroyuki authored
      commit 966cca02 upstream.
      
      Since 2.6.31, swap_map[]'s refcounting was changed to show that a used
      swap entry is just for swap-cache, can be reused.  Then, while scanning
      free entry in swap_map[], a swap entry may be able to be reclaimed and
      reused.  It was caused by commit c9e44410 ("mm: reuse unused swap
      entry if necessary").
      
      But this caused deta corruption at resume. The scenario is
      
      - Assume a clean-swap cache, but mapped.
      
      - at hibernation_snapshot[], clean-swap-cache is saved as
        clean-swap-cache and swap_map[] is marked as SWAP_HAS_CACHE.
      
      - then, save_image() is called.  And reuse SWAP_HAS_CACHE entry to save
        image, and break the contents.
      
      After resume:
      
      - the memory reclaim runs and finds clean-not-referenced-swap-cache and
        discards it because it's marked as clean.  But here, the contents on
        disk and swap-cache is inconsistent.
      
      Hance memory is corrupted.
      
      This patch avoids the bug by not reclaiming swap-entry during hibernation.
      This is a quick fix for backporting.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Reported-by: default avatarOndreg Zary <linux@rainbow-software.org>
      Tested-by: default avatarOndreg Zary <linux@rainbow-software.org>
      Tested-by: default avatarAndrea Gelmini <andrea.gelmini@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      
      46dc12d5
    • NeilBrown's avatar
      md/raid1: delay reads that could overtake behind-writes. · 8c2b26e4
      NeilBrown authored
      commit e555190d upstream.
      
      When a raid1 array is configured to support write-behind
      on some devices, it normally only reads from other devices.
      If all devices are write-behind (because the rest have failed)
      it is possible for a read request to be serviced before a
      behind-write request, which would appear as data corruption.
      
      So when forced to read from a WriteMostly device, wait for any
      write-behind to complete, and don't start any more behind-writes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      
      8c2b26e4
    • Brian King's avatar
      ibmvfc: Reduce error recovery timeout · 0c5210f7
      Brian King authored
      commit daa142d1 upstream.
      
      If a command times out resulting in EH getting invoked, we wait for the
      aborted commands to come back after sending the abort. Shorten
      the amount of time we wait for these responses, to ensure we don't
      get stuck in EH for several minutes.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      0c5210f7
    • Brian King's avatar
      ibmvfc: Fix command completion handling · d006b64e
      Brian King authored
      commit f5832fa2 upstream.
      
      Commands which are completed by the VIOS are placed on a CRQ
      in kernel memory for the ibmvfc driver to process. Each CRQ
      entry is 16 bytes. The ibmvfc driver reads the first 8 bytes
      to check if the entry is valid, then reads the next 8 bytes to get
      the handle, which is a pointer the completed command. This fixes
      an issue seen on Power 7 where the processor reordered the
      loads from memory, resulting in processing command completion
      with a stale handle. This could result in command timeouts,
      and also early completion of commands.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d006b64e
    • Hannes Reinecke's avatar
      aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree · b92f4435
      Hannes Reinecke authored
      commit 534ef056 upstream.
      
      When removing several devices aic79xx will occasionally Oops
      in ahd_handle_nonpkt_busfree during rescan. Looking at the
      code I found that we're indeed not checking if the scb in
      question is NULL. So check for it before accessing it.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b92f4435
    • Nikanth Karthikesan's avatar
      loop: Update mtime when writing using aops · bde7acea
      Nikanth Karthikesan authored
      commit 02246c41 upstream.
      
      Update mtime when writing to backing filesystem using the address space
      operations write_begin and write_end.
      Signed-off-by: default avatarNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      bde7acea
    • Sachin Prabhu's avatar
      Skip check for mandatory locks when unlocking · 1fe6910c
      Sachin Prabhu authored
      commit ee860b6a upstream.
      
      ocfs2_lock() will skip locks on file which has mode set to 02666. This
      is a problem in cases where the mode of the file is changed after a
      process has obtained a lock on the file.
      
      ocfs2_lock() should skip the check for mandatory locks when unlocking a
      file.
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1fe6910c
    • Jan Kara's avatar
      ocfs2: Set MS_POSIXACL on remount · bce8a761
      Jan Kara authored
      commit 57b09bb5 upstream.
      
      We have to set MS_POSIXACL on remount as well. Otherwise VFS
      would not know we started supporting ACLs after remount and
      thus ACLs would not work.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      bce8a761
    • Tao Ma's avatar
      ocfs2: Find proper end cpos for a leaf refcount block. · 7b2212b2
      Tao Ma authored
      commit 38a04e43 upstream.
      
      ocfs2 refcount tree is stored as an extent tree while
      the leaf ocfs2_refcount_rec points to a refcount block.
      
      The following step can trip a kernel panic.
      mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE
      mount -t ocfs2 $DEVICE $MNT_DIR
      FILE_NAME=$RANDOM
      FILE_NAME_1=$RANDOM
      FILE_REF="${FILE_NAME}_ref"
      FILE_REF_1="${FILE_NAME}_ref_1"
      for((i=0;i<305;i++))
      do
      # /mnt/1048576 is a file with 1048576 sizes.
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
      done
      for((i=0;i<3;i++))
      do
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
      done
      
      for((i=0;i<2;i++))
      do
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
      done
      
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
      
      for((i=0;i<11;i++))
      do
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
      cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
      done
      reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF
      # write_f is a program which will write some bytes to a file at offset.
      # write_f -f file_name -l offset -w write_bytes.
      ./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096
      ./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096
      ./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096
      ./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096
      ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
      reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1
      ./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
      #kernel panic here.
      
      The reason is that if the ocfs2_extent_rec is the last record
      in a leaf extent block, the old solution fails to find the
      suitable end cpos. So this patch try to walk through the b-tree,
      find the next sub root and get the c_pos the next sub-tree starts
      from.
      
      btw, I have runned tristan's test case against the patched kernel
      for several days and this type of kernel panic never happens again.
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7b2212b2
    • David Teigland's avatar
      dlm: send reply before bast · 8b52a196
      David Teigland authored
      commit cf6620ac upstream.
      
      When the lock master processes a successful operation (request,
      convert, cancel, or unlock), it will process the effects of the
      change before sending the reply for the operation.  The "effects"
      of the operation are:
      
      - blocking callbacks (basts) for any newly granted locks
      - waiting or converting locks that can now be granted
      
      The cast is queued on the local node when the reply from the lock
      master is received.  This means that a lock holder can receive a
      bast for a lock mode that is doesn't yet know has been granted.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      8b52a196
    • David Teigland's avatar
      dlm: fix ordering of bast and cast · 6ce7a93b
      David Teigland authored
      commit 7fe2b319 upstream.
      
      When both blocking and completion callbacks are queued for lock,
      the dlm would always deliver the completion callback (cast) first.
      In some cases the blocking callback (bast) is queued before the
      cast, though, and should be delivered first.  This patch keeps
      track of the order in which they were queued and delivers them
      in that order.
      
      This patch also keeps track of the granted mode in the last cast
      and eliminates the following bast if the bast mode is compatible
      with the preceding cast mode.  This happens when a remotely mastered
      lock is demoted, e.g. EX->NL, in which case the local node queues
      a cast immediately after sending the demote message.  In this way
      a cast can be queued for a mode, e.g. NL, that makes an in-transit
      bast extraneous.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6ce7a93b
    • David Teigland's avatar
      dlm: always use GFP_NOFS · d53f5912
      David Teigland authored
      commit 573c24c4 upstream.
      
      Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
      ls_allocation would be GFP_KERNEL for userland lockspaces
      and GFP_NOFS for file system lockspaces.
      
      It was discovered that any lockspaces on the system can
      affect all others by triggering memory reclaim in the
      file system which could in turn call back into the dlm
      to acquire locks, deadlocking dlm threads that were
      shared by all lockspaces, like dlm_recv.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d53f5912
    • Jeff Mahoney's avatar
      reiserfs: fix oops while creating privroot with selinux enabled · bd91f592
      Jeff Mahoney authored
      commit 6cb4aff0 upstream.
      
      Commit 57fe60df ("reiserfs: add atomic addition of selinux attributes
      during inode creation") contains a bug that will cause it to oops when
      mounting a file system that didn't previously contain extended attributes
      on a system using security.* xattrs.
      
      The issue is that while creating the privroot during mount
      reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
      dereferences the xattr root.  The xattr root doesn't exist, so we get an
      oops.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15309Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      bd91f592
    • Jeff Mahoney's avatar
      reiserfs: properly honor read-only devices · ee0f79dd
      Jeff Mahoney authored
      commit 3f8b5ee3 upstream.
      
      The reiserfs journal behaves inconsistently when determining whether to
      allow a mount of a read-only device.
      
      This is due to the use of the continue_replay variable to short circuit
      the journal scanning.  If it's set, it's assumed that there are
      transactions to replay, but there may not be.  If it's unset, it's assumed
      that there aren't any, and that may not be the case either.
      
      I've observed two failure cases:
      1) Where a clean file system on a read-only device refuses to mount
      2) Where a clean file system on a read-only device passes the
         optimization and then tries writing the journal header to update
         the latest mount id.
      
      The former is easily observable by using a freshly created file system on
      a read-only loopback device.
      
      This patch moves the check into journal_read_transaction, where it can
      bail out before it's about to replay a transaction.  That way it can go
      through and skip transactions where appropriate, yet still refuse to mount
      a file system with outstanding transactions.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ee0f79dd
    • Eric Sandeen's avatar
      ext4: Fix optional-arg mount options · 670a13a7
      Eric Sandeen authored
      commit 15121c18 upstream.
      
      We have 2 mount options, "barrier" and "auto_da_alloc" which may or
      may not take a 1/0 argument.  This causes the ext4 superblock mount
      code to subtract uninitialized pointers and pass the result to
      kmalloc, which results in very noisy failures.
      
      Per Ted's suggestion, initialize the args struct so that
      we know whether match_token() found an argument for the
      option, and skip match_int() if not.
      
      Also, return error (0) from parse_options if we thought
      we found an argument, but match_int() Fails.
      Reported-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      670a13a7
    • Theodore Ts'o's avatar
      ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files · c60ca623
      Theodore Ts'o authored
      commit 1f5a81e4 upstream.
      
      Dan Roseberg has reported a problem with the MOVE_EXT ioctl.  If the
      donor file is an append-only file, we should not allow the operation
      to proceed, lest we end up overwriting the contents of an append-only
      file.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c60ca623
    • Darrick J. Wong's avatar
      ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0 · a234580f
      Darrick J. Wong authored
      commit 455c0d71 upstream.
      
      Earlier, Ingo Molnar posted a patch to make it so that the kernel would avoid
      reading _PPC on his broken T60.  Unfortunately, it seems that with Thomas
      Renninger's patch last July to eliminate _PPC evaluations when the processor
      driver loads, the kernel never actually reads _PPC at all!  This is problematic
      if you happen to boot your non-T60 computer in a state where the BIOS _wants_
      _PPC to be something other than zero.
      
      So, put the _PPC evaluation back into acpi_processor_get_performance_info if
      ignore_ppc isn't 1.
      Signed-off-by: default avatarDarrick J. Wong <djwong@us.ibm.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Acked-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      a234580f
    • Breno Leitao's avatar
      powerpc/eeh: Fix a bug when pci structure is null · 876c10ad
      Breno Leitao authored
      commit 8d3d50bf upstream.
      
      During a EEH recover, the pci_dev structure can be null, mainly if an
      eeh event is detected during cpi config operation. In this case, the
      pci_dev will not be known (and will be null) the kernel will crash
      with the following message:
      
      Unable to handle kernel paging request for data at address 0x000000a0
      Faulting instruction address: 0xc00000000006b8b4
      Oops: Kernel access of bad area, sig: 11 [#1]
      
      NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0
      LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
      Call Trace:
      [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0
      [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70
      
      The bug occurs because pci_name() tries to access a null pointer.
      This patch just guarantee that pci_name() is not called on Null pointers.
      Signed-off-by: default avatarBreno Leitao <leitao@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinas Vepstas <linasvepstas@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      876c10ad
    • Wu Fengguang's avatar
      HWPOISON: abort on failed unmap · d0cddca7
      Wu Fengguang authored
      commit 1668bfd5 upstream.
      
      Don't try to isolate a still mapped page. Otherwise we will hit the
      BUG_ON(page_mapped(page)) in __remove_from_page_cache().
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d0cddca7
    • Wu Fengguang's avatar
      HWPOISON: remove the anonymous entry · b30604b9
      Wu Fengguang authored
      commit 9b9a29ec upstream.
      
      (PG_swapbacked && !PG_lru) pages should not happen.
      Better to treat them as unknown pages.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b30604b9
    • Eric W. Biederman's avatar
      x86: Fix out of order of gsi · b0cac079
      Eric W. Biederman authored
      commit fad53995 upstream.
      
      Iranna D Ankad reported that IBM x3950 systems have boot
      problems after this commit:
      
       |
       | commit b9c61b70
       |
       |    x86/pci: update pirq_enable_irq() to setup io apic routing
       |
      
      The problem is that with the patch, the machine freezes when
      console=ttyS0,... kernel serial parameter is passed.
      
      It seem to freeze at DVD initialization and the whole problem
      seem to be DVD/pata related, but somehow exposed through the
      serial parameter.
      
      Such apic problems can expose really weird behavior:
      
        ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0])
        IOAPIC[0]: apic_id 16, version 0, address 0xfecff000, GSI 0-2
        ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3])
        IOAPIC[1]: apic_id 15, version 0, address 0xfec00000, GSI 3-38
        ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39])
        IOAPIC[2]: apic_id 14, version 0, address 0xfec01000, GSI 39-74
        ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl)
      
      It turns out that the system has three io apic controllers, but
      boot ioapic routing is in the second one, and that gsi_base is
      not 0 - it is using a bunch of INT_SRC_OVR...
      
      So these recent changes:
      
       1. one set routing for first io apic controller
       2. assume irq = gsi
      
      ... will break that system.
      
      So try to remap those gsis, need to seperate boot_ioapic_idx
      detection out of enable_IO_APIC() and call them early.
      
      So introduce boot_ioapic_idx, and remap_ioapic_gsi()...
      
       -v2: shift gsi with delta instead of gsi_base of boot_ioapic_idx
      
       -v3: double check with find_isa_irq_apic(0, mp_INT) to get right
            boot_ioapic_idx
      
       -v4: nr_legacy_irqs
      
       -v5: add print out for boot_ioapic_idx, and also make it could be
            applied for current kernel and previous kernel
      
       -v6: add bus_irq, in acpi_sci_ioapic_setup, so can get overwride
            for sci right mapping...
      
       -v7: looks like pnpacpi get irq instead of gsi, so need to revert
            them back...
      
       -v8: split into two patches
      
       -v9: according to Eric, use fixed 16 for shifting instead of remap
      
       -v10: still need to touch rsparser.c
      
       -v11: just revert back to way Eric suggest...
            anyway the ioapic in first ioapic is blocked by second...
      
       -v12: two patches, this one will add more loop but check apic_id and irq > 16
      Reported-by: default avatarIranna D Ankad <iranna.ankad@in.ibm.com>
      Bisected-by: default avatarIranna D Ankad <iranna.ankad@in.ibm.com>
      Tested-by: default avatarGary Hade <garyhade@us.ibm.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: len.brown@intel.com
      LKML-Reference: <4B8A321A.1000008@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b0cac079
    • Shaohui Zheng's avatar
      memory hotplug: fix a bug on /dev/mem for 64-bit kernels · 768cde06
      Shaohui Zheng authored
      commit ea085417 upstream.
      
      Newly added memory can not be accessed via /dev/mem, because we do not
      update the variables high_memory, max_pfn and max_low_pfn.
      
      Add a function update_end_of_memory_vars() to update these variables for
      64-bit kernels.
      
      [akpm@linux-foundation.org: simplify comment]
      Signed-off-by: default avatarShaohui Zheng <shaohui.zheng@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Li Haicheng <haicheng.li@intel.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      768cde06
    • Song Youquan's avatar
      crypto: testmgr - Fix complain about lack test for internal used algorithm · 07049d16
      Song Youquan authored
      commit 863b557a upstream.
      
      When load aesni-intel and ghash_clmulni-intel driver,kernel will complain no
       test for some internal used algorithm.
      The strange information as following:
      
      alg: No test for __aes-aesni (__driver-aes-aesni)
      alg: No test for __ecb-aes-aesni (__driver-ecb-aes-aesni)
      alg: No test for __cbc-aes-aesni (__driver-cbc-aes-aesni)
      alg: No test for __ecb-aes-aesni (cryptd(__driver-ecb-aes-aesni)
      alg: No test for __ghash (__ghash-pclmulqdqni)
      alg: No test for __ghash (cryptd(__ghash-pclmulqdqni))
      
      This patch add NULL test entries for these algorithm and driver.
      Signed-off-by: default avatarSong Youquan <youquan.song@intel.com>
      Signed-off-by: default avatarHang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      07049d16
    • FUJITA Tomonori's avatar
      fix SBA IOMMU to handle allocation failure properly · 588809a6
      FUJITA Tomonori authored
      commit e2a46567 upstream.
      
      It's possible that SBA IOMMU might fail to find I/O space under heavy
      I/Os.  SBA IOMMU panics on allocation failure but it shouldn't; drivers
      can handle the failure.  The majority of other IOMMU drivers don't panic
      on allocation failure.
      
      This patch fixes SBA IOMMU path to handle allocation failure properly.
      Signed-off-by: default avatarFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarLeonardo Chiquitto <lchiquitto@novell.com>
      Acked-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      588809a6
    • Benjamin Herrenschmidt's avatar
      mutex: Don't spin when the owner CPU is offline or other weird cases · f40bf5f2
      Benjamin Herrenschmidt authored
      commit 4b402210 upstream.
      
      Due to recent load-balancer changes that delay the task migration to
      the next wakeup, the adaptive mutex spinning ends up in a live lock
      when the owner's CPU gets offlined because the cpu_online() check
      lives before the owner running check.
      
      This patch changes mutex_spin_on_owner() to return 0 (don't spin) in
      any case where we aren't sure about the owner struct validity or CPU
      number, and if the said CPU is offline. There is no point going back &
      re-evaluate spinning in corner cases like that, let's just go to
      sleep.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1271212509.13059.135.camel@pasglop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      f40bf5f2
    • Hidetoshi Seto's avatar
      sched, cputime: Introduce thread_group_times() · 19eb722b
      Hidetoshi Seto authored
      commit 0cf55e1e upstream.
      
      This is a real fix for problem of utime/stime values decreasing
      described in the thread:
      
         http://lkml.org/lkml/2009/11/3/522
      
      Now cputime is accounted in the following way:
      
       - {u,s}time in task_struct are increased every time when the thread
         is interrupted by a tick (timer interrupt).
      
       - When a thread exits, its {u,s}time are added to signal->{u,s}time,
         after adjusted by task_times().
      
       - When all threads in a thread_group exits, accumulated {u,s}time
         (and also c{u,s}time) in signal struct are added to c{u,s}time
         in signal struct of the group's parent.
      
      So {u,s}time in task struct are "raw" tick count, while
      {u,s}time and c{u,s}time in signal struct are "adjusted" values.
      
      And accounted values are used by:
      
       - task_times(), to get cputime of a thread:
         This function returns adjusted values that originates from raw
         {u,s}time and scaled by sum_exec_runtime that accounted by CFS.
      
       - thread_group_cputime(), to get cputime of a thread group:
         This function returns sum of all {u,s}time of living threads in
         the group, plus {u,s}time in the signal struct that is sum of
         adjusted cputimes of all exited threads belonged to the group.
      
      The problem is the return value of thread_group_cputime(),
      because it is mixed sum of "raw" value and "adjusted" value:
      
        group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
      
      This misbehavior can break {u,s}time monotonicity.
      Assume that if there is a thread that have raw values greater
      than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
      but only runs 45ms) and if it exits, cputime will decrease (e.g.
      -5ms).
      
      To fix this, we could do:
      
        group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
      
      But task_times() contains hard divisions, so applying it for
      every thread should be avoided.
      
      This patch fixes the above problem in the following way:
      
       - Modify thread's exit (= __exit_signal()) not to use task_times().
         It means {u,s}time in signal struct accumulates raw values instead
         of adjusted values.  As the result it makes thread_group_cputime()
         to return pure sum of "raw" values.
      
       - Introduce a new function thread_group_times(*task, *utime, *stime)
         that converts "raw" values of thread_group_cputime() to "adjusted"
         values, in same calculation procedure as task_times().
      
       - Modify group's exit (= wait_task_zombie()) to use this introduced
         thread_group_times().  It make c{u,s}time in signal struct to
         have adjusted values like before this patch.
      
       - Replace some thread_group_cputime() by thread_group_times().
         This replacements are only applied where conveys the "adjusted"
         cputime to users, and where already uses task_times() near by it.
         (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
      
      This patch have a positive side effect:
      
       - Before this patch, if a group contains many short-life threads
         (e.g. runs 0.9ms and not interrupted by ticks), the group's
         cputime could be invisible since thread's cputime was accumulated
         after adjusted: imagine adjustment function as adj(ticks, runtime),
           {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
         After this patch it will not happen because the adjustment is
         applied after accumulated.
      
      v2:
       - remove if()s, put new variables into signal_struct.
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      
      19eb722b
    • Hidetoshi Seto's avatar
      sched: Fix granularity of task_u/stime() · 2b2513f3
      Hidetoshi Seto authored
      commit 761b1d26 upstream.
      
      Originally task_s/utime() were designed to return clock_t but
      later changed to return cputime_t by following commit:
      
        commit efe567fc
        Author: Christian Borntraeger <borntraeger@de.ibm.com>
        Date:   Thu Aug 23 15:18:02 2007 +0200
      
      It only changed the type of return value, but not the
      implementation. As the result the granularity of task_s/utime()
      is still that of clock_t, not that of cputime_t.
      
      So using task_s/utime() in __exit_signal() makes values
      accumulated to the signal struct to be rounded and coarse
      grained.
      
      This patch removes casts to clock_t in task_u/stime(), to keep
      granularity of cputime_t over the calculation.
      
      v2:
        Use div_u64() to avoid error "undefined reference to `__udivdi3`"
        on some 32bit systems.
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: xiyou.wangcong@gmail.com
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2b2513f3