1. 21 Sep, 2012 8 commits
  2. 19 Sep, 2012 12 commits
    • Konrad Rzeszutek Wilk's avatar
      xen/boot: Disable BIOS SMP MP table search. · bd49940a
      Konrad Rzeszutek Wilk authored
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      CC: stable@vger.kernel.org
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bd49940a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · c46de226
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A small collection of driver fixes/updates and a core fix for 3.6.  It
        contains:
      
         - Bug fixes for mtip32xx, and support for new hardware (just addition
           of IDs).  They have been queued up for 3.7 for a few weeks as well.
      
         - rate-limit a failing command error message in block core.
      
         - A fix for an old cciss bug from Stephen.
      
         - Prevent overflow of partition count from Alan."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        cciss: fix handling of protocol error
        blk: add an upper sanity check on partition adding
        mtip32xx: fix user_buffer check in exec_drive_command
        mtip32xx: Remove dead code
        mtip32xx: Change printk to pr_xxxx
        mtip32xx: Proper reporting of write protect status on big-endian
        mtip32xx: Increase timeout for standby command
        mtip32xx: Handle NCQ commands during the security locked state
        mtip32xx: Add support for new devices
        block: rate-limit the error message from failing commands
      c46de226
    • Linus Torvalds's avatar
      Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh · 077fee00
      Linus Torvalds authored
      Pull SuperH fixes from Paul Mundt.
      
      * tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
        sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.
        sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path
        sh: intc: Fix up multi-evt irq association.
      077fee00
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg · cf42d543
      Linus Torvalds authored
      Pull rpmsg fix from Ohad Ben-Cohen:
       "A quick rpmsg fix from Fernando, fixing two buggy invocations of
        dma_free_coherent"
      
      * tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
        rpmsg: fix dma_free_coherent dev parameter
      cf42d543
    • Linus Torvalds's avatar
      Merge tag 'md-3.6-fixes' of git://neil.brown.name/md · 4b92c17e
      Linus Torvalds authored
      Pull md fixes from NeilBrown:
       "3 fixes for md in 3.6.
      
        One reverts a recent patch which turns out to not be such a good idea.
      
        Other two fix minor bugs with the new (since 3.3) 'replacement' code
        and have been tagged for -stable."
      
      * tag 'md-3.6-fixes' of git://neil.brown.name/md:
        md: make sure metadata is updated when spares are activated or removed.
        md/raid5: fix calculate of 'degraded' when a replacement becomes active.
        Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."
      4b92c17e
    • Linus Torvalds's avatar
      Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · c5c473e2
      Linus Torvalds authored
      Pull workqueue / powernow-k8 fix from Tejun Heo:
       "This is the fix for the bug where cpufreq/powernow-k8 was tripping
        BUG_ON() in try_to_wake_up_local() by migrating workqueue worker to a
        different CPU.
      
          https://bugzilla.kernel.org/show_bug.cgi?id=47301
      
        As discussed, the fix is now two parts - one to reimplement
        work_on_cpu() so that it doesn't create a new kthread each time and
        the actual fix which makes powernow-k8 use work_on_cpu() instead of
        performing manual migration.
      
        While pretty late in the merge cycle, both changes are on the safer
        side.  Jiri and I verified two existing users of work_on_cpu() and
        Duncan confirmed that the powernow-k8 fix survived about 18 hours of
        testing."
      
      * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
        workqueue: reimplement work_on_cpu() using system_wq
      c5c473e2
    • Tejun Heo's avatar
      cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU · 6889125b
      Tejun Heo authored
      powernowk8_target() runs off a per-cpu work item and if the
      cpufreq_policy->cpu is different from the current one, it migrates the
      kworker to the target CPU by manipulating current->cpus_allowed.  The
      function migrates the kworker back to the original CPU but this is
      still broken.  Workqueue concurrency management requires the kworkers
      to stay on the same CPU and powernowk8_target() ends up triggerring
      BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
      fidvid_mutex and sleeps.
      
      It is unclear why this bug is being reported now.  Duncan says it
      appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
      3.5.  Bisection seemed to point to 63d95a91 "workqueue: use @pool
      instead of @gcwq or @cpu where applicable" which is an non-functional
      change.  Given that the reproduce case sometimes took upto days to
      trigger, it's easy to be misled while bisecting.  Maybe something made
      contention on fidvid_mutex more likely?  I don't know.
      
      This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
      isn't the same as the current one.  The code assumes that
      cpufreq_policy->cpu is kept online by the caller, which Rafael tells
      me is the case.
      
      stable: ed48ece2 ("workqueue: reimplement work_on_cpu() using
              system_wq") should be applied before this; otherwise, the
              behavior could be horrible.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDuncan <1i5t5.duncan@cox.net>
      Tested-by: default avatarDuncan <1i5t5.duncan@cox.net>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: stable@vger.kernel.org
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
      6889125b
    • Tejun Heo's avatar
      workqueue: reimplement work_on_cpu() using system_wq · ed48ece2
      Tejun Heo authored
      The existing work_on_cpu() implementation is hugely inefficient.  It
      creates a new kthread, execute that single function and then let the
      kthread die on each invocation.
      
      Now that system_wq can handle concurrent executions, there's no
      advantage of doing this.  Reimplement work_on_cpu() using system_wq
      which makes it simpler and way more efficient.
      
      stable: While this isn't a fix in itself, it's needed to fix a
              workqueue related bug in cpufreq/powernow-k8.  AFAICS, this
              shouldn't break other existing users.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@vger.kernel.org
      ed48ece2
    • Guenter Roeck's avatar
      linux/kernel.h: Fix warning seen with W=1 due to change in DIV_ROUND_CLOSEST · 263a523d
      Guenter Roeck authored
      After commit b6d86d3d (Fix DIV_ROUND_CLOSEST to support negative dividends),
      the following warning is seen if the kernel is compiled with W=1 (-Wextra):
      
      warning: comparison of unsigned expression >= 0 is always true
      
      The warning is due to the test '((typeof(x))-1) >= 0', which is used to detect
      if the variable type is unsigned. Research on the web suggests that the warning
      disappears if '>' instead of '>=' is used for the comparison.
      
      Tests after changing the macro along that line show that the warning is gone,
      and that the result is still correct:
      
      i=-4: DIV_ROUND_CLOSEST(i, 2)=-2
      i=-3: DIV_ROUND_CLOSEST(i, 2)=-2
      i=-2: DIV_ROUND_CLOSEST(i, 2)=-1
      i=-1: DIV_ROUND_CLOSEST(i, 2)=-1
      i=0: DIV_ROUND_CLOSEST(i, 2)=0
      i=1: DIV_ROUND_CLOSEST(i, 2)=1
      i=2: DIV_ROUND_CLOSEST(i, 2)=1
      i=3: DIV_ROUND_CLOSEST(i, 2)=2
      i=4: DIV_ROUND_CLOSEST(i, 2)=2
      
      Code size is the same as before.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: default avatarJean Delvare <khali@linux-fr.org>
      263a523d
    • NeilBrown's avatar
      md: make sure metadata is updated when spares are activated or removed. · 6dafab6b
      NeilBrown authored
      It isn't always necessary to update the metadata when spares are
      removed as the presence-or-not of a spare isn't really important to
      the integrity of an array.
      Also activating a spare doesn't always require updating the metadata
      as the update on 'recovery-completed' is usually sufficient.
      
      However the introduction of 'replacement' devices have made these
      transitions sometimes more important.  For example the 'Replacement'
      flag isn't cleared until the original device is removed, so we need
      to ensure a metadata update after that 'spare' is removed.
      
      So set MD_CHANGE_DEVS whenever a spare is activated or removed, to
      complement the current situation where it is set when a spare is added
      or a device is failed (or a number of other less common situations).
      
      This is suitable for -stable as out-of-data metadata could lead
      to data corruption.
      This is only relevant for 3.3 and later 9when 'replacement' as
      introduced.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6dafab6b
    • NeilBrown's avatar
      md/raid5: fix calculate of 'degraded' when a replacement becomes active. · e5c86471
      NeilBrown authored
      When a replacement device becomes active, we mark the device that it
      replaces as 'faulty' so that it can subsequently get removed.
      However 'calc_degraded' only pays attention to the primary device, not
      the replacement, so the array appears to become degraded, which is
      wrong.
      
      So teach 'calc_degraded' to consider any replacement if a primary
      device is faulty.
      
      This is suitable for -stable as an incorrect 'degraded' value can
      confuse md and could lead to data corruption.
      This is only relevant for 3.3 and later.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRobin Hill <robin@robinhill.me.uk>
      Reported-by: default avatarJohn Drescher <drescherjm@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e5c86471
    • NeilBrown's avatar
      Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE." · a852d7b8
      NeilBrown authored
      This reverts commit 895e3c5c.
      
      While this patch seemed like a good idea and did help some workloads,
      it hurts other workloads.
      Large sequential O_DIRECT writes were faster,
      Small random O_DIRECT writes were slower.
      
      Other changes (batching RAID5 writes) have improved the sequential
      writes using a different mechanism, so the net result of this patch
      is definitely negative.  So revert it.
      Reported-by: default avatarShaohua Li <shli@kernel.org>
      Tested-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a852d7b8
  3. 18 Sep, 2012 18 commits
  4. 17 Sep, 2012 2 commits
    • Linus Torvalds's avatar
      Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 4651afbb
      Linus Torvalds authored
      Pull another workqueue fix from Tejun Heo:
       "Unfortunately, yet another late fix.  This too is discovered and fixed
        by Lai.  This bug was introduced during this merge window by commit
        25511a47 ("workqueue: reimplement CPU online rebinding to handle
        idle workers") which started using WORKER_REBIND flag for idle rebind
        too.
      
        The bug is relatively easy to trigger if the CPU rapidly goes through
        off, on and then off (and stay off).  The fix is on the safer side.
        This hasn't been on linux-next yet but I'm pushing early so that it
        can get more exposure before v3.6 release."
      
      * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
      4651afbb
    • Lai Jiangshan's avatar
      workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn() · 960bd11b
      Lai Jiangshan authored
      busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
      (CPU is down again).  This used to be okay because the flag wasn't
      used for anything else.
      
      However, after 25511a47 "workqueue: reimplement CPU online rebinding
      to handle idle workers", WORKER_REBIND is also used to command idle
      workers to rebind.  If not cleared, the worker may confuse the next
      CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
      prematurely calling idle_worker_rebind().
      
        WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
       00()
        Hardware name: Bochs
        Modules linked in: test_wq(O-)
        Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
        Call Trace:
         [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
         [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
         [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        ---[ end trace e977cf20f4661968 ]---
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: test_wq(O-)
        CPU 0
        Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
        RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
        RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
        RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
        RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
        R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
        R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
        FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
        Stack:
         ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
         ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
         ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
        Call Trace:
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
        RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
         RSP <ffff88001e1c9de0>
        CR2: 0000000000000000
      
      There was no reason to keep WORKER_REBIND on failure in the first
      place - WORKER_UNBOUND is guaranteed to be set in such cases
      preventing incorrectly activating concurrency management.  Always
      clear WORKER_REBIND.
      
      tj: Updated comment and description.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      960bd11b