1. 31 May, 2011 3 commits
    • Robert Richter's avatar
      oprofile, dcookies: Fix possible circular locking dependency · fe47ae7f
      Robert Richter authored
      The lockdep warning below detects a possible A->B/B->A locking
      dependency of mm->mmap_sem and dcookie_mutex. The order in
      sync_buffer() is mm->mmap_sem/dcookie_mutex, while in
      sys_lookup_dcookie() it is vice versa.
      
      Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before
      copy_to_user().
      
      oprofiled/4432 is trying to acquire lock:
       (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3
      
      but task is already holding lock:
       (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (dcookie_mutex){+.+.+.}:
             [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
             [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
             [<ffffffff81124e5c>] get_dcookie+0x30/0x144
             [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile]
             [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
             [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
             [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
             [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
             [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
             [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
             [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
             [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
             [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      
      -> #0 (&mm->mmap_sem){++++++}:
             [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
             [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
             [<ffffffff810b4478>] might_fault+0x80/0xa3
             [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
             [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      
      other info that might help us debug this:
      
      1 lock held by oprofiled/4432:
       #0:  (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149
      
      stack backtrace:
      Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9
      Call Trace:
       [<ffffffff81063193>] print_circular_bug+0xae/0xbc
       [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
       [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42
       [<ffffffff810b444b>] ? might_fault+0x53/0xa3
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffff810b444b>] ? might_fault+0x53/0xa3
       [<ffffffff810d7d54>] ? path_put+0x22/0x27
       [<ffffffff810b4478>] might_fault+0x80/0xa3
       [<ffffffff810b444b>] ? might_fault+0x53/0xa3
       [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=13809
      Cc: <stable@kernel.org> # .27+
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      fe47ae7f
    • Robert Richter's avatar
      oprofile: Fix locking dependency in sync_start() · 130c5ce7
      Robert Richter authored
      This fixes the A->B/B->A locking dependency, see the warning below.
      
      The function task_exit_notify() is called with (task_exit_notifier)
      .rwsem set and then calls sync_buffer() which locks buffer_mutex. In
      sync_start() the buffer_mutex was set to prevent notifier functions to
      be started before sync_start() is finished. But when registering the
      notifier, (task_exit_notifier).rwsem is locked too, but now in
      different order than in sync_buffer(). In theory this causes a locking
      dependency, what does not occur in practice since task_exit_notify()
      is always called after the notifier is registered which means the lock
      is already released.
      
      However, after checking the notifier functions it turned out the
      buffer_mutex in sync_start() is unnecessary. This is because
      sync_buffer() may be called from the notifiers even if sync_start()
      did not finish yet, the buffers are already allocated but empty. No
      need to protect this with the mutex.
      
      So we fix this theoretical locking dependency by removing buffer_mutex
      in sync_start(). This is similar to the implementation before commit:
      
       750d857c oprofile: fix crash when accessing freed task structs
      
      which introduced the locking dependency.
      
      Lockdep warning:
      
      oprofiled/4447 is trying to acquire lock:
       (buffer_mutex){+.+...}, at: [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
      
      but task is already holding lock:
       ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 ((task_exit_notifier).rwsem){++++..}:
             [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
             [<ffffffff81463a2b>] down_write+0x44/0x67
             [<ffffffff810581c0>] blocking_notifier_chain_register+0x52/0x8b
             [<ffffffff8105a6ac>] profile_event_register+0x2d/0x2f
             [<ffffffffa00013c1>] sync_start+0x47/0xc6 [oprofile]
             [<ffffffffa00001bb>] oprofile_setup+0x60/0xa5 [oprofile]
             [<ffffffffa00014e3>] event_buffer_open+0x59/0x8c [oprofile]
             [<ffffffff810cd3b9>] __dentry_open+0x1eb/0x308
             [<ffffffff810cd59d>] nameidata_to_filp+0x60/0x67
             [<ffffffff810daad6>] do_last+0x5be/0x6b2
             [<ffffffff810dbc33>] path_openat+0xc7/0x360
             [<ffffffff810dbfc5>] do_filp_open+0x3d/0x8c
             [<ffffffff810ccfd2>] do_sys_open+0x110/0x1a9
             [<ffffffff810cd09e>] sys_open+0x20/0x22
             [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      
      -> #0 (buffer_mutex){+.+...}:
             [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
             [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
             [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
             [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
             [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
             [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
             [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
             [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
             [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
             [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
             [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
             [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
             [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      
      other info that might help us debug this:
      
      1 lock held by oprofiled/4447:
       #0:  ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67
      
      stack backtrace:
      Pid: 4447, comm: oprofiled Not tainted 2.6.39-00007-gcf4d8d4 #10
      Call Trace:
       [<ffffffff81063193>] print_circular_bug+0xae/0xbc
       [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
       [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
       [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffff81062627>] ? mark_lock+0x42f/0x552
       [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
       [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile]
       [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
       [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67
       [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
       [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
       [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
       [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
       [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
       [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
       [<ffffffff81465031>] ? retint_swapgs+0xe/0x13
       [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
       [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
       [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
      Reported-by: default avatarMarcin Slusarz <marcin.slusarz@gmail.com>
      Cc: Carl Love <carll@us.ibm.com>
      Cc: <stable@kernel.org> # .36+
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      130c5ce7
    • Robert Richter's avatar
      oprofile: Free potentially owned tasks in case of errors · 6ac6519b
      Robert Richter authored
      After registering the task free notifier we possibly have tasks in our
      dying_tasks list. Free them after unregistering the notifier in case
      of an error.
      
      Cc: <stable@kernel.org> # .36+
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      6ac6519b
  2. 30 May, 2011 1 commit
  3. 24 May, 2011 2 commits
  4. 20 May, 2011 1 commit
    • Robert Richter's avatar
      oprofile, x86: Enable preemption during pci device setup in IBS init · 3d2606f4
      Robert Richter authored
      IBS initialization is a mix of per-core register access and per-node
      pci device setup. Register access should be pinned to the cpu, but pci
      setup must run with preemption enabled.
      
      This patch better separates the code into non-/preemptible sections
      and fixes sleeping with preemption disabled. See bug message below.
      
      Fixes also freeing the eilvt entry by introducing put_eilvt().
      
      BUG: sleeping function called from invalid context at mm/slub.c:824
      in_atomic(): 1, irqs_disabled(): 0, pid: 32357, name: modprobe
      INFO: lockdep is turned off.
      Pid: 32357, comm: modprobe Not tainted 2.6.39-rc7+ #14
      Call Trace:
       [<ffffffff8104bdc8>] __might_sleep+0x112/0x117
       [<ffffffff81129693>] kmem_cache_alloc_trace+0x4b/0xe7
       [<ffffffff81278f14>] kzalloc.constprop.0+0x29/0x2b
       [<ffffffff81278f4c>] pci_get_subsys+0x36/0x78
       [<ffffffff81022689>] ? setup_APIC_eilvt+0xfb/0x139
       [<ffffffff81278fa4>] pci_get_device+0x16/0x18
       [<ffffffffa06c8b5d>] op_amd_init+0xd3/0x211 [oprofile]
       [<ffffffffa064d000>] ? 0xffffffffa064cfff
       [<ffffffffa064d298>] op_nmi_init+0x21e/0x26a [oprofile]
       [<ffffffffa064d062>] oprofile_arch_init+0xe/0x26 [oprofile]
       [<ffffffffa064d010>] oprofile_init+0x10/0x42 [oprofile]
       [<ffffffff81002099>] do_one_initcall+0x7f/0x13a
       [<ffffffff81096524>] sys_init_module+0x132/0x281
       [<ffffffff814cc682>] system_call_fastpath+0x16/0x1b
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: <stable@kernel.org>         [2.6.37.x]
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      3d2606f4
  5. 19 May, 2011 1 commit
  6. 18 May, 2011 22 commits
  7. 17 May, 2011 10 commits
    • Jeff Layton's avatar
      cifs: fix cifsConvertToUCS() for the mapchars case · 11379b5e
      Jeff Layton authored
      As Metze pointed out, commit 84cdf74e broke mapchars option:
      
          Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
          (84cdf74e) does multiple steps
          in just one commit (moving the function and changing it without
          testing).
      
          put_unaligned_le16(temp, &target[j]); is never called for any
          codepoint the goes via the 'default' switch statement. As a result
          we put just zero (or maybe uninitialized) bytes into the target
          buffer.
      
      His proposed patch looks correct, but doesn't apply to the current head
      of the tree. This patch should also fix it.
      
      Cc: <stable@kernel.org> # .38.x: 581ade4d: cifs: clean up various nits in unicode routines (try #2)
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      11379b5e
    • Jeff Layton's avatar
      cifs: add fallback in is_path_accessible for old servers · 221d1d79
      Jeff Layton authored
      The is_path_accessible check uses a QPathInfo call, which isn't
      supported by ancient win9x era servers. Fall back to an older
      SMBQueryInfo call if it fails with the magic error codes.
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: default avatarSandro Bonazzola <sandro.bonazzola@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      221d1d79
    • Linus Torvalds's avatar
      Merge branch 'timers-fixes-for-linus' of... · a085963a
      Linus Torvalds authored
      Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        tick: Clear broadcast active bit when switching to oneshot
        rtc: mc13xxx: Don't call rtc_device_register while holding lock
        rtc: rp5c01: Initialize drvdata before registering device
        rtc: pcap: Initialize drvdata before registering device
        rtc: msm6242: Initialize drvdata before registering device
        rtc: max8998: Initialize drvdata before registering device
        rtc: max8925: Initialize drvdata before registering device
        rtc: m41t80: Initialize clientdata before registering device
        rtc: ds1286: Initialize drvdata before registering device
        rtc: ep93xx: Initialize drvdata before registering device
        rtc: davinci: Initialize drvdata before registering device
        rtc: mxc: Initialize drvdata before registering device
        clocksource: Install completely before selecting
      a085963a
    • Borislav Petkov's avatar
      x86, AMD: Fix ARAT feature setting again · 14fb57dc
      Borislav Petkov authored
      Trying to enable the local APIC timer on early K8 revisions
      uncovers a number of other issues with it, in conjunction with
      the C1E enter path on AMD. Fixing those causes much more churn
      and troubles than the benefit of using that timer brings so
      don't enable it on K8 at all, falling back to the original
      functionality the kernel had wrt to that.
      Reported-and-bisected-by: default avatarNick Bowler <nbowler@elliptictech.com>
      Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Nick Bowler <nbowler@elliptictech.com>
      Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
      Signed-off-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14fb57dc
    • Borislav Petkov's avatar
      Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors" · 328935e6
      Borislav Petkov authored
      This reverts commit e20a2d20, as it crashes
      certain boxes with specific AMD CPU models.
      
      Moving the lower endpoint of the Erratum 400 check to accomodate
      earlier K8 revisions (A-E) opens a can of worms which is simply
      not worth to fix properly by tweaking the errata checking
      framework:
      
      * missing IntPenging MSR on revisions < CG cause #GP:
      
      http://marc.info/?l=linux-kernel&m=130541471818831
      
      * makes earlier revisions use the LAPIC timer instead of the C1E
      idle routine which switches to HPET, thus not waking up in
      deeper C-states:
      
      http://lkml.org/lkml/2011/4/24/20
      
      Therefore, leave the original boundary starting with K8-revF.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      328935e6
    • Jens Axboe's avatar
      scsi: remove performance regression due to async queue run · 9937a5e2
      Jens Axboe authored
      Commit c21e6beb removed our queue request_fn re-enter
      protection, and defaulted to always running the queues from
      kblockd to be safe. This was a known potential slow down,
      but should be safe.
      
      Unfortunately this is causing big performance regressions for
      some, so we need to improve this logic. Looking into the details
      of the re-enter, the real issue is on requeue of requests.
      
      Requeue of requests upon seeing a BUSY condition from the device
      ends up re-running the queue, causing traces like this:
      
      scsi_request_fn()
              scsi_dispatch_cmd()
                      scsi_queue_insert()
                              __scsi_queue_insert()
                                      scsi_run_queue()
      					scsi_request_fn()
      						...
      
      potentially causing the issue we want to avoid. So special
      case the requeue re-run of the queue, but improve it to offload
      the entire run of local queue and starved queue from a single
      workqueue callback. This is a lot better than potentially
      kicking off a workqueue run for each device seen.
      
      This also fixes the issue of the local device going into recursion,
      since the above mentioned commit never moved that queue run out
      of line.
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      9937a5e2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · c1d10d18
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        net: Change netdev_fix_features messages loglevel
        vmxnet3: Fix inconsistent LRO state after initialization
        sfc: Fix oops in register dump after mapping change
        IPVS: fix netns if reading ip_vs_* procfs entries
        bridge: fix forwarding of IPv6
      c1d10d18
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 477de0de
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
        Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"
      477de0de
    • Randy Dunlap's avatar
      mm: fix kernel-doc warning in page_alloc.c · b5e6ab58
      Randy Dunlap authored
      Fix new kernel-doc warning in mm/page_alloc.c:
      
        Warning(mm/page_alloc.c:2370): No description found for parameter 'nid'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5e6ab58
    • Yinghai Lu's avatar
      PCI: Clear bridge resource flags if requested size is 0 · 93d2175d
      Yinghai Lu authored
      During pci remove/rescan testing found:
      
        pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
        pci 0000:c0:03.0:   bridge window [io  0x1000-0x0fff]
        pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
        pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
        pci 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
        pci 0000:c0:03.0: Error enabling bridge (-22), continuing
        pci 0000:c0:03.0: enabling bus mastering
        pci 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
        pcieport: probe of 0000:c0:03.0 failed with error -22
      
      This bug was caused by commit c8adf9a3 ("PCI: pre-allocate
      additional resources to devices only after successful allocation of
      essential resources.")
      
      After that commit, pci_hotplug_io_size is changed to additional_io_size
      from minium size.  So it will not go through resource_size(res) != 0
      path, and will not be reset.
      
      The root cause is: pci_bridge_check_ranges will set RESOURCE_IO flag for
      pci bridge, and later if children do not need IO resource.  those bridge
      resources will not need to be allocated.  but flags is still there.
      that will confuse the the pci_enable_bridges later.
      
      related code:
      
         static void assign_requested_resources_sorted(struct resource_list *head,
                                          struct resource_list_x *fail_head)
         {
                 struct resource *res;
                 struct resource_list *list;
                 int idx;
      
                 for (list = head->next; list; list = list->next) {
                         res = list->res;
                         idx = res - &list->dev->resource[0];
                         if (resource_size(res) && pci_assign_resource(list->dev, idx)) {
         ...
                                 reset_resource(res);
                         }
                 }
         }
      
      At last, We have to clear the flags in pbus_size_mem/io when requested
      size == 0 and !add_head.  becasue this case it will not go through
      adjust_resources_sorted().
      
      Just make size1 = size0 when !add_head. it will make flags get cleared.
      
      At the same time when requested size == 0, add_size != 0, will still
      have in head and add_list.  because we do not clear the flags for it.
      
      After this, we will get right result:
      
        pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
        pci 0000:c0:03.0:   bridge window [io  disabled]
        pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
        pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
        pci 0000:c0:03.0: enabling bus mastering
        pci 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: irq 160 for MSI/MSI-X
        pcieport 0000:c0:03.0: Signaling PME through PCIe PME interrupt
        pci 0000:c4:00.0: Signaling PME through PCIe PME interrupt
        pcie_pme 0000:c0:03.0:pcie01: service driver pcie_pme loaded
        aer 0000:c0:03.0:pcie02: service driver aer loaded
        pciehp 0000:c0:03.0:pcie04: Hotplug Controller:
      
      v3: more simple fix. also fix one typo in pbus_size_mem
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93d2175d