1. 03 Mar, 2016 40 commits
    • Stefan Haberland's avatar
      s390/dasd: fix performance drop · a90ad5c5
      Stefan Haberland authored
      commit 12d319b9 upstream.
      
      Commit ca369d51 ("sd: Fix device-imposed transfer length limits")
      introduced a new queue limit max_dev_sectors which limits the maximum
      sectors for requests. The default value leads to small dasd requests
      and therefor to a performance drop.
      Set the max_dev_sectors value to the same value as the max_hw_sectors
      to use the maximum available request size for DASD devices.
      Signed-off-by: default avatarStefan Haberland <sth@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a90ad5c5
    • Stefan Haberland's avatar
      s390/dasd: fix refcount for PAV reassignment · 2e80d52b
      Stefan Haberland authored
      commit 9d862aba upstream.
      
      Add refcount to the DASD device when a summary unit check worker is
      scheduled. This prevents that the device is set offline with worker
      in place.
      Signed-off-by: default avatarStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e80d52b
    • Stefan Haberland's avatar
      s390/dasd: prevent incorrect length error under z/VM after PAV changes · ecf5ebf5
      Stefan Haberland authored
      commit 020bf042 upstream.
      
      The channel checks the specified length and the provided amount of
      data for CCWs and provides an incorrect length error if the size does
      not match. Under z/VM with simulation activated the length may get
      changed. Having the suppress length indication bit set is stated as
      good CCW coding practice and avoids errors under z/VM.
      Signed-off-by: default avatarStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecf5ebf5
    • Ard Biesheuvel's avatar
      s390: fix normalization bug in exception table sorting · ee17e6e3
      Ard Biesheuvel authored
      commit bcb7825a upstream.
      
      The normalization pass in the sorting routine of the relative exception
      table serves two purposes:
      - it ensures that the address fields of the exception table entries are
        fully ordered, so that no ambiguities arise between entries with
        identical instruction offsets (i.e., when two instructions that are
        exactly 8 bytes apart each have an exception table entry associated with
        them)
      - it ensures that the offsets of both the instruction and the fixup fields
        of each entry are relative to their final location after sorting.
      
      Commit eb608fb3 ("s390/exceptions: switch to relative exception table
      entries") ported the relative exception table format from x86, but modified
      the sorting routine to only normalize the instruction offset field and not
      the fixup offset field. The result is that the fixup offset of each entry
      will be relative to the original location of the entry before sorting,
      likely leading to crashes when those entries are dereferenced.
      
      Fixes: eb608fb3 ("s390/exceptions: switch to relative exception table entries")
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee17e6e3
    • Sebastian Andrzej Siewior's avatar
      btrfs: initialize the seq counter in struct btrfs_device · e59ba555
      Sebastian Andrzej Siewior authored
      commit 546bed63 upstream.
      
      I managed to trigger this:
      | INFO: trying to register non-static key.
      | the code is fine but needs lockdep annotation.
      | turning off the locking correctness validator.
      | CPU: 1 PID: 781 Comm: systemd-gpt-aut Not tainted 4.4.0-rt2+ #14
      | Hardware name: ARM-Versatile Express
      | [<80307cec>] (dump_stack)
      | [<80070e98>] (__lock_acquire)
      | [<8007184c>] (lock_acquire)
      | [<80287800>] (btrfs_ioctl)
      | [<8012a8d4>] (do_vfs_ioctl)
      | [<8012ac14>] (SyS_ioctl)
      
      so I think that btrfs_device_data_ordered_init() is not invoked behind
      a macro somewhere.
      
      Fixes: 7cc8e58d ("Btrfs: fix unprotected device's variants on 32bits machine")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e59ba555
    • Chandan Rajendra's avatar
      Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots · 8102f96e
      Chandan Rajendra authored
      commit f32e48e9 upstream.
      
      The following call trace is seen when btrfs/031 test is executed in a loop,
      
      [  158.661848] ------------[ cut here ]------------
      [  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
      [  158.664102] BTRFS: Transaction aborted (error -2)
      [  158.664774] Modules linked in:
      [  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711af #2
      [  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
      [  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
      [  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
      [  158.670769] Call Trace:
      [  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
      [  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
      [  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
      [  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
      [  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
      [  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
      [  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
      [  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
      [  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
      [  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
      [  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
      [  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
      [  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
      [  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
      [  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
      [  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
      [  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
      [  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
      [  158.709508] BTRFS info (device loop0): forced readonly
      [  158.737113] BTRFS info (device loop0): disk space caching is enabled
      [  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
      [  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30
      
      This occurs because,
      
      Mount filesystem
      Create subvol with ID 257
      Unmount filesystem
      Mount filesystem
      Delete subvol with ID 257
        btrfs_drop_snapshot()
          Add root corresponding to subvol 257 into
          btrfs_transaction->dropped_roots list
      Create new subvol (i.e. create_subvol())
        257 is returned as the next free objectid
        btrfs_read_fs_root_no_name()
          Finds the btrfs_root instance corresponding to the old subvol with ID 257
          in btrfs_fs_info->fs_roots_radix.
          Returns error since btrfs_root_item->refs has the value of 0.
      
      To fix the issue the commit initializes tree root's and subvolume root's
      highest_objectid when loading the roots from disk.
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8102f96e
    • Filipe Manana's avatar
      Btrfs: fix transaction handle leak on failure to create hard link · e6edd99c
      Filipe Manana authored
      commit 271dba45 upstream.
      
      If we failed to create a hard link we were not always releasing the
      the transaction handle we got before, resulting in a memory leak and
      preventing any other tasks from being able to commit the current
      transaction.
      Fix this by always releasing our transaction handle.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6edd99c
    • Filipe Manana's avatar
      Btrfs: fix number of transaction units required to create symlink · c6362a1e
      Filipe Manana authored
      commit 9269d12b upstream.
      
      We weren't accounting for the insertion of an inline extent item for the
      symlink inode nor that we need to update the parent inode item (through
      the call to btrfs_add_nondir()). So fix this by including two more
      transaction units.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6362a1e
    • Filipe Manana's avatar
      Btrfs: send, don't BUG_ON() when an empty symlink is found · d70554d6
      Filipe Manana authored
      commit a879719b upstream.
      
      When a symlink is successfully created it always has an inline extent
      containing the source path. However if an error happens when creating
      the symlink, we can leave in the subvolume's tree a symlink inode without
      any such inline extent item - this happens if after btrfs_symlink() calls
      btrfs_end_transaction() and before it calls the inode eviction handler
      (through the final iput() call), the transaction gets committed and a
      crash happens before the eviction handler gets called, or if a snapshot
      of the subvolume is made before the eviction handler gets called. Sadly
      we can't just avoid this by making btrfs_symlink() call
      btrfs_end_transaction() after it calls the eviction handler, because the
      later can commit the current transaction before it removes any items from
      the subvolume tree (if it encounters ENOSPC errors while reserving space
      for removing all the items).
      
      So make send fail more gracefully, with an -EIO error, and print a
      message to dmesg/syslog informing that there's an empty symlink inode,
      so that the user can delete the empty symlink or do something else
      about it.
      Reported-by: default avatarStephen R. van den Berg <srb@cuci.nl>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d70554d6
    • David Sterba's avatar
      btrfs: statfs: report zero available if metadata are exhausted · ee695402
      David Sterba authored
      commit ca8a51b3 upstream.
      
      There is one ENOSPC case that's very confusing. There's Available
      greater than zero but no file operation succeds (besides removing
      files). This happens when the metadata are exhausted and there's no
      possibility to allocate another chunk.
      
      In this scenario it's normal that there's still some space in the data
      chunk and the calculation in df reflects that in the Avail value.
      
      To at least give some clue about the ENOSPC situation, let statfs report
      zero value in Avail, even if there's still data space available.
      
      Current:
        /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test
      
      New:
        /dev/sdb1             4.0G  3.3G     0 100% /mnt/test
      
      We calculate the remaining metadata space minus global reserve. If this
      is (supposedly) smaller than zero, there's no space. But this does not
      hold in practice, the exhausted state happens where's still some
      positive delta. So we apply some guesswork and compare the delta to a 4M
      threshold. (Practically observed delta was 2M.)
      
      We probably cannot calculate the exact threshold value because this
      depends on the internal reservations requested by various operations, so
      some operations that consume a few metadata will succeed even if the
      Avail is zero. But this is better than the other way around.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee695402
    • Josef Bacik's avatar
      Btrfs: igrab inode in writepage · 11066392
      Josef Bacik authored
      commit be7bd730 upstream.
      
      We hit this panic on a few of our boxes this week where we have an
      ordered_extent with an NULL inode.  We do an igrab() of the inode in writepages,
      but weren't doing it in writepage which can be called directly from the VM on
      dirty pages.  If the inode has been unlinked then we could have I_FREEING set
      which means igrab() would return NULL and we get this panic.  Fix this by trying
      to igrab in btrfs_writepage, and if it returns NULL then just redirty the page
      and return AOP_WRITEPAGE_ACTIVATE; so the VM knows it wasn't successful.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11066392
    • Anand Jain's avatar
      Btrfs: add missing brelse when superblock checksum fails · 98d41a06
      Anand Jain authored
      commit b2acdddf upstream.
      
      Looks like oversight, call brelse() when checksum fails. Further down the
      code, in the non error path, we do call brelse() and so we don't see
      brelse() in the goto error paths.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98d41a06
    • David Hildenbrand's avatar
      KVM: s390: fix memory overwrites when vx is disabled · 3824f787
      David Hildenbrand authored
      commit 9abc2a08 upstream.
      
      The kernel now always uses vector registers when available, however KVM
      has special logic if support is really enabled for a guest. If support
      is disabled, guest_fpregs.fregs will only contain memory for the fpu.
      The kernel, however, will store vector registers into that area,
      resulting in crazy memory overwrites.
      
      Simply extending that area is not enough, because the format of the
      registers also changes. We would have to do additional conversions, making
      the code even more complex. Therefore let's directly use one place for
      the vector/fpu registers + fpc (in kvm_run). We just have to convert the
      data properly when accessing it. This makes current code much easier.
      
      Please note that vector/fpu registers are now always stored to
      vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
      used for migration, we only guarantee valid values to user space  when
      KVM_SYNC_VRS is set. As that is only the case when we have vector
      register support, we are on the safe side.
      
      Fixes: b5510d9b ("s390/fpu: always enable the vector facility if it is available")
      Cc: stable@vger.kernel.org # v4.4 d9a3a09a s390/kvm: remove dependency on struct save_area definition
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [adopt to d9a3a09a]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3824f787
    • Martin Schwidefsky's avatar
      s390/kvm: remove dependency on struct save_area definition · b5e72574
      Martin Schwidefsky authored
      commit d9a3a09a upstream.
      
      Replace the offsets based on the struct area_area with the offset
      constants from asm-offsets.c based on the struct _lowcore.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5e72574
    • Roman Volkov's avatar
      clocksource/drivers/vt8500: Increase the minimum delta · 142ad71d
      Roman Volkov authored
      commit f9eccf24 upstream.
      
      The vt8500 clocksource driver declares itself as capable to handle the
      minimum delay of 4 cycles by passing the value into
      clockevents_config_and_register(). The vt8500_timer_set_next_event()
      requires the passed cycles value to be at least 16. The impact is that
      userspace hangs in nanosleep() calls with small delay intervals.
      
      This problem is reproducible in Linux 4.2 starting from:
      c6eb3f70 ('hrtimer: Get rid of hrtimer softirq')
      
      From Russell King, more detailed explanation:
      
      "It's a speciality of the StrongARM/PXA hardware. It takes a certain
      number of OSCR cycles for the value written to hit the compare registers.
      So, if a very small delta is written (eg, the compare register is written
      with a value of OSCR + 1), the OSCR will have incremented past this value
      before it hits the underlying hardware. The result is, that you end up
      waiting a very long time for the OSCR to wrap before the event fires.
      
      So, we introduce a check in set_next_event() to detect this and return
      -ETIME if the calculated delta is too small, which causes the generic
      clockevents code to retry after adding the min_delta specified in
      clockevents_config_and_register() to the current time value.
      
      min_delta must be sufficient that we don't re-trip the -ETIME check - if
      we do, we will return -ETIME, forward the next event time, try to set it,
      return -ETIME again, and basically lock the system up. So, min_delta
      must be larger than the check inside set_next_event(). A factor of two
      was chosen to ensure that this situation would never occur.
      
      The PXA code worked on PXA systems for years, and I'd suggest no one
      changes this mechanism without access to a wide range of PXA systems,
      otherwise they're risking breakage."
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: default avatarAlexey Charkov <alchark@gmail.com>
      Signed-off-by: default avatarRoman Volkov <rvolkov@v1ros.org>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      142ad71d
    • Thomas Gleixner's avatar
      genirq: Validate action before dereferencing it in handle_irq_event_percpu() · 827685a6
      Thomas Gleixner authored
      commit 570540d5 upstream.
      
      commit 71f64340 changed the handling of irq_desc->action from
      
      CPU 0                   CPU 1
      free_irq()              lock(desc)
        lock(desc)            handle_edge_irq()
                              if (desc->action) {
                                handle_irq_event()
                                  action = desc->action
                                  unlock(desc)
        desc->action = NULL       handle_irq_event_percpu(desc, action)
                                    action->xxx
      to
      
      CPU 0                   CPU 1
      free_irq()              lock(desc)
        lock(desc)            handle_edge_irq()
                              if (desc->action) {
                                handle_irq_event()
                                  unlock(desc)
        desc->action = NULL       handle_irq_event_percpu(desc, action)
                                    action = desc->action
                                    action->xxx
      
      So if free_irq manages to set the action to NULL between the unlock and before
      the readout, we happily dereference a null pointer.
      
      We could simply revert 71f64340, but we want to preserve the better code
      generation. A simple solution is to change the action loop from a do {} while
      to a while {} loop.
      
      This is safe because we either see a valid desc->action or NULL. If the action
      is about to be removed it is still valid as free_irq() is blocked on
      synchronize_irq().
      
      CPU 0                   CPU 1
      free_irq()              lock(desc)
        lock(desc)            handle_edge_irq()
                                handle_irq_event(desc)
                                  set(INPROGRESS)
                                  unlock(desc)
                                  handle_irq_event_percpu(desc)
                                  action = desc->action
        desc->action = NULL           while (action) {
                                        action->xxx
                                        ...
                                        action = action->next;
        sychronize_irq()
          while(INPROGRESS);      lock(desc)
                                  clr(INPROGRESS)
      free(action)
      
      That's basically the same mechanism as we have for shared
      interrupts. action->next can become NULL while handle_irq_event_percpu()
      runs. Either it sees the action or NULL. It does not matter, because action
      itself cannot go away before the interrupt in progress flag has been cleared.
      
      Fixes: commit 71f64340 "genirq: Remove the second parameter from handle_irq_event_percpu()"
      Reported-by: zyjzyj2000@gmail.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Huang Shijie <shijie.huang@arm.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601131224190.3575@nanosSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      827685a6
    • Mel Gorman's avatar
      mm: numa: quickly fail allocations for NUMA balancing on full nodes · 18b75e0b
      Mel Gorman authored
      commit 8479eba7 upstream.
      
      Commit 4167e9b2 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
      flag combination due to confusing semantics.  It noted that
      alloc_misplaced_dst_page() was one such user after changes made by
      commit e97ca8e5 ("mm: fix GFP_THISNODE callers and clarify").
      
      Unfortunately when GFP_THISNODE was removed, users of
      alloc_misplaced_dst_page() started waking kswapd and entering direct
      reclaim because the wrong GFP flags are cleared.  The consequence is
      that workloads that used to fit into memory now get reclaimed which is
      addressed by this patch.
      
      The problem can be demonstrated with "mutilate" that exercises memcached
      which is software dedicated to memory object caching.  The configuration
      uses 80% of memory and is run 3 times for varying numbers of clients.
      The results on a 4-socket NUMA box are
      
      mutilate
                                  4.4.0                 4.4.0
                                vanilla           numaswap-v1
      Hmean    1      8394.71 (  0.00%)     8395.32 (  0.01%)
      Hmean    4     30024.62 (  0.00%)    34513.54 ( 14.95%)
      Hmean    7     32821.08 (  0.00%)    70542.96 (114.93%)
      Hmean    12    55229.67 (  0.00%)    93866.34 ( 69.96%)
      Hmean    21    39438.96 (  0.00%)    85749.21 (117.42%)
      Hmean    30    37796.10 (  0.00%)    50231.49 ( 32.90%)
      Hmean    47    18070.91 (  0.00%)    38530.13 (113.22%)
      
      The metric is queries/second with the more the better.  The results are
      way outside of the noise and the reason for the improvement is obvious
      from some of the vmstats
      
                                       4.4.0       4.4.0
                                     vanillanumaswap-v1r1
      Minor Faults                1929399272  2146148218
      Major Faults                  19746529        3567
      Swap Ins                      57307366        9913
      Swap Outs                     50623229       17094
      Allocation stalls                35909         443
      DMA allocs                           0           0
      DMA32 allocs                  72976349   170567396
      Normal allocs               5306640898  5310651252
      Movable allocs                       0           0
      Direct pages scanned         404130893      799577
      Kswapd pages scanned         160230174           0
      Kswapd pages reclaimed        55928786           0
      Direct pages reclaimed         1843936       41921
      Page writes file                  2391           0
      Page writes anon              50623229       17094
      
      The vanilla kernel is swapping like crazy with large amounts of direct
      reclaim and kswapd activity.  The figures are aggregate but it's known
      that the bad activity is throughout the entire test.
      
      Note that simple streaming anon/file memory consumers also see this
      problem but it's not as obvious.  In those cases, kswapd is awake when
      it should not be.
      
      As there are at least two reclaim-related bugs out there, it's worth
      spelling out the user-visible impact.  This patch only addresses bugs
      related to excessive reclaim on NUMA hardware when the working set is
      larger than a NUMA node.  There is a bug related to high kswapd CPU
      usage but the reports are against laptops and other UMA hardware and is
      not addressed by this patch.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18b75e0b
    • Andrea Arcangeli's avatar
      mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED · 915d0245
      Andrea Arcangeli authored
      commit ad33bb04 upstream.
      
      pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
      introduced to locklessy (but atomically) detect when a pmd is a regular
      (stable) pmd or when the pmd is unstable and can infinitely transition
      from pmd_none() and pmd_trans_huge() from under us, while only holding
      the mmap_sem for reading (for writing not).
      
      While holding the mmap_sem only for reading, MADV_DONTNEED can run from
      under us and so before we can assume the pmd to be a regular stable pmd
      we need to compare it against pmd_none() and pmd_trans_huge() in an
      atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
      tiny window for a race.
      
      Useful applications are unlikely to notice the difference as doing
      MADV_DONTNEED concurrently with a page fault would lead to undefined
      behavior.
      
      [akpm@linux-foundation.org: tidy up comment grammar/layout]
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      915d0245
    • Guozhonghua's avatar
      ocfs2: unlock inode if deleting inode from orphan fails · c88edc09
      Guozhonghua authored
      commit a4a8481f upstream.
      
      When doing append direct io cleanup, if deleting inode fails, it goes
      out without unlocking inode, which will cause the inode deadlock.
      
      This issue was introduced by commit cf1776a9 ("ocfs2: fix a tiny
      race when truncate dio orohaned entry").
      Signed-off-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarGang He <ghe@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c88edc09
    • Daniel Vetter's avatar
      drm/i915: shut up gen8+ SDE irq dmesg noise · cc082a1c
      Daniel Vetter authored
      commit 97e5ed11 upstream.
      
      We get tons of cases where the master interrupt handler apparently set
      a bit, with the SDEIIR disagreeing. No idea what's going on there, but
      it's consistent on gen8+, no one seems to care about it and it's
      making CI results flaky.
      
      Shut it up.
      
      No idea what's going on here, but we've had fun with PCH interrupts
      before:
      
      commit 44498aea
      Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Date:   Fri Feb 22 17:05:28 2013 -0300
      
          drm/i915: also disable south interrupts when handling them
      
      Note that there's a regression report in Bugzilla, and other
      regression reports on the mailing lists keep croping up. But no ill
      effects have ever been reported. But for paranoia still keep the
      message at a debug level as a breadcrumb, just in case.
      
      This message was introduced in
      
      commit 38cc46d7
      Author: Oscar Mateo <oscar.mateo@intel.com>
      Date:   Mon Jun 16 16:10:59 2014 +0100
      
          drm/i915/bdw: Ack interrupts before handling them (GEN8)
      
      v2: Improve commit message a bit.
      
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1445590572-23631-2-git-send-email-daniel.vetter@ffwll.ch
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92084
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80896Acked-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc082a1c
    • Hariprasad S's avatar
      iw_cxgb3: Fix incorrectly returning error on success · 64fb3e29
      Hariprasad S authored
      commit 67f1aee6 upstream.
      
      The cxgb3_*_send() functions return NET_XMIT_ values, which are
      positive integers values. So don't treat positive return values
      as an error.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [a pox on developers and maintainers who do not cc: stable for bug fixes like this - gregkh]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64fb3e29
    • Michael Welling's avatar
      spi: omap2-mcspi: Prevent duplicate gpio_request · 50d60403
      Michael Welling authored
      commit 2f538c01 upstream.
      
      Occasionally the setup function will be called multiple times. Only request
      the gpio the first time otherwise -EBUSY will occur on subsequent calls to
      setup.
      Reported-by: default avatarJoseph Bell <joe@iachieved.it>
      Signed-off-by: default avatarMichael Welling <mwelling@ieee.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50d60403
    • Lisa Du's avatar
      drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE · 3e908446
      Lisa Du authored
      commit 7a64cd88 upstream.
      
      There's one point was missed in the patch commit da49889d ("staging:
      binder: Support concurrent 32 bit and 64 bit processes."). When configure
      BINDER_IPC_32BIT, the size of binder_uintptr_t was 32bits, but size of
      void * is 64bit on 64bit system. Correct it here.
      Signed-off-by: default avatarLisa Du <cldu@marvell.com>
      Signed-off-by: default avatarNicolas Boichat <drinkcat@chromium.org>
      Fixes: da49889d ("staging: binder: Support concurrent 32 bit and 64 bit processes.")
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e908446
    • Bjørn Mork's avatar
      USB: option: add "4G LTE usb-modem U901" · 80d0fecf
      Bjørn Mork authored
      commit d061c1ca upstream.
      
      Thomas reports:
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=05c6 ProdID=6001 Rev=00.00
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=1234567890ABCDEF
      C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      Reported-by: default avatarThomas Schäfer <tschaefer@t-online.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80d0fecf
    • Andrey Skvortsov's avatar
      USB: option: add support for SIM7100E · aed43336
      Andrey Skvortsov authored
      commit 3158a8d4 upstream.
      
      $ lsusb:
      Bus 001 Device 101: ID 1e0e:9001 Qualcomm / Option
      
      $ usb-devices:
      T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=101 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  2
      P:  Vendor=1e0e ProdID=9001 Rev= 2.32
      S:  Manufacturer=SimTech, Incorporated
      S:  Product=SimTech, Incorporated
      S:  SerialNumber=0123456789ABCDEF
      C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      
      The last interface (6) is used for Android Composite ADB interface.
      
      Serial port layout:
      0: QCDM/DIAG
      1: NMEA
      2: AT
      3: AT/PPP
      4: audio
      Signed-off-by: default avatarAndrey Skvortsov <andrej.skvortzov@gmail.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aed43336
    • Ken Lin's avatar
      USB: cp210x: add IDs for GE B650V3 and B850V3 boards · a9d25862
      Ken Lin authored
      commit 6627ae19 upstream.
      
      Add USB ID for cp2104/5 devices on GE B650v3 and B850v3 boards.
      Signed-off-by: default avatarKen Lin <ken.lin@advantech.com.tw>
      Signed-off-by: default avatarAkshay Bhat <akshay.bhat@timesys.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9d25862
    • John Youn's avatar
      usb: dwc3: Fix assignment of EP transfer resources · e8d63424
      John Youn authored
      commit c4509601 upstream.
      
      The assignement of EP transfer resources was not handled properly in the
      dwc3 driver. Commit aebda618 ("usb: dwc3: Reset the transfer
      resource index on SET_INTERFACE") previously fixed one aspect of this
      where resources may be exhausted with multiple calls to SET_INTERFACE.
      However, it introduced an issue where composite devices with multiple
      interfaces can be assigned the same transfer resources for different
      endpoints. This patch solves both issues.
      
      The assignment of transfer resources cannot perfectly follow the data
      book due to the fact that the controller driver does not have all
      knowledge of the configuration in advance. It is given this information
      piecemeal by the composite gadget framework after every
      SET_CONFIGURATION and SET_INTERFACE. Trying to follow the databook
      programming model in this scenario can cause errors. For two reasons:
      
      1) The databook says to do DEPSTARTCFG for every SET_CONFIGURATION and
      SET_INTERFACE (8.1.5). This is incorrect in the scenario of multiple
      interfaces.
      
      2) The databook does not mention doing more DEPXFERCFG for new endpoint
      on alt setting (8.1.6).
      
      The following simplified method is used instead:
      
      All hardware endpoints can be assigned a transfer resource and this
      setting will stay persistent until either a core reset or hibernation.
      So whenever we do a DEPSTARTCFG(0) we can go ahead and do DEPXFERCFG for
      every hardware endpoint as well. We are guaranteed that there are as
      many transfer resources as endpoints.
      
      This patch triggers off of the calling dwc3_gadget_start_config() for
      EP0-out, which always happens first, and which should only happen in one
      of the above conditions.
      
      Fixes: aebda618 ("usb: dwc3: Reset the transfer resource index on SET_INTERFACE")
      Reported-by: default avatarRavi Babu <ravibabu@ti.com>
      Signed-off-by: default avatarJohn Youn <johnyoun@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8d63424
    • Gerhard Uttenthaler's avatar
      can: ems_usb: Fix possible tx overflow · f847ff0d
      Gerhard Uttenthaler authored
      commit 90cfde46 upstream.
      
      This patch fixes the problem that more CAN messages could be sent to the
      interface as could be send on the CAN bus. This was more likely for slow baud
      rates. The sleeping _start_xmit was woken up in the _write_bulk_callback. Under
      heavy TX load this produced another bulk transfer without checking the
      free_slots variable and hence caused the overflow in the interface.
      Signed-off-by: default avatarGerhard Uttenthaler <uttenthaler@ems-wuensche.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f847ff0d
    • Nikolay Borisov's avatar
      dm thin: fix race condition when destroying thin pool workqueue · 5c8a03a3
      Nikolay Borisov authored
      commit 18d03e8c upstream.
      
      When a thin pool is being destroyed delayed work items are
      cancelled using cancel_delayed_work(), which doesn't guarantee that on
      return the delayed item isn't running.  This can cause the work item to
      requeue itself on an already destroyed workqueue.  Fix this by using
      cancel_delayed_work_sync() which guarantees that on return the work item
      is not running anymore.
      
      Fixes: 905e51b3 ("dm thin: commit outstanding data every second")
      Fixes: 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c8a03a3
    • Kent Overstreet's avatar
      bcache: Change refill_dirty() to always scan entire disk if necessary · a556b804
      Kent Overstreet authored
      commit 627ccd20 upstream.
      
      Previously, it would only scan the entire disk if it was starting from
      the very start of the disk - i.e. if the previous scan got to the end.
      
      This was broken by refill_full_stripes(), which updates last_scanned so
      that refill_dirty was never triggering the searched_from_start path.
      
      But if we change refill_dirty() to always scan the entire disk if
      necessary, regardless of what last_scanned was, the code gets cleaner
      and we fix that bug too.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a556b804
    • Stefan Bader's avatar
      bcache: prevent crash on changing writeback_running · 9e761c1b
      Stefan Bader authored
      commit 8d16ce54 upstream.
      
      Added a safeguard in the shutdown case. At least while not being
      attached it is also possible to trigger a kernel bug by writing into
      writeback_running. This change  adds the same check before trying to
      wake up the thread for that case.
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e761c1b
    • Gabriel de Perthuis's avatar
      bcache: allows use of register in udev to avoid "device_busy" error. · b38798df
      Gabriel de Perthuis authored
      commit d7076f21 upstream.
      
      Allows to use register, not register_quiet in udev to avoid "device_busy" error.
      The initial patch proposed at https://lkml.org/lkml/2013/8/26/549 by Gabriel de Perthuis
      <g2p.code@gmail.com> does not unlock the mutex and hangs the kernel.
      
      See http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594 for the discussion.
      
      Cc: Denis Bychkov <manover@gmail.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Eric Wheeler <bcache@linux.ewheeler.net>
      Cc: Gabriel de Perthuis <g2p.code@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b38798df
    • Zheng Liu's avatar
      bcache: unregister reboot notifier if bcache fails to unregister device · d81b4c86
      Zheng Liu authored
      commit 2ecf0cdb upstream.
      
      In bcache_init() function it forgot to unregister reboot notifier if
      bcache fails to unregister a block device.  This commit fixes this.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d81b4c86
    • Al Viro's avatar
      bcache: fix a leak in bch_cached_dev_run() · 8e086f9f
      Al Viro authored
      commit 4d4d8573 upstream.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e086f9f
    • Zheng Liu's avatar
      bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device · 1ebc8501
      Zheng Liu authored
      commit fecaee6f upstream.
      
      This bug can be reproduced by the following script:
      
        #!/bin/bash
      
        bcache_sysfs="/sys/fs/bcache"
      
        function clear_cache()
        {
        	if [ ! -e $bcache_sysfs ]; then
        		echo "no bcache sysfs"
        		exit
        	fi
      
        	cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}')
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach"
        	sleep 5
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach"
        }
      
        for ((i=0;i<10;i++)); do
        	clear_cache
        done
      
      The warning messages look like below:
      [  275.948611] ------------[ cut here ]------------
      [  275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P        W
      ---------------   )
      [  275.979253] Hardware name: Tecal RH2285
      [  275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache'
      [  276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.072643] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.089315] Call Trace:
      [  276.105801]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.122650]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.139361]  [<ffffffff81205c08>] ? sysfs_add_one+0xb8/0xd0
      [  276.156012]  [<ffffffff8120609b>] ? sysfs_do_create_link+0x12b/0x170
      [  276.172682]  [<ffffffff81206113>] ? sysfs_create_link+0x13/0x20
      [  276.189282]  [<ffffffffa03bda21>] ? bcache_device_link+0xc1/0x110 [bcache]
      [  276.205993]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.222794]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.239680]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.256594]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.273364]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.290133]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.306368]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]---
      [  276.338241] ------------[ cut here ]------------
      [  276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720
      bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P        W  ---------------   )
      [  276.386017] Hardware name: Tecal RH2285
      [  276.401430] Couldn't create device <-> cache set symlinks
      [  276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.465477] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.482169] Call Trace:
      [  276.498610]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.515405]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.532059]  [<ffffffffa03bda3f>] ? bcache_device_link+0xdf/0x110 [bcache]
      [  276.548808]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.565569]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.582418]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.599341]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.616142]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.632607]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.648671]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]---
      
      We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach()
      function when we attach a backing device first time.  After detaching this
      backing device, this flag will be true and sysfs_remove_link() isn't called in
      bcache_device_unlink().  Then when we attach this backing device again,
      sysfs_create_link() will return EEXIST error in bcache_device_link().
      
      So the fix is trival and we clear this flag in bcache_device_link().
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ebc8501
    • Kent Overstreet's avatar
      bcache: Add a cond_resched() call to gc · f4c87d7c
      Kent Overstreet authored
      commit c5f1e5ad upstream.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4c87d7c
    • Zheng Liu's avatar
      bcache: fix a livelock when we cause a huge number of cache misses · 69ccf81e
      Zheng Liu authored
      commit 2ef9ccbf upstream.
      
      Subject :	[PATCH v2] bcache: fix a livelock in btree lock
      Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
      
      This commit tries to fix a livelock in bcache.  This livelock might
      happen when we causes a huge number of cache misses simultaneously.
      
      When we get a cache miss, bcache will execute the following path.
      
      ->cached_dev_make_request()
        ->cached_dev_read()
          ->cached_lookup()
            ->bch->btree_map_keys()
              ->btree_root()  <------------------------
                ->bch_btree_map_keys_recurse()        |
                  ->cache_lookup_fn()                 |
                    ->cached_dev_cache_miss()         |
                      ->bch_btree_insert_check_key() -|
                        [If btree->seq is not equal to seq + 1, we should return
                         EINTR and traverse btree again.]
      
      In bch_btree_insert_check_key() function we first need to check upgrade
      flag (op->lock == -1), and when this flag is true we need to release
      read btree->lock and try to take write btree->lock.  During taking and
      releasing this write lock, btree->seq will be monotone increased in
      order to prevent other threads modify this in cache miss (see btree.h:74).
      But if there are some cache misses caused by some requested, we could
      meet a livelock because btree->seq is always changed by others.  Thus no
      one can make progress.
      
      This commit will try to take write btree->lock if it encounters a race
      when we traverse btree.  Although it sacrifice the scalability but we
      can ensure that only one can modify the btree.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Joshua Schmid <jschmid@suse.com>
      Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69ccf81e
    • Jason Andryuk's avatar
      lib/ucs2_string: Correct ucs2 -> utf8 conversion · 9e8afc94
      Jason Andryuk authored
      commit a6807590 upstream.
      
      The comparisons should be >= since 0x800 and 0x80 require an additional bit
      to store.
      
      For the 3 byte case, the existing shift would drop off 2 more bits than
      intended.
      
      For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in
      byte 2.
      Signed-off-by: default avatarJason Andryuk <jandryuk@gmail.com>
      Reviewed-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Matthew Garrett <mjg59@coreos.com>
      Cc: "Lee, Chun-Yi" <jlee@suse.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e8afc94
    • Matt Fleming's avatar
      efi: Add pstore variables to the deletion whitelist · 9168b9b4
      Matt Fleming authored
      commit e246eb56 upstream.
      
      Laszlo explains why this is a good idea,
      
       'This is because the pstore filesystem can be backed by UEFI variables,
        and (for example) a crash might dump the last kilobytes of the dmesg
        into a number of pstore entries, each entry backed by a separate UEFI
        variable in the above GUID namespace, and with a variable name
        according to the above pattern.
      
        Please see "drivers/firmware/efi/efi-pstore.c".
      
        While this patch series will not prevent the user from deleting those
        UEFI variables via the pstore filesystem (i.e., deleting a pstore fs
        entry will continue to delete the backing UEFI variable), I think it
        would be nice to preserve the possibility for the sysadmin to delete
        Linux-created UEFI variables that carry portions of the crash log,
        *without* having to mount the pstore filesystem.'
      
      There's also no chance of causing machines to become bricked by
      deleting these variables, which is the whole purpose of excluding
      things from the whitelist.
      
      Use the LINUX_EFI_CRASH_GUID guid and a wildcard '*' for the match so
      that we don't have to update the string in the future if new variable
      name formats are created for crash dump variables.
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Acked-by: default avatarPeter Jones <pjones@redhat.com>
      Tested-by: default avatarPeter Jones <pjones@redhat.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: "Lee, Chun-Yi" <jlee@suse.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9168b9b4
    • Peter Jones's avatar
      efi: Make efivarfs entries immutable by default · 05913989
      Peter Jones authored
      commit ed8b0de5 upstream.
      
      "rm -rf" is bricking some peoples' laptops because of variables being
      used to store non-reinitializable firmware driver data that's required
      to POST the hardware.
      
      These are 100% bugs, and they need to be fixed, but in the mean time it
      shouldn't be easy to *accidentally* brick machines.
      
      We have to have delete working, and picking which variables do and don't
      work for deletion is quite intractable, so instead make everything
      immutable by default (except for a whitelist), and make tools that
      aren't quite so broad-spectrum unset the immutable flag.
      Signed-off-by: default avatarPeter Jones <pjones@redhat.com>
      Tested-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Acked-by: default avatarMatthew Garrett <mjg59@coreos.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      05913989