1. 17 Dec, 2015 6 commits
    • Omar Sandoval's avatar
      Btrfs: implement the free space B-tree · a5ed9182
      Omar Sandoval authored
      The free space cache has turned out to be a scalability bottleneck on
      large, busy filesystems. When the cache for a lot of block groups needs
      to be written out, we can get extremely long commit times; if this
      happens in the critical section, things are especially bad because we
      block new transactions from happening.
      
      The main problem with the free space cache is that it has to be written
      out in its entirety and is managed in an ad hoc fashion. Using a B-tree
      to store free space fixes this: updates can be done as needed and we get
      all of the benefits of using a B-tree: checksumming, RAID handling,
      well-understood behavior.
      
      With the free space tree, we get commit times that are about the same as
      the no cache case with load times slower than the free space cache case
      but still much faster than the no cache case. Free space is represented
      with extents until it becomes more space-efficient to use bitmaps,
      giving us similar space overhead to the free space cache.
      
      The operations on the free space tree are: adding and removing free
      space, handling the creation and deletion of block groups, and loading
      the free space for a block group. We can also create the free space tree
      by walking the extent tree and clear the free space tree.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a5ed9182
    • Omar Sandoval's avatar
      Btrfs: introduce the free space B-tree on-disk format · 208acb8c
      Omar Sandoval authored
      The on-disk format for the free space tree is straightforward. Each
      block group is represented in the free space tree by a free space info
      item that stores accounting information: whether the free space for this
      block group is stored as bitmaps or extents and how many extents of free
      space exist for this block group (regardless of which format is being
      used in the tree). Extents are (start, FREE_SPACE_EXTENT, length) keys
      with no corresponding item, and bitmaps instead have the
      FREE_SPACE_BITMAP type and have a bitmap item attached, which is just an
      array of bytes.
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      208acb8c
    • Omar Sandoval's avatar
      Btrfs: refactor caching_thread() · 73fa48b6
      Omar Sandoval authored
      We're also going to load the free space tree from caching_thread(), so
      we should refactor some of the common code.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      73fa48b6
    • Omar Sandoval's avatar
      Btrfs: add helpers for read-only compat bits · 1abfbcdf
      Omar Sandoval authored
      We're finally going to add one of these for the free space tree, so
      let's add the same nice helpers that we have for the incompat bits.
      While we're add it, also add helpers to clear the bits.
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      1abfbcdf
    • Omar Sandoval's avatar
      Btrfs: add extent buffer bitmap sanity tests · 0f331229
      Omar Sandoval authored
      Sanity test the extent buffer bitmap operations (test, set, and clear)
      against the equivalent standard kernel operations.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      0f331229
    • Omar Sandoval's avatar
      Btrfs: add extent buffer bitmap operations · 3e1e8bb7
      Omar Sandoval authored
      These are going to be used for the free space tree bitmap items.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3e1e8bb7
  2. 02 Nov, 2015 1 commit
  3. 01 Nov, 2015 7 commits
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 95fc00a4
      Linus Torvalds authored
      Pull memremap fix from Dan Williams:
       "The new memremap() api introduced in the 4.3 cycle to unify/replace
        ioremap_cache() and ioremap_wt() is mishandling the highmem case.
        This patch has received a build success notification from a
        0day-kbuild-robot run and has received an ack from Ard"
      
      From the commit message:
       "The impact of this bug is low for now since the pmem driver is the
        only user of memremap(), but this is important to fix before more
        conversions to memremap arrive in 4.4"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        memremap: fix highmem support
      95fc00a4
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ca04d396
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "This set of updates contains:
      
         - Another bugfix for the pathologic vm86 machinery.  Clear
           thread.vm86 on fork to prevent corrupting the parent state.  This
           comes along with an update to the vm86 selftest case
      
         - Fix another corner case in the ioapic setup code which causes a
           boot crash on some oddball systems
      
         - Fix the fallout from the dma allocation consolidation work, which
           leads to a NULL pointer dereference when the allocation code is
           called with a NULL device"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vm86: Set thread.vm86 to NULL on fork/clone
        selftests/x86: Add a fork() to entry_from_vm86 to catch fork bugs
        x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest()
        x86/dma-mapping: Fix arch_dma_alloc_attrs() oops with NULL dev
      ca04d396
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f5eab267
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "The last round of minimalistic fixes for clocksource drivers:
      
         - Prevent multiple shutdown of the sh_mtu2 clocksource
      
         - Annotate a bunch of clocksource/schedclock functions with notrace
           to prevent an annoying ftrace recursion issue"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/sh_mtu2: Fix multiple shutdown call issue
        clocksource/drivers/digicolor: Prevent ftrace recursion
        clocksource/drivers/fsl_ftm_timer: Prevent ftrace recursion
        clocksource/drivers/vf_pit_timer: Prevent ftrace recursion
        clocksource/drivers/prima2: Prevent ftrace recursion
        clocksource/drivers/samsung_pwm_timer: Prevent ftrace recursion
        clocksource/drivers/pistachio: Prevent ftrace recursion
        clocksource/drivers/arm_global_timer: Prevent ftrace recursion
      f5eab267
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4bf690d7
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "The last two one-liners for 4.3 from the irqchip space:
      
         - Regression fix for armada SoC which addresses the fallout from the
           set_irq_flags() cleanup
      
         - Add the missing propagation of the irq_set_type() callback to the
           parent interrupt controller of the tegra interrupt chip"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/tegra: Propagate IRQ type setting to parent
        irqchip/armada-370-xp: Fix regression by clearing IRQ_NOAUTOEN
      4bf690d7
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 56ef9db2
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "This should be our final batch of fixes for 4.3:
      
         - A patch from Sudeep Holla that fixes annotation of wakeup sources
           properly, old unused format seems to have spread through copying.
      
         - Two patches from Tony for OMAP.  One dealing with MUSB setup
           problems due to runtime PM being enabled too early on the parent
           device.  The other fixes IRQ numbering for OMAP1"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        usb: musb: omap2430: Fix regression caused by driver core change
        ARM: OMAP1: fix incorrect INT_DMA_LCD
        ARM: dts: fix gpio-keys wakeup-source property
      56ef9db2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 060b85b0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is three essential bug fixes for various SCSI parts.
      
        The only affected users are SCSI multi-path via device handler
        (basically all the enterprise) and mvsas users.  The dh bugs are an
        async entanglement in boot resulting in a serious WARN_ON trip and a
        use after free on remove leading to a crash with strict memory
        accounting.  The mvsas bug manifests as a null deref oops but only on
        abort sequences; however, these can commonly occur with SATA attached
        devices, hence the fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi_dh: don't try to load a device handler during async probing
        scsi_dh: fix use-after-free when removing scsi device
        mvsas: Fix NULL pointer dereference in mvs_slot_task_free
      060b85b0
    • Linus Torvalds's avatar
      Merge tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md · af7eba01
      Linus Torvalds authored
      Pull md bug fixes from Neil Brown:
       "Two more bug fixes for md.
      
        One bugfix for a list corruption in raid5 because of incorrect
        locking.
      
        Other for possible data corruption when a recovering device is failed,
        removed, and re-added.
      
        Both tagged for -stable"
      
      * tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md:
        Revert "md: allow a partially recovered device to be hot-added to an array."
        md/raid5: fix locking in handle_stripe_clean_event()
      af7eba01
  4. 31 Oct, 2015 12 commits
  5. 30 Oct, 2015 4 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 9b971e77
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Apologies for this being so late, but we've uncovered a few nasty
        issues on arm64 which didn't settle down until yesterday and the fixes
        all look suitable for 4.3.  Of the four patches, three of them are
        Cc'd to stable, with the remaining patch fixing an issue that only
        took effect during the merge window.
      
        Summary:
      
         - Fix corruption in SWP emulation when STXR fails due to contention
         - Fix MMU re-initialisation when resuming from a low-power state
         - Fix stack unwinding code to match what ftrace expects
         - Fix relocation code in the EFI stub when DRAM base is not 2MB aligned"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/efi: do not assume DRAM base is aligned to 2 MB
        Revert "ARM64: unwind: Fix PC calculation"
        arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
        arm64: compat: fix stxr failure case in SWP emulation
      9b971e77
    • Linus Torvalds's avatar
      Merge tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 7c0f488f
      Linus Torvalds authored
      Pull ia64 kcmp syscall from Tony Luck:
       "Missed adding the kcmp() syscall a long time ago.  Now it seems that
        it is essential to build systemd"
      
      * tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Wire up kcmp syscall
      7c0f488f
    • Roman Gushchin's avatar
      md/raid5: fix locking in handle_stripe_clean_event() · b8a9d66d
      Roman Gushchin authored
      After commit 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      __find_stripe() is called under conf->hash_locks + hash.
      But handle_stripe_clean_event() calls remove_hash() under
      conf->device_lock.
      
      Under some cirscumstances the hash chain can be circuited,
      and we get an infinite loop with disabled interrupts and locked hash
      lock in __find_stripe(). This leads to hard lockup on multiple CPUs
      and following system crash.
      
      I was able to reproduce this behavior on raid6 over 6 ssd disks.
      The devices_handle_discard_safely option should be set to enable trim
      support. The following script was used:
      
      for i in `seq 1 32`; do
          dd if=/dev/zero of=large$i bs=10M count=100 &
      done
      
      neilb: original was against a 3.x kernel.  I forward-ported
        to 4.3-rc.  This verison is suitable for any kernel since
        Commit: 59fc630b ("RAID5: batch adjacent full stripe write")
        (v4.1+).  I'll post a version for earlier kernels to stable.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Fixes: 566c09c5 ("raid5: relieve lock contention in get_active_stripe()")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <stable@vger.kernel.org> # 3.13 - 4.2
      b8a9d66d
    • Ronny Hegewald's avatar
      rbd: require stable pages if message data CRCs are enabled · bae818ee
      Ronny Hegewald authored
      rbd requires stable pages, as it performs a crc of the page data before
      they are send to the OSDs.
      
      But since kernel 3.9 (patch 1d1d1a76
      "mm: only enforce stable page writes if the backing device requires
      it") it is not assumed anymore that block devices require stable pages.
      
      This patch sets the necessary flag to get stable pages back for rbd.
      
      In a ceph installation that provides multiple ext4 formatted rbd
      devices "bad crc" messages appeared regularly (ca 1 message every 1-2
      minutes on every OSD that provided the data for the rbd) in the
      OSD-logs before this patch. After this patch this messages are pretty
      much gone (only ca 1-2 / month / OSD).
      
      Cc: stable@vger.kernel.org # 3.9+, needs backporting
      Signed-off-by: default avatarRonny Hegewald <Ronny.Hegewald@online.de>
      [idryomov@gmail.com: require stable pages only in crc case, changelog]
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      bae818ee
  6. 29 Oct, 2015 6 commits
  7. 28 Oct, 2015 4 commits
    • Émeric MASCHINO's avatar
      [IA64] Wire up kcmp syscall · d305c477
      Émeric MASCHINO authored
      systemd > 218 fails to compile on ia64 with:
      
           error: ‘__NR_kcmp’ undeclared [1].
      
      I've been told that this is because the kcmp syscall hasn't been wired up
      for the ia64 arch [2].
      
      The proposed patch thus wire up the kcmp syscall for the ia64 arch.
      
      [1] https://bugs.gentoo.org/show_bug.cgi?id=560492
      [2] https://bugs.gentoo.org/show_bug.cgi?id=560492#c17Signed-off-by: default avatarÉmeric MASCHINO <emeric.maschino@gmail.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      d305c477
    • Tony Lindgren's avatar
      usb: musb: omap2430: Fix regression caused by driver core change · 8f2279d5
      Tony Lindgren authored
      Commit ddef08dd ("Driver core: wakeup the parent device before trying
      probe") started automatically ensuring the parent device is enabled when
      the child gets probed.
      
      This however caused a regression for MUSB omap2430 interface as the
      runtime PM for the parent device needs the child initialized to access
      the MUSB hardware registers.
      
      Let's delay the enabling of PM runtime for the parent until the child
      has been properly initialized as suggested in an earlier patch by
      Grygorii Strashko <grygorii.strashko@ti.com>.
      
      In addition to delaying pm_runtime_enable, we now also need to make sure
      the parent is enabled during omap2430_musb_init. We also want to propagate
      an error from omap2430_runtime_resume if struct musb is not initialized.
      
      Note that we use pm_runtime_put_noidle here for both the child and parent
      to prevent an extra runtime_suspend/resume cycle.
      
      Let's also add some comments to avoid confusion between the
      two different devices.
      
      Fixes: ddef08dd ("Driver core: wakeup the parent device before
      trying probe")
      Suggested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      8f2279d5
    • Will Deacon's avatar
      Revert "ARM64: unwind: Fix PC calculation" · 9702970c
      Will Deacon authored
      This reverts commit e306dfd0.
      
      With this patch applied, we were the only architecture making this sort
      of adjustment to the PC calculation in the unwinder. This causes
      problems for ftrace, where the PC values are matched against the
      contents of the stack frames in the callchain and fail to match any
      records after the address adjustment.
      
      Whilst there has been some effort to change ftrace to workaround this,
      those patches are not yet ready for mainline and, since we're the odd
      architecture in this regard, let's just step in line with other
      architectures (like arch/arm/) for now.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      9702970c
    • Lorenzo Pieralisi's avatar
      arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap · e13d918a
      Lorenzo Pieralisi authored
      Commit dd006da2 ("arm64: mm: increase VA range of identity map")
      introduced a mechanism to extend the virtual memory map range
      to support arm64 systems with system RAM located at very high offset,
      where the identity mapping used to enable/disable the MMU requires
      additional translation levels to map the physical memory at an equal
      virtual offset.
      
      The kernel detects at boot time the tcr_el1.t0sz value required by the
      identity mapping and sets-up the tcr_el1.t0sz register field accordingly,
      any time the identity map is required in the kernel (ie when enabling the
      MMU).
      
      After enabling the MMU, in the cold boot path the kernel resets the
      tcr_el1.t0sz to its default value (ie the actual configuration value for
      the system virtual address space) so that after enabling the MMU the
      memory space translated by ttbr0_el1 is restored as expected.
      
      Commit dd006da2 ("arm64: mm: increase VA range of identity map")
      also added code to set-up the tcr_el1.t0sz value when the kernel resumes
      from low-power states with the MMU off through cpu_resume() in order to
      effectively use the identity mapping to enable the MMU but failed to add
      the code required to restore the tcr_el1.t0sz to its default value, when
      the core returns to the kernel with the MMU enabled, so that the kernel
      might end up running with tcr_el1.t0sz value set-up for the identity
      mapping which can be lower than the value required by the actual virtual
      address space, resulting in an erroneous set-up.
      
      This patchs adds code in the resume path that restores the tcr_el1.t0sz
      default value upon core resume, mirroring this way the cold boot path
      behaviour therefore fixing the issue.
      
      Cc: <stable@vger.kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Fixes: dd006da2 ("arm64: mm: increase VA range of identity map")
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      e13d918a