1. 24 Nov, 2016 5 commits
  2. 09 Nov, 2016 1 commit
    • Brian Foster's avatar
      xfs: fix unbalanced inode reclaim flush locking · 98efe8af
      Brian Foster authored
      Filesystem shutdown testing on an older distro kernel has uncovered an
      imbalanced locking pattern for the inode flush lock in
      xfs_reclaim_inode(). Specifically, there is a double unlock sequence
      between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
      "reclaim:" label.
      
      This actually does not cause obvious problems on current kernels due to
      the current flush lock implementation. Older kernels use a counting
      based flush lock mechanism, however, which effectively breaks the lock
      indefinitely when an already unlocked flush lock is repeatedly unlocked.
      Though this only currently occurs on filesystem shutdown, it has
      reproduced the effect of elevating an fs shutdown to a system-wide crash
      or hang.
      
      As it turns out, the flush lock is not actually required for the reclaim
      logic in xfs_reclaim_inode() because by that time we have already cycled
      the flush lock once while holding ILOCK_EXCL. Therefore, remove the
      additional flush lock/unlock cycle around the 'reclaim:' label and
      update branches into this label to release the flush lock where
      appropriate. Add an assert to xfs_ifunlock() to help prevent future
      occurences of the same problem.
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      98efe8af
  3. 08 Nov, 2016 4 commits
    • Eric Sandeen's avatar
      xfs: provide helper for counting extents from if_bytes · 5d829300
      Eric Sandeen authored
      The open-coded pattern:
      
      ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)
      
      is all over the xfs code; provide a new helper
      xfs_iext_count(ifp) to count the number of inline extents
      in an inode fork.
      
      [dchinner: pick up several missed conversions]
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5d829300
    • Eric Sandeen's avatar
      xfs: fix up xfs_swap_extent_forks inline extent handling · 4dfce57d
      Eric Sandeen authored
      There have been several reports over the years of NULL pointer
      dereferences in xfs_trans_log_inode during xfs_fsr processes,
      when the process is doing an fput and tearing down extents
      on the temporary inode, something like:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
          [exception RIP: xfs_trans_log_inode+0x10]
       #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
      #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
      #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
      #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
      #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
      #14 [ffff8800a57bbe00] evict at ffffffff811e1b67
      #15 [ffff8800a57bbe28] iput at ffffffff811e23a5
      #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
      #17 [ffff8800a57bbe88] dput at ffffffff811dd06c
      #18 [ffff8800a57bbea8] __fput at ffffffff811c823b
      #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
      #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
      #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
      #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
      
      As it turns out, this is because the i_itemp pointer, along
      with the d_ops pointer, has been overwritten with zeros
      when we tear down the extents during truncate.  When the in-core
      inode fork on the temporary inode used by xfs_fsr was originally
      set up during the extent swap, we mistakenly looked at di_nextents
      to determine whether all extents fit inline, but this misses extents
      generated by speculative preallocation; we should be using if_bytes
      instead.
      
      This mistake corrupts the in-memory inode, and code in
      xfs_iext_remove_inline eventually gets bad inputs, causing
      it to memmove and memset incorrect ranges; this became apparent
      because the two values in ifp->if_u2.if_inline_ext[1] contained
      what should have been in d_ops and i_itemp; they were memmoved due
      to incorrect array indexing and then the original locations
      were zeroed with memset, again due to an array overrun.
      
      Fix this by properly using i_df.if_bytes to determine the number
      of extents, not di_nextents.
      
      Thanks to dchinner for looking at this with me and spotting the
      root cause.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      4dfce57d
    • Brian Foster's avatar
      xfs: don't BUG() on mixed direct and mapped I/O · 04197b34
      Brian Foster authored
      We've had reports of generic/095 causing XFS to BUG() in
      __xfs_get_blocks() due to the existence of delalloc blocks on a
      direct I/O read. generic/095 issues a mix of various types of I/O,
      including direct and memory mapped I/O to a single file. This is
      clearly not supported behavior and is known to lead to such
      problems. E.g., the lack of exclusion between the direct I/O and
      write fault paths means that a write fault can allocate delalloc
      blocks in a region of a file that was previously a hole after the
      direct read has attempted to flush/inval the file range, but before
      it actually reads the block mapping. In turn, the direct read
      discovers a delalloc extent and cannot proceed.
      
      While the appropriate solution here is to not mix direct and memory
      mapped I/O to the same regions of the same file, the current
      BUG_ON() behavior is probably overkill as it can crash the entire
      system.  Instead, localize the failure to the I/O in question by
      returning an error for a direct I/O that cannot be handled safely
      due to delalloc blocks. Be careful to allow the case of a direct
      write to post-eof delalloc blocks. This can occur due to speculative
      preallocation and is safe as post-eof blocks are not accompanied by
      dirty pages in pagecache (conversely, preallocation within eof must
      have been zeroed, and thus dirtied, before the inode size could have
      been increased beyond said blocks).
      
      Finally, provide an additional warning if a direct I/O write occurs
      while the file is memory mapped. This may not catch all problematic
      scenarios, but provides a hint that some known-to-be-problematic I/O
      methods are in use.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      04197b34
    • Brian Foster's avatar
      xfs: don't skip cow forks w/ delalloc blocks in cowblocks scan · 39937234
      Brian Foster authored
      The cowblocks background scanner currently clears the cowblocks tag
      for inodes without any real allocations in the cow fork. This
      excludes inodes with only delalloc blocks in the cow fork. While we
      might never expect to clear delalloc blocks from the cow fork in the
      background scanner, it is not necessarily correct to clear the
      cowblocks tag from such inodes.
      
      For example, if the background scanner happens to process an inode
      between a buffered write and writeback, the scanner catches the
      inode in a state after delalloc blocks have been allocated to the
      cow fork but before the delalloc blocks have been converted to real
      blocks by writeback. The background scanner then incorrectly clears
      the cowblocks tag, even if part of the aforementioned delalloc
      reservation will not be remapped to the data fork (i.e., extra
      blocks due to the cowextsize hint). This means that any such
      additional blocks in the cow fork might never be reclaimed by the
      background scanner and could persist until the inode itself is
      reclaimed.
      
      To address this problem, only skip and clear inodes without any cow
      fork allocations whatsoever from the background scanner. While we
      generally do not want to cancel delalloc reservations from the
      background scanner, the pagecache dirty check following the
      cowblocks check should prevent that situation. If we do end up with
      delalloc cow fork blocks without a dirty address space mapping, this
      is probably an indication that something has gone wrong and the
      blocks should be reclaimed, as they may never be converted to a real
      allocation.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      39937234
  4. 24 Oct, 2016 4 commits
  5. 20 Oct, 2016 22 commits
  6. 15 Oct, 2016 4 commits
    • Linus Torvalds's avatar
      Linux 4.9-rc1 · 1001354c
      Linus Torvalds authored
      1001354c
    • Linus Torvalds's avatar
      Merge tag 'befs-v4.9-rc1' of git://github.com/luisbg/linux-befs · df34d04a
      Linus Torvalds authored
      Pull befs fixes from Luis de Bethencourt:
       "I recently took maintainership of the befs file system [0]. This is
        the first time I send you a git pull request, so please let me know if
        all the below is OK.
      
        Salah Triki and myself have been cleaning the code and fixing a few
        small bugs.
      
        Sorry I couldn't send this sooner in the merge window, I was waiting
        to have my GPG key signed by kernel members at ELCE in Berlin a few
        days ago."
      
      [0] https://lkml.org/lkml/2016/7/27/502
      
      * tag 'befs-v4.9-rc1' of git://github.com/luisbg/linux-befs: (39 commits)
        befs: befs: fix style issues in datastream.c
        befs: improve documentation in datastream.c
        befs: fix typos in datastream.c
        befs: fix typos in btree.c
        befs: fix style issues in super.c
        befs: fix comment style
        befs: add check for ag_shift in superblock
        befs: dump inode_size superblock information
        befs: remove unnecessary initialization
        befs: fix typo in befs_sb_info
        befs: add flags field to validate superblock state
        befs: fix typo in befs_find_key
        befs: remove unused BEFS_BT_PARMATCH
        fs: befs: remove ret variable
        fs: befs: remove in vain variable assignment
        fs: befs: remove unnecessary *befs_sb variable
        fs: befs: remove useless initialization to zero
        fs: befs: remove in vain variable assignment
        fs: befs: Insert NULL inode to dentry
        fs: befs: Remove useless calls to brelse in befs_find_brun_dblindirect
        ...
      df34d04a
    • Linus Torvalds's avatar
      Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 9ffc6694
      Linus Torvalds authored
      Pull gcc plugins update from Kees Cook:
       "This adds a new gcc plugin named "latent_entropy". It is designed to
        extract as much possible uncertainty from a running system at boot
        time as possible, hoping to capitalize on any possible variation in
        CPU operation (due to runtime data differences, hardware differences,
        SMP ordering, thermal timing variation, cache behavior, etc).
      
        At the very least, this plugin is a much more comprehensive example
        for how to manipulate kernel code using the gcc plugin internals"
      
      * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        latent_entropy: Mark functions with __latent_entropy
        gcc-plugins: Add latent_entropy plugin
      9ffc6694
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 133d970e
      Linus Torvalds authored
      Pull MIPS updates from Ralf Baechle:
       "This is the main MIPS pull request for 4.9:
      
        MIPS core arch code:
         - traps: 64bit kernels should read CP0_EBase 64bit
         - traps: Convert ebase to KSEG0
         - c-r4k: Drop bc_wback_inv() from icache flush
         - c-r4k: Split user/kernel flush_icache_range()
         - cacheflush: Use __flush_icache_user_range()
         - uprobes: Flush icache via kernel address
         - KVM: Use __local_flush_icache_user_range()
         - c-r4k: Fix flush_icache_range() for EVA
         - Fix -mabi=64 build of vdso.lds
         - VDSO: Drop duplicated -I*/-E* aflags
         - tracing: move insn_has_delay_slot to a shared header
         - tracing: disable uprobe/kprobe on compact branch instructions
         - ptrace: Fix regs_return_value for kernel context
         - Squash lines for simple wrapper functions
         - Move identification of VP(E) into proc.c from smp-mt.c
         - Add definitions of SYNC barrierstype values
         - traps: Ensure full EBase is written
         - tlb-r4k: If there are wired entries, don't use TLBINVF
         - Sanitise coherentio semantics
         - dma-default: Don't check hw_coherentio if device is non-coherent
         - Support per-device DMA coherence
         - Adjust MIPS64 CAC_BASE to reflect Config.K0
         - Support generating Flattened Image Trees (.itb)
         - generic: Introduce generic DT-based board support
         - generic: Convert SEAD-3 to a generic board
         - Enable hardened usercopy
         - Don't specify STACKPROTECTOR in defconfigs
      
        Octeon:
         - Delete dead code and files across the platform.
         - Change to use all memory into use by default.
         - Rename upper case variables in setup code to lowercase.
         - Delete legacy hack for broken bootloaders.
         - Leave maintaining the link state to the actual ethernet/PHY drivers.
         - Add DTS for D-Link DSR-500N.
         - Fix PCI interrupt routing on D-Link DSR-500N.
      
        Pistachio:
         - Remove ANDROID_TIMED_OUTPUT from defconfig
      
        TX39xx:
         - Move GPIO setup from .mem_setup() to .arch_init()
         - Convert to Common Clock Framework
      
        TX49xx:
         - Move GPIO setup from .mem_setup() to .arch_init()
         - Convert to Common Clock Framework
      
        txx9wdt:
         - Add missing clock (un)prepare calls for CCF
      
        BMIPS:
         - Add PW, GPIO SDHCI and NAND device node names
         - Support APPENDED_DTB
         - Add missing bcm97435svmb to DT_NONE
         - Rename bcm96358nb4ser to bcm6358-neufbox4-sercom
         - Add DT examples for BCM63268, BCM3368 and BCM6362
         - Add support for BCM3368 and BCM6362
      
        PCI
         - Reduce stack frame usage
         - Use struct list_head lists
         - Support for CONFIG_PCI_DOMAINS_GENERIC
         - Make pcibios_set_cache_line_size an initcall
         - Inline pcibios_assign_all_busses
         - Split pci.c into pci.c & pci-legacy.c
         - Introduce CONFIG_PCI_DRIVERS_LEGACY
         - Support generic drivers
      
        CPC
         - Convert bare 'unsigned' to 'unsigned int'
         - Avoid lock when MIPS CM >= 3 is present
      
        GIC:
         - Delete unused file smp-gic.c
      
        mt7620:
         - Delete unnecessary assignment for the field "owner" from PCI
      
        BCM63xx:
         - Let clk_disable() return immediately if clk is NULL
      
        pm-cps:
         - Change FSB workaround to CPU blacklist
         - Update comments on barrier instructions
         - Use MIPS standard lightweight ordering barrier
         - Use MIPS standard completion barrier
         - Remove selection of sync types
         - Add MIPSr6 CPU support
         - Support CM3 changes to Coherence Enable Register
      
        SMP:
         - Wrap call to mips_cpc_lock_other in mips_cm_lock_other
         - Introduce mechanism for freeing and allocating IPIs
      
        cpuidle:
         - cpuidle-cps: Enable use with MIPSr6 CPUs.
      
        SEAD3:
         - Rewrite to use DT and generic kernel feature.
      
        USB:
         - host: ehci-sead3: Remove SEAD-3 EHCI code
      
        FBDEV:
         - cobalt_lcdfb: Drop SEAD3 support
      
        dt-bindings:
         -  Document a binding for simple ASCII LCDs
      
        auxdisplay:
         - img-ascii-lcd: driver for simple ASCII LCD displays
      
        irqchip i8259:
         - i8259: Add domain before mapping parent irq
         - i8259: Allow platforms to override poll function
         - i8259: Remove unused i8259A_irq_pending
      
        Malta:
         - Rewrite to use DT
      
        of/platform:
         - Probe "isa" busses by default
      
        CM:
         - Print CM error reports upon bus errors
      
        Module:
         - Migrate exception table users off module.h and onto extable.h
         - Make various drivers explicitly non-modular:
         - Audit and remove any unnecessary uses of module.h
      
        mailmap:
         - Canonicalize to Qais' current email address.
      
        Documentation:
         - MIPS supports HAVE_REGS_AND_STACK_ACCESS_API
      
        Loongson1C:
         - Add CPU support for Loongson1C
         - Add board support
         - Add defconfig
         - Add RTC support for Loongson1C board
      
        All this except one Documentation fix has sat in linux-next and has
        survived Imagination's automated build test system"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (127 commits)
        Documentation: MIPS supports HAVE_REGS_AND_STACK_ACCESS_API
        MIPS: ptrace: Fix regs_return_value for kernel context
        MIPS: VDSO: Drop duplicated -I*/-E* aflags
        MIPS: Fix -mabi=64 build of vdso.lds
        MIPS: Enable hardened usercopy
        MIPS: generic: Convert SEAD-3 to a generic board
        MIPS: generic: Introduce generic DT-based board support
        MIPS: Support generating Flattened Image Trees (.itb)
        MIPS: Adjust MIPS64 CAC_BASE to reflect Config.K0
        MIPS: Print CM error reports upon bus errors
        MIPS: Support per-device DMA coherence
        MIPS: dma-default: Don't check hw_coherentio if device is non-coherent
        MIPS: Sanitise coherentio semantics
        MIPS: PCI: Support generic drivers
        MIPS: PCI: Introduce CONFIG_PCI_DRIVERS_LEGACY
        MIPS: PCI: Split pci.c into pci.c & pci-legacy.c
        MIPS: PCI: Inline pcibios_assign_all_busses
        MIPS: PCI: Make pcibios_set_cache_line_size an initcall
        MIPS: PCI: Support for CONFIG_PCI_DOMAINS_GENERIC
        MIPS: PCI: Use struct list_head lists
        ...
      133d970e