1. 17 Feb, 2017 1 commit
    • Brian Foster's avatar
      xfs: clear delalloc and cache on buffered write failure · fa7f138a
      Brian Foster authored
      The buffered write failure handling code in
      xfs_file_iomap_end_delalloc() has a couple minor problems. First, if
      written == 0, start_fsb is not rounded down and it fails to kill off a
      delalloc block if the start offset is block unaligned. This results in a
      lingering delalloc block and broken delalloc block accounting detected
      at unmount time. Fix this by rounding down start_fsb in the unlikely
      event that written == 0.
      
      Second, it is possible for a failed overwrite of a delalloc extent to
      leave dirty pagecache around over a hole in the file. This is because is
      possible to hit ->iomap_end() on write failure before the iomap code has
      attempted to allocate pagecache, and thus has no need to clean it up. If
      the targeted delalloc extent was successfully written by a previous
      write, however, then it does still have dirty pages when ->iomap_end()
      punches out the underlying blocks. This ultimately results in writeback
      over a hole. To fix this problem, unconditionally punch out the
      pagecache from XFS before the associated delalloc range.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      fa7f138a
  2. 09 Feb, 2017 6 commits
  3. 07 Feb, 2017 5 commits
  4. 03 Feb, 2017 1 commit
  5. 02 Feb, 2017 7 commits
    • Darrick J. Wong's avatar
      xfs: mark speculative prealloc CoW fork extents unwritten · 5eda4300
      Darrick J. Wong authored
      Christoph Hellwig pointed out that there's a potentially nasty race when
      performing simultaneous nearby directio cow writes:
      
      "Thread 1 writes a range from B to c
      
      "                    B --------- C
                                 p
      
      "a little later thread 2 writes from A to B
      
      "        A --------- B
                     p
      
      [editor's note: the 'p' denote cowextsize boundaries, which I added to
      make this more clear]
      
      "but the code preallocates beyond B into the range where thread
      "1 has just written, but ->end_io hasn't been called yet.
      "But once ->end_io is called thread 2 has already allocated
      "up to the extent size hint into the write range of thread 1,
      "so the end_io handler will splice the unintialized blocks from
      "that preallocation back into the file right after B."
      
      We can avoid this race by ensuring that thread 1 cannot accidentally
      remap the blocks that thread 2 allocated (as part of speculative
      preallocation) as part of t2's write preparation in t1's end_io handler.
      The way we make this happen is by taking advantage of the unwritten
      extent flag as an intermediate step.
      
      Recall that when we begin the process of writing data to shared blocks,
      we create a delayed allocation extent in the CoW fork:
      
      D: --RRRRRRSSSRRRRRRRR---
      C: ------DDDDDDD---------
      
      When a thread prepares to CoW some dirty data out to disk, it will now
      convert the delalloc reservation into an /unwritten/ allocated extent in
      the cow fork.  The da conversion code tries to opportunistically
      allocate as much of a (speculatively prealloc'd) extent as possible, so
      we may end up allocating a larger extent than we're actually writing
      out:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UUUUUUU---------
      
      Next, we convert only the part of the extent that we're actively
      planning to write to normal (i.e. not unwritten) status:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UURRUUU---------
      
      If the write succeeds, the end_cow function will now scan the relevant
      range of the CoW fork for real extents and remap only the real extents
      into the data fork:
      
      D: --RRRRRRRRSRRRRRRRR---
      U: ------UU--UUU---------
      
      This ensures that we never obliterate valid data fork extents with
      unwritten blocks from the CoW fork.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5eda4300
    • Darrick J. Wong's avatar
      xfs: allow unwritten extents in the CoW fork · 05a630d7
      Darrick J. Wong authored
      In the data fork, we only allow extents to perform the following state
      transitions:
      
      delay -> real <-> unwritten
      
      There's no way to move directly from a delalloc reservation to an
      /unwritten/ allocated extent.  However, for the CoW fork we want to be
      able to do the following to each extent:
      
      delalloc -> unwritten -> written -> remapped to data fork
      
      This will help us to avoid a race in the speculative CoW preallocation
      code between a first thread that is allocating a CoW extent and a second
      thread that is remapping part of a file after a write.  In order to do
      this, however, we need two things: first, we have to be able to
      transition from da to unwritten, and second the function that converts
      between real and unwritten has to be made aware of the cow fork.  Do
      both of those things.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      05a630d7
    • Darrick J. Wong's avatar
      xfs: verify free block header fields · de14c5f5
      Darrick J. Wong authored
      Perform basic sanity checking of the directory free block header
      fields so that we avoid hanging the system on invalid data.
      
      (Granted that just means that now we shutdown on directory write,
      but that seems better than hanging...)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      de14c5f5
    • Darrick J. Wong's avatar
      xfs: check for obviously bad level values in the bmbt root · b3bf607d
      Darrick J. Wong authored
      We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
      no such thing as a zero-level bmbt (for that we have extents format),
      so if we see this, send back an error code.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      b3bf607d
    • Darrick J. Wong's avatar
      xfs: filter out obviously bad btree pointers · d5a91bae
      Darrick J. Wong authored
      Don't let anybody load an obviously bad btree pointer.  Since the values
      come from disk, we must return an error, not just ASSERT.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      d5a91bae
    • Darrick J. Wong's avatar
      xfs: fail _dir_open when readahead fails · 7a652bbe
      Darrick J. Wong authored
      When we open a directory, we try to readahead block 0 of the directory
      on the assumption that we're going to need it soon.  If the bmbt is
      corrupt, the directory will never be usable and the readahead fails
      immediately, so we might as well prevent the directory from being opened
      at all.  This prevents a subsequent read or modify operation from
      hitting it and taking the fs offline.
      
      NOTE: We're only checking for early failures in the block mapping, not
      the readahead directory block itself.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      7a652bbe
    • Darrick J. Wong's avatar
      xfs: fix toctou race when locking an inode to access the data map · 4b5bd5bf
      Darrick J. Wong authored
      We use di_format and if_flags to decide whether we're grabbing the ilock
      in btree mode (btree extents not loaded) or shared mode (anything else),
      but the state of those fields can be changed by other threads that are
      also trying to load the btree extents -- IFEXTENTS gets set before the
      _bmap_read_extents call and cleared if it fails.
      
      We don't actually need to have IFEXTENTS set until after the bmbt
      records are successfully loaded and validated, which will fix the race
      between multiple threads trying to read the same directory.  The next
      patch strengthens directory bmbt validation by refusing to open the
      directory if reading the bmbt to start directory readahead fails.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      4b5bd5bf
  6. 31 Jan, 2017 9 commits
  7. 29 Jan, 2017 3 commits
    • Linus Torvalds's avatar
      Linux 4.10-rc6 · 566cf877
      Linus Torvalds authored
      566cf877
    • Linus Torvalds's avatar
      drm/i915: Check for NULL i915_vma in intel_unpin_fb_obj() · 39cb2c9a
      Linus Torvalds authored
      I've seen this trigger twice now, where the i915_gem_object_to_ggtt()
      call in intel_unpin_fb_obj() returns NULL, resulting in an oops
      immediately afterwards as the (inlined) call to i915_vma_unpin_fence()
      tries to dereference it.
      
      It seems to be some race condition where the object is going away at
      shutdown time, since both times happened when shutting down the X
      server.  The call chains were different:
      
       - VT ioctl(KDSETMODE, KD_TEXT):
      
          intel_cleanup_plane_fb+0x5b/0xa0 [i915]
          drm_atomic_helper_cleanup_planes+0x6f/0x90 [drm_kms_helper]
          intel_atomic_commit_tail+0x749/0xfe0 [i915]
          intel_atomic_commit+0x3cb/0x4f0 [i915]
          drm_atomic_commit+0x4b/0x50 [drm]
          restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper]
          drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
          drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
          intel_fbdev_set_par+0x18/0x70 [i915]
          fb_set_var+0x236/0x460
          fbcon_blank+0x30f/0x350
          do_unblank_screen+0xd2/0x1a0
          vt_ioctl+0x507/0x12a0
          tty_ioctl+0x355/0xc30
          do_vfs_ioctl+0xa3/0x5e0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x13/0x94
      
       - i915 unpin_work workqueue:
      
          intel_unpin_work_fn+0x58/0x140 [i915]
          process_one_work+0x1f1/0x480
          worker_thread+0x48/0x4d0
          kthread+0x101/0x140
      
      and this patch purely papers over the issue by adding a NULL pointer
      check and a WARN_ON_ONCE() to avoid the oops that would then generally
      make the machine unresponsive.
      
      Other callers of i915_gem_object_to_ggtt() seem to also check for the
      returned pointer being NULL and warn about it, so this clearly has
      happened before in other places.
      
      [ Reported it originally to the i915 developers on Jan 8, applying the
        ugly workaround on my own now after triggering the problem for the
        second time with no feedback.
      
        This is likely to be the same bug reported as
      
           https://bugs.freedesktop.org/show_bug.cgi?id=98829
           https://bugs.freedesktop.org/show_bug.cgi?id=99134
      
        which has a patch for the underlying problem, but it hasn't gotten to
        me, so I'm applying the workaround. ]
      
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39cb2c9a
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 2c5d9555
      Linus Torvalds authored
      Pull two parisc fixes from Helge Deller:
       "One fix to avoid usage of BITS_PER_LONG in user-space exported swab.h
        header which breaks compiling qemu, and one trivial fix for printk
        continuation in the parisc parport driver"
      
      * 'parisc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header
        parisc, parport_gsc: Fixes for printk continuation lines
      2c5d9555
  8. 28 Jan, 2017 7 commits
  9. 27 Jan, 2017 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 1b1bc42c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
          DF on transmit).
      
       2) Netfilter needs to reflect the fwmark when sending resets, from Pau
          Espin Pedrol.
      
       3) nftable dump OOPS fix from Liping Zhang.
      
       4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
          from Rolf Neugebauer.
      
       5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
          Bergmann.
      
       6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
          from Eric Dumazet.
      
       7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.
      
       8) tcp_fastopen_create_child() needs to setup tp->max_window, from
          Alexey Kodanev.
      
       9) Missing kmemdup() failure check in ipv6 segment routing code, from
          Eric Dumazet.
      
      10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
          with splice. From WANG Cong.
      
      11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
          therefore callers must reload cached header pointers into that skb.
          Fix from Eric Dumazet.
      
      12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
          Tobias Regnery.
      
      13) Do not allow lwtunnel drivers to be unloaded while they are
          referenced by active instances, from Robert Shearman.
      
      14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.
      
      15) Fix a few regressions from virtio_net XDP support, from John
          Fastabend and Jakub Kicinski.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
        ISDN: eicon: silence misleading array-bounds warning
        net: phy: micrel: add support for KSZ8795
        gtp: fix cross netns recv on gtp socket
        gtp: clear DF bit on GTP packet tx
        gtp: add genl family modules alias
        tcp: don't annotate mark on control socket from tcp_v6_send_response()
        ravb: unmap descriptors when freeing rings
        virtio_net: reject XDP programs using header adjustment
        virtio_net: use dev_kfree_skb for small buffer XDP receive
        r8152: check rx after napi is enabled
        r8152: re-schedule napi for tx
        r8152: avoid start_xmit to schedule napi when napi is disabled
        r8152: avoid start_xmit to call napi_schedule during autosuspend
        net: dsa: Bring back device detaching in dsa_slave_suspend()
        net: phy: leds: Fix truncated LED trigger names
        net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
        net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
        net-next: ethernet: mediatek: change the compatible string
        Documentation: devicetree: change the mediatek ethernet compatible string
        bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
        ...
      1b1bc42c