1. 08 Apr, 2017 14 commits
    • Darrick J. Wong's avatar
      xfs: mark speculative prealloc CoW fork extents unwritten · e02f0ff2
      Darrick J. Wong authored
      commit 5eda4300 upstream.
      
      Christoph Hellwig pointed out that there's a potentially nasty race when
      performing simultaneous nearby directio cow writes:
      
      "Thread 1 writes a range from B to c
      
      "                    B --------- C
                                 p
      
      "a little later thread 2 writes from A to B
      
      "        A --------- B
                     p
      
      [editor's note: the 'p' denote cowextsize boundaries, which I added to
      make this more clear]
      
      "but the code preallocates beyond B into the range where thread
      "1 has just written, but ->end_io hasn't been called yet.
      "But once ->end_io is called thread 2 has already allocated
      "up to the extent size hint into the write range of thread 1,
      "so the end_io handler will splice the unintialized blocks from
      "that preallocation back into the file right after B."
      
      We can avoid this race by ensuring that thread 1 cannot accidentally
      remap the blocks that thread 2 allocated (as part of speculative
      preallocation) as part of t2's write preparation in t1's end_io handler.
      The way we make this happen is by taking advantage of the unwritten
      extent flag as an intermediate step.
      
      Recall that when we begin the process of writing data to shared blocks,
      we create a delayed allocation extent in the CoW fork:
      
      D: --RRRRRRSSSRRRRRRRR---
      C: ------DDDDDDD---------
      
      When a thread prepares to CoW some dirty data out to disk, it will now
      convert the delalloc reservation into an /unwritten/ allocated extent in
      the cow fork.  The da conversion code tries to opportunistically
      allocate as much of a (speculatively prealloc'd) extent as possible, so
      we may end up allocating a larger extent than we're actually writing
      out:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UUUUUUU---------
      
      Next, we convert only the part of the extent that we're actively
      planning to write to normal (i.e. not unwritten) status:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UURRUUU---------
      
      If the write succeeds, the end_cow function will now scan the relevant
      range of the CoW fork for real extents and remap only the real extents
      into the data fork:
      
      D: --RRRRRRRRSRRRRRRRR---
      U: ------UU--UUU---------
      
      This ensures that we never obliterate valid data fork extents with
      unwritten blocks from the CoW fork.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e02f0ff2
    • Darrick J. Wong's avatar
      xfs: allow unwritten extents in the CoW fork · 8370826f
      Darrick J. Wong authored
      commit 05a630d7 upstream.
      
      In the data fork, we only allow extents to perform the following state
      transitions:
      
      delay -> real <-> unwritten
      
      There's no way to move directly from a delalloc reservation to an
      /unwritten/ allocated extent.  However, for the CoW fork we want to be
      able to do the following to each extent:
      
      delalloc -> unwritten -> written -> remapped to data fork
      
      This will help us to avoid a race in the speculative CoW preallocation
      code between a first thread that is allocating a CoW extent and a second
      thread that is remapping part of a file after a write.  In order to do
      this, however, we need two things: first, we have to be able to
      transition from da to unwritten, and second the function that converts
      between real and unwritten has to be made aware of the cow fork.  Do
      both of those things.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8370826f
    • Darrick J. Wong's avatar
      xfs: verify free block header fields · 3d2bd2fd
      Darrick J. Wong authored
      commit de14c5f5 upstream.
      
      Perform basic sanity checking of the directory free block header
      fields so that we avoid hanging the system on invalid data.
      
      (Granted that just means that now we shutdown on directory write,
      but that seems better than hanging...)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d2bd2fd
    • Darrick J. Wong's avatar
      xfs: check for obviously bad level values in the bmbt root · 4056a74a
      Darrick J. Wong authored
      commit b3bf607d upstream.
      
      We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
      no such thing as a zero-level bmbt (for that we have extents format),
      so if we see this, send back an error code.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4056a74a
    • Darrick J. Wong's avatar
      xfs: filter out obviously bad btree pointers · efab3ae2
      Darrick J. Wong authored
      commit d5a91bae upstream.
      
      Don't let anybody load an obviously bad btree pointer.  Since the values
      come from disk, we must return an error, not just ASSERT.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efab3ae2
    • Darrick J. Wong's avatar
      xfs: fail _dir_open when readahead fails · 7e2dd1fb
      Darrick J. Wong authored
      commit 7a652bbe upstream.
      
      When we open a directory, we try to readahead block 0 of the directory
      on the assumption that we're going to need it soon.  If the bmbt is
      corrupt, the directory will never be usable and the readahead fails
      immediately, so we might as well prevent the directory from being opened
      at all.  This prevents a subsequent read or modify operation from
      hitting it and taking the fs offline.
      
      NOTE: We're only checking for early failures in the block mapping, not
      the readahead directory block itself.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e2dd1fb
    • Darrick J. Wong's avatar
      xfs: fix toctou race when locking an inode to access the data map · 0a6844ab
      Darrick J. Wong authored
      commit 4b5bd5bf upstream.
      
      We use di_format and if_flags to decide whether we're grabbing the ilock
      in btree mode (btree extents not loaded) or shared mode (anything else),
      but the state of those fields can be changed by other threads that are
      also trying to load the btree extents -- IFEXTENTS gets set before the
      _bmap_read_extents call and cleared if it fails.
      
      We don't actually need to have IFEXTENTS set until after the bmbt
      records are successfully loaded and validated, which will fix the race
      between multiple threads trying to read the same directory.  The next
      patch strengthens directory bmbt validation by refusing to open the
      directory if reading the bmbt to start directory readahead fails.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a6844ab
    • Brian Foster's avatar
      xfs: fix eofblocks race with file extending async dio writes · 4127a5d9
      Brian Foster authored
      commit e4229d6b upstream.
      
      It's possible for post-eof blocks to end up being used for direct I/O
      writes. dio write performs an upfront unwritten extent allocation, sends
      the dio and then updates the inode size (if necessary) on write
      completion. If a file release occurs while a file extending dio write is
      in flight, it is possible to mistake the post-eof blocks for speculative
      preallocation and incorrectly truncate them from the inode. This means
      that the resulting dio write completion can discover a hole and allocate
      new blocks rather than perform unwritten extent conversion.
      
      This requires a strange mix of I/O and is thus not likely to reproduce
      in real world workloads. It is intermittently reproduced by generic/299.
      The error manifests as an assert failure due to transaction overrun
      because the aforementioned write completion transaction has only
      reserved enough blocks for btree operations:
      
        XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \
         file: fs/xfs//xfs_trans.c, line: 309
      
      The root cause is that xfs_free_eofblocks() uses i_size to truncate
      post-eof blocks from the inode, but async, file extending direct writes
      do not update i_size until write completion, long after inode locks are
      dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
      to the incorrect size.
      
      Update xfs_free_eofblocks() to serialize against dio similar to how
      extending writes are serialized against i_size updates before post-eof
      block zeroing. Specifically, wait on dio while under the iolock. This
      ensures that dio write completions have updated i_size before post-eof
      blocks are processed.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4127a5d9
    • Brian Foster's avatar
      xfs: sync eofblocks scans under iolock are livelock prone · 4d725d74
      Brian Foster authored
      commit c3155097 upstream.
      
      The xfs_eofblocks.eof_scan_owner field is an internal field to
      facilitate invoking eofb scans from the kernel while under the iolock.
      This is necessary because the eofb scan acquires the iolock of each
      inode. Synchronous scans are invoked on certain buffered write failures
      while under iolock. In such cases, the scan owner indicates that the
      context for the scan already owns the particular iolock and prevents a
      double lock deadlock.
      
      eofblocks scans while under iolock are still livelock prone in the event
      of multiple parallel scans, however. If multiple buffered writes to
      different inodes fail and invoke eofblocks scans at the same time, each
      scan avoids a deadlock with its own inode by virtue of the
      eof_scan_owner field, but will never be able to acquire the iolock of
      the inode from the parallel scan. Because the low free space scans are
      invoked with SYNC_WAIT, the scan will not return until it has processed
      every tagged inode and thus both scans will spin indefinitely on the
      iolock being held across the opposite scan. This problem can be
      reproduced reliably by generic/224 on systems with higher cpu counts
      (x16).
      
      To avoid this problem, simplify the semantics of eofblocks scans to
      never invoke a scan while under iolock. This means that the buffered
      write context must drop the iolock before the scan. It must reacquire
      the lock before the write retry and also repeat the initial write
      checks, as the original state might no longer be valid once the iolock
      was dropped.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      4d725d74
    • Brian Foster's avatar
      xfs: pull up iolock from xfs_free_eofblocks() · 798b1dc5
      Brian Foster authored
      commit a36b9261 upstream.
      
      xfs_free_eofblocks() requires the IOLOCK_EXCL lock, but is called from
      different contexts where the lock may or may not be held. The
      need_iolock parameter exists for this reason, to indicate whether
      xfs_free_eofblocks() must acquire the iolock itself before it can
      proceed.
      
      This is ugly and confusing. Simplify the semantics of
      xfs_free_eofblocks() to require the caller to acquire the iolock
      appropriately and kill the need_iolock parameter. While here, the mp
      param can be removed as well as the xfs_mount is accessible from the
      xfs_inode structure. This patch does not change behavior.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      798b1dc5
    • Christoph Hellwig's avatar
      xfs: use per-AG reservations for the finobt · 08a2a268
      Christoph Hellwig authored
      commit 76d771b4 upstream.
      
      Currently we try to rely on the global reserved block pool for block
      allocations for the free inode btree, but I have customer reports
      (fairly complex workload, need to find an easier reproducer) where that
      is not enough as the AG where we free an inode that requires a new
      finobt block is entirely full.  This causes us to cancel a dirty
      transaction and thus a file system shutdown.
      
      I think the right way to guard against this is to treat the finot the same
      way as the refcount btree and have a per-AG reservations for the possible
      worst case size of it, and the patch below implements that.
      
      Note that this could increase mount times with large finobt trees.  In
      an ideal world we would have added a field for the number of finobt
      fields to the AGI, similar to what we did for the refcount blocks.
      We should do add it next time we rev the AGI or AGF format by adding
      new fields.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08a2a268
    • Christoph Hellwig's avatar
      xfs: only update mount/resv fields on success in __xfs_ag_resv_init · 9be1c33d
      Christoph Hellwig authored
      commit 4dfa2b84 upstream.
      
      Try to reserve the blocks first and only then update the fields in
      or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
      again after a previous failure.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9be1c33d
    • Ross Lagerwall's avatar
      xen/setup: Don't relocate p2m over existing one · 8b08aec6
      Ross Lagerwall authored
      commit 7ecec850 upstream.
      
      When relocating the p2m, take special care not to relocate it so
      that is overlaps with the current location of the p2m/initrd. This is
      needed since the full extent of the current location is not marked as a
      reserved region in the e820.
      
      This was seen to happen to a dom0 with a large initial p2m and a small
      reserved region in the middle of the initial p2m.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b08aec6
    • Ilya Dryomov's avatar
      libceph: force GFP_NOIO for socket allocations · 86015377
      Ilya Dryomov authored
      commit 633ee407 upstream.
      
      sock_alloc_inode() allocates socket+inode and socket_wq with
      GFP_KERNEL, which is not allowed on the writeback path:
      
          Workqueue: ceph-msgr con_work [libceph]
          ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
          0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
          ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
          Call Trace:
          [<ffffffff816dd629>] schedule+0x29/0x70
          [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
          [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
          [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
          [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
          [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
          [<ffffffff81086335>] flush_work+0x165/0x250
          [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
          [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
          [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
          [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
          [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
          [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
          [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
          [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
          [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
          [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
          [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
          [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
          [<ffffffff8115af70>] shrink_slab+0x100/0x140
          [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
          [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
          [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
          [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
          [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
          [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
          [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
          [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
          [<ffffffff811d8566>] alloc_inode+0x26/0xa0
          [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
          [<ffffffff815b933e>] sock_alloc+0x1e/0x80
          [<ffffffff815ba855>] __sock_create+0x95/0x220
          [<ffffffff815baa04>] sock_create_kern+0x24/0x30
          [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
          [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
          [<ffffffff81084c19>] process_one_work+0x159/0x4f0
          [<ffffffff8108561b>] worker_thread+0x11b/0x530
          [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
          [<ffffffff8108b6f9>] kthread+0xc9/0xe0
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
          [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
      
      Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.
      
      Link: http://tracker.ceph.com/issues/19309Reported-by: default avatarSergey Jerusalimov <wintchester@gmail.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86015377
  2. 31 Mar, 2017 17 commits
  3. 30 Mar, 2017 9 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.19 · c8e13160
      Greg Kroah-Hartman authored
      c8e13160
    • Jiri Slaby's avatar
      crypto: algif_hash - avoid zero-sized array · bc959a40
      Jiri Slaby authored
      commit 62071194 upstream.
      
      With this reproducer:
        struct sockaddr_alg alg = {
                .salg_family = 0x26,
                .salg_type = "hash",
                .salg_feat = 0xf,
                .salg_mask = 0x5,
                .salg_name = "digest_null",
        };
        int sock, sock2;
      
        sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(sock, (struct sockaddr *)&alg, sizeof(alg));
        sock2 = accept(sock, NULL, NULL);
        setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
        accept(sock2, NULL, NULL);
      
      ==== 8< ======== 8< ======== 8< ======== 8< ====
      
      one can immediatelly see an UBSAN warning:
      UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
      variable length array bound value 0 <= 0
      CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
      ...
      Call Trace:
      ...
       [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
       [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
       [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
       [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
       [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
       [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40
      
      It is a correct warning, as hash state is propagated to accept as zero,
      but creating a zero-length variable array is not allowed in C.
      
      Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
      similar happens in the code there, so we just allocate one byte even
      though we do not use the array.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc959a40
    • Takashi Iwai's avatar
      fbcon: Fix vc attr at deinit · 3fd37725
      Takashi Iwai authored
      commit 8aac7f34 upstream.
      
      fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
      the vc attrs dynamically when vc_hi_font_mask is changed at
      fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
      the existing console buffer with one bit shift up (for 9 bits), while
      it remaps with one bit shift down (for 8 bits) when the value is
      cleared.  It works fine as long as the font gets updated after fbcon
      was initialized.
      
      However, we hit a bizarre problem when the console is switched to
      another fb driver (typically from vesafb or efifb to drmfb).  At
      switching to the new fb driver, we temporarily rebind the console to
      the dummy console, then rebind to the new driver.  During the
      switching, we leave the modified attrs as is.  Thus, the new fbcon
      takes over the old buffer as if it were to contain 8 bits chars
      (although the attrs are still shifted for 9 bits), and effectively
      this results in the yellow color texts instead of the original white
      color, as found in the bugzilla entry below.
      
      An easy fix for this is to re-adjust the attrs before leaving the
      fbcon at con_deinit callback.  Since the code to adjust the attrs is
      already present in the current fbcon code, in this patch, we simply
      factor out the relevant code, and call it from fbcon_deinit().
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fd37725
    • Daniel Vetter's avatar
      drm: reference count event->completion · c75fe789
      Daniel Vetter authored
      commit 24835e44 upstream.
      
      When writing the generic nonblocking commit code I assumed that
      through clever lifetime management I can assure that the completion
      (stored in drm_crtc_commit) only gets freed after it is completed. And
      that worked.
      
      I also wanted to make nonblocking helpers resilient against driver
      bugs, by having timeouts everywhere. And that worked too.
      
      Unfortunately taking boths things together results in oopses :( Well,
      at least sometimes: What seems to happen is that the drm event hangs
      around forever stuck in limbo land. The nonblocking helpers eventually
      time out, move on and release it. Now the bug I tested all this
      against is drivers that just entirely fail to deliver the vblank
      events like they should, and in those cases the event is simply
      leaked. But what seems to happen, at least sometimes, on i915 is that
      the event is set up correctly, but somohow the vblank fails to fire in
      time. Which means the event isn't leaked, it's still there waiting for
      eventually a vblank to fire. That tends to happen when re-enabling the
      pipe, and then the trap springs and the kernel oopses.
      
      The correct fix here is simply to refcount the crtc commit to make
      sure that the event sticks around even for drivers which only
      sometimes fail to deliver vblanks for some arbitrary reasons. Since
      crtc commits are already refcounted that's easy to do.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=96781
      Cc: Jim Rees <rees@umich.edu>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161221102331.31033-1-daniel.vetter@ffwll.ch
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c75fe789
    • Johannes Berg's avatar
      nl80211: fix dumpit error path RTNL deadlocks · 56769e7a
      Johannes Berg authored
      commit ea90e0dc upstream.
      
      Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out
      to be perfectly accurate - there are various error paths that miss unlock
      of the RTNL.
      
      To fix those, change the locking a bit to not be conditional in all those
      nl80211_prepare_*_dump() functions, but make those require the RTNL to
      start with, and fix the buggy error paths. This also let me use sparse
      (by appropriately overriding the rtnl_lock/rtnl_unlock functions) to
      validate the changes.
      Reported-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56769e7a
    • Marek Szyprowski's avatar
      drm/bridge: analogix dp: Fix runtime PM state on driver bind · 7b3c8b2a
      Marek Szyprowski authored
      commit f0a8b49c upstream.
      
      Analogix_dp_bind() can be called from component framework, which doesn't
      guarantee proper runtime PM state of the device during bind operation,
      so ensure that device is runtime active before doing any register access.
      This ensures that the power domain, to which DP module belongs, is turned
      on. While at it, also fix the unbalanced call to phy_power_on() in
      analogix_dp_bind() function.
      
      This patch solves the following kernel oops on Samsung Exynos5250 Snow
      board:
      
      Unhandled fault: imprecise external abort (0x406) at 0x00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: : 406 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 75 Comm: kworker/0:2 Not tainted 4.9.0 #1046
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      Workqueue: events deferred_probe_work_func
      task: ee272300 task.stack: ee312000
      PC is at analogix_dp_enable_sw_function+0x18/0x2c
      LR is at analogix_dp_init_dp+0x2c/0x50
      ...
      [<c03fcb38>] (analogix_dp_enable_sw_function) from [<c03fa9c4>] (analogix_dp_init_dp+0x2c/0x50)
      [<c03fa9c4>] (analogix_dp_init_dp) from [<c03fab6c>] (analogix_dp_bind+0x184/0x42c)
      [<c03fab6c>] (analogix_dp_bind) from [<c03fdb84>] (component_bind_all+0xf0/0x218)
      [<c03fdb84>] (component_bind_all) from [<c03ed64c>] (exynos_drm_load+0x134/0x200)
      [<c03ed64c>] (exynos_drm_load) from [<c03d5058>] (drm_dev_register+0xa0/0xd0)
      [<c03d5058>] (drm_dev_register) from [<c03d66b8>] (drm_platform_init+0x58/0xb0)
      [<c03d66b8>] (drm_platform_init) from [<c03fe0c4>] (try_to_bring_up_master+0x14c/0x188)
      [<c03fe0c4>] (try_to_bring_up_master) from [<c03fe188>] (component_add+0x88/0x138)
      [<c03fe188>] (component_add) from [<c0403a38>] (platform_drv_probe+0x50/0xb0)
      [<c0403a38>] (platform_drv_probe) from [<c0402470>] (driver_probe_device+0x1f0/0x2a8)
      [<c0402470>] (driver_probe_device) from [<c0400a54>] (bus_for_each_drv+0x44/0x8c)
      [<c0400a54>] (bus_for_each_drv) from [<c04021f8>] (__device_attach+0x9c/0x100)
      [<c04021f8>] (__device_attach) from [<c04018e8>] (bus_probe_device+0x84/0x8c)
      [<c04018e8>] (bus_probe_device) from [<c0401d1c>] (deferred_probe_work_func+0x60/0x8c)
      [<c0401d1c>] (deferred_probe_work_func) from [<c012fc14>] (process_one_work+0x120/0x318)
      [<c012fc14>] (process_one_work) from [<c012fe34>] (process_scheduled_works+0x28/0x38)
      [<c012fe34>] (process_scheduled_works) from [<c0130048>] (worker_thread+0x204/0x4ac)
      [<c0130048>] (worker_thread) from [<c01352c4>] (kthread+0xd8/0xf4)
      [<c01352c4>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
      Code: e59035f0 e5935018 f57ff04f e3c55001 (f57ff04e)
      ---[ end trace 3d1d0d87796de344 ]---
      Reviewed-by: default avatarSean Paul <seanpaul@chromium.org>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarArchit Taneja <architt@codeaurora.org>
      Link: http://patchwork.freedesktop.org/patch/msgid/1483091866-1088-1-git-send-email-m.szyprowski@samsung.com
      Cc: Javier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b3c8b2a
    • Dave Jiang's avatar
      device-dax: fix pmd/pte fault fallback handling · eae72468
      Dave Jiang authored
      commit 0134ed4f upstream.
      
      Jeff Moyer reports:
      
          With a device dax alignment of 4KB or 2MB, I get sigbus when running
          the attached fio job file for the current kernel (4.11.0-rc1+).  If
          I specify an alignment of 1GB, it works.
      
          I turned on debug output, and saw that it was failing in the huge
          fault code.
      
           dax dax1.0: dax_open
           dax dax1.0: dax_mmap
           dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
           dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
           dax dax1.0: dax_release
      
          fio config for reproduce:
          [global]
          ioengine=dev-dax
          direct=0
          filename=/dev/dax0.0
          bs=2m
      
          [write]
          rw=write
      
          [read]
          stonewall
          rw=read
      
      The driver fails to fallback when taking a fault that is larger than
      the device alignment, or handling a larger fault when a smaller
      mapping is already established. While we could support larger
      mappings for a device with a smaller alignment, that change is
      too large for the immediate fix. The simplest change is to force
      fallback until the fault size matches the alignment.
      
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eae72468
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · 81ec3dc1
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81ec3dc1
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Don't leak memory when a channel is rescinded · df1fe6c9
      K. Y. Srinivasan authored
      commit 5e030d5c upstream.
      
      When we close a channel that has been rescinded, we will leak memory since
      vmbus_teardown_gpadl() returns an error. Fix this so that we can properly
      cleanup the memory allocated to the ring buffers.
      
      Fixes: ccb61f8a ("Drivers: hv: vmbus: Fix a rescind handling bug")
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df1fe6c9