1. 11 Jun, 2015 1 commit
    • Joe Thornber's avatar
      dm cache: add stochastic-multi-queue (smq) policy · 66a63635
      Joe Thornber authored
      The stochastic-multi-queue (smq) policy addresses some of the problems
      with the current multiqueue (mq) policy.
      
      Memory usage
      ------------
      
      The mq policy uses a lot of memory; 88 bytes per cache block on a 64
      bit machine.
      
      SMQ uses 28bit indexes to implement it's data structures rather than
      pointers.  It avoids storing an explicit hit count for each block.  It
      has a 'hotspot' queue rather than a pre cache which uses a quarter of
      the entries (each hotspot block covers a larger area than a single
      cache block).
      
      All these mean smq uses ~25bytes per cache block.  Still a lot of
      memory, but a substantial improvement nontheless.
      
      Level balancing
      ---------------
      
      MQ places entries in different levels of the multiqueue structures
      based on their hit count (~ln(hit count)).  This means the bottom
      levels generally have the most entries, and the top ones have very
      few.  Having unbalanced levels like this reduces the efficacy of the
      multiqueue.
      
      SMQ does not maintain a hit count, instead it swaps hit entries with
      the least recently used entry from the level above.  The over all
      ordering being a side effect of this stochastic process.  With this
      scheme we can decide how many entries occupy each multiqueue level,
      resulting in better promotion/demotion decisions.
      
      Adaptability
      ------------
      
      The MQ policy maintains a hit count for each cache block.  For a
      different block to get promoted to the cache it's hit count has to
      exceed the lowest currently in the cache.  This means it can take a
      long time for the cache to adapt between varying IO patterns.
      Periodically degrading the hit counts could help with this, but I
      haven't found a nice general solution.
      
      SMQ doesn't maintain hit counts, so a lot of this problem just goes
      away.  In addition it tracks performance of the hotspot queue, which
      is used to decide which blocks to promote.  If the hotspot queue is
      performing badly then it starts moving entries more quickly between
      levels.  This lets it adapt to new IO patterns very quickly.
      
      Performance
      -----------
      
      In my tests SMQ shows substantially better performance than MQ.  Once
      this matures a bit more I'm sure it'll become the default policy.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      66a63635
  2. 29 May, 2015 25 commits
  3. 27 May, 2015 3 commits
    • Mike Snitzer's avatar
      dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY · 45714fbe
      Mike Snitzer authored
      Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
      DM blk-mq device's .queue_rq.  This cleans up the previous convoluted
      handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
      (even though it wasn't) and then run blk_mq_requeue_request() followed
      by blk_mq_kick_requeue_list().
      
      Also, document that DM blk-mq ontop of old request_fn devices cannot
      fail in clone_rq() since the clone request is preallocated as part of
      the pdu.
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      45714fbe
    • Mike Snitzer's avatar
      dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d
      Mike Snitzer authored
      Otherwise kmemleak reported:
      
      unreferenced object 0xffff88009b14e2b0 (size 16):
        comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
        hex dump (first 16 bytes):
          40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
        backtrace:
          [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
          [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
          [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
          [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
          [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
          [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
          [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
          [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
          [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
          [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
          [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
          [<ffffffff812c0ff2>] submit_bio+0x72/0x150
          [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
          [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
          [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      4c6dd53d
    • Junichi Nomura's avatar
      dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED · 3a140755
      Junichi Nomura authored
      When stacking request-based DM on blk_mq device, request cloning and
      remapping are done in a single call to target's clone_and_map_rq().
      The clone is allocated and valid only if clone_and_map_rq() returns
      DM_MAPIO_REMAPPED.
      
      The "IS_ERR(clone)" check in map_request() does not cover all the
      !DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
      are not ready or unavailable, clone_and_map_rq() may return
      DM_MAPIO_REQUEUE without ever having established an ERR_PTR).  Fix this
      by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
      map_request().
      
      Without this fix, DM core may call setup_clone() for a NULL clone
      and oops like this:
      
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
         IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
         ...
         CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
         ...
         Call Trace:
          [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
          [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
          [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
          [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
          [<ffffffff81071fdd>] kthread+0xe6/0xee
          [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
          [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
          [<ffffffff814c2d98>] ret_from_fork+0x58/0x90
          [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      3a140755
  4. 26 May, 2015 2 commits
  5. 25 May, 2015 1 commit
  6. 24 May, 2015 3 commits
  7. 23 May, 2015 3 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 086e8ddb
      Linus Torvalds authored
      Pull two Ceph fixes from Sage Weil:
       "These fix an issue with the RBD notifications when there are topology
        changes in the cluster"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
        libceph: request a new osdmap if lingering request maps to no osd
      086e8ddb
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 7ce14f6f
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "I fixed up a regression from 4.0 where conversion between different
        raid levels would sometimes bail out without converting.
      
        Filipe tracked down a race where it was possible to double allocate
        chunks on the drive.
      
        Mark has a fix for fiemap.  All three will get bundled off for stable
        as well"
      
      * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix regression in raid level conversion
        Btrfs: fix racy system chunk allocation when setting block group ro
        btrfs: clear 'ret' in btrfs_check_shared() loop
      7ce14f6f
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · cf539cbd
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Radeon has two displayport fixes, one for a regression.
      
        i915 regression flicker fix needed so 4.0 can get fixed.
      
        A bunch of msm fixes and a bunch of exynos fixes, these two are
        probably a bit larger than I'd like, but most of them seems pretty
        good"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
        drm/radeon: fix error flag checking in native aux path
        drm/radeon: retry dcpd fetch
        drm/msm/mdp5: fix incorrect parameter for msm_framebuffer_iova()
        drm/exynos: dp: Lower level of EDID read success message
        drm/exynos: cleanup exynos_drm_plane
        drm/exynos: 'win' is always unsigned
        drm/exynos: mixer: don't dump registers under spinlock
        drm/exynos: Consolidate return statements in fimd_bind()
        drm/exynos: Constify exynos_drm_crtc_ops
        drm/exynos: Fix build breakage on !DRM_EXYNOS_FIMD
        drm/exynos: mixer: Constify platform_device_id
        drm/exynos: mixer: cleanup pixelformat handling
        drm/exynos: mixer: also allow NV21 for the video processor
        drm/exynos: mixer: remove buffer count handling in vp_video_buffer()
        drm/exynos: plane: honor buffer offset for dma_addr
        drm/exynos: fb: use drm_format_num_planes to get buffer count
        drm/i915: fix screen flickering
        drm/msm: fix locking inconsistencies in gpu->destroy()
        drm/msm/dsi: Simplify the code to get the number of read byte
        drm/msm: Attach assigned encoder to eDP and DSI connectors
        ...
      cf539cbd
  8. 22 May, 2015 2 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0b6280c6
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.
      
       2) Fix neighbour table entry leak in rocker driver, from Ying Xue.
      
       3) Do not emit bonding notifications for unregistered interfaces, from
          Nicolas Dichtel.
      
       4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
          Fourcot.
      
       5) Fix regression in ipv6 multicast filter test, from Henning Rogge.
      
       6) do_replace() in various footables netfilter modules is missing a
          check for 0 counters in the datastructure provided by the user.  Fix
          from Dave Jones, and found with trinity.
      
       7) Fix RCU bug in packet scheduler classifier module unloads, from
          Daniel Borkmann.
      
       8) Avoid deadlock in tcp_get_info() by using u64_sync.  From Eric
          Dumzaet.
      
       9) Input packet processing can race with inetdev_destroy() teardown,
          fix potential OOPS in ip_error() by explicitly testing whether the
          inetdev is still attached.  From Eric W Biederman.
      
      10) MLDv2 parser in bridge multicast code breaks too early while
          parsing.  Fix from Thadeu Lima de Souza Cascardo.
      
      11) Asking for settings on non-zero PHYID doesn't work because we do not
          import the command structure from the user and use the PHYID
          provided there.  Fix from Arun Parameswaran.
      
      12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.
      
      13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
          fix from Florian Westphal.
      
      14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
          from Jesper Dangaard Brouer.
      
      15) If netlink autobind retry fails, we have to reset the sockets portid
          back to zero.  From Herbert Xu.
      
      16) VXLAN netns exit code unregisters using wrong device, from John W
          Linville.
      
      17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
          Dmitry Tunin and Wen-chien Jesse Sung.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        bridge: fix lockdep splat
        net: core: 'ethtool' issue with querying phy settings
        bridge: fix parsing of MLDv2 reports
        ARM: zynq: DT: Use the zynq binding with macb
        net: macb: Disable half duplex gigabit on Zynq
        net: macb: Document zynq gem dt binding
        ipv4: fill in table id when replacing a route
        cdc_ncm: Fix tx_bytes statistics
        ipv4: Avoid crashing in ip_error
        tcp: fix a potential deadlock in tcp_get_info()
        net: sched: fix call_rcu() race on classifier module unloads
        net: phy: Make sure phy_start() always re-enables the phy interrupts
        ipv6: fix ECMP route replacement
        ipv6: do not delete previously existing ECMP routes if add fails
        Revert "netfilter: bridge: query conntrack about skb dnat"
        netfilter: ensure number of counters is >0 in do_replace()
        netfilter: nfnetlink_{log,queue}: Register pernet in first place
        tcp: don't over-send F-RTO probes
        tcp: only undo on partial ACKs in CA_Loss
        net/ipv6/udp: Fix ipv6 multicast socket filter regression
        ...
      0b6280c6
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 1c8df7bd
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Three small fixes that have been picked up the last few weeks.
        Specifically:
      
         - Fix a memory corruption issue in NVMe with malignant user
           constructed request.  From Christoph.
      
         - Kill (now) unused blk_queue_bio(), dm was changed to not need this
           anymore.  From Mike Snitzer.
      
         - Always use blk_schedule_flush_plug() from the io_schedule() path
           when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
           Shaohua"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        sched: always use blk_schedule_flush_plug in io_schedule_out
        nvme: fix kernel memory corruption with short INQUIRY buffers
        block: remove export for blk_queue_bio
      1c8df7bd