1. 24 Oct, 2015 11 commits
    • Shaohua Li's avatar
      raid5: add basic stripe log · f6bed0ef
      Shaohua Li authored
      This introduces a simple log for raid5. Data/parity writing to raid
      array first writes to the log, then write to raid array disks. If
      crash happens, we can recovery data from the log. This can speed up
      raid resync and fix write hole issue.
      
      The log structure is pretty simple. Data/meta data is stored in block
      unit, which is 4k generally. It has only one type of meta data block.
      The meta data block can track 3 types of data, stripe data, stripe
      parity and flush block. MD superblock will point to the last valid
      meta data block. Each meta data block has checksum/seq number, so
      recovery can scan the log correctly. We store a checksum of stripe
      data/parity to the metadata block, so meta data and stripe data/parity
      can be written to log disk together. otherwise, meta data write must
      wait till stripe data/parity is finished.
      
      For stripe data, meta data block will record stripe data sector and
      size. Currently the size is always 4k. This meta data record can be made
      simpler if we just fix write hole (eg, we can record data of a stripe's
      different disks together), but this format can be extended to support
      caching in the future, which must record data address/size.
      
      For stripe parity, meta data block will record stripe sector. It's
      size should be 4k (for raid5) or 8k (for raid6). We always store p
      parity first. This format should work for caching too.
      
      flush block indicates a stripe is in raid array disks. Fixing write
      hole doesn't need this type of meta data, it's for caching extension.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      f6bed0ef
    • Shaohua Li's avatar
      raid5: add a new state for stripe log handling · b70abcb2
      Shaohua Li authored
      When a stripe finishes construction, we write the stripe to raid in
      ops_run_io normally. With log, we do a bunch of other operations before
      the stripe is written to raid. Mainly write the stripe to log disk,
      flush disk cache and so on. The operations are still driven by raid5d
      and run in the stripe state machine. We introduce a new state for such
      stripe (trapped into log). The stripe is in this state from the time it
      first enters ops_run_io (finish construction) to the time it is written
      to raid. Since we know the state is only for log, we bypass other
      check/operation in handle_stripe.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      b70abcb2
    • Shaohua Li's avatar
      raid5: export some functions · 6d036f7d
      Shaohua Li authored
      Next several patches use some raid5 functions, rename them with raid5
      prefix and export out.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      6d036f7d
    • Shaohua Li's avatar
      md: override md superblock recovery_offset for journal device · 3069aa8d
      Shaohua Li authored
      Journal device stores data in a log structure. We need record the log
      start. Here we override md superblock recovery_offset for this purpose.
      This field of a journal device is meaningless otherwise.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      3069aa8d
    • Song Liu's avatar
      MD: add a new disk role to present write journal device · bac624f3
      Song Liu authored
      Next patches will use a disk as raid5/6 journaling. We need a new disk
      role to present the journal device and add MD_FEATURE_JOURNAL to
      feature_map for backward compability.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      bac624f3
    • Song Liu's avatar
      MD: replace special disk roles with macros · c4d4c91b
      Song Liu authored
      Add the following two macros for special roles: spare and faulty
      
      MD_DISK_ROLE_SPARE	0xffff
      MD_DISK_ROLE_FAULTY	0xfffe
      
      Add MD_DISK_ROLE_MAX	0xff00 as the maximal possible regular role,
      and minimal value of special role.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      c4d4c91b
    • Goldwyn Rodrigues's avatar
      md-cluster: Call update_raid_disks() if another node --grow's raid_disks · 28c1b9fd
      Goldwyn Rodrigues authored
      To incorporate --grow feature executed on one node, other nodes need to
      acknowledge the change in number of disks. Call update_raid_disks()
      to update internal data structures.
      
      This leads to call check_reshape() -> md_allow_write() -> md_update_sb(),
      this results in a deadlock. This is done so it can safely allocate memory
      (which might trigger writeback which might write to raid1). This is
      not required for md with a bitmap.
      
      In the clustered case, we don't perform md_update_sb() in md_allow_write(),
      but in do_md_run(). Also we disable safemode for clustered mode.
      
      mddev->recovery_cp need not be set in check_sb_changes() because this
      is required only when a node reads another node's bitmap. mddev->recovery_cp
      (which is read from sb->resync_offset), is set only if mddev is in_sync.
      Since we disabled safemode, in_sync is set to zero.
      In a clustered environment, the MD may not be in sync because another
      node could be writing to it. So make sure that in_sync is not set in
      case of clustered node in __md_stop_writes().
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      28c1b9fd
    • NeilBrown's avatar
      md-cluster: remove mddev arg from add_resync_info() · 30661b49
      NeilBrown authored
      The arg isn't used, so its presence is only confusing.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      30661b49
    • NeilBrown's avatar
      md-cluster: don't cast void pointers when assigning them. · 2e2a7cd9
      NeilBrown authored
      It is common practice in the kernel to leave out this case.
      It isn't needed and adds little if any value.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      2e2a7cd9
    • NeilBrown's avatar
      md-cluster: discard unused sb_mutex. · 82381523
      NeilBrown authored
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      82381523
    • Guoqing Jiang's avatar
      md-cluster: Fix warnings when build with CF=-D__CHECK_ENDIAN__ · cf97a348
      Guoqing Jiang authored
      This patches fixes sparse warnings like incorrect type in assignment
      (different base types), cast to restricted __le64.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      cf97a348
  2. 16 Oct, 2015 1 commit
  3. 13 Oct, 2015 1 commit
    • NeilBrown's avatar
      Merge branch 'md-next' of git://github.com/goldwynr/linux into for-next · c2a06c38
      NeilBrown authored
      md-cluster: A better way for METADATA_UPDATED processing
      
      The processing of METADATA_UPDATED message is too simple and prone to
      errors. Besides, it would not update the internal data structures as
      required.
      
      This set of patches reads the superblock from one of the device of the MD
      and checks for changes in the in-memory data structures. If there is a change,
      it performs the necessary actions to keep the internal data structures
      as it would be in the primary node.
      
      An example is if a devices turns faulty. The algorithm is:
      
      1. The initiator node marks the device as faulty and updates the superblock
      2. The initiator node sends METADATA_UPDATED with an advisory  device number to the rest of the nodes.
      3. The receiving node on receiving the METADATA_UPDATED message
        3.1 Reads the superblock
        3.2 Detects a device has failed by comparing with memory structure
        3.3 Calls the necessary functions to record the failure and get the device out of the active array.
        3.4 Acknowledges the message.
      
      The patch series also fixes adding the disk which was impacted because of
      the changes.
      
      Patches can also be found at
      https://github.com/goldwynr/linux branch md-next
      
      Changes since V2:
       - Fix status synchrnoization after --add and --re-add operations
       - Included Guoqing's patches on endian correctness, zeroing cmsg etc
       - Restructure add_new_disk() and cancel()
      c2a06c38
  4. 12 Oct, 2015 18 commits
  5. 11 Oct, 2015 8 commits
  6. 10 Oct, 2015 1 commit
    • Linus Torvalds's avatar
      Merge tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 4a06c8ac
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB and PHY fixes and quirk updates for 4.3-rc5.
      
        Nothing major here, full details in the shortlog, and all of these
        have been in linux-next for a while"
      
      * tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: Add device quirk for Logitech PTZ cameras
        USB: chaoskey read offset bug
        USB: Add reset-resume quirk for two Plantronics usb headphones.
        usb: renesas_usbhs: Add support for R-Car H3
        usb: renesas_usbhs: fix build warning if 64-bit architecture
        usb: gadget: bdc: fix memory leak
        phy: berlin-sata: Fix module autoload for OF platform driver
        phy: rockchip-usb: power down phy when rockchip phy probe
        phy: qcom-ufs: fix build error when the component is built as a module
      4a06c8ac