1. 14 Apr, 2023 2 commits
    • Li Nan's avatar
      md/raid10: fix null-ptr-deref in raid10_sync_request · a405c6f0
      Li Nan authored
      init_resync() inits mempool and sets conf->have_replacemnt at the beginning
      of sync, close_sync() frees the mempool when sync is completed.
      
      After [1] recovery might be skipped and init_resync() is called but
      close_sync() is not. null-ptr-deref occurs with r10bio->dev[i].repl_bio.
      
      The following is one way to reproduce the issue.
      
        1) create a array, wait for resync to complete, mddev->recovery_cp is set
           to MaxSector.
        2) recovery is woken and it is skipped. conf->have_replacement is set to
           0 in init_resync(). close_sync() not called.
        3) some io errors and rdev A is set to WantReplacement.
        4) a new device is added and set to A's replacement.
        5) recovery is woken, A have replacement, but conf->have_replacemnt is
           0. r10bio->dev[i].repl_bio will not be alloced and null-ptr-deref
           occurs.
      
      Fix it by not calling init_resync() if recovery skipped.
      
      [1] commit 7e83ccbe ("md/raid10: Allow skipping recovery when clean arrays are assembled")
      Fixes: 7e83ccbe ("md/raid10: Allow skipping recovery when clean arrays are assembled")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230222041000.3341651-3-linan666@huaweicloud.com
      a405c6f0
    • Li Nan's avatar
      md/raid10: fix task hung in raid10d · 72c215ed
      Li Nan authored
      commit fe630de0 ("md/raid10: avoid deadlock on recovery.") allowed
      normal io and sync io to exist at the same time. Task hung will occur as
      below:
      
      T1                      T2		T3		T4
      raid10d
       handle_read_error
        allow_barrier
         conf->nr_pending--
          -> 0
                              //submit sync io
                              raid10_sync_request
                               raise_barrier
      			  ->will not be blocked
      			  ...
      			//submit to drivers
        raid10_read_request
         wait_barrier
          conf->nr_pending++
           -> 1
      					//retry read fail
      					raid10_end_read_request
      					 reschedule_retry
      					  add to retry_list
      					  conf->nr_queued++
      					   -> 1
      							//sync io fail
      							end_sync_read
      							 __end_sync_read
      							  reschedule_retry
      							   add to retry_list
      					                    conf->nr_queued++
      							     -> 2
       ...
       handle_read_error
       get form retry_list
       conf->nr_queued--
        freeze_array
         wait nr_pending == nr_queued+1
              ->1	      ->2
         //task hung
      
      retry read and sync io will be added to retry_list(nr_queued->2) if they
      fails. raid10d() called handle_read_error() and hung in freeze_array().
      nr_queued will not decrease because raid10d is blocked, nr_pending will
      not increase because conf->barrier is not released.
      
      Fix it by moving allow_barrier() after raid10_read_request().
      raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io
      and regular io will not be issued at the same time.
      
      Also remove the check of nr_queued in stop_waiting_barrier. It can be 0
      but don't need to be blocking. Remove the check for MD_RECOVERY_RUNNING as
      the check is redundent.
      
      Fixes: fe630de0 ("md/raid10: avoid deadlock on recovery.")
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230222041000.3341651-2-linan666@huaweicloud.com
      72c215ed
  2. 13 Apr, 2023 33 commits
  3. 12 Apr, 2023 5 commits