1. 03 Nov, 2021 1 commit
    • Yorick Peterse's avatar
      Fix MR commits with missing committers/authors · 9b553e50
      Yorick Peterse authored
      In MR https://gitlab.com/gitlab-org/gitlab/-/merge_requests/63669 we
      introduced a new data format for storing merge request diff commit
      authors and committers. As part of this work we made changes to the
      import/export code to support this new format, and added a set of
      migrations to migrate existing data to this new format. At this time we
      supported reading and writing of data in both the old and new format,
      allowing us to gradually migrate data over to the new format.
      
      In https://gitlab.com/gitlab-org/gitlab/-/merge_requests/72219 we
      ensured all migrations are done, stopped using the old data format, and
      removed the columns storing this data.
      
      Unfortunately, this chain of events uncovered a bug in our import/export
      logic. Consider the following timeline of events:
      
      1. You export project "Cooking Recipes" from a GitLab instance running a
         version earlier than 14.1 (e.g. 14.0).
      2. The instance you intend to import this project into is running 14.1
         or newer. Existing data has been fully migrated already.
      3. You import the project into this new instance.
      
      At this point, the imported data is using the old format, not the
      format. This is because we forgot to take into account users importing
      exports using GitLab 14.0 or older, instead only covering exports
      generated using GitLab 14.1 or newer. Because the background migrations
      finished, or the data imported would fall in a "bucket" (= a chunk or
      rows to migrate) that had already been migrated, the data would never be
      updated to the new format.
      
      In this commit we resolve this problem in two steps. First, we change
      the import/export logic to support importing data in both the old and
      new format. Exports still use the new format. In addition, we include a
      background migration that processes all projects created using a GitLab
      import/export since the first mentioned merge request was introduced.
      For each such project we scan over the merge request diff commits and
      fix any that are missing the commit author or committer details.
      
      For small self-hosted instances this process is unlikely to take more
      than a few minutes. On GitLab.com however we expect this process to take
      a few days, as we have to process around 200 000 projects imported since
      July. This means we'll likely need additional manual intervention
      similar to the manual work needed for
      https://gitlab.com/gitlab-org/gitlab/-/issues/334394.
      
      See https://gitlab.com/gitlab-org/gitlab/-/issues/344080 for additional
      details.
      
      Changelog: fixed
      9b553e50
  2. 02 Nov, 2021 39 commits