1. 31 Jan, 2025 1 commit
    • Kirill Smelkov's avatar
      pull, restore: Read refs of a saved repository only once; Use that reading to... · a066d5e8
      Kirill Smelkov authored
      pull, restore: Read refs of a saved repository only once; Use that reading to recreate refs on restore
      
      Since the beginning `git-backup pull` was working by saving discovered
      files as blobs and only making special care about *.git/objects/ via
      fetching data as git packs from there. This means that besides
      *.git/objects/ other files and directories inside *.git are saved as just
      regular files. For example *.git/HEAD , *.git/description and
      *.git/heads/refs/master are all saved as regular files. Which works ok
      when we know that there is no simultaneous modifications to saved
      repository, but could result in inconsistency in refs part in the presence of
      concurrent updates.
      
      For example Alain and Thomas report that restore side of lab.nexedi.com
      PBS complains sometimes as e.g.
      
      main.cmd_restore: E: extracted /srv/slapgrid/slappart43/srv/runner/instance/slappart0/var/repositories.1737918101/nexedi/erp5.git refs corrupt:
      
          want:
          9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix
          c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3
          48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency
          335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility
          d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime
          78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type
          6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change
          5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip
          8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau
          33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool
          9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report
          03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool
          ...
      
          have:
          9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix
          c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3
          48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency
          335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility
          d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime
          78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type
          6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change
          5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip
          8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau
          33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool
          9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report
          03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool
          ...
      
      which was found to result in the following difference:
      
          diff --git a/a b/b
          index fb51541cace8d..98efbacb8ed6f 100644
          --- a/a
          +++ b/b
          @@ -18997,13 +18997,13 @@ ce76cb544752122200c72b1fe0e8d22de3409e89 refs/merge-requests/1127/head
           726b07bccde0ed7f88b8ffcf9e4adc9c64624a75 refs/merge-requests/113/head
           dca6727ddb752a803764f6f2248c41b5ad98f653 refs/merge-requests/1130/head
           0ecb76f1d433f6b04bf9562313f84375769377e4 refs/merge-requests/1131/head
          -ebf766512111f25e9cadd7f00d7e730e980ecde8 refs/merge-requests/1131/merge
          +cb636a38c13e00de2400f90f17d50f6a15ddb584 refs/merge-requests/1131/merge
           f8a87dfd1a0af405bd162b5c6884081f72fb29e1 refs/merge-requests/1132/head
           da16b24621c7fd8940906a718acf2aa3ad325ba6 refs/merge-requests/1132/merge
           2d34bbeac38908a08546e8df0c912242dfbe0077 refs/merge-requests/1133/head
           9f2fa83567bd0c1491d45d5b6809998276581354 refs/merge-requests/1134/head
           850dcdc6c174487b51098e833aa4a8e868e3d3e6 refs/merge-requests/1135/head
          -959cfc12c6b3d61fe324813070803c92efea8373 refs/merge-requests/1135/merge
          +ff7515979f75b83adecaa18a72b114ce5742b906 refs/merge-requests/1135/merge
           64ec4b8e7abf11f3f1877285dc4ec04e936b75b8 refs/merge-requests/1136/head
           1f4e93c0c458739f0a97834802682a90642ddb10 refs/merge-requests/1137/head
           41fa1d0c5e95ad51bb9d1d3d5bc4b404e9f7b44b refs/merge-requests/1138/head
      
      If we check about e.g. 1131 merge request, we can see that
      ebf766512111f25e9cadd7f00d7e730e980ecde8 is a recent commit, done a few days
      before the time of backup, and that cb636a38c13e00de2400f90f17d50f6a15ddb584 is
      not there in the backup repository at all:
      
          backup-gitlab.git$ git log -1 HEAD
          commit 296fe74285068fcd1bbc7333244883b49612bef8 (HEAD -> master)
          Merge: b4cd493bd50ae 0000025c4085f 00007c41785ae ...
          Date:   Sun Jan 26 20:35:43 2025 +0060
      
          Git-backup 20250126-2005
      
          backup-gitlab.git$ git log -1 ebf766512111f25e9cadd7f00d7e730e980ecde8
          commit ebf766512111f25e9cadd7f00d7e730e980ecde8
          Merge: 569f78098281a 0ecb76f1d433f
          Author: Łukasz Nowak <luke@nexedi.com>
          Date:   Wed Jan 22 19:46:15 2025 +0100
      
              WIP: Feature/rjs gadget pagehelp
      
              See merge request nexedi/erp5!1131
      
          backup-gitlab.git$ git log -1 cb636a38c13e00de2400f90f17d50f6a15ddb584
          fatal: bad object cb636a38c13e00de2400f90f17d50f6a15ddb584
      
      So what happenned here is that GitLab, it seems, automatically creates
      refs/merge-requests/X/merge from time to time, and from time to updates
      it with a fresh merge commit, ready to be used as the merge commit,
      if/when the merge-request in question becomes merged. Then the time of
      updating such merge commit aligned with the time when `git-backup pull`
      was running, git-backup first pulled when the state of saved repository
      was not yet modifed, then GitLab updated refs/merge-requests/1131/merge
      reference, and then `git-backup pull` continued and saved content of
      repo.git/refs/merge-requests/1131/merge file in its updated state.
      However the backup.refs in the backup repository is built from the state
      of saved references observed at fetch time, and so this results in the
      inconsistency in between information saved into backup.refs and
      information saved about those refs via just files from under *.git/ .
      
      -> Fix that by reading refs state only once - when doing fetch, and
      restoring refs state from that read state without relying on per-file
      level save/restore.
      
      For the reference Git itself implements fetching with always providing
      some consistent state of fetched repository. So even though in the
      presence of simultaneous changes git-backup cannot guarantee overall
      atomicity of saved backup, what it now should be able to guarantee is
      that saved state is some set of per-individual-repository consistent
      snapshots. As before, if one wants to make a fully atomic snapshot of
      the data, GitLab service should be stopped before running `git-backup pull`.
      
      Added test fails as follows without the fix:
      
          git-backup_test.go:291: pull: third run was not noop: δ:
              diff --git a/b1/dir/hello.git/refs/test/ref-to-blob b/b1/dir/hello.git/refs/test/ref-to-blob
              new file mode 100644
              index 0000000..379708e
              --- /dev/null
              +++ b/b1/dir/hello.git/refs/test/ref-to-blob
              @@ -0,0 +1 @@
              +cb8d6bb5e54b1c7159698442057416473a3b5385
      
      and even if we disable δ23 check in the test via
      
          @@ -286,7 +287,7 @@ func TestPullRestore(t *testing.T) {
                          t.Fatal("pull: third run did not adjusted HEAD")
                  }
                  δ23 := xgit(ctx, "diff", h2, h3)
          -       if δ23 != "" {
          +       if false && δ23 != "" {
                          t.Fatalf("pull: third run was not noop: δ:\n%s", δ23)
                  }
      
      then it becomes to fail in the restore part of the test confirming that
      the test is effective to cover hit problem:
      
          E: Problem while checking connectivity of extracted repo:
              git-backup_test.go:109: git-backup_test.go:296: lab.nexedi.com/kirr/git-backup.cmd_restore: E: extracted /tmp/t-git-backup10753034/1/dir/hello.git refs corrupt:
      
                  want:
                  647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2
                  feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master
                  11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob
                  f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit
                  7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag
                  ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree
                  7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob	<-- NOTE
                  61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree
      
                  have:
                  647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2
                  feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master
                  11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob
                  f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit
                  7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag
                  ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree
                  cb8d6bb5e54b1c7159698442057416473a3b5385 refs/test/ref-to-blob	<-- NOTE
                  61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree
      
      /reported-by @alain.takoudjou, @tomo
      /reported-on https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK
      /reviewed-by @jerome, @alain.takoudjou, @tomo
      /reviewed-on !11
      /cc @rafael
      a066d5e8
  2. 28 Jan, 2025 1 commit
  3. 14 Jun, 2024 1 commit
  4. 28 Apr, 2024 2 commits
    • Kirill Smelkov's avatar
      pull: test: Update expected error for "missing blob in pack" for Git ≥ 2.31 · 3230197c
      Kirill Smelkov authored
      Starting from the same https://git.kernel.org/pub/scm/git/git.git/commit/?id=5476e1efded5
      (see previous patch) git changed output when reporting error about a bad
      pack. Before that patch it was something like
      
              $ git -c fetch.fsckObjects=true fetch-pack --thin --upload-pack=git -c uploadpack.allowAnySHA1InWant=true -c uploadpack.allowTipSHA1InWant=true -c uploadpack.allowReachableSHA1InWant=true upload-pack /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/3/incomplete-send-pack.git 46094318d7ea2dab446294556097361409ca1e84 </dev/null
              warning: no common commits
              remote: I: x-missing-blob/hook-pack-object is running ...
              remote: Enumerating objects: 4, done.
      remote: Counting objects: 100% (4/4), done.
      remote: Compressing objects: 100% (3/3), done.
              remote: Total 4 (delta 1), reused 0 (delta 0)
              fatal: object of unexpected type                               <-- NOTE
              fatal: unpack-objects failed                                   <-- NOTE
      
      and after the patch it became
      
              $ git -c fetch.fsckObjects=true fetch-pack --thin --upload-pack=git -c uploadpack.allowAnySHA1InWant=true -c uploadpack.allowTipSHA1InWant=true -c uploadpack.allowReachableSHA1InWant=true upload-pack /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/3/incomplete-send-pack.git 46094318d7ea2dab446294556097361409ca1e84 </dev/null
              remote: I: x-missing-blob/hook-pack-object is running ...
              remote: Enumerating objects: 4, done.
      remote: Counting objects: 100% (4/4), done.
      remote: Compressing objects: 100% (3/3), done.
              remote: Total 4 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
      Receiving objects: 100% (4/4), 377 bytes | 377.00 KiB/s, done.
      Resolving deltas: 100% (1/1), done.)
              fatal: did not receive expected object 737592d2855213009bd3fa689167fe9a8363e45f         <-- NOTE
              fatal: index-pack failed                                                                <-- NOTE
      
      -> Adjust the test to detect the second case as also ok to avoid failing
      tests with Git ≥ 2.31.
      3230197c
    • Kirill Smelkov's avatar
      pull: test: Fix thinko when pruning test backup.git · e60e2e94
      Kirill Smelkov authored
      Pull test verifies that pulled tag objects become gone after becoming
      represented via specially-crafted commits. The "gone" is verified by
      removing all unreachable objects and asserting that looking up an object
      by original tag sha1 results in error.
      
      However there was a thinko in "removing all unreachable objects" - this
      step was implemented via `git prune`, but as documented, `git prune`
      removes only loose unreachable objects, while unreachable objects that
      are already in packs remain intact. From `git prune --help`:
      
          Note that unreachable, packed objects will remain. If this is not desired, see git-repack(1).
      
      So previously we were lucky that running the tests did not pull objects
      from test repositories into packs and plain `git prune` was enough.
      However starting from Git 2.31 fetching objects from test repositories
      pulls some of them as packs and the test starts to fail because original
      tag object remains alive:
      
          === RUN   TestPullRestore
          # creating root commit
          # building "already-have" index
          # git  b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git
          # file b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/HEAD
          # file b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/config
          # file b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/description
          # file b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/info/exclude
          # file b0       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/refs/.keep
          # building "already-have" index
          # git  b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/HEAD
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/config
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/description
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/info/exclude
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/packed-refs
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/refs/tags/tag-to-blob
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/refs/test/ref-to-tree
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/world.txt
          # git  b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/HEAD
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/config
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/description
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/info/exclude
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/refs/.keep
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/file 2
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/nongit.git/latest
          # git  b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/HEAD
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/config
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/description
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/info/exclude
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/refs/heads/master
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/file
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/file with space + α
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/fileexec
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.dir
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.file
          # file b1       <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.missing
              git-backup_test.go:203: tag f735011c9fcece41219729a33f7876cd8791f659 still present in backup.git after git-prune
          --- FAIL: TestPullRestore (0.14s)
      
      Bisecting shows that particular behaviour of pulling into packs from test
      repositories starts to happen after https://git.kernel.org/pub/scm/git/git.git/commit/?id=5476e1efded571e374cd97c7d69f17962ba1c44f .
      But as explained in the beginning it is not a Git bug as in the general case just `git prune` is not enough to remove all unreachable objects.
      
      -> Fix it by accompanying `git prune` with `git repack -ad` that removes unreachable objects from the packs.
      
      NOTE just `git repack -ad` is not enough as it puts in packs only reachable
      objects, so any unreachable objects that were loose before repacking, will
      remain alive.
      e60e2e94
  5. 07 Jul, 2021 1 commit
    • Kirill Smelkov's avatar
      Fix module-based build · c9db60e8
      Kirill Smelkov authored
      @lpgeneau reports that module-based build fails with
      
          2021-07-07 16:37:34 slapos[2415] INFO GOMODSRC /srv/slapgrid/slappart11/srv/runner/software/a12cdf3481c4202ae72cb255d8a6c183/go.work/src/lab.nexedi.com/kirr/git-backup:./...
          2021-07-07 16:37:34 slapos[2415] INFO ../../../../pkg/mod/lab.nexedi.com/kirr/go123@v0.0.0-20210302025843-863c4602a230/xerr/xerr.go:77:2: missing go.sum entry for module providing package github.com/pkg/errors (imported by lab.nexedi.com/kirr/go123/xerr); to add:
          2021-07-07 16:37:34 slapos[2415] INFO   go get lab.nexedi.com/kirr/go123/xerr@v0.0.0-20210302025843-863c4602a230
          2021-07-07 16:37:34 slapos[2415] INFO gowork.goinstall: Non zero exit code (1) while running command.
      
      -> Fix it by updating go123 dependency and running recommended command to update go.sum.
      
      Amends 3c804105 (*: Add Go modules support; Upgrade to latest git2go release)
      c9db60e8
  6. 02 Mar, 2021 1 commit
  7. 02 Jul, 2020 1 commit
  8. 20 May, 2020 3 commits
  9. 25 Feb, 2020 1 commit
  10. 10 Feb, 2020 3 commits
  11. 13 Jan, 2020 1 commit
    • Kirill Smelkov's avatar
      restore: Rework extraction pipeline to use xsync.WorkGroup · 6af054b0
      Kirill Smelkov authored
      The pattern where multiple workers are spawned to work on a common task
      and where whole work needs to be canceled on first error is now well
      understood, with the functionality to broadcast cancel and propagate
      errors being wrapped into libraries such as
      
      	https://godoc.org/golang.org/x/sync/errgroup			and
      	https://godoc.org/lab.nexedi.com/kirr/go123/xsync#WorkGroup
      	(kirr/go123@515a6d14)
      
      Let's streamline the code by using xsync.WorkGroup (it is in our hands,
      a bit more well designed (imho), has analog in Pygolang, and can be
      changed/enhanced as needed).
      
      The other reason to rework the code is that the workgroup is created
      under context (currently always background) and can be canceled by that
      context cancel. In the next patch we'll teach all git-backup
      subcommands, including restore, to work under context, and by using
      xsync.WorkGroup we will automatically handle cancellation from outside,
      while without reworking extraction pipeline we would need to
      additionally glue ctx cancel to signal to workers to stop.
      
      Compared to previous state both xsync.WorkGroup and errogroup return
      only the first error, however it should likely not cause problems in
      practice as the first error is usually the most informative one.
      6af054b0
  12. 05 Jan, 2020 2 commits
    • Kirill Smelkov's avatar
      gitlab-backup: pull|restore: Cleanup $tmpd in defer-style · 00f58d0b
      Kirill Smelkov authored
      Similarly to previous patch, let's cleanup gitlab-backup temporary
      folder always unconditionally in the presence of errors. Keeping $tmpd
      on error was not preventing further gitlab-backup run to proceed, but it
      can quickly eat up disk space if there are many such runs. If debugging
      is needed one can comment the cleanup, but by default let's be
      production friendly out of the box.
      
      Based on patch by @alain.takoudjou:
      !4
      
      Original description from Alain:
      
      ---- 8< ----
      When script exit, remove tmp backup folder which are not longuer needed.
      Keep this folder when backup is failing will contribute to fill the disk
      of server. backup.locked is also removed, because we want to
      automatically retry gitlab-backup if previous backup failed, without
      human action. If the file is not removed automatically, backup is
      blocked until someone remove it.
      00f58d0b
    • Kirill Smelkov's avatar
      pull: Don't leave backup repository locked on error · 2cc61da3
      Kirill Smelkov authored
      On pull git-backup locks backup repository to make sure another
      concurrent `git-backup pull` process is not running. However until now,
      if a pull was failing, the lock was left unreleased, which made followup
      pull attempts to fail while acquiring the lock until the lock was
      manually removed with `git update-ref -d ...`. Probably originally I
      made it like this in 6f237f22 (git-backup: Initial draft) to make sure
      that if there is a problem it does not go unnoticed and forces me to
      investigate. But in general we do _not_ need to keep the lock on error
      return after `git-backup pull` completes even abnormally.
      
      This "lock left unreleased" is causing operational issues on
      lab.nexedi.com from time to time: if a pull try fails for some, even
      temporary, reason, all next pull tries will fail until a human intervene
      and remove the lock ref.
      
      Fix it.
      
      See also: kirr/git-backup!4
      2cc61da3
  13. 29 Aug, 2018 1 commit
    • Kirill Smelkov's avatar
      Fix build with Go1.11 · 9791c04e
      Kirill Smelkov authored
      	# lab.nexedi.com/kirr/git-backup
      	./git.go:177: Raisef call needs 1 arg but has 2 args
      
      The bug was there from day 1 after rewrite in Go in 28986e0e.
      9791c04e
  14. 20 Jun, 2018 1 commit
  15. 13 Jun, 2018 1 commit
  16. 12 Jun, 2018 4 commits
    • Kirill Smelkov's avatar
    • Kirill Smelkov's avatar
      pull: Speedup fetching by prebuilding index of objects we already have at start · 3efed898
      Kirill Smelkov authored
      Like it was already said in 899103bf (pull: Switch from porcelain `git
      fetch` to plumbing `git fetch-pack` + friends) currently on
      lab.nexedi.com `git-backup pull` became slow and most of the slowness
      was tracked down to the fact that `git fetch` for every pulled repository does
      linear scan of whole backup repository history just to find out there is
      usually nothing to fetch. Quoting 899103bf:
      
      """
          `git fetch`, before fetching data from remote repository, first checks
          whether it already locally has all the objects remote advertises. This
          boils down to running
      
      	echo $remote_tips | git rev-list --quiet --objects --stdin --not --all
      
          and checking whether it succeeds or not:
      
      	https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671
      	https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925
      	https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8
      
          The "--not --all" in the query means that objects should be not
          reachable from all locally existing refs and is implemented by linearly
          scanning from tip of those existing refs and marking objects reachable
          from there as "do not print".
      
          In case of git-backup, where we have mostly master which is super commit
          merging from whole histories of all projects and from backup history,
          linearly scanning from such a tip goes through lots of commits. Up to
          the point where fetching a small, outdated repository, which was already
          pulled into backup and did not changed since long, takes more than 30
          seconds with almost 100% of that time being spent in quickfetch() only.
      """
      
      The solution is that we can build index of objects we already have ourselves
      only once at startup, and then in fetch, after checking lsremote output, consult
      that index, and if we see we already have everything for an advertised
      reference - just avoid giving it to fetch-pack to process. It turns out for
      many pulled repositories there is no references changed at all and this way
      fetch-pack can be skipped completely. This leads to dramatical speedup: before
      `gitlab-backup pull` was taking ~ 2 hours, and now something under ~ 5 minutes.
      
      The index building itself takes ~ 30 seconds - the time which we were
      previously spending to fetch just from 1 unchanged repository. The index size
      is small and so it all can be kept in RAM - please see details in the code
      comments on this.
      
      I initially wanted to speedup fetching by teaching `git fetch-objects` to
      consult backup repo bitmap reachability index (if, for a commit, we can see
      that there is an entry in this index -> we know we already have all reachable
      objects for this commit and can skip fetching). This won't however work
      fully for all our refs - 40% of them are mostly tags, and since in the backup
      repository we don't keep tag objects - we keep tags/tree/blobs encoded as
      commits - sha1 of those 40% references to tags won't be in bitmap index.
      
      So just do the indexing ourselves.
      3efed898
    • Kirill Smelkov's avatar
      Factor out backup.refs loading code from restore · 1be6aaaa
      Kirill Smelkov authored
      In the next patch we will need to load backup.refs in the beginning of
      pull too. Factored function changed to return regular error instead of
      raising exception (which will be the general plan from now on).
      1be6aaaa
    • Kirill Smelkov's avatar
      pull: Switch from porcelain `git fetch` to plumbing `git fetch-pack` + friends · 899103bf
      Kirill Smelkov authored
      On lab.nexedi.com `git-backup pull` became slow, and most of the slowness
      was tracked down to the following:
      
      `git fetch`, before fetching data from remote repository, first checks
      whether it already locally has all the objects remote advertises. This
      boils down to running
      
      	echo $remote_tips | git rev-list --quiet --objects --stdin --not --all
      
      and checking whether it succeeds or not:
      
      	https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671
      	https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925
      	https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8
      
      The "--not --all" in the query means that objects should be not
      reachable from all locally existing refs and is implemented by linearly
      scanning from tip of those existing refs and marking objects reachable
      from there as "do not print".
      
      In case of git-backup, where we have mostly master which is super commit
      merging from whole histories of all projects and from backup history,
      linearly scanning from such a tip goes through lots of commits. Up to
      the point where fetching a small, outdated repository, which was already
      pulled into backup and did not changed since long, takes more than 30
      seconds with almost 100% of that time being spent in quickfetch() only.
      
      The solution will be to optimize checking whether we already have all the
      remote objects and to not repeat whole backup-repo scanning for every
      pulled repository. This will be done via first querying through `git
      ls-remote` what tips remote repository has, then checking on
      git-backup specific index which tips we already have and then fetching
      only the rest. This way we are essentially moving most of quickfetch
      phase of git into git-backup.
      
      Since we'll be tailing to git to fetch only some of the remote refs, we
      will either have to amend ourselves the refs `git fetch` creates after
      fetching, or to not rely on `git fetch` creating any refs at all. Since
      we already have a long standing issue that many many refs that are
      coming live after `git fetch` slow down further git fetches
      
      https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L551
      
      the longer term plan will be not to create unneeded references.
      Since 2 forks could have references covering the same commits, we would
      either have to compare references created after git-fetch and deduplicate
      them or manage references creation ourselves.
      
      It is also generally better to split `git fetch` into steps at plumbing
      layer, because after doing so, we can have the chance to optimize or
      tweak any of the steps at our side with knowing full git-backup context
      and indices.
      
      This commit only switches from using `git fetch` to its plumbing
      counterpart `git fetch-pack` + friends + manually creating fetched refs
      the way `git fetch` used to do exactly. There should be neither
      functionality changed nor any speedup.
      
      Further commits will start to take advantage of the switch and optimize
      `git-backup pull`.
      899103bf
  17. 11 Jun, 2018 2 commits
    • Kirill Smelkov's avatar
      Clarify git Ref* types a bit · 350a01f9
      Kirill Smelkov authored
      - tell that reference name always goes without "refs/" prefix
      - use .name for reference name, not .ref: this way
      
      	ref.name
      
        is more readable than
      
      	ref.ref
      
        and so there is less need to use for __ in range loops.
      350a01f9
    • Kirill Smelkov's avatar
      restore: Show details when extracted repo refs were found corrupt · 23e07d70
      Kirill Smelkov authored
      Noticed this while changing how pull works and making error there
      incidentally with leaving more "refs/" prefix. With the error before
      this patch tests show:
      
              git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors:
      			- E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt:
      			- E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt:
      
      with the patch tests report:
      
              git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors:
                              - E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt:
      
                      want:
                      cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/refs/heads/master
      
                      have:
                      cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/heads/master
                              - E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt:
      
                      want:
                      647e137fd3b31939b36889eba854a298ef97b6ff refs/refs/heads/branch2
                      feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/refs/heads/master
                      11e67095628aa17b03436850e690faea3006c25d refs/refs/tags/tag-to-blob
                      f735011c9fcece41219729a33f7876cd8791f659 refs/refs/tags/tag-to-commit
                      7124713e403925bc772cd252b0dec099f3ced9c5 refs/refs/tags/tag-to-tag
                      ba899e5639273a6fa4d50d684af8db1ae070351e refs/refs/tags/tag-to-tree
                      7a3343f584218e973165d943d7c0af47a52ca477 refs/refs/test/ref-to-blob
                      61882eb85774ed4401681d800bb9c638031375e2 refs/refs/test/ref-to-tree
      
                      have:
                      647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2
                      feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master
                      11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob
                      f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit
                      7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag
                      ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree
                      7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob
                      61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree
      
      Should be good to have this details if something really breaks after restore.
      23e07d70
  18. 08 Jun, 2018 2 commits
  19. 05 Jun, 2018 1 commit
  20. 25 Apr, 2018 1 commit
    • Alain Takoudjou's avatar
      gitlab-backup: don't keep backup_gitlab.pulled files · 0b8d834b
      Alain Takoudjou authored
      add option to remove or keep pulled backup data
      
      [ kirr: The .pulled files with gitlab backup data (SQL and the like)
        were originally not removed "just in case" in the early days of
        git/gitlab-backup. They are clearly not needed to be kept since their
        content is entered into git backup database by gitlab-backup, and
        leaving those .pulled files just wastes disk space.
      
        So default to not keep them around and for now add an option to
        forcibly preserve the raw gitlab backup if we'll need it just in case or
        for the debugging.
      
        However if it turns out we won't really need -keep in practice, it
        might go away in some time. ]
      
      /reviewed-on kirr/git-backup!3
      0b8d834b
  21. 07 Mar, 2018 1 commit
  22. 24 Oct, 2017 1 commit
    • Kirill Smelkov's avatar
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · e37d99b4
      Kirill Smelkov authored
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.
      
      Nexedi stack is licensed under Free Software licenses with various exceptions
      that cover three business cases:
      
      - Free Software
      - Proprietary Software
      - Rebranding
      
      As long as one intends to develop Free Software based on Nexedi stack, no
      license cost is involved. Developing proprietary software based on Nexedi stack
      may require a proprietary exception license. Rebranding Nexedi stack is
      prohibited unless rebranding license is acquired.
      
      Through this licensing approach, Nexedi expects to encourage Free Software
      development without restrictions and at the same time create a framework for
      proprietary software to contribute to the long term sustainability of the
      Nexedi stack.
      
      Please see https://www.nexedi.com/licensing for details, rationale and options.
      e37d99b4
  23. 19 Apr, 2017 1 commit
  24. 13 Dec, 2016 4 commits
  25. 03 Nov, 2016 1 commit
    • Kirill Smelkov's avatar
      Don't be fooled by strings.Split(..., "\n") result always having empty "" last element · 3ba6cf73
      Kirill Smelkov authored
      By definition of strings.Split(..., sep) it "slices s into all substrings
      separated by sep and returns a slice of the substrings between those
      separators". That means that
      
          string.Split("hello\nworld\n", "\n") -> ["hello", "world", ""])     # NOTE the last ""
      
      when parsing file by lines, it is handy though to do not get last empty
      "" after last "\n". #6 shows how we missed to do that filtering-out for
      case of empty backup.refs file and errored-out because of that.
      
      To fix let's introduce a helper - splitlines(), which does the job of
      filtering-out last empty entry after last separator. By using this
      helper everywhere we can hopefully avoid problems while pulling only
      empty repositories (#6 case), and also similar ones.
      
      Fixes #6
      /reported-by @iv
      3ba6cf73
  26. 01 Aug, 2016 1 commit