-
Kirill Smelkov authored
pull, restore: Read refs of a saved repository only once; Use that reading to recreate refs on restore Since the beginning `git-backup pull` was working by saving discovered files as blobs and only making special care about *.git/objects/ via fetching data as git packs from there. This means that besides *.git/objects/ other files and directories inside *.git are saved as just regular files. For example *.git/HEAD , *.git/description and *.git/heads/refs/master are all saved as regular files. Which works ok when we know that there is no simultaneous modifications to saved repository, but could result in inconsistency in refs part in the presence of concurrent updates. For example Alain and Thomas report that restore side of lab.nexedi.com PBS complains sometimes as e.g. main.cmd_restore: E: extracted /srv/slapgrid/slappart43/srv/runner/instance/slappart0/var/repositories.1737918101/nexedi/erp5.git refs corrupt: want: 9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3 48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency 335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime 78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type 6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change 5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip 8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau 33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool 9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report 03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool ... have: 9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3 48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency 335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime 78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type 6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change 5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip 8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau 33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool 9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report 03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool ... which was found to result in the following difference: diff --git a/a b/b index fb51541cace8d..98efbacb8ed6f 100644 --- a/a +++ b/b @@ -18997,13 +18997,13 @@ ce76cb544752122200c72b1fe0e8d22de3409e89 refs/merge-requests/1127/head 726b07bccde0ed7f88b8ffcf9e4adc9c64624a75 refs/merge-requests/113/head dca6727ddb752a803764f6f2248c41b5ad98f653 refs/merge-requests/1130/head 0ecb76f1d433f6b04bf9562313f84375769377e4 refs/merge-requests/1131/head -ebf766512111f25e9cadd7f00d7e730e980ecde8 refs/merge-requests/1131/merge +cb636a38c13e00de2400f90f17d50f6a15ddb584 refs/merge-requests/1131/merge f8a87dfd1a0af405bd162b5c6884081f72fb29e1 refs/merge-requests/1132/head da16b24621c7fd8940906a718acf2aa3ad325ba6 refs/merge-requests/1132/merge 2d34bbeac38908a08546e8df0c912242dfbe0077 refs/merge-requests/1133/head 9f2fa83567bd0c1491d45d5b6809998276581354 refs/merge-requests/1134/head 850dcdc6c174487b51098e833aa4a8e868e3d3e6 refs/merge-requests/1135/head -959cfc12c6b3d61fe324813070803c92efea8373 refs/merge-requests/1135/merge +ff7515979f75b83adecaa18a72b114ce5742b906 refs/merge-requests/1135/merge 64ec4b8e7abf11f3f1877285dc4ec04e936b75b8 refs/merge-requests/1136/head 1f4e93c0c458739f0a97834802682a90642ddb10 refs/merge-requests/1137/head 41fa1d0c5e95ad51bb9d1d3d5bc4b404e9f7b44b refs/merge-requests/1138/head If we check about e.g. 1131 merge request, we can see that ebf766512111f25e9cadd7f00d7e730e980ecde8 is a recent commit, done a few days before the time of backup, and that cb636a38c13e00de2400f90f17d50f6a15ddb584 is not there in the backup repository at all: backup-gitlab.git$ git log -1 HEAD commit 296fe74285068fcd1bbc7333244883b49612bef8 (HEAD -> master) Merge: b4cd493bd50ae 0000025c4085f 00007c41785ae ... Date: Sun Jan 26 20:35:43 2025 +0060 Git-backup 20250126-2005 backup-gitlab.git$ git log -1 ebf766512111f25e9cadd7f00d7e730e980ecde8 commit ebf766512111f25e9cadd7f00d7e730e980ecde8 Merge: 569f78098281a 0ecb76f1d433f Author: Łukasz Nowak <luke@nexedi.com> Date: Wed Jan 22 19:46:15 2025 +0100 WIP: Feature/rjs gadget pagehelp See merge request nexedi/erp5!1131 backup-gitlab.git$ git log -1 cb636a38c13e00de2400f90f17d50f6a15ddb584 fatal: bad object cb636a38c13e00de2400f90f17d50f6a15ddb584 So what happenned here is that GitLab, it seems, automatically creates refs/merge-requests/X/merge from time to time, and from time to updates it with a fresh merge commit, ready to be used as the merge commit, if/when the merge-request in question becomes merged. Then the time of updating such merge commit aligned with the time when `git-backup pull` was running, git-backup first pulled when the state of saved repository was not yet modifed, then GitLab updated refs/merge-requests/1131/merge reference, and then `git-backup pull` continued and saved content of repo.git/refs/merge-requests/1131/merge file in its updated state. However the backup.refs in the backup repository is built from the state of saved references observed at fetch time, and so this results in the inconsistency in between information saved into backup.refs and information saved about those refs via just files from under *.git/ . -> Fix that by reading refs state only once - when doing fetch, and restoring refs state from that read state without relying on per-file level save/restore. For the reference Git itself implements fetching with always providing some consistent state of fetched repository. So even though in the presence of simultaneous changes git-backup cannot guarantee overall atomicity of saved backup, what it now should be able to guarantee is that saved state is some set of per-individual-repository consistent snapshots. As before, if one wants to make a fully atomic snapshot of the data, GitLab service should be stopped before running `git-backup pull`. Added test fails as follows without the fix: git-backup_test.go:291: pull: third run was not noop: δ: diff --git a/b1/dir/hello.git/refs/test/ref-to-blob b/b1/dir/hello.git/refs/test/ref-to-blob new file mode 100644 index 0000000..379708e --- /dev/null +++ b/b1/dir/hello.git/refs/test/ref-to-blob @@ -0,0 +1 @@ +cb8d6bb5e54b1c7159698442057416473a3b5385 and even if we disable δ23 check in the test via @@ -286,7 +287,7 @@ func TestPullRestore(t *testing.T) { t.Fatal("pull: third run did not adjusted HEAD") } δ23 := xgit(ctx, "diff", h2, h3) - if δ23 != "" { + if false && δ23 != "" { t.Fatalf("pull: third run was not noop: δ:\n%s", δ23) } then it becomes to fail in the restore part of the test confirming that the test is effective to cover hit problem: E: Problem while checking connectivity of extracted repo: git-backup_test.go:109: git-backup_test.go:296: lab.nexedi.com/kirr/git-backup.cmd_restore: E: extracted /tmp/t-git-backup10753034/1/dir/hello.git refs corrupt: want: 647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree 7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob <-- NOTE 61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree have: 647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree cb8d6bb5e54b1c7159698442057416473a3b5385 refs/test/ref-to-blob <-- NOTE 61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree /reported-by @alain.takoudjou, @tomo /reported-on https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK /reviewed-by @jerome, @alain.takoudjou, @tomo /reviewed-on !11 /cc @rafael
a066d5e8