- 31 Jan, 2025 1 commit
-
-
Kirill Smelkov authored
pull, restore: Read refs of a saved repository only once; Use that reading to recreate refs on restore Since the beginning `git-backup pull` was working by saving discovered files as blobs and only making special care about *.git/objects/ via fetching data as git packs from there. This means that besides *.git/objects/ other files and directories inside *.git are saved as just regular files. For example *.git/HEAD , *.git/description and *.git/heads/refs/master are all saved as regular files. Which works ok when we know that there is no simultaneous modifications to saved repository, but could result in inconsistency in refs part in the presence of concurrent updates. For example Alain and Thomas report that restore side of lab.nexedi.com PBS complains sometimes as e.g. main.cmd_restore: E: extracted /srv/slapgrid/slappart43/srv/runner/instance/slappart0/var/repositories.1737918101/nexedi/erp5.git refs corrupt: want: 9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3 48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency 335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime 78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type 6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change 5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip 8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau 33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool 9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report 03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool ... have: 9e7ca95f7104af0c3acf7e69cdd80306f10c8567 refs/heads/DateTime.equalTo_fix c11e08bac6fb8d41a2ce6c3c4f3321b4f3cd295c refs/heads/TMP-2to3 48b8c69f291026f6cb08fb118978dcf537b0e12a refs/heads/UpdateValidationStateFromConsistency 335648bdb91c75da22f9a4a627127933de5a6cd3 refs/heads/UserPropertySheet_backward_compatibility d5cef37ec4bba46168b04400978edf4a64ab781f refs/heads/addToDate_implicit_localtime 78e033ac7f1f0e80b51194fd732a495b003c1947 refs/heads/add_boolean_type 6ee39f470be53eb306e65b52e3ae6770f5487cf6 refs/heads/allow_login_change 5f5d2daada9502b7b4db8b3bc275571c40e39b54 refs/heads/allow_login_change_wip 8b6848d8bf8e60f491eb34b102f85eaf466f751c refs/heads/arnau 33c86fbbdf2f847b32bd5eabf7c34fbf5eddc8fa refs/heads/arnau-RD-Components-CacheTool 9f1ce1bab9489307be29b3c8439c78158e57a48e refs/heads/arnau-RD-Components-ERP5Form-ERP5Report 03a1712fedeabe643cdb1833b26b6d6370133a74 refs/heads/arnau-RD-Components-ERP5Form-SelectionTool-MemcachedTool ... which was found to result in the following difference: diff --git a/a b/b index fb51541cace8d..98efbacb8ed6f 100644 --- a/a +++ b/b @@ -18997,13 +18997,13 @@ ce76cb544752122200c72b1fe0e8d22de3409e89 refs/merge-requests/1127/head 726b07bccde0ed7f88b8ffcf9e4adc9c64624a75 refs/merge-requests/113/head dca6727ddb752a803764f6f2248c41b5ad98f653 refs/merge-requests/1130/head 0ecb76f1d433f6b04bf9562313f84375769377e4 refs/merge-requests/1131/head -ebf766512111f25e9cadd7f00d7e730e980ecde8 refs/merge-requests/1131/merge +cb636a38c13e00de2400f90f17d50f6a15ddb584 refs/merge-requests/1131/merge f8a87dfd1a0af405bd162b5c6884081f72fb29e1 refs/merge-requests/1132/head da16b24621c7fd8940906a718acf2aa3ad325ba6 refs/merge-requests/1132/merge 2d34bbeac38908a08546e8df0c912242dfbe0077 refs/merge-requests/1133/head 9f2fa83567bd0c1491d45d5b6809998276581354 refs/merge-requests/1134/head 850dcdc6c174487b51098e833aa4a8e868e3d3e6 refs/merge-requests/1135/head -959cfc12c6b3d61fe324813070803c92efea8373 refs/merge-requests/1135/merge +ff7515979f75b83adecaa18a72b114ce5742b906 refs/merge-requests/1135/merge 64ec4b8e7abf11f3f1877285dc4ec04e936b75b8 refs/merge-requests/1136/head 1f4e93c0c458739f0a97834802682a90642ddb10 refs/merge-requests/1137/head 41fa1d0c5e95ad51bb9d1d3d5bc4b404e9f7b44b refs/merge-requests/1138/head If we check about e.g. 1131 merge request, we can see that ebf766512111f25e9cadd7f00d7e730e980ecde8 is a recent commit, done a few days before the time of backup, and that cb636a38c13e00de2400f90f17d50f6a15ddb584 is not there in the backup repository at all: backup-gitlab.git$ git log -1 HEAD commit 296fe74285068fcd1bbc7333244883b49612bef8 (HEAD -> master) Merge: b4cd493bd50ae 0000025c4085f 00007c41785ae ... Date: Sun Jan 26 20:35:43 2025 +0060 Git-backup 20250126-2005 backup-gitlab.git$ git log -1 ebf766512111f25e9cadd7f00d7e730e980ecde8 commit ebf766512111f25e9cadd7f00d7e730e980ecde8 Merge: 569f78098281a 0ecb76f1d433f Author: Łukasz Nowak <luke@nexedi.com> Date: Wed Jan 22 19:46:15 2025 +0100 WIP: Feature/rjs gadget pagehelp See merge request nexedi/erp5!1131 backup-gitlab.git$ git log -1 cb636a38c13e00de2400f90f17d50f6a15ddb584 fatal: bad object cb636a38c13e00de2400f90f17d50f6a15ddb584 So what happenned here is that GitLab, it seems, automatically creates refs/merge-requests/X/merge from time to time, and from time to updates it with a fresh merge commit, ready to be used as the merge commit, if/when the merge-request in question becomes merged. Then the time of updating such merge commit aligned with the time when `git-backup pull` was running, git-backup first pulled when the state of saved repository was not yet modifed, then GitLab updated refs/merge-requests/1131/merge reference, and then `git-backup pull` continued and saved content of repo.git/refs/merge-requests/1131/merge file in its updated state. However the backup.refs in the backup repository is built from the state of saved references observed at fetch time, and so this results in the inconsistency in between information saved into backup.refs and information saved about those refs via just files from under *.git/ . -> Fix that by reading refs state only once - when doing fetch, and restoring refs state from that read state without relying on per-file level save/restore. For the reference Git itself implements fetching with always providing some consistent state of fetched repository. So even though in the presence of simultaneous changes git-backup cannot guarantee overall atomicity of saved backup, what it now should be able to guarantee is that saved state is some set of per-individual-repository consistent snapshots. As before, if one wants to make a fully atomic snapshot of the data, GitLab service should be stopped before running `git-backup pull`. Added test fails as follows without the fix: git-backup_test.go:291: pull: third run was not noop: δ: diff --git a/b1/dir/hello.git/refs/test/ref-to-blob b/b1/dir/hello.git/refs/test/ref-to-blob new file mode 100644 index 0000000..379708e --- /dev/null +++ b/b1/dir/hello.git/refs/test/ref-to-blob @@ -0,0 +1 @@ +cb8d6bb5e54b1c7159698442057416473a3b5385 and even if we disable δ23 check in the test via @@ -286,7 +287,7 @@ func TestPullRestore(t *testing.T) { t.Fatal("pull: third run did not adjusted HEAD") } δ23 := xgit(ctx, "diff", h2, h3) - if δ23 != "" { + if false && δ23 != "" { t.Fatalf("pull: third run was not noop: δ:\n%s", δ23) } then it becomes to fail in the restore part of the test confirming that the test is effective to cover hit problem: E: Problem while checking connectivity of extracted repo: git-backup_test.go:109: git-backup_test.go:296: lab.nexedi.com/kirr/git-backup.cmd_restore: E: extracted /tmp/t-git-backup10753034/1/dir/hello.git refs corrupt: want: 647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree 7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob <-- NOTE 61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree have: 647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree cb8d6bb5e54b1c7159698442057416473a3b5385 refs/test/ref-to-blob <-- NOTE 61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree /reported-by @alain.takoudjou, @tomo /reported-on https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK /reviewed-by @jerome, @alain.takoudjou, @tomo /reviewed-on !11 /cc @rafael
-
- 28 Jan, 2025 1 commit
-
-
Kirill Smelkov authored
Even though we have the readme people keep on wondering why git-backup was created at all. Rafael suggested to share the original announcement email "so we dont forget later on why we have this." -> Do that. /suggested-and-reviewed-by @rafael /reviewed-on !10 /reviewed-on https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=8&reset=1#2074720302
-
- 14 Jun, 2024 1 commit
-
-
Alain Takoudjou authored
pg_restore command now need at least -d/--dbname or -f/--filename argument. use --if-exits to not drop table if it doesn't exists. -------- kirr: * pg_restore started to require `-f` in PostgreSQL 12. From https://www.postgresql.org/docs/release/12.0/ : In pg_restore, require specification of -f - to send the dump contents to standard output (Euler Taveira) Previously, this happened by default if no destination was specified, but that was deemed to be unfriendly. * --if-exists suppresses "does not exist" warnings if restored table did not exist in the database. From https://www.postgresql.org/docs/current/app-pgrestore.html : --if-exists Use DROP ... IF EXISTS commands to drop objects in --clean mode. This suppresses “does not exist” errors that might otherwise be reported. This option is not valid unless --clean is also specified. /reviewed-by @kirr /reviewed-on !9
-
- 28 Apr, 2024 2 commits
-
-
Kirill Smelkov authored
Starting from the same https://git.kernel.org/pub/scm/git/git.git/commit/?id=5476e1efded5 (see previous patch) git changed output when reporting error about a bad pack. Before that patch it was something like $ git -c fetch.fsckObjects=true fetch-pack --thin --upload-pack=git -c uploadpack.allowAnySHA1InWant=true -c uploadpack.allowTipSHA1InWant=true -c uploadpack.allowReachableSHA1InWant=true upload-pack /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/3/incomplete-send-pack.git 46094318d7ea2dab446294556097361409ca1e84 </dev/null warning: no common commits remote: I: x-missing-blob/hook-pack-object is running ... remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Compressing objects: 100% (3/3), done. remote: Total 4 (delta 1), reused 0 (delta 0) fatal: object of unexpected type <-- NOTE fatal: unpack-objects failed <-- NOTE and after the patch it became $ git -c fetch.fsckObjects=true fetch-pack --thin --upload-pack=git -c uploadpack.allowAnySHA1InWant=true -c uploadpack.allowTipSHA1InWant=true -c uploadpack.allowReachableSHA1InWant=true upload-pack /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/3/incomplete-send-pack.git 46094318d7ea2dab446294556097361409ca1e84 </dev/null remote: I: x-missing-blob/hook-pack-object is running ... remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Compressing objects: 100% (3/3), done. remote: Total 4 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0) Receiving objects: 100% (4/4), 377 bytes | 377.00 KiB/s, done. Resolving deltas: 100% (1/1), done.) fatal: did not receive expected object 737592d2855213009bd3fa689167fe9a8363e45f <-- NOTE fatal: index-pack failed <-- NOTE -> Adjust the test to detect the second case as also ok to avoid failing tests with Git ≥ 2.31.
-
Kirill Smelkov authored
Pull test verifies that pulled tag objects become gone after becoming represented via specially-crafted commits. The "gone" is verified by removing all unreachable objects and asserting that looking up an object by original tag sha1 results in error. However there was a thinko in "removing all unreachable objects" - this step was implemented via `git prune`, but as documented, `git prune` removes only loose unreachable objects, while unreachable objects that are already in packs remain intact. From `git prune --help`: Note that unreachable, packed objects will remain. If this is not desired, see git-repack(1). So previously we were lucky that running the tests did not pull objects from test repositories into packs and plain `git prune` was enough. However starting from Git 2.31 fetching objects from test repositories pulls some of them as packs and the test starts to fail because original tag object remains alive: === RUN TestPullRestore # creating root commit # building "already-have" index # git b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git # file b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/HEAD # file b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/config # file b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/description # file b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/info/exclude # file b0 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/0/dir/empty repo.git/refs/.keep # building "already-have" index # git b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/HEAD # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/config # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/description # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/info/exclude # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/packed-refs # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/refs/tags/tag-to-blob # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/hello.git/refs/test/ref-to-tree # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir/world.txt # git b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/HEAD # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/config # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/description # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/info/exclude # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/empty repo.git/refs/.keep # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/file 2 # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/nongit.git/latest # git b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/HEAD # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/config # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/description # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/info/exclude # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/dir 2 + β/repo with+fragile name %αβγ.git/refs/heads/master # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/file # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/file with space + α # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/fileexec # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.dir # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.file # file b1 <- /home/kirr/src/neo/src/lab.nexedi.com/kirr/git-backup/testdata/1/symlink.missing git-backup_test.go:203: tag f735011c9fcece41219729a33f7876cd8791f659 still present in backup.git after git-prune --- FAIL: TestPullRestore (0.14s) Bisecting shows that particular behaviour of pulling into packs from test repositories starts to happen after https://git.kernel.org/pub/scm/git/git.git/commit/?id=5476e1efded571e374cd97c7d69f17962ba1c44f . But as explained in the beginning it is not a Git bug as in the general case just `git prune` is not enough to remove all unreachable objects. -> Fix it by accompanying `git prune` with `git repack -ad` that removes unreachable objects from the packs. NOTE just `git repack -ad` is not enough as it puts in packs only reachable objects, so any unreachable objects that were loose before repacking, will remain alive.
-
- 07 Jul, 2021 1 commit
-
-
Kirill Smelkov authored
@lpgeneau reports that module-based build fails with 2021-07-07 16:37:34 slapos[2415] INFO GOMODSRC /srv/slapgrid/slappart11/srv/runner/software/a12cdf3481c4202ae72cb255d8a6c183/go.work/src/lab.nexedi.com/kirr/git-backup:./... 2021-07-07 16:37:34 slapos[2415] INFO ../../../../pkg/mod/lab.nexedi.com/kirr/go123@v0.0.0-20210302025843-863c4602a230/xerr/xerr.go:77:2: missing go.sum entry for module providing package github.com/pkg/errors (imported by lab.nexedi.com/kirr/go123/xerr); to add: 2021-07-07 16:37:34 slapos[2415] INFO go get lab.nexedi.com/kirr/go123/xerr@v0.0.0-20210302025843-863c4602a230 2021-07-07 16:37:34 slapos[2415] INFO gowork.goinstall: Non zero exit code (1) while running command. -> Fix it by updating go123 dependency and running recommended command to update go.sum. Amends 3c804105 (*: Add Go modules support; Upgrade to latest git2go release)
-
- 02 Mar, 2021 1 commit
-
-
Kirill Smelkov authored
-
- 02 Jul, 2020 1 commit
-
-
Alain Takoudjou authored
If there is a folder in git repositories which ends with .git and which is not a git repository, git-backup will fail because it will try to use it as a git repository. This patch just checks if it's really a git repo before continue inside. [kirr: Reworked the patch a bit] /reviewed-on: kirr/git-backup!8
-
- 20 May, 2020 3 commits
-
-
Kirill Smelkov authored
@kirr This two commits are required to backup and restore with recent versions of gitlab. - Skip gitlab page, else gitlab will try to restore it and fail. - `Backup::Database.new` now require a parameter /reviewed-on: kirr/git-backup!7
-
Alain Takoudjou authored
-
- 25 Feb, 2020 1 commit
-
-
Alain Takoudjou authored
Solve the problem: NoMethodError: undefined method `parse' for Time:Class -------- kirr: it probaly started to fail due to gitlab codebase upgrade, as eariler gitlab-rake was likely requiring "time" and now does not. /reviewed-by @kirr /reviewed-on kirr/git-backup!6
-
- 10 Feb, 2020 3 commits
-
-
Kirill Smelkov authored
So that further `git-backup pull` run can proceed, and many gitlab-backup's left `$tmpd` don't eat up disk space. Rework of kirr/git-backup!4. /reviewed-on kirr/git-backup!5
-
Kirill Smelkov authored
This lets deferred cleanup to be run instead of whole process being killed without having a chance to preserve external data in consistent state.
-
Kirill Smelkov authored
Add ctx parameter to cmd_pull, cmd_restore and inner functions that they call that can block and handle ctx cancel where we can. For now both pull and restore are always run under background context, but in the next patch we'll connect SIGINT+SIGTERM to cancel spawned work. In general a service - even a command line utility - needs to handle cancellation properly and itself to maintain consistency of external state. See e.g. https://callistaenterprise.se/blogg/teknik/2019/10/05/go-worker-cancellation/ for example.
-
- 13 Jan, 2020 1 commit
-
-
Kirill Smelkov authored
The pattern where multiple workers are spawned to work on a common task and where whole work needs to be canceled on first error is now well understood, with the functionality to broadcast cancel and propagate errors being wrapped into libraries such as https://godoc.org/golang.org/x/sync/errgroup and https://godoc.org/lab.nexedi.com/kirr/go123/xsync#WorkGroup (kirr/go123@515a6d14) Let's streamline the code by using xsync.WorkGroup (it is in our hands, a bit more well designed (imho), has analog in Pygolang, and can be changed/enhanced as needed). The other reason to rework the code is that the workgroup is created under context (currently always background) and can be canceled by that context cancel. In the next patch we'll teach all git-backup subcommands, including restore, to work under context, and by using xsync.WorkGroup we will automatically handle cancellation from outside, while without reworking extraction pipeline we would need to additionally glue ctx cancel to signal to workers to stop. Compared to previous state both xsync.WorkGroup and errogroup return only the first error, however it should likely not cause problems in practice as the first error is usually the most informative one.
-
- 05 Jan, 2020 2 commits
-
-
Kirill Smelkov authored
Similarly to previous patch, let's cleanup gitlab-backup temporary folder always unconditionally in the presence of errors. Keeping $tmpd on error was not preventing further gitlab-backup run to proceed, but it can quickly eat up disk space if there are many such runs. If debugging is needed one can comment the cleanup, but by default let's be production friendly out of the box. Based on patch by @alain.takoudjou: !4 Original description from Alain: ---- 8< ---- When script exit, remove tmp backup folder which are not longuer needed. Keep this folder when backup is failing will contribute to fill the disk of server. backup.locked is also removed, because we want to automatically retry gitlab-backup if previous backup failed, without human action. If the file is not removed automatically, backup is blocked until someone remove it.
-
Kirill Smelkov authored
On pull git-backup locks backup repository to make sure another concurrent `git-backup pull` process is not running. However until now, if a pull was failing, the lock was left unreleased, which made followup pull attempts to fail while acquiring the lock until the lock was manually removed with `git update-ref -d ...`. Probably originally I made it like this in 6f237f22 (git-backup: Initial draft) to make sure that if there is a problem it does not go unnoticed and forces me to investigate. But in general we do _not_ need to keep the lock on error return after `git-backup pull` completes even abnormally. This "lock left unreleased" is causing operational issues on lab.nexedi.com from time to time: if a pull try fails for some, even temporary, reason, all next pull tries will fail until a human intervene and remove the lock ref. Fix it. See also: kirr/git-backup!4
-
- 29 Aug, 2018 1 commit
-
-
Kirill Smelkov authored
# lab.nexedi.com/kirr/git-backup ./git.go:177: Raisef call needs 1 arg but has 2 args The bug was there from day 1 after rewrite in Go in 28986e0e.
-
- 20 Jun, 2018 1 commit
-
-
Kirill Smelkov authored
Without trailing dot the following sentence was included into TOC.
-
- 13 Jun, 2018 1 commit
-
-
Kirill Smelkov authored
-
- 12 Jun, 2018 4 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Like it was already said in 899103bf (pull: Switch from porcelain `git fetch` to plumbing `git fetch-pack` + friends) currently on lab.nexedi.com `git-backup pull` became slow and most of the slowness was tracked down to the fact that `git fetch` for every pulled repository does linear scan of whole backup repository history just to find out there is usually nothing to fetch. Quoting 899103bf: """ `git fetch`, before fetching data from remote repository, first checks whether it already locally has all the objects remote advertises. This boils down to running echo $remote_tips | git rev-list --quiet --objects --stdin --not --all and checking whether it succeeds or not: https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671 https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925 https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8 The "--not --all" in the query means that objects should be not reachable from all locally existing refs and is implemented by linearly scanning from tip of those existing refs and marking objects reachable from there as "do not print". In case of git-backup, where we have mostly master which is super commit merging from whole histories of all projects and from backup history, linearly scanning from such a tip goes through lots of commits. Up to the point where fetching a small, outdated repository, which was already pulled into backup and did not changed since long, takes more than 30 seconds with almost 100% of that time being spent in quickfetch() only. """ The solution is that we can build index of objects we already have ourselves only once at startup, and then in fetch, after checking lsremote output, consult that index, and if we see we already have everything for an advertised reference - just avoid giving it to fetch-pack to process. It turns out for many pulled repositories there is no references changed at all and this way fetch-pack can be skipped completely. This leads to dramatical speedup: before `gitlab-backup pull` was taking ~ 2 hours, and now something under ~ 5 minutes. The index building itself takes ~ 30 seconds - the time which we were previously spending to fetch just from 1 unchanged repository. The index size is small and so it all can be kept in RAM - please see details in the code comments on this. I initially wanted to speedup fetching by teaching `git fetch-objects` to consult backup repo bitmap reachability index (if, for a commit, we can see that there is an entry in this index -> we know we already have all reachable objects for this commit and can skip fetching). This won't however work fully for all our refs - 40% of them are mostly tags, and since in the backup repository we don't keep tag objects - we keep tags/tree/blobs encoded as commits - sha1 of those 40% references to tags won't be in bitmap index. So just do the indexing ourselves.
-
Kirill Smelkov authored
In the next patch we will need to load backup.refs in the beginning of pull too. Factored function changed to return regular error instead of raising exception (which will be the general plan from now on).
-
Kirill Smelkov authored
On lab.nexedi.com `git-backup pull` became slow, and most of the slowness was tracked down to the following: `git fetch`, before fetching data from remote repository, first checks whether it already locally has all the objects remote advertises. This boils down to running echo $remote_tips | git rev-list --quiet --objects --stdin --not --all and checking whether it succeeds or not: https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671 https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925 https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8 The "--not --all" in the query means that objects should be not reachable from all locally existing refs and is implemented by linearly scanning from tip of those existing refs and marking objects reachable from there as "do not print". In case of git-backup, where we have mostly master which is super commit merging from whole histories of all projects and from backup history, linearly scanning from such a tip goes through lots of commits. Up to the point where fetching a small, outdated repository, which was already pulled into backup and did not changed since long, takes more than 30 seconds with almost 100% of that time being spent in quickfetch() only. The solution will be to optimize checking whether we already have all the remote objects and to not repeat whole backup-repo scanning for every pulled repository. This will be done via first querying through `git ls-remote` what tips remote repository has, then checking on git-backup specific index which tips we already have and then fetching only the rest. This way we are essentially moving most of quickfetch phase of git into git-backup. Since we'll be tailing to git to fetch only some of the remote refs, we will either have to amend ourselves the refs `git fetch` creates after fetching, or to not rely on `git fetch` creating any refs at all. Since we already have a long standing issue that many many refs that are coming live after `git fetch` slow down further git fetches https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L551 the longer term plan will be not to create unneeded references. Since 2 forks could have references covering the same commits, we would either have to compare references created after git-fetch and deduplicate them or manage references creation ourselves. It is also generally better to split `git fetch` into steps at plumbing layer, because after doing so, we can have the chance to optimize or tweak any of the steps at our side with knowing full git-backup context and indices. This commit only switches from using `git fetch` to its plumbing counterpart `git fetch-pack` + friends + manually creating fetched refs the way `git fetch` used to do exactly. There should be neither functionality changed nor any speedup. Further commits will start to take advantage of the switch and optimize `git-backup pull`.
-
- 11 Jun, 2018 2 commits
-
-
Kirill Smelkov authored
- tell that reference name always goes without "refs/" prefix - use .name for reference name, not .ref: this way ref.name is more readable than ref.ref and so there is less need to use for __ in range loops.
-
Kirill Smelkov authored
Noticed this while changing how pull works and making error there incidentally with leaving more "refs/" prefix. With the error before this patch tests show: git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors: - E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt: - E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt: with the patch tests report: git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors: - E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt: want: cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/refs/heads/master have: cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/heads/master - E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt: want: 647e137fd3b31939b36889eba854a298ef97b6ff refs/refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/refs/tags/tag-to-tree 7a3343f584218e973165d943d7c0af47a52ca477 refs/refs/test/ref-to-blob 61882eb85774ed4401681d800bb9c638031375e2 refs/refs/test/ref-to-tree have: 647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2 feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master 11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit 7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree 7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob 61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree Should be good to have this details if something really breaks after restore.
-
- 08 Jun, 2018 2 commits
-
-
Kirill Smelkov authored
This way, if backup repository was freshly repacked with bitmap index generation turned on, we can get ~ 30% - 50% speedup for a typical erp5.git pack extraction. "--use-bitmap-index" option was added to git in v2.0, but was only active for to-stdout packs generation. It was enabled for to-file packs generation in git v2.11. Since git v2.0 was released in 2014 - 4 years ago - I'm not adding runtime detection of "--use-bitmap-index" availability. See https://git.kernel.org/pub/scm/git/git.git/commit/?h=645c432d61 for details.
-
Kirill Smelkov authored
-
- 05 Jun, 2018 1 commit
-
-
Kirill Smelkov authored
- remove blank line between main description and package clause, so that the main description is understood as such; - move notes describing what a file does after package clause, so that those notes do not get mixed into program description under godoc.
-
- 25 Apr, 2018 1 commit
-
-
Alain Takoudjou authored
add option to remove or keep pulled backup data [ kirr: The .pulled files with gitlab backup data (SQL and the like) were originally not removed "just in case" in the early days of git/gitlab-backup. They are clearly not needed to be kept since their content is entered into git backup database by gitlab-backup, and leaving those .pulled files just wastes disk space. So default to not keep them around and for now add an option to forcibly preserve the raw gitlab backup if we'll need it just in case or for the debugging. However if it turns out we won't really need -keep in practice, it might go away in some time. ] /reviewed-on kirr/git-backup!3
-
- 07 Mar, 2018 1 commit
-
-
Alain Takoudjou authored
If a repository is removed when git-backup is running, print a warning message and continue pulling instead of exiting with error. /reviewed-on kirr/git-backup!2
-
- 24 Oct, 2017 1 commit
-
-
Kirill Smelkov authored
Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options. Nexedi stack is licensed under Free Software licenses with various exceptions that cover three business cases: - Free Software - Proprietary Software - Rebranding As long as one intends to develop Free Software based on Nexedi stack, no license cost is involved. Developing proprietary software based on Nexedi stack may require a proprietary exception license. Rebranding Nexedi stack is prohibited unless rebranding license is acquired. Through this licensing approach, Nexedi expects to encourage Free Software development without restrictions and at the same time create a framework for proprietary software to contribute to the long term sustainability of the Nexedi stack. Please see https://www.nexedi.com/licensing for details, rationale and options.
-
- 19 Apr, 2017 1 commit
-
-
Kirill Smelkov authored
- myname moved -> my kirr/go123@98249b24 - Traceback now returns []runtime.Frame kirr/go123@7deb28a5
-
- 13 Dec, 2016 4 commits
-
-
Kirill Smelkov authored
to xflag.Count
-
https://lab.nexedi.com/kirr/go123/xstrings/Kirill Smelkov authored
xstrings.SplitLines xstrings.Split2 xstrings.HeadTail Other string-related routines stay in git-backup for now as I don't feel they are general enough or interface chosen is really ok.
-
https://lab.nexedi.com/kirr/go123/mem/Kirill Smelkov authored
It is now mem.String(), and mem.Bytes()
-
Kirill Smelkov authored
error.go is completely being moved to that shared place for handy Go utilities into several subpackages: lab.nexedi.com/kirr/go123/exc -- exception-style error handling for Go lab.nexedi.com/kirr/go123/myname -- easy way to determine current function's name and package lab.nexedi.com/kirr/go123/xerr -- addons for error-handling lab.nexedi.com/kirr/go123/xruntime -- addons to standard package runtime
-
- 03 Nov, 2016 1 commit
-
-
Kirill Smelkov authored
By definition of strings.Split(..., sep) it "slices s into all substrings separated by sep and returns a slice of the substrings between those separators". That means that string.Split("hello\nworld\n", "\n") -> ["hello", "world", ""]) # NOTE the last "" when parsing file by lines, it is handy though to do not get last empty "" after last "\n". #6 shows how we missed to do that filtering-out for case of empty backup.refs file and errored-out because of that. To fix let's introduce a helper - splitlines(), which does the job of filtering-out last empty entry after last separator. By using this helper everywhere we can hopefully avoid problems while pulling only empty repositories (#6 case), and also similar ones. Fixes #6 /reported-by @iv
-
- 01 Aug, 2016 1 commit
-
-
Kirill Smelkov authored
Continuing 62374038 (pull: Turns unused refs are removed not 100% and a lot of empty directories are accumulated) we just make sure to remove them in the end of pull. But NOTE: there could be O(n^2) behaviour still hidden, so it makes sense to eventually revisit it and cleanup empty dirs earlier. For now we just care not to degrade future pull performance. The appropriate time for revisiting could be when reworking pull to do fetches in parallel. Updates: https://lab.nexedi.com/lab.nexedi.com/lab.nexedi.com/issues/4
-