Commit 7c0e3ff2 authored by Kirill Smelkov's avatar Kirill Smelkov

fsck incoming objects on pull

Since objects are shared between backed up repositories, it is important
to make sure we do not pull a broken object once, thus programming
future corruption of that object after restore in all repositories which
use it.

Object corruption could happen for two reasons:

    - plain storage corruption, or
    - someone intentionally pushing corrupted object with known sha1 to
      any repository.

Second case is even more dangerous, as it potentially allows attacker to
change data in not-available-to-him repositories.

Now objects are checked on pull, and if corrupt, git-backup complains,
e.g. this way:

    RuntimeError: git -c fetch.fsckObjects=true fetch --no-tags ../D/corrupt.git refs/*:refs/backup/20151014-1914/aaa/corrupt.git/*
    error: inflate: data stream error (incorrect data check)
    fatal: loose object 52baccfe8479b61c2a0d5447bc0a6bf7c6827c60 (stored in ./objects/52/baccfe8479b61c2a0d5447bc0a6bf7c6827c60) is corrupt
    fatal: The remote end hung up unexpectedly
parent 19b35be9
......@@ -455,14 +455,16 @@ def cmd_pull_(pullspecv):
info('# git %s\t<- %s' % (prefix, gitrepo))
# NOTE --no-tags : do not try to autoextend commit -> covering tag
xgit('fetch', '--no-tags', gitrepo,
# NOTE fetch.fsckObjects=true : check objects for corruption as they are fetched
xgit('-c', 'fetch.fsckObjects=true',
'fetch', '--no-tags', gitrepo,
'refs/*:%s%s/*' % (backup_refs_work,
# NOTE repo name is quoted as it can contain spaces, and refs must not
quote(reprefix(dir_, prefix, gitrepo))),
# TODO do not show which ref we pulled - show only pack transfer progress
stderr=gitprogress())
# XXX do we want to fsck source git repo on pull ?
# XXX do we want to do full fsck of source git repo on pull as well ?
# do not recurse into dirs so marked
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment