• Kirill Smelkov's avatar
    git-backup: Initial draft · 6f237f22
    Kirill Smelkov authored
    This program backups files and set of bare Git repositories into one Git repository.
    Files are copied to blobs and then added to tree under certain place, and for
    Git repositories, all reachable objects are pulled in with maintaining index
    which remembers reference -> sha1 for every pulled repositories.
    
    After objects from backuped Git repositories are pulled in, we create new
    commit which references tree with changed backup index and files, and also has
    all head objects from pulled-in repositories in its parents(*). This way backup
    has history and all pulled objects become reachable from single head commit in
    backup repository. In particular that means that the whole state of backup can
    be described with only single sha1, and that backup repository itself could be
    synchronized via standard git pull/push, be repacked, etc.
    
    Restoration process is the opposite - from a particular backup state, files are
    extracted at a proper place, and for Git repositories a pack with all objects
    reachable from that repository heads is prepared and extracted from backup
    repository object database.
    
    This approach allows to leverage Git's good ability for object contents
    deduplication and packing, especially for cases when there are many hosted
    repositories which are forks of each other with relatively minor changes in
    between each other and over time, and mostly common base. In author experience
    the size of backup is dramatically smaller compared to straightforward "let's
    tar it all" approach.
    
    Data for all backuped files and repositories can be accessed if one has access
    to backup repository, so either they all should be in the same security domain,
    or extra care has to be taken to protect access to backup repository.
    
    File permissions are not managed with strict details due to inherent
    nature of Git. This aspect can be improved with e.g. etckeeper-like
    (http://etckeeper.branchable.com/) approach if needed.
    
    Please see README.txt with user-level overview on how to use git-backup.
    
    NOTE the idea of pulling all refs together is similar to git-namespaces
         http://git-scm.com/docs/gitnamespaces
    
    (*) Tag objects are handled specially - because in a lot of places Git insists and
        assumes commit parents can only be commit objects. We encode tag objects in
        specially-crafted commit object on pull, and decode back on backup restore.
    
        We do likewise if a ref points to tree or blob, which is valid in Git.
    6f237f22
README.txt 1.93 KB