1. 08 Sep, 2015 1 commit
    • Kirill Smelkov's avatar
      Don't forget to save symlinks pointing to directories · 380b65f1
      Kirill Smelkov authored
      os.walk() yields symlinks to directories in dirnames and do not follow
      them. Our backup cycle expects all files that need to go to blob to be
      in filenames and that dirnames are only recursed-into by walk().
      
      Thus, until now, symlink to a directory was simply ignored and not
      backup'ed. In particular *.git/hooks are usually symlinks to common
      place.
      
      The fix is to adjust our xwalk() to always represent blob-ish things in
      filenames, and leave dirnames only for real directories.
      
      /cc @kazuhiko
      380b65f1
  2. 31 Aug, 2015 3 commits
    • Kirill Smelkov's avatar
      gitlab-backup: Initial draft · 32e1f7af
      Kirill Smelkov authored
      This is convenience program to pull/restore backup data for a GitLab
      instance into/from git-backup managed repository.
      
      Backup layout is:
      
          gitlab/misc   - db + uploads + ...
          gitlab/repo   - git repositories
      
      On restoration we extract repositories into
      .../git-data/repositories.<timestamp> and db backup into standard gitlab
      backup tar and advice user how to proceed with exact finishing commands.
      
      This will hopefully be improved and changed to finish automatically,
      after some testing.
      32e1f7af
    • Kirill Smelkov's avatar
      git-backup: Initial draft · 6f237f22
      Kirill Smelkov authored
      This program backups files and set of bare Git repositories into one Git repository.
      Files are copied to blobs and then added to tree under certain place, and for
      Git repositories, all reachable objects are pulled in with maintaining index
      which remembers reference -> sha1 for every pulled repositories.
      
      After objects from backuped Git repositories are pulled in, we create new
      commit which references tree with changed backup index and files, and also has
      all head objects from pulled-in repositories in its parents(*). This way backup
      has history and all pulled objects become reachable from single head commit in
      backup repository. In particular that means that the whole state of backup can
      be described with only single sha1, and that backup repository itself could be
      synchronized via standard git pull/push, be repacked, etc.
      
      Restoration process is the opposite - from a particular backup state, files are
      extracted at a proper place, and for Git repositories a pack with all objects
      reachable from that repository heads is prepared and extracted from backup
      repository object database.
      
      This approach allows to leverage Git's good ability for object contents
      deduplication and packing, especially for cases when there are many hosted
      repositories which are forks of each other with relatively minor changes in
      between each other and over time, and mostly common base. In author experience
      the size of backup is dramatically smaller compared to straightforward "let's
      tar it all" approach.
      
      Data for all backuped files and repositories can be accessed if one has access
      to backup repository, so either they all should be in the same security domain,
      or extra care has to be taken to protect access to backup repository.
      
      File permissions are not managed with strict details due to inherent
      nature of Git. This aspect can be improved with e.g. etckeeper-like
      (http://etckeeper.branchable.com/) approach if needed.
      
      Please see README.txt with user-level overview on how to use git-backup.
      
      NOTE the idea of pulling all refs together is similar to git-namespaces
           http://git-scm.com/docs/gitnamespaces
      
      (*) Tag objects are handled specially - because in a lot of places Git insists and
          assumes commit parents can only be commit objects. We encode tag objects in
          specially-crafted commit object on pull, and decode back on backup restore.
      
          We do likewise if a ref points to tree or blob, which is valid in Git.
      6f237f22
    • Kirill Smelkov's avatar
      Start of git-backup.git · bbee44ce
      Kirill Smelkov authored
      The project to implement backing up repositories on git hosting
      efficiently.
      bbee44ce