1. 02 Jul, 2020 1 commit
  2. 12 Jun, 2018 1 commit
    • Kirill Smelkov's avatar
      pull: Switch from porcelain `git fetch` to plumbing `git fetch-pack` + friends · 899103bf
      Kirill Smelkov authored
      On lab.nexedi.com `git-backup pull` became slow, and most of the slowness
      was tracked down to the following:
      
      `git fetch`, before fetching data from remote repository, first checks
      whether it already locally has all the objects remote advertises. This
      boils down to running
      
      	echo $remote_tips | git rev-list --quiet --objects --stdin --not --all
      
      and checking whether it succeeds or not:
      
      	https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671
      	https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925
      	https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8
      
      The "--not --all" in the query means that objects should be not
      reachable from all locally existing refs and is implemented by linearly
      scanning from tip of those existing refs and marking objects reachable
      from there as "do not print".
      
      In case of git-backup, where we have mostly master which is super commit
      merging from whole histories of all projects and from backup history,
      linearly scanning from such a tip goes through lots of commits. Up to
      the point where fetching a small, outdated repository, which was already
      pulled into backup and did not changed since long, takes more than 30
      seconds with almost 100% of that time being spent in quickfetch() only.
      
      The solution will be to optimize checking whether we already have all the
      remote objects and to not repeat whole backup-repo scanning for every
      pulled repository. This will be done via first querying through `git
      ls-remote` what tips remote repository has, then checking on
      git-backup specific index which tips we already have and then fetching
      only the rest. This way we are essentially moving most of quickfetch
      phase of git into git-backup.
      
      Since we'll be tailing to git to fetch only some of the remote refs, we
      will either have to amend ourselves the refs `git fetch` creates after
      fetching, or to not rely on `git fetch` creating any refs at all. Since
      we already have a long standing issue that many many refs that are
      coming live after `git fetch` slow down further git fetches
      
      https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L551
      
      the longer term plan will be not to create unneeded references.
      Since 2 forks could have references covering the same commits, we would
      either have to compare references created after git-fetch and deduplicate
      them or manage references creation ourselves.
      
      It is also generally better to split `git fetch` into steps at plumbing
      layer, because after doing so, we can have the chance to optimize or
      tweak any of the steps at our side with knowing full git-backup context
      and indices.
      
      This commit only switches from using `git fetch` to its plumbing
      counterpart `git fetch-pack` + friends + manually creating fetched refs
      the way `git fetch` used to do exactly. There should be neither
      functionality changed nor any speedup.
      
      Further commits will start to take advantage of the switch and optimize
      `git-backup pull`.
      899103bf
  3. 03 Nov, 2016 1 commit
    • Kirill Smelkov's avatar
      Don't be fooled by strings.Split(..., "\n") result always having empty "" last element · 3ba6cf73
      Kirill Smelkov authored
      By definition of strings.Split(..., sep) it "slices s into all substrings
      separated by sep and returns a slice of the substrings between those
      separators". That means that
      
          string.Split("hello\nworld\n", "\n") -> ["hello", "world", ""])     # NOTE the last ""
      
      when parsing file by lines, it is handy though to do not get last empty
      "" after last "\n". #6 shows how we missed to do that filtering-out for
      case of empty backup.refs file and errored-out because of that.
      
      To fix let's introduce a helper - splitlines(), which does the job of
      filtering-out last empty entry after last separator. By using this
      helper everywhere we can hopefully avoid problems while pulling only
      empty repositories (#6 case), and also similar ones.
      
      Fixes #6
      /reported-by @iv
      3ba6cf73
  4. 06 Jul, 2016 1 commit
    • Kirill Smelkov's avatar
      Rewrite in Go · 28986e0e
      Kirill Smelkov authored
      This is more-or-less 1-to-1 port of git-backup to Go. There are things
      we handle a bit differently:
      
      - there is a separate type for Sha1
      - conversion of repo paths to git references is now more robust wrt
        avoiding not-allowed in git constructs like ".." or ".lock"
      
        https://git.kernel.org/cgit/git/git.git/tree/refs.c?h=v2.9.0-37-g6d523a3#n34
      
      The rewrite happened because we need to optimize restore, and for e.g.
      parallelizing part it should be convenient to use goroutines and channels.
      
      I'm not very comfortable with how error handling is done, because
      contrary to what canonical Go way seems to be, in a lot of places it still
      looks to me exceptions are better idea compared to just error codes,
      though in many places just error codes are better and makes more sense.
      Probably there will be less exceptions over time once the code starts to
      be collaborating set of goroutines with communications done via
      channels.
      
      Still a lot of python habits on my side.
      
      And as a bonus we now have end-to-end pull/restore tests...
      28986e0e