Commit 10478e98 authored by Julien Muchembled's avatar Julien Muchembled

fixup! download: add support for slapos.libnetworkcache

See commit 47ab68e0.
parent 30fb1cff
Pipeline #33753 failed with stage
in 0 seconds
......@@ -523,9 +523,15 @@ class Buildout(DictMixin):
except ImportError:
pass
except Exception:
self._logger.exception("There was problem while trying to"
" import slapos.libnetworkcache."
" Networkcache forced to be disabled.")
self._logger.exception(
"Failed to setup Networkcache. Continue without.")
else:
networkcache_client._buildout_download_filter = match_list = []
for pattern in networkcache_section.get('file-urlmd5-exclude',
'').split():
include = pattern.startswith('!')
match_list.append((not include,
re.compile(pattern[include:]).match))
def _buildout_path(self, name):
if '${' in name:
......
......@@ -270,23 +270,25 @@ class Download(object):
url_host, url_port = parsed[-2:]
return '%s:%s' % (url_host, url_port)
def urlretrieve(self, url, tmp_path):
def _auth(self, url):
parsed_url = urlparse(url)
req = url
while parsed_url.scheme in ('http', 'https'): # not a loop
if parsed_url.scheme in ('http', 'https'):
auth_host = parsed_url.netloc.rsplit('@', 1)
if len(auth_host) > 1:
auth = auth_host[0]
url = parsed_url._replace(netloc=auth_host[1]).geturl()
else:
auth = netrc.authenticators(parsed_url.hostname)
if not auth:
break
auth = '{0}:{2}'.format(*auth)
req = Request(url)
return (auth_host[0],
parsed_url._replace(netloc=auth_host[1]).geturl())
auth = netrc.authenticators(parsed_url.hostname)
if auth:
return '{0}:{2}'.format(*auth), url
def urlretrieve(self, url, tmp_path):
auth = self._auth(url)
if auth:
req = Request(auth[1])
req.add_header("Authorization",
"Basic " + bytes2str(b64encode(str2bytes(auth))))
break
"Basic " + bytes2str(b64encode(str2bytes(auth[0]))))
else:
req = url
with closing(urlopen(req)) as src:
with open(tmp_path, 'wb') as dst:
shutil.copyfileobj(src, dst)
......@@ -297,8 +299,15 @@ class Download(Download):
def _download(self, url, path, md5sum=None, alternate_url=None):
from .buildout import networkcache_client as nc
key = 'file-urlmd5:' + md5(url.encode()).hexdigest()
if nc and nc.tryDownload(key):
while nc: # not a loop
if (self._auth(url) # do not cache restricted data
or next((x for x, match in nc._buildout_download_filter
if match(url)), False)):
nc = None
break
key = 'file-urlmd5:' + md5(url.encode()).hexdigest()
if not nc.tryDownload(key):
break
with nc:
entry = next(nc.select(key, {'url': url}), None)
if entry is None:
......@@ -311,6 +320,7 @@ class Download(Download):
return
err = 'MD5 checksum mismatch'
self.logger.info('Cannot download from network cache: %s', err)
break
super(Download, self)._download(url, path, md5sum, alternate_url)
if nc and nc.tryUpload(key):
with nc, open(path, 'rb') as f:
......
  • This adds 2 things:

    • Do not cache restricted data. I think it's quite obvious but ask me otherwise.
    • New ${networkcache-section:file-urlmd5-exclude} option to exclude URL. This is required because caching URLs whose contents change (e.g. https://lab.nexedi.com/nexedi/slapos/raw/master/software/fluentd/instance.cfg) does not work (the server returns a random version which is rarely the correct and the client uploads all the time). Given that gitlab urls don't distinguish branches from tags or commits, we'd need something like:
    file-urlmd5-exclude =
      !https://lab\.nexedi\.com/[^/]+/[^/]+/raw/[0-9a-f]{40}/
      !https://lab\.nexedi\.com/[^/]+/[^/]+/raw/[^/]*\d+\.\d+\.\d+
      https://lab\.nexedi\.com/

    Waiting for comments before testing.

    /cc @xavier_thompson @jerome @tomo

  • I'd really prefer to have the same syntax and same vocabulary that we use for binary network cache. See https://lab.nexedi.com/nexedi/slapos.core/-/blob/acff45335692847d3aa5e87017b6325abab75699/slapos.cfg.example#L162

    So it would be file-urlmd5-blacklist and the content would be exactly the same as I pointed above.

  • I don't know this well, so I don't have much to say except that it's not good to use words like "blacklist" ( https://en.wikipedia.org/wiki/Blacklist_(computing)#Controversy_over_use_of_the_term ), we should really consider changing slapos.cfg instead, or at least not use more this kind of terminology.

    The syntax is ! for include and without to exclude ? we could use two options maybe ( maybe the problem is in which order to apply them ? )

    newer gitlab use a - in URL https://lab.nexedi.com/nexedi/slapos/-/raw/master/software/fluentd/instance.cfg once we make the real regex we should allow these.

    Is https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API more suitable than regular expression here ? I think we don't like the idea of adding dependencies in buildout so it might be a bad idea. EDIT: this seems different kind of thing, it's not just a way to describe URL patterns with a string.

    Another thing I am wondering, isn't it possible/better to refuse the URLs on server side ?

    Edited by Jérome Perrin
  • it's not good to use words like "blacklist" ( https://en.wikipedia.org/wiki/Blacklist_(computing)#Controversy_over_use_of_the_term )

    I'd use 'blacklist' just because I find this controversy stupid.

    I chose something that was a little closer to Git ($GIT_DIR/info/exclude, but could not do exactly the same because too limited) and I also had the feeling it was less strange to have negative patterns by naming like this.

    The syntax is ! for include and without to exclude ?

    So yes, like gitignore. I think everyone knows this way of filtering so it should be fine. And well, this option will be so rarely touched.

    we could use two options maybe ( maybe the problem is in which order to apply them ? )

    Indeed the order matters and with 2 options you lose a lot.

    Another thing I am wondering, isn't it possible/better to refuse the URLs on server side ?

    Unfortunately no, this is too late. First because one of the purpose is to avoid useless shadir lookups for things that should not be cached. And also the way some blob is indexed is exclusively client's business.

    I'd really prefer to have the same syntax and same vocabulary that we use for binary network cache.

    You don't take into account what I tried to solve by doing differently. Do you discuss the goal ?

    Edited by Julien Muchembled
  • I'm waiting on this to cherry-pick/fixup it on !30 (merged). Is this ready to push on master?

  • mentioned in merge request !30 (merged)

    Toggle commit list
  • Pushed to master. You can include it in !30 (merged).

  • Unfortunately, @jp is against the principle of excluding some stuff from shacache. The philosophy is "everything buildout is using should go into shacache". The files with URLs pointing to a branch will of course be changing overtime but with the new filter mechanism of slapos.libnetworkcache, we can make buildout select the most recent version of the files. @jp is aware that the state of the files in shacache for master branch will be incoherent (but in a sense this is already the same problem when using directly gitlab and only a local git repository or a tag is coherent).

    So I will partially revert this commit (only the part for exclude) so that Xavier can still rebase with the part for not caching authenticated material.

    Sorry for not warning beforehand. I just had the discussion yesterday afternoon.

    After slapos.libnetworkcache!9 (merged) is done, we can discuss the changes needed in buildout for using those cached branches (maybe add an option --accept-shacache-incoherent-buildout-files).

  • mentioned in commit dbdea2a0

    Toggle commit list
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment