Support streaming big file in DMS.
-
Owner
@romain, @vpelletier, we were talking with @rafael recently and I discovered it would be just handy if BigFile implements usual file-like interface, so that BigFile can be passed directly to functions which expects this (e.g.
scipy.io.wavfile.read()
and a lot of other use-cases).Somethink like external this is possible:
class BigFileReader: def __init__(self, bigfile): self.bigfile = bigfile self.pos = 0 def tell(self): return self.pos def seek(self, pos): # TODO whence # TODO check for out of range self.pos = pos def read(self, n): chunkv = [] for chunk in self.bigfile.iterate(self.offset, n): chunkv.append(chunk) data = ''.join(chunkv) self.pos += len(data) return data
but imho we better place it inside BigFile itself.
/cc @Tyagov, @donkey-hotei, @frequent
-
Owner
BTreeData originated as a spare-time project to benchmark NEO with weird workloads: I wanted to compile linux in a fuse mountpoint using NEO as mass storage (I never went further than extracting the tarball[1], as I failed at transaction boundaries, then lost interest in this idea).
So it has an API very close to what fuse expects, so it should be quite trivial to implement a file-ish API above it.
[1] I mean kernel source code tarball in the mountpoint.
-
Owner
Ah, and one comment about where to put the code: a file object is (as you proposed) some access to underlying data (self.bigfile) plus a current offset (self.pos). I think this current offset does not belong to persistent data, but is only accessible to whatever opened the file. So it would not be a persistent class, so it would not be part of BigFile class - but could be in the same file and returned by an open method on BigFile class (or maybe even integrated at the BTreeData level ?).
-
Owner
ok, I agree it should be file handle which has .read() etc and file handle should be created with BigFile.open() . From this point of view we add BigFile.open() returning BigFileHandle with code from e.g. my example.
-
Developer
I like if we can use file-ish API. +1