• Kirill Smelkov's avatar
    bigfile/zodb: Format #1 which is optimized for small changes · 13c0c17c
    Kirill Smelkov authored
    Our current approach is that each file block is represented by 1 zodb
    object, with block size being 2M. Even with trailing \0 trimming, which
    halves the overhead on average, DB size grows very fast if we do a lot
    of small appends or changes. So another format needs to be introduced
    which has lower overhead for storing small changes:
    
    In general, to represent BigFile as ZODB objects, each file block could
    be represented separately either as
    
        1) one ZODB object, or          (ZBlk0 - this what we have already)
        2) group of ZODB objects        (ZBlk1 - this is what we introduce)
    
    with top-level BTree directory #blk -> objects representing block.
    
    For "1" we have
    
        - low-overhead access time (only 1 object loaded from DB), but
        - high-overhead in terms of ZODB size (with FileStorage / ZEO, every change
          to a block causes it to be written into DB in full again)
    
    For "2" we have
    
        - low-overhead in terms of ZODB size (only part of a block is overwritten
          in DB on single change), but
        - high-overhead in terms of access time
          (several objects need to be loaded for 1 block)
    
    In general it is not possible to have low-overhead for both i) access-time, and
    ii) DB size, with approach where we do block objects representation /
    management on *client* side.
    
    On the other hand, if object management is moved to DB *server* side, it is
    possible to deduplicate them there and this way have low-overhead for both
    access-time and DB size with just client storing 1 object per file block. This
    will be our future approach after we teach NEO about object deduplication.
    
    ~~~~
    
    As shown above in the last paragraph it is not possible to perform
    optimally on client side. Thus ZBlk1 should be only an intermediate
    solution until we move data management to DB server side, with main
    criteria for ZBlk1 to keep it simple.
    
    In this patch a simple scheme is used, where every block is divided into
    chunks organized via BTree. When a block part changes, only corresponding
    chunk is updated. Chunk size is chosen to be 4K which creates ~ 512
    fanout for 2M block.
    
    DB size after tests is changed as follows:
    
            bigfile     bigarray
    
    ZBlk0     24K       6200K
    ZBlk1     36K         36K
    
    ( slight size increase for bigfile tests is because of btree structures
      overhead )
    
    Time to run tests stays approximately the same.
    
    /cc @Tyagov, @klaus
    13c0c17c
file_zodb.py 25.3 KB