• Kirill Smelkov's avatar
    interface: Require invalidations to be called with full set of objects and not to skip transactions · c1e08052
    Kirill Smelkov authored
    Currently invalidate documentation is not clear whether it should be
    called for every transaction and whether it should include full set of
    objects created/modified by that transaction. Until now this was working
    relatively well for the sole purpose of invalidating client ZEO cache,
    because for that particular task it is relatively OK not to include just
    created objects into invalidation messages, and even to completely skip
    sending invalidation if transaction only create - not modify - objects.
    Due to this fact the workings of the client cache was indifferent to the
    ambiguity of the interface.
    
    In 2016 skipping transactions with only created objects was reconsidered
    as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
    semantic and needs to be notified about every transaction committed to
    storage to be able to properly update ZODB.Connection view:
    
    https://github.com/zopefoundation/ZEO/commit/02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
    https://github.com/zopefoundation/ZEO/commit/9613f09b
    
    However just-created objects were not included into invalidation
    messages until, hopefully, recently:
    
    https://github.com/zopefoundation/ZEO/pull/160
    
    As ZODB is started to be used more widely in areas where it was not
    traditionally used before, the ambiguity in invalidate interface and the
    lack of guarantees - for any storage - to be notified with full set of
    information, creates at least the following problems:
    
    - a ZODB client (not necessarily native ZODB/py client) can maintain
      raw cache for the storage. If such client tries to load an oid at
      database view when that object did not existed yet, gets "no object"
      reply and stores that information into raw cache, to properly invalidate
      the cache it needs an invalidation message from ZODB server that
      *includes* created object.
    
    - tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
      if not all objects modified/created by a transaction are included into
      invalidation messages.
    
    - similarly to `zodb watch`, a monitoring tool, that would want to be
      notified of all created/modified objects, won't see full
      database-change picture, and so won't work properly without knowing
      which objects were created.
    
    - wendelin.core 2 - which builds data from ZODB BTrees and data objects
      into virtual filesystem - needs to get invalidation messages with both
      modified and created objects to properly implement its own lazy
      invalidation and isolation protocol for file blocks in OS cache: when
      a block of file is accessed, all clients, that have this block mmaped,
      need to be notified and asked to remmap that block into particular
      revision of the file depending on a client's view of the filesystem and
      database [4,5].
    
      To compute to where a client needs to remmap the block, WCFS server
      (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
      to see whether client's view of the filesystem is before object creation
      (and then ask that client to pin that block to hole), or after creation
      (and then ask the client to pin that block to corresponding revision).
    
      This computation needs ZODB server to send invalidation messages in
      full: with both modified and just created objects.
    
    Also:
    
    - the property that all objects - both modified and just created -
      are included into invalidation messages is required and can help to
      remove `next_serial` from `loadBefore` return in the future.
      This, in turn, can help to do 2x less SQL queries in loadBefore for
      NEO and RelStorage (and maybe other storages too):
      https://github.com/zopefoundation/ZODB/issues/318#issuecomment-657685745
    
    Current state of storages with respect to new requirements:
    
    - ZEO: does not skip transactions, but includes only modified - not
      created - objects. This is fixed by https://github.com/zopefoundation/ZEO/pull/160
    
    - NEO: already implements the requirements in full
    
    - RelStorage: already implements the requirements in full, if I
      understand correctly:
    
      https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145
    
    While editing invalidate documentation, use the occasion to document
    recently added property that invalidate(tid) is always called before
    storage starts to report its lastTransaction() ≥ tid - see 4a6b0283
    (mvccadapter: check if the last TID changed without invalidation).
    
    /cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083
    /reviewed-on https://github.com/zopefoundation/ZODB/pull/319
    /reviewed-by @dataflake
    /reviewed-by @jmuchemb
    
    [1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
    [2] neo@e0d59f5d
    [3] neo@c41c2907
    
    [4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
    [5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
    c1e08052
interfaces.py 53.6 KB