1. 30 Nov, 2020 1 commit
    • Kirill Smelkov's avatar
      [ZODB4] Backport the way MVCC is handled from ZODB5 · 8e7eab33
      Kirill Smelkov authored
      This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
      always calling storage.loadBefore() instead of "load for latest version
      + loadBefore if we were notified of database changes" approach.
      
      Why?
      ----
      
      Short answer: because Wendelin.core 2 needs to know at which particular
      database state application-level ZODB connection is viewing the
      database, and it is hard to implement such functionality correctly
      without this backport. Please see appendix for the explanation.
      
      What
      ----
      
      This backports to ZODB4 the minimum necessary part of upstream commit 227953b9
      (Simplify MVCC by determining transaction start time using lastTransaction) +
      follow-up correctness fixes:
      
      https://github.com/zopefoundation/ZODB/issues/50
      https://github.com/zopefoundation/ZODB/pull/56
      https://github.com/zopefoundation/ZODB/pull/291
      https://github.com/zopefoundation/ZODB/pull/307
      
      In short:
      
      - a Connection is always opened with explicitly corresponding to a particular database revision
      - Connection uses only loadBefore with that revision to load objects
      - every time a Connection is (re)opened, the result of queued invalidations and
        explicit query to storage.lastTransaction is carefully merged to refresh
        Connection's idea about which database state it corresponds to.
      
      The "careful" in last point is important. Historically ZODB5 was first reworked
      in commit 227953b9 (https://github.com/zopefoundation/ZODB/pull/56) to always
      call lastTransaction to refresh state of Connection view. Since there
      was no proper synchronisation with respect to process of handling
      invalidations, that lead to data corruption issue due to race in
      Connection.open() vs invalidations:
      
      https://github.com/zopefoundation/ZODB/issues/290
      
      That race and data corruption was fixed in commit b5895a5c
      (https://github.com/zopefoundation/ZODB/pull/291) by way of avoiding
      lastTransaction call and relying only on invalidations channel when
      refreshing Connection view.
      
      This fix in turn led to another data corruption issue because in
      presence of client-server reconnections, ZODB storage drivers can partly
      skip notifying client with detailed invalidation messages:
      
      https://github.com/zopefoundation/ZODB/pull/291#issuecomment-581047061
      
      A fix to that issue (https://github.com/zopefoundation/ZODB/pull/307)
      proposed to change back to query storage for lastTransaction on every
      Connection refresh, but to implement careful merging of lastTransaction
      result and data from invalidation channel. However it was found that the
      "careful merging" can work correctly only if we require from storage
      drivers a particular ordering of invalidation events wrt lastTransaction
      return and result:
      
      https://github.com/zopefoundation/ZODB/pull/307#discussion_r434145034
      
      While ZEO was already complying with that requirements, NEO had to be
      fixed to support that:
      
      https://github.com/zopefoundation/ZODB/pull/307#discussion_r434166238
      nexedi/neoppod@a7d101ec
      nexedi/neoppod@96a5c01f
      
      Patch details
      -------------
      
      We change Connection._txn_time to be a "before" for the database state
      to which Connection view corresponds. This state is hooked to be
      initialized and updated in Connection._flush_invalidations - the
      function that is called from both explicit Connection (re)open and at
      transaction boundaries via Connection.afterCompletion hook.
      
      Objects loading is factored into Connection._load which replaces old
      "load + check invalidated + fallback to loadBefore" game in
      Connection._setstate.
      
      Connection.open now calls Connection._flush_invalidations
      unconditionally - even if it was global cache reset event - because
      besides invalidation flushes the latter is now responsible for querying
      storage lastTransaction.
      
      TmpStore - a "storage" that provides runtime support for savepoints - is
      refactored correspondingly to delegate loading of original objects back
      to underlying Connection.
      
      DB.close is modified - similarly to ZODB5 - to release DB's Connections
      carefully with preventing connections from DB poll from implicitly
      starting new transactions via afterCompletion hook.
      
      ZODB.nxd_patches is introduced to indicate to client software that this
      particular patch is present and can be relied upon.
      
      Tests are updated correspondingly. In 227953b9 Jim talks about
      converting many tests - because
      
      	"Lots of tests didn't clean up databases and connections properly"
      
      and because new MVCC approach
      
      	"makes database and connection hygiene a bit more important,
      	especially for tests, because a connection will continue to interact
      	with storages if it isn't properly closed, which can lead to errors if
      	the storage is closed."
      
      but finally implementing automatic cleanup at transaction boundaries
      because there are too many tests to fix. We backport only automatic
      cleanup + necessary explicit test fixes to keep the diff minimal.
      
      All tests pass. This includes tests for ZODB itself, ZEO and NEO test
      over hereby modified ZODB(*), my test programs from
      
      https://github.com/zopefoundation/ZODB/issues/290	and
      https://github.com/zopefoundation/ZEO/issues/155
      
      and ERP5 tests. Upcoming wendelin.core 2 also work with this change.
      
      (*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
      here because ZEO, NEO and ERP5 tests are failing regularly, and in the
      same way, even with unmodified ZODB.
      
      Appendix. zconn_at
      ------------------
      
      This appendix provides motivation for the backport:
      
      For wendelin.core v2 we need a way to know at which particular database
      state application-level ZODB connection is viewing the database. Knowing
      that state, WCFS client library interacts with WCFS filesystem server
      and, in simple terms, requests the server to provide data as of that
      particular database state. Let us call the function that for a client
      ZODB connection returns database state corresponding to its database
      view zconn_at.
      
      Now here is the problem: zconn_at is relatively easy to implement for
      ZODB5 - see e.g. here:
      
      https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
      nexedi/wendelin.core@3bd82127
      
      however, for ZODB4, since its operational models is not
      directly MVCC, it is not that straightforward. Still, even for older
      ZODB4, for every client connection, there _is_ such at that corresponds
      to that connection view of the database.
      
      We need ZODB4 support, because ZODB4 is currently the version that
      Nexedi uses, and my understanding is that it will stay like this for not
      a small time. I have the feeling that ZODB5 was reworked in better
      direction, but without caring enough about quality which resulted in
      concurrency bugs with data corruption effects like
      
      https://github.com/zopefoundation/ZODB/issues/290
      https://github.com/zopefoundation/ZEO/issues/155
      etc.
      
      Even though the first one is now fixed (but it broke other parts and so
      both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
      fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
      require some non-negligible amount of QA work, and thus it is better if
      we move step-by-step - even if we eventually upgrade to ZODB5 - it is
      better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
      current version of ZODB.
      
      Now please note what would happen if zconn_at gives, even a bit, wrong
      answer: wcfs client will ask wcfs server to provide array data as of
      different database state compared to current on-client ZODB connection.
      This will result in that data accessed via ZBigArray will _not_
      correspond to all other data accessed via regular ZODB mechanism.
      It is, in other words, a data corruptions.
      In some scenarios it can be harmless, but generally it is the worst
      that can happen to a database.
      
      It is good to keep in mind ZODB issue290 when imagining corner cases
      that zconn_at has to deal with. Even though that issue is ZODB5 only, it
      shows what kind of bugs it can be in zconn_at implementation for ZODB4.
      
      Just for the reference: in Wendelin-based systems there is usually constant
      stream of database ingestions coming from many places simultaneously. Plus many
      activities crunching on the DB at the same time as well. And the more clients a
      system handles, the more there will be level-of-concurrency increase. This
      means that the problem of correctly handling concurrency issues in zconn_at is
      not purely theoretical, but has direct relation to our systems.
      
      --------
      
      With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:
      
      https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195
      
      I would like to thank Joshua Wölfel whose internship helped this topic
      to shape up:
      
      https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz
      
      /cc @nexedi, @jwolf083Signed-off-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
      8e7eab33
  2. 20 Nov, 2020 2 commits
  3. 31 Jul, 2020 1 commit
  4. 31 Mar, 2020 1 commit
    • Kirill Smelkov's avatar
      FileStorage: Save committed transaction to disk even if changed data is empty · fdf9e7a2
      Kirill Smelkov authored
      [ This is ZODB4 backport of commit bb9bf539
        (https://github.com/zopefoundation/ZODB/pull/298) ]
      
      ZODB tries to avoid saving empty transactions to storage on
      `transaction.commit()`. The way it works is: if no objects were changed
      during ongoing transaction, ZODB.Connection does not join current
      TransactionManager, and transaction.commit() performs two-phase commit
      protocol only on joined DataManagers. In other words if no objects were
      changed, no tpc_*() methods are called at all on ZODB.Connection at
      transaction.commit() time.
      
      This way application servers like Zope/ZServer/ERP5/... can have
      something as
      
          try:
              # process incoming request
              transaction.commit()    # processed ok
          except:
              transaction.abort()
              # problem: log + reraise
      
      in top-level code to process requests without creating many on-disk
      transactions with empty data changes just because read-only requests
      were served.
      
      Everything is working as intended.
      
      However at storage level, FileStorage currently also checks whether
      transaction that is being committed also comes with empty data changes,
      and _skips_ saving transaction into disk *at all* for such cases, even
      if it has been explicitly told to commit the transaction via two-phase
      commit protocol calls done at storage level.
      
      This creates the situation, where contrary to promise in
      ZODB/interfaces.py(*), after successful tpc_begin/tpc_vote/tpc_finish()
      calls made at storage level, transaction is _not_ made permanent,
      despite tid of "committed" transaction being returned to caller. In other
      words FileStorage, when asked to commit a transaction, even if one with
      empty data changes, reports "ok" and gives transaction ID to the caller,
      without creating corresponding transaction record on disk.
      
      This behaviour is
      
      a) redundant to application-level avoidance to create empty transaction
         on storage described in the beginning, and
      
      b) creates problems:
      
      The first problem is that application that works at storage-level might
      be interested in persisting transaction, even with empty changes to
      data, just because it wants to save the metadata similarly to e.g.
      `git commit --allow-empty`.
      
      The other problem is that an application view and data in database
      become inconsistent: an application is told that a transaction was
      created with corresponding transaction ID, but if the storage is
      actually inspected, e.g. by iteration, the transaction is not there.
      This, in particular, can create problems if TID of committed transaction
      is reported elsewhere and that second database client does not find the
      transaction it was told should exist.
      
      I hit this particular problem with wendelin.core. In wendelin.core,
      there is custom virtual memory layer that keeps memory in sync with
      data in ZODB. At commit time, the memory is inspected for being dirtied,
      and if a page was changed, virtual memory layer joins current
      transaction _and_ forces corresponding ZODB.Connection - via which it
      will be saving data into ZODB objects - to join the transaction too,
      because it would be too late to join ZODB.Connection after 2PC process
      has begun(+). One of the format in which data are saved tries to
      optimize disk space usage, and it actually might happen, that even if
      data in RAM were dirtied, the data itself stayed the same and so nothing
      should be saved into ZODB. However ZODB.Connection is already joined
      into transaction and it is hard not to join it because joining a
      DataManager when the 2PC is already ongoing does not work.
      
      This used to work ok with wendelin.core 1, but with wendelin.core 2 -
      where separate virtual filesystem is also connected to the database to
      provide base layer for arrays mappings - this creates problem, because
      when wcfs (the filesystem) is told to synchronize to view the database
      @tid of committed transaction, it can wait forever waiting for that, or
      later, transaction to appear on disk in the database, creating
      application-level deadlock.
      
      I agree that some more effort might be made at wendelin.core side to
      avoid committing transactions with empty data at storage level.
      
      However the most clean way to fix this problem in my view is to fix
      FileStorage itself, because if at storage level it was asked to commit
      something, it should not silently skip doing so and dropping even non-empty
      metadata + returning ok and committed transaction ID to the caller.
      
      As described in the beginning this should not create problems for
      application-level ZODB users, while at storage-level the implementation
      is now consistently matching interface and common sense.
      
      ----
      
      (*) tpc_finish: Finish the transaction, making any transaction changes permanent.
          Changes must be made permanent at this point.
          ...
      
          https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L828-L831
      
      (+) https://lab.nexedi.com/kirr/wendelin.core/blob/9ff5ed32/bigfile/file_zodb.py#L788-822
      fdf9e7a2
  5. 30 Apr, 2018 1 commit
  6. 08 Apr, 2017 2 commits
  7. 03 Apr, 2017 1 commit
    • Kirill Smelkov's avatar
      FileStorage: Report problem on read-only open of non-existent file · 56c96a11
      Kirill Smelkov authored
      ... instead of silently creating empty database on such opens.
      
      Use-case for this are utilities like e.g. zodbdump and zodbcmp which
      expect such storage opens to fail so that the tool can know there is no
      such storage and report it to user.
      
      In contrast current state is: read-only opens get created-on-the-fly
      empty storage with no content, but which can be iterated over without
      getting any error.
      
      This way e.g. `zodbdump non-existent.fs` produces empty output _and_
      exit code 0 which is not what caller expects.
      
      (cherry picked from commit 30bbabf1)
      56c96a11
  8. 06 Feb, 2017 1 commit
  9. 02 Feb, 2017 5 commits
  10. 27 Nov, 2016 2 commits
  11. 25 Nov, 2016 1 commit
  12. 27 Sep, 2016 2 commits
  13. 12 Sep, 2016 2 commits
  14. 09 Sep, 2016 1 commit
  15. 21 Aug, 2016 2 commits
  16. 04 Aug, 2016 1 commit
  17. 27 Jul, 2016 2 commits
  18. 26 Jul, 2016 1 commit
  19. 13 Jul, 2016 1 commit
  20. 12 Jul, 2016 8 commits
    • Julien Muchembled's avatar
      fstail: print the txn offset and header size, instead of only the data offset · 3807ace8
      Julien Muchembled authored
      Before:
      
          2016-07-01 09:41:50.416574: hash=d7101c5ee7b8e412d7b6d54873204421e09b7f34
          user='' description='' length=1629 offset=58990284
      
      After:
      
          2016-07-01 09:41:50.416574: hash=d7101c5ee7b8e412d7b6d54873204421e09b7f34
          user='' description='' length=1629 offset=58990261 (+23)
      
      The structure of a FileStorage DB is such that it's easy to revert the last
      transactions, by truncating the file at the right offset. With the above
      change, `fstail` can now be used to get this offset.
      
      In the above example:
      
          truncate -s 58990261 Data.fs
      
      would delete the transaction and all those after.
      3807ace8
    • Jim Fulton's avatar
      Merge pull request #89 from zopefoundation/undo-refactor · 75bae1a6
      Jim Fulton authored
      Refactored FileStorage transactional undo
      75bae1a6
    • Jim Fulton's avatar
      removed out of date comment · b563487e
      Jim Fulton authored
      b563487e
    • Jim Fulton's avatar
      Merge pull request #86 from NextThought/handle-serials4 · e080bdcc
      Jim Fulton authored
      Fix handle_all_serials for the new and old protocols.
      e080bdcc
    • Jason Madden's avatar
      Update comment. [skip ci] · c13649da
      Jason Madden authored
      c13649da
    • Jim Fulton's avatar
      changes · d64a0cbf
      Jim Fulton authored
      d64a0cbf
    • Jim Fulton's avatar
      Refactored FileStorage transactional undo · d717a685
      Jim Fulton authored
      As part of a project to provide object-level commit locks for ZEO, I'm
      refactiring FileStorage to maintain transaction-specific data in
      Tranaction.data.  This involved undo.  In trying to figure this out, I
      found:
      
      - A bug in _undoDataInfo, which I verified with some tests and
      
      - _transactionalUndoRecord was maddeningly difficult to reason about
        (and thus change).
      
      I was concerned less by the bug than my inability to know whether a
      change to the code would be correct.
      
      So I refactored the code, mainly transactionalUndoRecord, to make the
      code easier to understand, fixing some logic errors (I'm pretty sure)
      along the way.  This included lots of comments. (Comments are much
      easier to compose when you're working out logic you didn't
      understand.)
      
      In addition to makeing the code cleaner, it allows undo to be handled
      in cases that weren't handled before.
      d717a685
    • Jim Fulton's avatar
      Long lines. Grrrr. · a6c1713d
      Jim Fulton authored
      a6c1713d
  21. 09 Jul, 2016 1 commit
  22. 08 Jul, 2016 1 commit