An error occurred fetching the project authors.
  1. 10 Jul, 2019 1 commit
  2. 11 Jan, 2019 1 commit
    • Kirill Smelkov's avatar
      bigfile/py: Properly untrack PyVMA from GC before dealloc · d97641d2
      Kirill Smelkov authored
      On a testing instance we started to see segfaults in pyvma_dealloc()
      with inside calls to vma_unmap but with NULL pyvma->fileh. That was
      strange, becuse before calling vma_unmap(), the code explicitly checks
      whether pyvma->fileh is !NULL.
      
      That was, as it turned out, due to pyvma_dealloc being called twice at the
      same time from two python threads. Here is how that was possible:
      
      T1 decrefs pyvma and finds its reference count drops to zero. It calls
      pyvma_dealloc. From there vma_unmap() is called, which calls virt_lock()
      and that releases GIL first. Another thread T2 was waiting for GIL, it
      acquires it, does some work at python level and somehow triggers GC.
      Since PyVMA supports cyclic GC, it was on GC list and thus GC calls
      dealloc for the same vma again. Here is how it looks in the backtraces:
      
      T1:
      
      	#0  0x00007f6aefc57827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x1e011d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
      	#1  do_futex_wait (sem=sem@entry=0x1e011d0, abstime=0x0) at sem_waitcommon.c:111
      	#2  0x00007f6aefc578d4 in __new_sem_wait_slow (sem=0x1e011d0, abstime=0x0) at sem_waitcommon.c:181
      	#3  0x00007f6aefc5797a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
      	#4  0x00000000004ffbc4 in PyThread_acquire_lock ()
      	#5  0x00000000004dbe8a in PyEval_RestoreThread ()
      	#6  0x00007f6ac6d3b8fc in py_gil_retake_if_waslocked (arg=0x4f18f00) at bigfile/_bigfile.c:1048
      	#7  0x00007f6ac6d3dcfc in virt_gil_retake_if_waslocked (gilstate=0x4f18f00) at bigfile/virtmem.c:78
      	#8  0x00007f6ac6d3dd30 in virt_lock () at bigfile/virtmem.c:92
      	#9  0x00007f6ac6d3e724 in vma_unmap (vma=0x7f6a7e0c4100) at bigfile/virtmem.c:271
      	#10 0x00007f6ac6d3a0bc in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:284
      	...
      	#13 0x00000000004d76b0 in PyEval_EvalFrameEx ()
      
      T2:
      
      	#5  0x00007f6ac6d3a081 in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:276
      	#6  0x0000000000500450 in ?? ()
      	#7  0x00000000004ffd82 in _PyObject_GC_New ()
      	#8  0x0000000000485392 in PyList_New ()
      	#9  0x00000000004d3bff in PyEval_EvalFrameEx ()
      
      T2 does the work of vma_unmap and clears C-level vma. Then, when T1 wakes up and
      returns to vma_unmap, it sees vma->file and all other fields cleared -> oops
      segfault.
      
      Fix it by removing pyvma from GC list before going to do actual destruction.
      This way if a concurrent GC triggers, it won't see the vma object on its list,
      and thus won't have a chance to invoke its destructor the second time.
      
      The bug was introduced in 450ad804 (bigarray: ArrayRef support for BigArray)
      when PyVMA was changed to be cyclic-GC aware. However at that time, even Python
      documentation itself was not saying PyObject_GC_UnTrack is needed, as it was
      added only in 2.7.15 after finding that many types in CPython itself are
      vulnerable to similar segfaults:
      
      https://github.com/python/cpython/commit/4cde4bdcc86
      https://bugs.python.org/issue31095
      
      It is pity, that CPython took the approach to force all type authors to
      care to invoke PyObject_GC_UnTrack explicitly, instead of doing that
      automatically in Python runtime before calling tp_dealloc.
      
      /cc @Tyagov, @klaus
      /reviewed-on !11
      d97641d2
  3. 24 Oct, 2017 1 commit
    • Kirill Smelkov's avatar
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · f11386a4
      Kirill Smelkov authored
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.
      
      Nexedi stack is licensed under Free Software licenses with various exceptions
      that cover three business cases:
      
      - Free Software
      - Proprietary Software
      - Rebranding
      
      As long as one intends to develop Free Software based on Nexedi stack, no
      license cost is involved. Developing proprietary software based on Nexedi stack
      may require a proprietary exception license. Rebranding Nexedi stack is
      prohibited unless rebranding license is acquired.
      
      Through this licensing approach, Nexedi expects to encourage Free Software
      development without restrictions and at the same time create a framework for
      proprietary software to contribute to the long term sustainability of the
      Nexedi stack.
      
      Please see https://www.nexedi.com/licensing for details, rationale and options.
      f11386a4
  4. 21 Aug, 2017 1 commit
    • Kirill Smelkov's avatar
      bigfile/py: Don't forget to clear exception state after retrieving pybuf referrers · 4228d8b6
      Kirill Smelkov authored
      A buffer object (pybuf) is passed by C-level loadblk to python loadblk
      implementation. Since pybuf points to memory that will go away after
      loadblk call returns to virtmem, PyBigFile tries hard to make sure
      nothing stays referencing pybuf so it can be released.
      
      It tries to:
      
      1. automatically GC cycles referencing pybuf (9aa6a5d7 "bigfile/py: Teach
         loadblk() to automatically break reference cycles to pybuf")
      2. replace pybuf with stub object if a calling frame referencing it still
         stays alive (61b18a40 "bigfile/py/loadblk: Replace pybuf with a stub
         object in calling frame in case it stays alive")
      3. and as a last resort unpins pybuf from original buffer memeory to
         point it to NULL (024c246c "bigfile/py/loadblk: Resort to pybuf
         unpinning, if nothing helps")
      
      Step #1 invokes GC. Step #2 calls gc.get_referrers(pybuf) and looks for
      frames in there.
      
      The gc.get_referrers() call happens at python level with allocating some
      objects, e.g. tuple to pass arguments, resulting list etc. And we all
      know that any object allocation might cause automatic garbage
      collection, and GC'ing can in turn ran arbitrary code due to __del__ in
      release objects and weakrefs callbacks.
      
      At a first glance the scenario that GC will be triggered at step #2
      looks unrealistic because the GC was just run at step #1 and it is only
      a few objects being allocated for the call at step #2. However if
      arbitrary code runs from under GC it can create new garbage and thus
      upon returning from gc.collect() the garbage list is not empty as the
      following program demonstrates:
      
          ---- 8< ----
          import gc
      
          # just an object we can set attributes on
          class X:
              pass
      
          # call f on __del__
          class DelCall:
              def __init__(self, f):
                  self.f = f
      
              def __del__(self):
                  self.f()
      
          # _mkgarbage creates n objects of garbage kept referenced from an object cycle
          # so that only cyclic GC can free them.
          def _mkgarbage(n):
              # cycle
              a, b = X(), X()
              a.b, b.a = b, a
      
              # cycle references [n] garbage
              a.objv = [X() for _ in range(n)]
              return a
      
          # mkgarbage creates cycled garbage and arranges for twice more garbage to be
          # created when original garbage is collected
          def mkgarbage(n):
              a = _mkgarbage(n)
              a.ondel = DelCall(lambda : _mkgarbage(2*n))
      
          def main():
              for i in xrange(10):
                  mkgarbage(1000)
                  print '> %s' % (gc.get_count(),)
                  n = gc.collect()
                  print '< %s' % (gc.get_count(),)
      
          main()
          ---- 8< ----
      
          kirr@deco:~/tmp/trashme/t$ ./gcmoregarbage.py
          > (482, 11, 0)
          < (1581, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
          > (531, 3, 0)
          < (2070, 0, 0)
      
      here lines starting with "<" show amount of live garbage objects after
      gc.collect() call has been finished.
      
      This way on a busy server there could be arrangements when GC is
      explicitly ran at step #1 and then automatically run at step #2 (because of
      gc.get_referrers() python-level call) and from under GC #2 arbitrary code runs
      thus potentially mutating exception state which shows in logs as
      
          bigfile/_bigfile.c:685: pybigfile_loadblk: Assertion `!(ts->exc_type || ts->exc_value || ts->exc_traceback)' failed.
      
      ----
      
      So don't assume we end with clean exception state after collecting pybuf
      referrers and just clear exception state once again as we do after explicit GC.
      Don't make a similar assumption for buffer unpinning as an object is
      decrefed there and in theory this can run some code.
      
      A test is added to automatically exercise exception state clearing for
      get_referrers code path via approach similar to demonstrated in above -
      - we generate more garbage from under garbage and also arrange for
      finalizers, which mutate exceptions state, to be run at GC #2.
      
      The test without the fix applied fails like this:
      
          bigfile/_bigfile.c:710 pybigfile_loadblk WARN: python thread-state found with handled but not cleared exception state
          bigfile/_bigfile.c:711 pybigfile_loadblk WARN: I will dump it and then crash
          ts->exc_type:   None
          ts->exc_value:  <nil>
          ts->exc_traceback:      <nil>
          Segmentation fault (core dumped)
      
      The None in ts->exc_type and nil value and traceback are probably coming from
      here in cpython runtime:
      
          https://github.com/python/cpython/blob/883520a8/Python/ceval.c#L3717
      
      Since this took some time to find, more diagnostics is also added before
      BUG_ONs corresponding to finding unclean exception state.
      4228d8b6
  5. 06 Jul, 2017 1 commit
    • Kirill Smelkov's avatar
      bigfile/virtmem: Don't forget to release fileh->writeout_inprogress on storeblk error · 87bf4908
      Kirill Smelkov authored
      Commit fb4bfb32 (bigfile/virtmem: Do storeblk() with virtmem lock
      released) added bug-protection to fileh_dirty_writeout() so that it could
      not be called twice at the same time or in parallel with other functions
      which modify pages.
      
      However it missed the code path when storeblk() call returned with error
      and whole writeout was thus erroring out, but with fileh->writeout_inprogress
      still left set to 1 incorrectly.
      
      This was leading to things like
      
          bigfile/virtmem.c:419: fileh_dirty_discard: Assertion `!(fileh->writeout_inprogress)' failed.
      
      and crashes.
      
      Fix it.
      87bf4908
  6. 16 Jan, 2017 3 commits
    • Kirill Smelkov's avatar
      bigfile/py/loadblk: Resort to pybuf unpinninf, if nothing helps · 024c246c
      Kirill Smelkov authored
      There are situations possible when both exc_traceback and frame objects are
      garbage-collected, but frame's f_locals remains not collected because e.g. it
      was explicitly added to somewhere. We cannot detect such cases (dicts are not
      listed in referrers).
      
      So if nothing helped, as a last resort, unpin pybuf from its original
      memory and make it point to zero-sized NULL.
      
      In general this is not strictly correct to do as other buffers &
      memoryview objects created from pybuf, copy its pointer on
      initialization and thus pybuf unpinning won't adjust them.
      
      However we require BigFile implementations to make sure not to use
      such-created objects, if any, after return from loadblk().
      
      Finally fixes #7
      024c246c
    • Kirill Smelkov's avatar
      bigfile/py/loadblk: Replace pybuf with a stub object in calling frame in case it stays alive · 61b18a40
      Kirill Smelkov authored
      It turns out some code wants to store tracebacks e.g. for further
      logging/whatever. This way GC won't help to free up references to pybuf.
      However if pybuf remain referenced only from calling frames, we can
      change there reference to pybuf to a stub object "<pybuf>" and this way
      remove the reference.
      
      With added test but without loadblk changes the failure would be as:
      
          pybigfile_loadblk WARN: pybuf->ob_refcnt != 1 even after GC:
          pybuf (ob_refcnt=2):    <read-write buffer ptr 0x7fae4911f000, size 2097152 at 0x7fae4998cef0>
          pybuf referrers:        [<frame object at 0x556daff41aa0>]		<-- NOTE
          bigfile/_bigfile.c:613 pybigfile_loadblk        BUG!
      61b18a40
    • Kirill Smelkov's avatar
      bigfile/py/test_basic: Rework exception testing codepath so it is active on py3 also · f01b27d2
      Kirill Smelkov authored
      As comments being removed states "on python3 exception state is cleared
      upon exiting from `except`" - so let's move exc_* fetching under except
      clause - this way we'll get correct exception objects on both py2 and py3.
      f01b27d2
  7. 11 Jan, 2017 1 commit
    • Kirill Smelkov's avatar
      bigfile/py: Teach loadblk() to automatically break reference cycles to pybuf · 9aa6a5d7
      Kirill Smelkov authored
      Because otherwise we bug on pybuf->ob_refcnt != 1.
      
      Such cycles might happen if inside loadblk implementation an exception
      is internally raised and then caught even in deeply internal function
      which does not receive pybuf as argument or by some other way:
      
      After
      
      	_, _, exc_traceback = sys.exc_info()
      
      there is a reference loop created:
      
      	exc_traceback
      	  |        ^
      	  |        |
      	  v     .f_localsplus
      	 frame
      
      and since exc_traceback object holds reference to deepest frame, which via f_back
      will be holding reference to frames up to frame with pybuf argument, it
      will result in additional reference to pybuf being held until the above
      cycle is garbage collected.
      
      So to solve the problem while leaving loadblk, if pybuf->ob_refcnt !=
      let's first do garbage-collection, and only then recheck left
      references. After GC reference-loops created by exceptions should go
      away.
      
      NOTE PyGC_Collect() (C way to call gc.collect()) always performs
          GC - it is not affected by gc.disable() which disables only
          _automatic_ garbage collection.
      
      NOTE it turned out out storeblk logic to unpin pybuf (see
          6da5172e "bigfile/py: Teach storeblk() how to correctly propagate
          traceback on error") is flawed, because when e.g. creating memoryview
          from pybuf internal pointer is copied and then clearing original buf
          does not result in clearing the copy.
      
      NOTE it is ok to do gc.collect() from under sighandler - at least we are
          already doing it for a long time via running non-trivial python code
          which for sure triggers automatic GC from time to time (see also
          786d418d "bigfile: Simple test that we can handle GC from-under
          sighandler" for the reference)
      
      Fixes: #7
      9aa6a5d7
  8. 06 Aug, 2015 1 commit
  9. 03 Apr, 2015 1 commit