An error occurred fetching the project authors.
- 10 Jul, 2019 1 commit
-
-
Kirill Smelkov authored
-
- 11 Jan, 2019 1 commit
-
-
Kirill Smelkov authored
On a testing instance we started to see segfaults in pyvma_dealloc() with inside calls to vma_unmap but with NULL pyvma->fileh. That was strange, becuse before calling vma_unmap(), the code explicitly checks whether pyvma->fileh is !NULL. That was, as it turned out, due to pyvma_dealloc being called twice at the same time from two python threads. Here is how that was possible: T1 decrefs pyvma and finds its reference count drops to zero. It calls pyvma_dealloc. From there vma_unmap() is called, which calls virt_lock() and that releases GIL first. Another thread T2 was waiting for GIL, it acquires it, does some work at python level and somehow triggers GC. Since PyVMA supports cyclic GC, it was on GC list and thus GC calls dealloc for the same vma again. Here is how it looks in the backtraces: T1: #0 0x00007f6aefc57827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x1e011d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x1e011d0, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f6aefc578d4 in __new_sem_wait_slow (sem=0x1e011d0, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f6aefc5797a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29 #4 0x00000000004ffbc4 in PyThread_acquire_lock () #5 0x00000000004dbe8a in PyEval_RestoreThread () #6 0x00007f6ac6d3b8fc in py_gil_retake_if_waslocked (arg=0x4f18f00) at bigfile/_bigfile.c:1048 #7 0x00007f6ac6d3dcfc in virt_gil_retake_if_waslocked (gilstate=0x4f18f00) at bigfile/virtmem.c:78 #8 0x00007f6ac6d3dd30 in virt_lock () at bigfile/virtmem.c:92 #9 0x00007f6ac6d3e724 in vma_unmap (vma=0x7f6a7e0c4100) at bigfile/virtmem.c:271 #10 0x00007f6ac6d3a0bc in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:284 ... #13 0x00000000004d76b0 in PyEval_EvalFrameEx () T2: #5 0x00007f6ac6d3a081 in pyvma_dealloc (pyvma0=0x7f6a7e0c40e0) at bigfile/_bigfile.c:276 #6 0x0000000000500450 in ?? () #7 0x00000000004ffd82 in _PyObject_GC_New () #8 0x0000000000485392 in PyList_New () #9 0x00000000004d3bff in PyEval_EvalFrameEx () T2 does the work of vma_unmap and clears C-level vma. Then, when T1 wakes up and returns to vma_unmap, it sees vma->file and all other fields cleared -> oops segfault. Fix it by removing pyvma from GC list before going to do actual destruction. This way if a concurrent GC triggers, it won't see the vma object on its list, and thus won't have a chance to invoke its destructor the second time. The bug was introduced in 450ad804 (bigarray: ArrayRef support for BigArray) when PyVMA was changed to be cyclic-GC aware. However at that time, even Python documentation itself was not saying PyObject_GC_UnTrack is needed, as it was added only in 2.7.15 after finding that many types in CPython itself are vulnerable to similar segfaults: https://github.com/python/cpython/commit/4cde4bdcc86 https://bugs.python.org/issue31095 It is pity, that CPython took the approach to force all type authors to care to invoke PyObject_GC_UnTrack explicitly, instead of doing that automatically in Python runtime before calling tp_dealloc. /cc @Tyagov, @klaus /reviewed-on !11
-
- 24 Oct, 2017 1 commit
-
-
Kirill Smelkov authored
Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options. Nexedi stack is licensed under Free Software licenses with various exceptions that cover three business cases: - Free Software - Proprietary Software - Rebranding As long as one intends to develop Free Software based on Nexedi stack, no license cost is involved. Developing proprietary software based on Nexedi stack may require a proprietary exception license. Rebranding Nexedi stack is prohibited unless rebranding license is acquired. Through this licensing approach, Nexedi expects to encourage Free Software development without restrictions and at the same time create a framework for proprietary software to contribute to the long term sustainability of the Nexedi stack. Please see https://www.nexedi.com/licensing for details, rationale and options.
-
- 21 Aug, 2017 1 commit
-
-
Kirill Smelkov authored
A buffer object (pybuf) is passed by C-level loadblk to python loadblk implementation. Since pybuf points to memory that will go away after loadblk call returns to virtmem, PyBigFile tries hard to make sure nothing stays referencing pybuf so it can be released. It tries to: 1. automatically GC cycles referencing pybuf (9aa6a5d7 "bigfile/py: Teach loadblk() to automatically break reference cycles to pybuf") 2. replace pybuf with stub object if a calling frame referencing it still stays alive (61b18a40 "bigfile/py/loadblk: Replace pybuf with a stub object in calling frame in case it stays alive") 3. and as a last resort unpins pybuf from original buffer memeory to point it to NULL (024c246c "bigfile/py/loadblk: Resort to pybuf unpinning, if nothing helps") Step #1 invokes GC. Step #2 calls gc.get_referrers(pybuf) and looks for frames in there. The gc.get_referrers() call happens at python level with allocating some objects, e.g. tuple to pass arguments, resulting list etc. And we all know that any object allocation might cause automatic garbage collection, and GC'ing can in turn ran arbitrary code due to __del__ in release objects and weakrefs callbacks. At a first glance the scenario that GC will be triggered at step #2 looks unrealistic because the GC was just run at step #1 and it is only a few objects being allocated for the call at step #2. However if arbitrary code runs from under GC it can create new garbage and thus upon returning from gc.collect() the garbage list is not empty as the following program demonstrates: ---- 8< ---- import gc # just an object we can set attributes on class X: pass # call f on __del__ class DelCall: def __init__(self, f): self.f = f def __del__(self): self.f() # _mkgarbage creates n objects of garbage kept referenced from an object cycle # so that only cyclic GC can free them. def _mkgarbage(n): # cycle a, b = X(), X() a.b, b.a = b, a # cycle references [n] garbage a.objv = [X() for _ in range(n)] return a # mkgarbage creates cycled garbage and arranges for twice more garbage to be # created when original garbage is collected def mkgarbage(n): a = _mkgarbage(n) a.ondel = DelCall(lambda : _mkgarbage(2*n)) def main(): for i in xrange(10): mkgarbage(1000) print '> %s' % (gc.get_count(),) n = gc.collect() print '< %s' % (gc.get_count(),) main() ---- 8< ---- kirr@deco:~/tmp/trashme/t$ ./gcmoregarbage.py > (482, 11, 0) < (1581, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) > (531, 3, 0) < (2070, 0, 0) here lines starting with "<" show amount of live garbage objects after gc.collect() call has been finished. This way on a busy server there could be arrangements when GC is explicitly ran at step #1 and then automatically run at step #2 (because of gc.get_referrers() python-level call) and from under GC #2 arbitrary code runs thus potentially mutating exception state which shows in logs as bigfile/_bigfile.c:685: pybigfile_loadblk: Assertion `!(ts->exc_type || ts->exc_value || ts->exc_traceback)' failed. ---- So don't assume we end with clean exception state after collecting pybuf referrers and just clear exception state once again as we do after explicit GC. Don't make a similar assumption for buffer unpinning as an object is decrefed there and in theory this can run some code. A test is added to automatically exercise exception state clearing for get_referrers code path via approach similar to demonstrated in above - - we generate more garbage from under garbage and also arrange for finalizers, which mutate exceptions state, to be run at GC #2. The test without the fix applied fails like this: bigfile/_bigfile.c:710 pybigfile_loadblk WARN: python thread-state found with handled but not cleared exception state bigfile/_bigfile.c:711 pybigfile_loadblk WARN: I will dump it and then crash ts->exc_type: None ts->exc_value: <nil> ts->exc_traceback: <nil> Segmentation fault (core dumped) The None in ts->exc_type and nil value and traceback are probably coming from here in cpython runtime: https://github.com/python/cpython/blob/883520a8/Python/ceval.c#L3717 Since this took some time to find, more diagnostics is also added before BUG_ONs corresponding to finding unclean exception state.
-
- 06 Jul, 2017 1 commit
-
-
Kirill Smelkov authored
Commit fb4bfb32 (bigfile/virtmem: Do storeblk() with virtmem lock released) added bug-protection to fileh_dirty_writeout() so that it could not be called twice at the same time or in parallel with other functions which modify pages. However it missed the code path when storeblk() call returned with error and whole writeout was thus erroring out, but with fileh->writeout_inprogress still left set to 1 incorrectly. This was leading to things like bigfile/virtmem.c:419: fileh_dirty_discard: Assertion `!(fileh->writeout_inprogress)' failed. and crashes. Fix it.
-
- 16 Jan, 2017 3 commits
-
-
Kirill Smelkov authored
There are situations possible when both exc_traceback and frame objects are garbage-collected, but frame's f_locals remains not collected because e.g. it was explicitly added to somewhere. We cannot detect such cases (dicts are not listed in referrers). So if nothing helped, as a last resort, unpin pybuf from its original memory and make it point to zero-sized NULL. In general this is not strictly correct to do as other buffers & memoryview objects created from pybuf, copy its pointer on initialization and thus pybuf unpinning won't adjust them. However we require BigFile implementations to make sure not to use such-created objects, if any, after return from loadblk(). Finally fixes #7
-
Kirill Smelkov authored
It turns out some code wants to store tracebacks e.g. for further logging/whatever. This way GC won't help to free up references to pybuf. However if pybuf remain referenced only from calling frames, we can change there reference to pybuf to a stub object "<pybuf>" and this way remove the reference. With added test but without loadblk changes the failure would be as: pybigfile_loadblk WARN: pybuf->ob_refcnt != 1 even after GC: pybuf (ob_refcnt=2): <read-write buffer ptr 0x7fae4911f000, size 2097152 at 0x7fae4998cef0> pybuf referrers: [<frame object at 0x556daff41aa0>] <-- NOTE bigfile/_bigfile.c:613 pybigfile_loadblk BUG!
-
Kirill Smelkov authored
As comments being removed states "on python3 exception state is cleared upon exiting from `except`" - so let's move exc_* fetching under except clause - this way we'll get correct exception objects on both py2 and py3.
-
- 11 Jan, 2017 1 commit
-
-
Kirill Smelkov authored
Because otherwise we bug on pybuf->ob_refcnt != 1. Such cycles might happen if inside loadblk implementation an exception is internally raised and then caught even in deeply internal function which does not receive pybuf as argument or by some other way: After _, _, exc_traceback = sys.exc_info() there is a reference loop created: exc_traceback | ^ | | v .f_localsplus frame and since exc_traceback object holds reference to deepest frame, which via f_back will be holding reference to frames up to frame with pybuf argument, it will result in additional reference to pybuf being held until the above cycle is garbage collected. So to solve the problem while leaving loadblk, if pybuf->ob_refcnt != let's first do garbage-collection, and only then recheck left references. After GC reference-loops created by exceptions should go away. NOTE PyGC_Collect() (C way to call gc.collect()) always performs GC - it is not affected by gc.disable() which disables only _automatic_ garbage collection. NOTE it turned out out storeblk logic to unpin pybuf (see 6da5172e "bigfile/py: Teach storeblk() how to correctly propagate traceback on error") is flawed, because when e.g. creating memoryview from pybuf internal pointer is copied and then clearing original buf does not result in clearing the copy. NOTE it is ok to do gc.collect() from under sighandler - at least we are already doing it for a long time via running non-trivial python code which for sure triggers automatic GC from time to time (see also 786d418d "bigfile: Simple test that we can handle GC from-under sighandler" for the reference) Fixes: #7
-
- 06 Aug, 2015 1 commit
-
-
Kirill Smelkov authored
And specifically that GC'ed object __del__ calls into virtmem (vma_dealloc and fileh_dealloc) again. NOTE not sure it is a good idea to do GC from under sighandle, but currently it happens in practice, because we did not cared to protect against it.
-
- 03 Apr, 2015 1 commit
-
-
Kirill Smelkov authored
Exposes BigFile - this way users can define BigFile backend in Python. Also exposed are BigFile handles, and VMA objects which are results of mmaping.
-