- 01 Jun, 2015 7 commits
-
-
Kevin Modzelewski authored
We weren't even doing any rewriting for hash, so there's not much downside. This also cuts down on boxing quite a bit since we can usually avoid boxing the hash value.
-
Kevin Modzelewski authored
Perf investigations
-
Kevin Modzelewski authored
Coming from looking into regex performance; re_compile is reduced from django-template startup, and dict_hashing_ubench is reduced from that.
-
Kevin Modzelewski authored
Also, quiet some debug output
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
use llvm::StringRef instead of std::string typeNew
-
Chris Toshok authored
turns out we allocate/free the same std::strings for every slot name, for every typeNew. instead do it once, when we create the slotdefs array. Also, use llvm::StringRefs instead of std::strings since we already have them in setattrGeneric (the other caller of update_slot.)
-
- 29 May, 2015 11 commits
-
-
Kevin Modzelewski authored
Add UTF8-BOM support, int.bit_length, function.func_doc, fix '(-1)**0'
-
Marius Wachtler authored
-
Marius Wachtler authored
-
Kevin Modzelewski authored
Python-level sampling profiler
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Uses setitimer() to set a recurring signal, and prints a Python stacktrace at the next safepoint aka allowGLReadPreemption. This is not great since allowGLReadPreemption can happen a decent amount later than the signal. (I'll play around with trying to get the signal to be acted on sooner, but it might be better to wait for full signal-handling support.) Still, it seems to provide some decent high-level info. For example, half of the startup time of the django-template benchmark seems to be due to regular expressions.
-
Kevin Modzelewski authored
Handle list self assignment during slicing
-
Kevin Modzelewski authored
Add some parsing tests
-
Marius Wachtler authored
-
Kevin Modzelewski authored
- byte order marker - newline between decorator and its function - strings with size >64k [including a fix] also, a decimal.Decimal test
-
- 28 May, 2015 12 commits
-
-
Kevin Modzelewski authored
Add PyErr_GetExcInfo, make __builtins__ more similar to cpython and id() output more useful
-
Kevin Modzelewski authored
Convert runtime functions to take llvm::StringRef
-
Marius Wachtler authored
-
Marius Wachtler authored
-
Marius Wachtler authored
-
Marius Wachtler authored
-
Kevin Modzelewski authored
This only happens to me on the gcc build, and it apparently doesn't happen on travis-ci. Not exactly sure why that would cause the code to get linked or not, but anyway it's stuff we're not using right now so just ifdef it out.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Should hopefully cut down on allocations to pass around 'const std::string&' objects (since we don't always store things as std::strings anymore), or to calls to strlen if we pass around const char*s. Haven't looked yet at the calls that we embed in the llvm IR. Here are the perf results: pyston django_migrate.py : 2.3s baseline: 2.3 (-1.7%) pyston django-template.py : 15.1s baseline: 15.4 (-1.6%) pyston interp2.py : 5.3s baseline: 6.3 (-15.1%) pyston raytrace.py : 6.1s baseline: 6.2 (-0.7%) pyston nbody.py : 8.4s baseline: 8.1 (+4.1%) pyston fannkuch.py : 7.5s baseline: 7.5 (+0.2%) pyston chaos.py : 20.2s baseline: 20.0 (+0.7%) pyston fasta.py : 5.4s baseline: 5.4 (+0.3%) pyston pidigits.py : 5.7s baseline: 5.7 (+0.0%) pyston richards.py : 2.5s baseline: 2.7 (-6.2%) pyston deltablue.py : 1.8s baseline: 1.8 (-0.0%) pyston (geomean-3424) : 5.7s baseline: 5.8 (-2.0%) I looked into the regression in nbody.py, and it is in an unrelated piece of code (list unpacking) that has the same assembly and gets called the same number of times. Maybe there's some weird cache collision. It's an extremely small benchmark (a single 13-line loop) so I'm happy to write it off as microbenchmark sensitivity. We can also optimize this if we want to; we could speculate on the type that we are unpacking and inline the parts of the unpacking code we need.
-
Kevin Modzelewski authored
switch to CPython's thread._local implementation
-
Kevin Modzelewski authored
ie make sure both that it's a valid allocation, but also that it will have Python destructor semantics applied when it is freed (as opposed to, say, STLCompatAllocator-allocated memory). This is to make sure that extension modules don't use a different allocation routine than we expected. There are only a few specialized places that I could find that we actually want the old-behavior; in dump(), and in PyObject_Init right before we call setIsPythonObject. So for those cases, add a new isValidGCMemory call that doesn't do the allocation-kind check.
-
- 27 May, 2015 6 commits
-
-
Kevin Modzelewski authored
We were using a slightly different (flattened) version; not sure why.
-
Kevin Modzelewski authored
CPython's implementation has quite a bit more features than our old one. We only particularly need one of them (call __init__ when accessed from a new thread), but it looks like there are some other features in there that have a decent chance of biting us in annoying ways (some gc-related stuff). That implementation forced some of the other work in this PR, of supporting weakrefs on extension objects (which this uses), and making object.tp_init get set the same way it does in CPython.
-
Kevin Modzelewski authored
Without this, one could use freed objects without issues, until something else was allocated in that space. And even then, it would still be a valid object. So, in debug mode overwrite the data with garbage to try to surface these issues. This exposed an issue with our "nonheap roots" handling, where we weren't scanning all of the memory that they pointed to. This is mostly fine, but there are some cases (time.gmtime) where gc-allocated memory would be stored in these objects. So, now you have to register the size of the object, and the memory range will be scanned conservatively.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Add a new allocation type CONSERVATIVE_PYTHON for extensions objects, for which we don't have heap maps, but should still have Python finalization semantics (ie destructors and weakrefs). Previously we were just marking them as CONSERVATIVE and skipping them during the sweep phase, and not running destructors or handling weakrefs. It's a bit tricky to figure out when to mark an allocation as conservative vs conservative-python; the approach in this commit is to mark all capi-originated allocations as conservative, and then when we call PyObject_Init or PyObject_InitVar, switch them from conservative to conservative-python. I think this is more expensive but safer than assuming that certain apis will always/never be used as object memory. Unfortunately there are quite a few extension classes that request a custom tp_dealloc, and this commit just keeps the old (bad) behavior of ignoring those. I tried to verify as many as I could and they all seem benign, but it will be nice to have real destructor support :)
-
Kevin Modzelewski authored
-
- 26 May, 2015 4 commits
-
-
Kevin Modzelewski authored
Make our pyc handling more robust
-
Kevin Modzelewski authored
Add __cmp__ and dictproxy
-
Kevin Modzelewski authored
and add a pyc "stress test". I think these issues are the source of our sporadic ci failures; it makes sense based on where things fail (usually in the parser), and because it's stateful (if you already have pycs generated you don't run into the issue) and because it only happens in multithreaded mode. changes: - read the entire file at once, then do checks - add a simple xor checksum in addition to the expected length
-
Kevin Modzelewski authored
add libunwind_patch for binary search
-