Merge pull request #687 from rudi-c/docs

Add docs and tips for new contributors

Merge pull request #687 from rudi-c/docs
Add docs and tips for new contributors
cf04e092 · Chris Toshok · f2a936e8 · e45acaaf · cf04e092 · cf04e092
Commit cf04e092 authored Jul 21, 2015 by Chris Toshok
9 changed files
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
 ## Contributing to Pyston

-#### Pull Requests
+### Pull Requests

 Before a pull request can be merged, you need to to sign the [Dropbox Contributor License Agreement](https://opensource.dropbox.com/cla/).

-Please make sure to run the tests (`make check` and `make check_format`) on your changes.
+Please make sure to run at least the basic tests (`make quick_check`) on your changes. Travis CI will run the full test suite (`make check`) on each pull request.

 ##### Formatting

-Please make sure `make check_format` passes on your commits.  If it reports any issues, you can run `make format` to auto-apply
-the project formatting rules.  Note that this will format your files in-place with no built-in undo, so you may want to
-create a temporary commit if you have any unstaged changes.
+Please make sure `make check_format` passes on your commits.  If it reports any issues, you can run `make format` to auto-apply the project formatting rules.  Note that this will format your files in-place with no built-in undo, so you may want to create a temporary commit if you have any unstaged changes.

-### Projects
+Adding a pre-commit hook can be useful to catch formatting errors earlier. i.e. have in `~/pyston/.git/hooks/pre-commit`:

-If you don't know where to start:
- Check out our [Github issues list](https://github.com/dropbox/pyston/issues), especially those marked ["probably easy"](https://github.com/dropbox/pyston/labels/probably%20easy)
+```
+#!/bin/sh
+exec make -C ~/pyston check_format
+```
+
+### Getting started with the codebase
+
+The easiest way to contribute to Pyston is to help us improve our compatibility with CPython. There are many small tasks to do such as built-in functions that are not yet implemented, edge cases that are not being handled by Pyston or where our output is slightly different than CPython, etc. The fix will often involve a local change, giving a smooth start to learning the codebase. One of Python's greatest strengths is that it comes ["batteries included"](https://xkcd.com/353/), but this means that there is a long long tail of these little tasks that needs to be driven through - your help is immensely valuable there!
+
+The command `make quick_check` will first run our Pyston tests (great way to make sure everything is in order) and then the default CPython tests. You will get an output that looks like this:
+
+```
+                  ...
+                  test_bastion.py    Correct output (125.7ms)
+                 test_unittest.py    (skipped: expected to fail)
+                     test_json.py    (skipped: expected to fail)
+                  test_future3.py    Correct output (952.8ms)
+                  ...
+```
+
+Notice that a large number tests are currently marked as failing (marked with an `# expected: fail` comment at the top of the file). Just pick any that you think is interesting and get it to pass! Remove the `#expected: fail` flag and run `make check_TESTNAME` (without `.py`) to compare the result to CPython's (the command will search for TESTNAME in the `test/` directory). If the test is crashing, the easiest way to start debugging is to use `make dbg_TESTNAME` which is essentially `make check_TESTNAME` inside GDB.
+
+This kind of work will often happen where native libraries are defined (e.g. `src/runtime/builtin_modules/builtins.cpp`), implementation of types (e.g. `src/runtime/str.cpp`) and debugging may involve tracing through the interpreter (`src/codegen/ast_interpreter.cpp`). The code in those files should be relatively straightforward. Code that involve the JIT (function rewriting, assembly generation, etc) is more intricate and confusing in the beginning (e.g. `src/asm_writing/rewriter.cpp`). Keep in mind, it's perfectly fine to ask for help!
+
+To save you some time, the cause of failures for some of the tests [may have already been identified](test/CPYTHON_TEST_NOTES.md). Do note, however, that not all of CPython's behavior can be matched exactly. For example, by nature of having a garbage collector over reference counting, the freeing of objects is non-deterministic and we can't necessarily call object finalizers in the same order as CPython does.
+
+[Some tips on challenges you might run into and debugging tips](docs/TIPS.md).
+
+You can also check out our [Github issues list](https://github.com/dropbox/pyston/issues), especially those marked ["probably easy"](https://github.com/dropbox/pyston/labels/probably%20easy).
+
+### Communicating
+
+- We use a [gitter chat room](https://gitter.im/dropbox/pyston) for most of our discussions. If you need help with figuring out where to start or with the codebase, you should get a response there fairly quickly. If you found a small project to work on already and are eager to start, by all means get started! It is still a good idea to drop us a note - we might some suggestions or we can think of an edge case or two.
 - Email the [pyston-dev mailing list](http://lists.pyston.org/cgi-bin/mailman/listinfo/pyston-dev), or [browse the archives](http://lists.pyston.org/pipermail/pyston-dev/)
- Join us on [#pyston](http://webchat.freenode.net/?channels=pyston) on freenode.
+
+### Bigger projects
+
+There are many big areas where Pyston can use improvements. This includes, for example, a better garbage collector, better performance profiling tools (including finding more benchmarks), finding new libraries to add to our test suite, etc. These can be very involved - if you are interested bigger projects (e.g. as part of graduate studies), please contact us directly.
--- a/README.md
+++ b/README.md
@@ -17,6 +17,8 @@ Currently, Pyston targets Python 2.7, only runs on x86_64 platforms, and only ha
 Pyston welcomes any kind of contribution; please see [CONTRIBUTING.md](https://github.com/dropbox/pyston/blob/master/CONTRIBUTING.md) for details.
 > tl;dr: You will need to sign the [Dropbox CLA](https://opensource.dropbox.com/cla/) and run the tests.

+Pyston a fairly low-level program with a lot of necessary hacks for compatibility or performance purposes. We recommend taking a look at [development tips](docs/TIPS.md).
+
 ### Roadmap

 ##### v0.1: [released 4/2/2014](https://tech.dropbox.com/2014/04/introducing-pyston-an-upcoming-jit-based-python-implementation/)
@@ -67,29 +69,12 @@ You can get a simple REPL by simply typing `make run`; it is not very robust rig

 #### Makefile targets

+- `make pyston_release`: to compile in release mode and generate the `pyston_release` executable
 - `make check`: run the tests
 - `make run`: run the REPL
 - `make format`: run clang-format over the codebase
- We have a number of helpers of the form `make VERB_TESTNAME`, where `TESTNAME` can be any of the tests/benchmarks, and `VERB` can be one of:
- - `make run_TESTNAME`: runs the file under pyston_dbg.
- - `make run_release_TESTNAME`: runs the file under pyston_release.
- - `make dbg_TESTNAME`: same as `run`, but runs pyston under gdb.
- - `make check_TESTNAME`: checks that the script has the same behavior under pyston_dbg as it does under CPython.  See tools/tester.py for information about test annotations.
- - `make perf_TESTNAME`: runs the script in pyston_release, and uses perf to record and display performance statistics.
- - A few lesser used ones; see the Makefile for details.
- `make watch_cmd`: meta-command which uses inotifywait to run `make cmd` every time a source file changes.
- - For example, `make watch_pyston_dbg` will rebuild pyston_dbg every time you save a source file.  This is handy enough to have the alias `make watch`.
- - `make watch_run_TESTNAME` will rebuild pyston_dbg and run TESTNAME every time you change a file.
- - `make wdbg_TESTNAME` is mostly an alias for `make watch_dbg_TESTNAME`, but will automatically quit GDB for you.  This is handy if pyston is crashing and you want to get a C-level stacktrace.
-
-There are a number of common flags you can pass to your make invocations:
- `V=1` or `VERBOSE=1`: display the full commands being executed
- `ARGS=-v`: pass the given args (in this example, `-v`) to the executable.
- - Note: these will usually end up before the script name, and so apply to the pyston runtime as opposed to appearing in sys.argv.  For example, `make run_test ARGS=-v` will execute `./pyston_dbg -v test.py`.
- `BR=breakpoint`: when running under gdb, automatically set a breakpoint at the given location.
- `SELF_HOST=1`: run all of our Python scripts using pyston_dbg.
-
-For a full list, please check out the [Makefile](https://github.com/dropbox/pyston/blob/master/Makefile).
+
+For more, see [development tips](docs/TIPS.md)

 #### Pyston command-line options:

@@ -99,16 +84,16 @@ Pyston-specific flags:
  <dd>Set verbosity to 0</dd>
 <dt>-v</dt>
  <dd>Increase verbosity by 1</dd>
-  
+
 <dt>-s</dt>
  <dd>Print out the internal stats at exit.</dd>
-  
+
 <dt>-n</dt>
  <dd>Disable the Pyston interpreter.  This is mostly used for debugging, to force the use of higher compilation tiers in situations they wouldn't typically be used.</dd>
-  
+
 <dt>-O</dt>
  <dd>Force Pyston to always run at the highest compilation tier.  This doesn't always produce the fastest running time due to the lack of type recording from lower compilation tiers, but similar to -n can help test the code generator.</dd>
-  
+
 <dt>-I</dt>
  <dd>Force always using the Pyston interpreter.  This is mostly used for debugging / testing. (Takes precedence over -n and -O)</dd>

@@ -170,7 +155,7 @@ int square(int n) {
    int r = 0;
    for (int i = 0; i < n; i++) {
        r += n;
-        
+
        // OSR exit here:
        _backedge_trip_count++;
        if (_backedge_trip_count >= 10000) {

--- a/docs/TIPS.md
+++ b/docs/TIPS.md
--- a/src/capi/typeobject.h
+++ b/src/capi/typeobject.h
@@ -19,6 +19,14 @@

 namespace pyston {

+/*
+ * Type objects refer to Python's new-style classes that inherit from
+ * `object`. This, classes declared as:
+ *
+ * class Foo(object):
+ *  ...
+ */
+
 // Returns if a slot was updated
 bool update_slot(BoxedClass* self, llvm::StringRef attr) noexcept;


--- a/src/codegen/ast_interpreter.cpp
+++ b/src/codegen/ast_interpreter.cpp
@@ -76,6 +76,12 @@ public:
    static void deregister(void* frame_addr);
 };

+/*
+ * ASTInterpreters exist per function frame - there's no global interpreter object that executes
+ * all non-jitted code!
+ *
+ * The ASTInterpreter inherits from Box as part of garbage collection support.
+ */
 class ASTInterpreter : public Box {
 public:
    typedef ContiguousMap<InternedString, Box*, llvm::SmallDenseMap<InternedString, int, 16>> SymMap;

--- a/src/core/types.h
+++ b/src/core/types.h
@@ -471,6 +471,12 @@ public:
 class BoxedDict;
 class BoxedString;

+// In Pyston, this is the same type as CPython's PyObject (they are interchangeable, but we
+// use Box in Pyston wherever possible as a convention).
+//
+// Other types on Pyston inherit from Box (e.g. BoxedString is a Box). Why is this class not
+// polymorphic? Because of C extension support -- having virtual methods would change the layout
+// of the object.
 class Box {
 private:
    BoxedDict** getDictPtr();

--- a/src/runtime/classobj.h
+++ b/src/runtime/classobj.h
@@ -19,6 +19,17 @@

 namespace pyston {

+/*
+ * Class objects refer to Python's old-style classes that don't inherit from
+ * `object`. This, classes declared as:
+ *
+ * class Foo:
+ *  ...
+ *
+ * When debugging, "obj->cls->tp_name" will have value "instance" for all
+ * old-style classes rather than the name of the class itself.
+ */
+
 void setupClassobj();

 class BoxedClass;

--- a/src/runtime/types.h
+++ b/src/runtime/types.h
@@ -165,7 +165,8 @@ extern "C" void printFloat(double d);
 Box* objectStr(Box*);
 Box* objectRepr(Box*);

-
+// In Pyston, this is the same type as CPython's PyTypeObject (they are interchangeable, but we
+// use BoxedClass in Pyston wherever possible as a convention).
 class BoxedClass : public BoxVar {
 public:
    typedef void (*gcvisit_func)(GCVisitor*, Box*);

--- a/test/cpython/NOTES.org
+++ b/test/cpython/NOTES.org
 These are rntz's notes from trying to get the CPython testing framework to run,
 for future reference.

-* getting regrtest to work is hard
-regrtest works by __import__()ing the tests to be run and then doing some stuff.
-This means that if even a single test causes pyston to crash or assert(), the
-whole test-runner dies.
-
-The best fix for this would be to simply run each test in a separate pyston
-process. It's not clear to accomplish this without breaking the tests, however,
-because Python's test support is a big, gnarly piece of code. In particular, it
-will skip tests based on various criteria, which we probably want to support.
-But it's not clear how to disentangle that knowledge from the way it __import__s
-the tests.
-
-So instead I ran regrtest manually, looked at what tests it ran, and imported
-those. tester.py will run them separately.
-
-** Obsolete notes: Hacking regrtest to manually change directories
-Need to run test/regrtest.py for testing harness; The standard way to do this in
-CPython is `python -m test.regrtest` from Lib/. The -m is important because
-otherwise it can't find the tests properly. Turns out implementing the -m flag
-is hard, because the machinery for imports is pretty complicated and there's no
-easy way to ask it "which file *would* I load to import this module". So I
-hacked regrtest to manually change directories.
-
-** Obsolete FIXME for regrtest.py: CFG/analysis bug with osr
-test/regrtest.py trips an assert in PropagatingTypeAnalysis::getTypeAtBlockStart
-if not run with -I, looks like malformed CFG or bug in analysis
-* tests are slow
-CPython's tests are pretty slow for us. In particular, since we're not running
-with regrtest, we have to go through the set-up time of loading
-test.test_support for each test. On my machine it's that's about a half-second
-per test.
-
-To handle this, by default we don't run tests marked "expected: fail". To
-disable this, pass --all-cpython-tests to tester.py.
-
-* bugs uncovered
+## Bugs uncovered
 The CPython tests I've included fail for various reasons. Recurring issues include:
- use of compile()
- missing __hash__ implementations for some builtin types
- we don't have imp.get_magic()
+- use of `compile()`
+- missing `__hash__` implementations for some builtin types
+- we don't have `imp.get_magic()`
 - segfaults
- missing __name__ attributes on capifuncs
- missing sys.__stdout__ attribute
- serialize_ast.cpp: writeColOffset: assert(v < 100000 || v == -1) gets tripped
- pypa-parser.cpp: readName: assert(e.type == pypa::AstType::Name)
- src/runtime/util.cpp: parseSlice: assert(isSubclass(start->cls, int_cls) || start->cls == none_cls)
+- missing `__name__` attributes on capifuncs
+- missing `sys.__stdout__` attribute
+- `serialize_ast.cpp`: `writeColOffset: assert(v < 100000 || v == -1)` gets tripped
+- `pypa-parser.cpp`: readName: `assert(e.type == pypa::AstType::Name)`
+- `src/runtime/util.cpp`: `parseSlice`: `assert(isSubclass(start->cls, int_cls) || start->cls == none_cls)`

-** list of files & why they're failing
+## List of files & why they're failing
+```
 FILE                    REASONS
 ------------------------------------------------------
 test_augassign          missing oldstyle-class __add__, __iadd__, etc
@@ -131,3 +97,39 @@ test_weakset            unknown issues
 test_with               weird codegen assert
 test_wsgiref            unknown issue
 test_xrange             unknown type analysis issue
+```
+
+### Getting regrtest to work is hard
+regrtest works by `__import__()ing` the tests to be run and then doing some stuff.
+This means that if even a single test causes pyston to crash or assert(), the
+whole test-runner dies.
+
+The best fix for this would be to simply run each test in a separate pyston
+process. It's not clear to accomplish this without breaking the tests, however,
+because Python's test support is a big, gnarly piece of code. In particular, it
+will skip tests based on various criteria, which we probably want to support.
+But it's not clear how to disentangle that knowledge from the way it `__import__`s
+the tests.
+
+So instead I ran regrtest manually, looked at what tests it ran, and imported
+those. tester.py will run them separately.
+
+### Obsolete notes: Hacking regrtest to manually change directories
+Need to run test/regrtest.py for testing harness; The standard way to do this in
+CPython is `python -m test.regrtest` from Lib/. The -m is important because
+otherwise it can't find the tests properly. Turns out implementing the -m flag
+is hard, because the machinery for imports is pretty complicated and there's no
+easy way to ask it "which file *would* I load to import this module". So I
+hacked regrtest to manually change directories.
+
+### Obsolete FIXME for regrtest.py: CFG/analysis bug with osr
+test/regrtest.py trips an assert in PropagatingTypeAnalysis::getTypeAtBlockStart
+if not run with -I, looks like malformed CFG or bug in analysis
+* tests are slow
+CPython's tests are pretty slow for us. In particular, since we're not running
+with regrtest, we have to go through the set-up time of loading
+test.test_support for each test. On my machine it's that's about a half-second
+per test.
+
+To handle this, by default we don't run tests marked "expected: fail". To
+disable this, pass --all-cpython-tests to tester.py.