Commit c4652cb3 authored by Boxiang Sun's avatar Boxiang Sun

add description of the implementation details section

parent eb9da666
......@@ -6,10 +6,10 @@ I rewrite the RestrictedPython because we(Nexedi) try to use it in
Pyston(A new Python implementation which originally developed by Dropbox).
But Pyston doesn't support the `compiler` package. And Pyston use different
bytecode than CPython. Luckily, Pyston support the `ast` package. But due to
the `compiler` package was obsolated. So I try to use `ast` function and
`compile` function to reimplement the RestrictedPython.
the `compiler` package was obsoleted. Pyston not intende to support it anymore.
So I try to use `ast` package and `compile` function to reimplement the RestrictedPython.
The new implemenation can support both CPython 2/3, and it can also support Pyston.
In theory, the new implemenation can support both CPython 2/3, and it can also support Pyston.
## Introduction
......@@ -29,16 +29,119 @@ implementation is based on `ast` package.
However, the AST in `ast` package is different than the AST in `compiler.ast`.
So there have some corner cases which I have to handle it as "exceptions".
For more information, please refer to the source code, which the key differences
are in `src/RestrictionMutator`.
For more information, please see the "Implementation details" section in below.
## Current state
This is not finished yet. But there already has the skeleton. Some tests
The new implementation is not finished yet. But there already has the skeleton. Some tests
could passed now, but some of them were disabled. And it not production ready.
So please feel free to contact if you have any suggestion about the new implementation.
## Implementation details
I will refer the RP based on the `compiler` package as RPc, and refer the new
RestrictedPython based on the `ast` package as RPa.
The first thing I did is try to understand what RPc trying to do. The RPc
basically do these things:
- Use `compiler.parse` to parse the source code to `compiler.ast`(In RCompile, `niceParse`)
- Use `MutatingWalker` to modify the abstract syntax tree.
- Use `compiler.Expression` and similar classes to generate CPython bytecode based on the ast tree.
- Use `RestrictedCodeGenerator` to modify the CPython bytecode again.
- Execute the bytecode in a custome builtin environment.
You can get more information from the original notes in below.
---------------
In RPa, I just replace the `compiler` package by the `ast` package.
Which basically did these things:
- Use `ast.parse` to generate the `ast.AST` tree from source code.
- Use `ast.NodeTransformer` to modify the ast tree.
- use `compile` builtin function to generate the bytecode.
- Use `exec` to run the bytecode.(We don't care the type of bytecode, In Pyston,
the `exec` in Pyston can run Pyston bytecode. In CPython the `exec` in CPython can
run CPython bytecode.)
I compared the ast tree before and after the process in MutatingWalker.
Then transforme the tree(in compiler.ast form) to the equivalent form in `ast.AST`.
Then do some works like reverse engineering. But there have several differences:
- Use `ast.parse` to generate the ast tree from Python source code. This ast is
different than the `compiler.ast`.
- RPc use a class which named `MutatingWalker` to visit the ast nodes manually.
More specifically, RPc defined lots of `visitXXX`(which XXX refer a type of node,
such as visitFunction). When RPc encounter a node which type is a function. It will
call the visitFunction(this done by the MutatingWalker). This function will modify
the ast node.
As you can see, the ast tree generated by RPa is not the `compiler.ast`, it is the `ast.AST`.
The `ast` pacakge has a class which named `ast.NodeTransformer`, which can replace the "Mutator".
So we don't need the `MutatingWalker` anymore. For example, we can define a class which
inherited from the `ast.NodeTransformer`. Inside this class, we define a function
which named as visit_Function. Then when we encounter a function node. This class will
automatically call the `visitFunction` like in the `MutatingWalker`.
Furthermore, the `compiler.ast` has different hirachy than the `ast.AST`:
* `a = foo[bar, baz]` becomes `a = _getitem(foo, (bar, baz))`
* `a = foo[bar: baz]` becomes `a = _getitem(foo, slice(bar, baz))`
* `del foo[bar]` becomes `del _write(foo)[bar]`
* `foo[bar] = a` becomes `_write(foo)[bar] = a`
In the `compiler.ast`. We will fall into one type of node, which is Subsciprt. Just have
different type of node.flags:
* OP_APPLY, but the type of subscript is Subsciprt.
* OP_APPLY, but the type of subscript is Slice.
* OP_DELETE
* OP_ASSIGN
In the `ast.AST`, we will encounter this statements in different type of nodes:
* In node Assign, the node.value is ast.Subscript
* In node Assign, the node.value is ast.Slice
* In node Delete, the node is ast.Delete, the node.target is ast.Subsciprt
* In node Assign, the type of node.target is ast.Subsciprt.
This kind of differences produced a serious problem:
In `compiler.ast`, we can use one function `visitSubscript` to handle all senarios in above.
Handle differently in different kind of node.flags. But in `ast.AST`. The function will fall
into something like `visit_Subscript`, `visit_Delete` or `visit_Slice`, rather than only one
`visitSubscript`.
The concequence is, we don't know the `ast.Subsciprt` is come from the left of an assignment,
or from ast.delete, or from the right of assignment(We need to apply different rewrite rules in
different situations). My solution is rewrite the ast.Subsciprt to `_getitem` by default. If
we encounter the ast.Subsciprt in `ast.Delete` or the left of an assignment. We rewrite it
to `_write`.
There have other places which has same issue, such as the attribute access in assginment,
in the for loop, in the `with` statement, in pure access etc. Which cost me lots of time
to handle this kind of corner cases.
The other problem is the RPc use to stage strategy. First stage is try to modify the ast tree.
The second stage is try to modify the CPython bytecode directly to do some work. The RPa need
to take care the Pyston implementation, so we can't modify the CPython bytecode or Pyston bytecode.
I merge these two stage into one stage. This also bring some serious corner cases.
For more information, please refer to the source code, which the key differences
are in `src/RestrictionMutator`.
## Conclusion
The corner cases didn't covered 100%, but according to RP test suite. Seems most
senerios were covered.
Please feel free to contact if you have any suggestion about the new implementation.
Such as the design, architecture etc, thanks!
===============================
oRIGINAL NOTES
===============================
How it works
============
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment