Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
M
mitogen
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Labels
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Commits
Open sidebar
nexedi
mitogen
Commits
05128543
Commit
05128543
authored
Oct 03, 2017
by
David Wilson
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
importer: module preloading docs
parent
b3bd075d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
112 additions
and
7 deletions
+112
-7
docs/howitworks.rst
docs/howitworks.rst
+92
-7
docs/index.rst
docs/index.rst
+20
-0
No files found.
docs/howitworks.rst
View file @
05128543
...
...
@@ -270,9 +270,12 @@ Masters listen on the following handles:
..
data
::
mitogen
.
core
.
GET_MODULE
Receives
`(
reply_to
,
fullname
)`
2
-
tuples
,
looks
up
the
source
code
for
the
module
named
``
fullname
``,
and
writes
the
source
along
with
some
metadata
back
to
the
handle
``
reply_to
``.
If
lookup
fails
,
``
None
``
is
sent
instead
.
Receives
the
name
of
a
module
to
load
`
fullname
`,
locates
the
source
code
for
``
fullname
``,
and
routes
one
or
more
``
LOAD_MODULE
``
messages
back
towards
the
sender
of
the
``
GET_MODULE
``
request
.
See
below
for
a
longer
discussion
of
``
GET_MODULE
``/``
LOAD_MODULE
``.
If
lookup
fails
,
``
None
``
is
sent
instead
.
..
data
::
mitogen
.
core
.
ALLOCATE_ID
...
...
@@ -285,6 +288,25 @@ Masters listen on the following handles:
Children
listen
on
the
following
handles
:
..
_LOAD_MODULE
:
..
data
::
mitogen
.
core
.
LOAD_MODULE
Receives
`(
pkg_present
,
path
,
compressed
,
related
)`
tuples
,
composed
of
:
*
**
pkg_present
**:
Either
``
None
``
for
a
plain
``.
py
``
module
,
or
a
list
of
canonical
names
of
submodules
existing
witin
this
package
.
For
example
,
a
``
LOAD_MODULE
``
for
the
``
mitogen
``
package
would
return
a
list
like
:
`[
"mitogen.core"
,
"mitogen.fakessh"
,
"mitogen.fakessh"
,
..]`.
This
list
is
used
by
children
to
avoid
generating
useless
round
-
trips
due
to
Python
2.
x
's ``import`` statement behavior.
* **path**: Original filesystem where the module was found on the master.
* **compressed**: :py:mod:`zlib`-compressed module source code.
* **related**: list of canonical module names on which this module appears
to depend. Used by children that have ever started any children of their
own to preload those children with ``LOAD_MODULE`` messages in response
to a ``GET_MODULE`` request.
.. _CALL_FUNCTION:
.. data:: mitogen.core.CALL_FUNCTION
Receives `(mod_name, class_name, func_name, args, kwargs)`
...
...
@@ -312,6 +334,7 @@ Children listen on the following handles:
to it, and arranging for the connection to its parent to be closed shortly
thereafter.
.. _ADD_ROUTE:
.. data:: mitogen.core.ADD_ROUTE
Receives `(target_id, via_id)` integer tuples, describing how messages
...
...
@@ -470,9 +493,10 @@ Python if it can satisfy the import by itself, and if not, indicating to Python
that it is capable of loading the module.
In :py:meth:`load_module() <mitogen.core.Importer.load_module>` an RPC is
started
to
the
parent
context
,
requesting
the
module
source
code
.
Once
the
source
is
fetched
,
the
method
builds
a
new
module
object
using
the
best
practice
documented
in
PEP
-
302.
started to the parent context, requesting the module source code by way of a
``GET_MODULE``. If the parent context does not have the module available, it
recursively forwards the request upstream, while avoiding duplicate requests
for the same module from its own threads and any child contexts.
Neutralizing ``__main__``
...
...
@@ -510,7 +534,7 @@ In Python 2.x, Python will first try to load ``mypkg.sys`` and ``mypkg.os``,
which
do
not
exist
,
before
falling
back
on
:
py
:
mod
:`
sys
`
and
:
py
:
mod
:`
os
`.
These
negative
imports
present
a
challenge
,
as
they
introduce
a
large
number
of
pointless network roundtrips. Therefore in addition to the
pointless
network
round
-
trips
.
Therefore
in
addition
to
the
:
py
:
mod
:`
zlib
`-
compressed
source
,
for
packages
the
master
sends
along
a
list
of
child
modules
known
to
exist
.
...
...
@@ -521,6 +545,67 @@ module does not appear in the enumeration of child modules belonging to the
package
that
was
provided
by
the
master
.
Import
Preloading
#################
To
further
avoid
round
-
trips
,
when
a
module
or
package
is
requested
by
a
child
,
its
bytecode
is
scanned
in
the
master
to
find
all
the
module
's ``import``
statements, and of those, which associated modules appear to have been loaded
in the master'
s
:
py
:
data
:`
sys
.
modules
`.
The
:
py
:
data
:`
sys
.
modules
`
check
is
necessary
to
handle
various
kinds
of
conditional
execution
,
for
example
,
when
a
module
's code guards an ``import``
statement based on the active Python runtime version, operating system, or
optional third party dependencies.
Before replying to a child'
s
request
for
a
module
with
dependencies
:
*
If
the
request
is
for
a
package
,
any
dependent
modules
used
by
the
package
that
appear
within
the
package
itself
are
known
to
be
missing
from
the
child
,
since
the
child
requested
the
top
-
level
package
module
,
therefore
they
are
pre
-
loaded
into
the
child
using
``
LOAD_MODULE
``
messages
before
sending
the
``
LOAD_MODULE
``
message
for
the
requested
package
module
itself
.
In
this
way
,
the
child
will
already
have
dependent
modules
cached
by
the
time
it
receives
the
requested
module
,
avoiding
one
round
-
trip
for
each
dependency
.
For
example
,
when
a
child
requests
the
``
django
``
package
,
and
the
master
determines
the
``
django
``
module
code
in
the
master
has
import
statements
for
``
django
.
utils
``,
``
django
.
utils
.
lru_cache
``,
and
``
django
.
utils
.
version
``,
and
that
exceution
of
the
module
code
on
the
master
caused
those
modules
to
appear
in
the
master
's :py:data:`sys.modules`, there is high probability
execution of the ``django`` module code in the child will cause the same
modules to be loaded. Since all those modules exist within the ``django``
package, and we already know the child lacks that package, it is safe to
assume the child will make follow-up requests for those modules too.
In the example, this replaces 4 round-trips with 1 round-trip.
For any package module ever requested by a child, the parent keeps a note of
the name of the package for one final optimization:
* If the request is for a sub-module of a package, and it is known the child
loaded the package'
s
implementation
from
the
parent
,
then
any
dependent
modules
of
the
requested
module
at
any
nesting
level
within
the
package
that
is
known
to
be
missing
are
sent
using
``
LOAD_MODULE
``
messages
before
sending
the
``
LOAD_MODULE
``
message
for
the
requested
module
,
avoiding
1
round
-
trip
for
each
dependency
within
the
same
top
-
level
package
.
For
example
,
when
a
child
has
previously
requested
the
``
django
``
package
,
the
parent
knows
it
was
completely
absent
on
the
child
.
Therefore
when
the
child
subsequently
requests
the
``
django
.
db
``
module
,
it
is
safe
to
assume
the
child
will
generate
subsequent
``
GET_MODULE
``
requests
for
the
2
``
django
.
conf
``,
3
``
django
.
core
``,
2
``
django
.
db
``,
3
``
django
.
dispatch
``,
and
7
``
django
.
utils
``
indirect
dependencies
for
``
django
.
db
``.
In
the
example
,
this
replaces
17
round
-
trips
with
1
round
-
trip
.
The
method
used
to
detect
import
statements
is
similar
to
the
standard
library
:
py
:
mod
:`
modulefinder
`
module
:
rather
than
analyze
module
source
code
,
``
IMPORT_NAME
``
opcodes
are
extracted
from
the
module
's bytecode. This is since
clean source analysis methods (:py:mod:`ast` and :py:mod:`compiler`) are an
order of magnitude slower, and incompatible across major Python versions.
Child Module Enumeration
########################
...
...
docs/index.rst
View file @
05128543
...
...
@@ -142,6 +142,26 @@ further effort.
.. _py2exe: http://www.py2exe.org/
Common sources of import latency and bandwidth consumption are mitigated:
* Modules need only be uploaded once per directly connected context. Subsequent
requests for modules from children of that context will be served by the
child itself.
* Imports by threads within a context triggering a load are deduplicated and
joined with any identical requests triggered by other threads in the same
context and children in the context's subtree.
* No roundtrip is required for negative responses due to Python 2's import
statement semantics: children have a list of submodules belonging to a
package, and ignore requests for submodules that did not exist on the master.
* Imports are extracted from each module, compared to those found in memory,
and recursively preloaded into children requesting that module, minimizing
round-trips to one per package nesting level. For example,
:py:mod:`django.db.models` only requires 3 round-trips to transfer 456KiB,
representing 1.7MiB of uncompressed source split across 148 modules.
SSH Client Emulation
####################
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment