Lowering the overhead of Cyberbrain #58

laike9m · 2020-10-18T23:50:08Z

Cyberbrain adds a huge overhead to program execution, both in time spent and memory usage. This issue is for discussing possible improvements.

Time

Profiled one run with py-spy

sudo py-spy record -o profile.svg -- python -m examples.password.password examples/password/sonnets/* -s 1 --l33t

here's the result: https://laike9m.github.io/images/cyberbrain_profile.svg

I only did a brief check. In summary, the overhead of sys.settrace is smaller than expected. It took up ~1/6 of the extra time.

Major consuming time operations:

JSON pickle
- ~~[ ] Propose to use orjson as the preferred backend jsonpickle/jsonpickle#326~~
- Cache JSON strings so that they don't need to be recomputed.
Protobuf encoding
Value stack operations
Is it possible to lazy load the repr handlers? alexmojaki/cheap_repr#16, took ~8% of the total time
parameters = inspect.signature(handler).parameters in value_stack.py. Kinda unexpected.
The log function in logger.py. This is also unexpected.
TODO: More to add

Apparently there are some low-hanging fruits, and we should fix them first.

Ultimately, we need to rewrite part of Cyberbrain in C/C++. There are many options, but I'd like to automate it as much as I can, so I will first look into Nuitka and mypyc. If they don't work well, Cython is also a good option.

Probably the only good news is that the overhead of sys.settrace only contributes a small portion to the overhead. So in the short-term I won't bother replacing it. Once we optimized the other stuff to the extent that sys.settrace becomes the majority overhead, we'll come back to it.

Optimize JSON pickle

Cybebrain uses the jsonpickle library to convert Python objects to JSON, so that they can be displayed in the devtools console. jsonpickle is pure Python and really slow, it took ~23% of the total time, which is the biggest performance bottleneck.

The to JSON process can't be parallelized, since we have to do it before the original object gets modified. Thus the only way left is to speed up the library. Some options I've considered or tried

Rewrite it in C++. Though jsonpickle is a relatively small lib, Writing it in C++ by myself is still a huge amount of work, and not really realistic. Not to mention it's hard to keep up with the upstream change.
Use Nuikta. I tried it on my Mac, it works surprisingly well and cut the execution time from 8.5s to 6s. However Nuikta isn't really designed to be used by a library, but more for applications. Some reasons:
- Nuikta doesn't support cross-compilation, but it's hard for library owners to compile a shared library for every platform.
- Nuikta does not let users use the generated C file as a C extension, but only the shared libraray.
Use Cython. Cython seems to do the job of compiling a Python package to C, and let you use it as a C extension, but I haven't tried it. Some refs:

Memory

TBD

The text was updated successfully, but these errors were encountered:

laike9m · 2020-11-12T05:46:57Z

Benchmark results from (all tests are executed with Python 3.8)

hyperfine -m 5 "python3.8 -m examples.password.password examples/password/sonnets/* -s 1 --l33t"

Before optimization:

Benchmark #1: python -m examples.password.password examples/password/sonnets/* -s 1 --l33t
  Time (mean ± σ):     13.538 s ±  1.564 s    [User: 11.830 s, System: 0.504 s]
  Range (min … max):   11.626 s … 15.674 s    5 runs

laike9m · 2020-11-12T06:26:49Z

After 6ee0e94:

Benchmark #1: python -m examples.password.password examples/password/sonnets/* -s 1 --l33t
  Time (mean ± σ):     10.726 s ±  0.892 s    [User: 9.321 s, System: 0.396 s]
  Range (min … max):    9.683 s … 11.840 s    5 runs

py-spy:
https://laike9m.github.io/images/6ee0e94.svg

laike9m · 2020-11-12T08:09:47Z

After af9b0d4

Benchmark #1: python -m examples.password.password examples/password/sonnets/* -s 1 --l33t
  Time (mean ± σ):      8.474 s ±  0.593 s    [User: 7.433 s, System: 0.357 s]
  Range (min … max):    7.678 s …  9.122 s    5 runs

py-spy:
https://laike9m.github.io/images/af9b0d4.svg

linw1995 · 2020-11-20T16:41:49Z

In my humble opinion. There is no need to save every invocation in the frame per time.

It only needs to keep the invocation of no-pure functions.
Other invocations can be re-calculated again in a short time if they are needing to be displayed or to have interacted, etc.

It maybe can save lots of memory and time spent.

laike9m · 2020-11-20T21:23:50Z

Hi @linw1995, thanks for the suggestions. Collecting less information is definitely one option we should consider. There are probably a few misundertandings here so I'd like to clarify and see what you think.

Currently Cyberbrain does not step in each function call. So if you have a function f:

@trace
def f():
  g()
  h()

and Cyberbrain will only know that g() and h() get called, but will not trace what happened inside g() or h(). I think what you have in mind is multi-frame tracing, which is the major feature planned for v2.

Second, let's say we want to apply this optimization for multi-frame tracing. How do we know if a function is pure? Note that even if we can analyze Python code, we don't know what happens at the C-level. BTW the analyzing process might take a lot of time too.

Third

Other invocations can be re-calculated again in a short time if they are needing to be displayed or to have interacted, etc.

This is not possible. Cyberbrain does not store the original value at each step, because you can't deepcopy unpickleble objects. Instead, we convert the value to JSON immediately and just store the JSON (which is the "JSON pickle" step mentioned in the main thread). Ultimately a JSON representation is what users see in the devtools console, so there's no need to store the original object either.

linw1995 · 2020-11-21T05:22:40Z

@laike9m thanks for the clarification. Yes, I am thinking about multi-frame tracing. In my vision, we can do this, or close enough. One way to lower the overhead of Cyberbrain in multi-frame tracing is that cut some tracing branches.

There are two types of branches that can be cut.

invocations from the other library or built-in functions.
pure function calls.

The pure function calls

Every invocation will produce a snapshot of the current variables which are in use. If collecting snapshots before and after the invocation in the current frame and without non-pure calls, can be re-calculated the detailed events of the invocation in the deeper frame.

Thinking more. The "frame" can be the code block in a function, in different branches of if-elif-else or of the try-except statement, or in code block of for-loop or of the while-loop statement. It may be hard to implement.

About how to determine non-pure invocation in frame, yes, we cannot do that. Maybe let the user decides what frame is the pure calculation, to cut some tracing branches.

def add(a, b):
    return a + b

def print_time():
     print(time.time())

@trace(pure=(add,), depth=2)
def multiply(a, b):
    answer = 0
    print_time()
    for _ in range(a):
        answer = add(answer, b)
    return answer

In the above example, we only need to save the arguments and the return value of the add function calls. We can re-calculate the detail if the user makes an interaction to view the invocations in the add function.

There is no need for deep copying everything. The snapshots in multiply only need to record the variable answer modifications.

need further discussion

CB changes to trace everything in this branch If cannot cut the tracing branch because cannot take the snapshot (current frame having objects cannot deep copy).
CB needs to limit the tracing depth.

laike9m · 2021-01-31T19:42:24Z

After 9789ab0 (Replaced protobuf with msgpack)

Benchmark #1: python3.8 -m examples.password.password examples/password/sonnets/* -s 1 --l33t
  Time (mean ± σ):      6.978 s ±  0.258 s    [User: 6.796 s, System: 0.341 s]
  Range (min … max):    6.745 s …  7.302 s    5 runs

py-spy result:
https://laike9m.github.io/images/9789ab0.svg

Message encoding is not a bottleneck anymore.

laike9m added the vision label Oct 18, 2020

laike9m changed the title ~~Improve Cyberbrain's performance~~ Lowering the overhead brought by Cyberbrain Nov 11, 2020

laike9m added a commit that referenced this issue Nov 12, 2020

Get rid of the overhead brought by debug log, #58

6ee0e94

laike9m added a commit that referenced this issue Nov 12, 2020

Cache value stack handler signature, #58

af9b0d4

laike9m changed the title ~~Lowering the overhead brought by Cyberbrain~~ Lowering the overhead of Cyberbrain Nov 13, 2020

laike9m added a commit that referenced this issue Jan 31, 2021

Replaced protobuf with msgpack, #58

9789ab0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lowering the overhead of Cyberbrain #58

Lowering the overhead of Cyberbrain #58

laike9m commented Oct 18, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

linw1995 commented Nov 20, 2020

laike9m commented Nov 20, 2020 •

edited

Loading

linw1995 commented Nov 21, 2020 •

edited

Loading

laike9m commented Jan 31, 2021 •

edited

Loading

Lowering the overhead of Cyberbrain #58

Lowering the overhead of Cyberbrain #58

Comments

laike9m commented Oct 18, 2020 • edited Loading

Time

Optimize JSON pickle

Memory

laike9m commented Nov 12, 2020 • edited Loading

laike9m commented Nov 12, 2020 • edited Loading

laike9m commented Nov 12, 2020 • edited Loading

linw1995 commented Nov 20, 2020

laike9m commented Nov 20, 2020 • edited Loading

linw1995 commented Nov 21, 2020 • edited Loading

The pure function calls

need further discussion

laike9m commented Jan 31, 2021 • edited Loading

laike9m commented Oct 18, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

laike9m commented Nov 12, 2020 •

edited

Loading

laike9m commented Nov 20, 2020 •

edited

Loading

linw1995 commented Nov 21, 2020 •

edited

Loading

laike9m commented Jan 31, 2021 •

edited

Loading