-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add decorators to modify map function behaviour #8
base: master
Are you sure you want to change the base?
Conversation
Can you show how you would achieve the same things without these decorators? Perhaps just for one of the examples to start with - I don't want to take up too much of your time. I'd like to see the problem you're trying to solve, but my initial reaction is that you're optimising too much for short code rather than clear code. In particular:
|
My hope is that, by keeping the extra logic outside of the actual implementations, it's more akin to syntactical sugar than built-in features. It's supposed to be unnecessary, not adding something you cannot do without, yet making it shorter to write if it fits. And honestly, one can probably be opiniated in these cases (as with Another pattern I like about the decorators is that these details rest with the defined map function, not with the map call. One could define competing map functions, each having slightly different semantics and encode these details within them rather than around the map call. No worries btw, it's an off-time project currently anyways, all the happier if it's also useful. Here are recent examples, which actually happened when prototyping REMI calibration code. I won't make the effort to have runnable code, just to make a point: a) Initializer sort = sort_class([0, 1, 2, 3, 4, 5, 6])
sort.set_scale_factors(0.34, 0.34, 0.386)
def process_train(worker_id, index, train_id, data):
...
sort.set_data(group) The new pattern would allow me to properly create a custom object for each worker in an initializer, and if necessary even de-allocate it. b) Local copies indices = np.empty((100000,), dtype=np.float64)
def process_train(worker_id, index, train_id, data):
....
edge_idx = cfd_native(trace, indices, ...)
# indices[:edge_idx] contains relevant data c) Reduction avg_spectra = psh.array_per_worker((100000,), dtype=np.float)
def average_run(worker_id, index, train_id, data)::
avg_spectra[worker_id] += data['my_data_source']
psh.map(average_run, run)
avg_spectra = avg_spectra.sum(axis=0) / len(run.train_ids) This one's a fairly common pattern for me in near-online analysis, as it can tremendously accelerate this simple operation to get a mean or sum spectra of a run (especially on ONC with its SSD storage). I wrote the lines above countless of times, and often you do a few processing steps on the raw data before adding it. |
I agree with you that it's nice to have the different steps together. But I don't particularly like gluing them together with decorators. For the pre/post worker steps, what about if you had the option to supply a worker function rather than a 'kernel', so you tie the steps together inside a single function. Rewriting your first example: import numpy as np
import pasha as psh
outp = psh.alloc(shape=10, dtype=np.int32)
def worker(functor, share, worker_id):
worker_id_plus_2 = worker_id + 2
for index, row in functor.iterate(share):
outp[index] = worker_id_plus_2 - worker_id
assert worker_id_plus_2 == worker_id + 2
psh.do_work(worker, np.arange(10))
np.testing.assert_allclose(outp, 2) This example looks rather ridiculous now, because This is possible because you've made a rather nice, general 'functor' interface, so having a 'worker' function with a loop is hardly any more code than a 'kernel' function which pasha calls in a loop. It's also potentially more flexible, e.g. if you want to do some vectorised operations on an array, you might be able to avoid looping, or you can loop over the values twice if that's useful for some reason. For making local copies, I'd do it explicitly: def worker(functor, share, worker_id):
local_buf = buf.copy()
for index, row in functor.iterate(share):
local_buf[:] = wid
np.testing.assert_allclose(buf, wid)
local_bufs[wid] = local_buf You could have a helper function to simplify copying several values at once ( There's a bit more going on in the reduction case. Maybe a decorator based solution is useful here - I'm still thinking about how this could be done. One thing that might make it clearer is injecting the special variables as arguments to the kernel/worker function, rather than as globals, so you can see that they're coming from the machinery calling it. @psh.with_reduction('outp')
def kernel(wid, index, row, outp): |
Thanks a lot for your feedback! I understand your preference to making it explicit, and as it turns out, this is definitely something one should add (see below.), too. That being said, I'm still a fan of making some syntactic sugar for it. The inspiration of This is not meant to dismiss your concerns, but maybe you have another idea on how to somehow make these patterns more convenient without the need to be fully explicit?
With the concerns voiced in the beginning, your idea of a worker function turned out very wonderful in another portion of the code I'm working 😄 By now, I've split the calculation into several map calls, and once of them is living entirely in Cython. With a worker function, one can eliminate even more Python overhead by moving the whole loop into Cython. And that map call is iterating A LOT.
I'm feeling similar to above, it's quite a change for a minor thing as "give me a local copy please". It's kind of what's happening behind the scenes, but I'm wondering if one can make a expressive interface for it.
That was actually my first version (for all three features), until I figured out the generator trick. I'm not sure I'm liking it anymore though, so this might be a more stable, yet less fancy way to do it. |
e729be7
to
5b57124
Compare
ada49a1
to
ff6f4d9
Compare
I've developed a few common patterns when using
pasha
and its spiritual predecessors, which can add up quite a bit of boilerplate at times and/or not fully work across all context types if implemented lazily.I always wanted to somehow automatize or at least make these things more convenient, but I was wary of making
pasha
's somewhat clean API more ugly with keywords and calls and such. Over the weekend, I had an inspiration on how one might get away with this as a purely optional feature. After some tests, it actually turned into a fully featured API to hook into various parts of the mapping process via decorators.An example for the already implemented decorators:
a) Initialize/finalize a worker scope
b) Special case of a) for local buffers
c) Automatize reduction
Any thoughts?
(The MRs compares against the
single_alloc_api
branch to keep the diff smaller)