Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rfc): Write RFC 0001 to propose task chunking support #54

Open
wants to merge 1 commit into
base: mainline
Choose a base branch
from

Conversation

mwiebe
Copy link

@mwiebe mwiebe commented Dec 2, 2024

Request for comment on task chunking proposal

Tracking Issue: #53

Chunking frames together is a commonly supported solution for amortizing application and scene loading overhead, and Open Job Description should support it. This RFC adds chunking to job templates with the goal that it be easy to author and read templates that work well with both contiguous frame ranges and pick up renders, and is not to much work to implement.

Here's a link directly to the rendered RFC: https://github.com/mwiebe/openjd-specifications/blob/task-chunking-rfc/rfcs/0001-task-chunking.md


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Ahuge
Copy link
Contributor

Ahuge commented Dec 3, 2024

Love this RFC, thanks Mark.

Question for you, do we have a reference to the rangeConstraint value to determine if the {{Task.Param.Frame}} value is a NONCONTIGUOUS or a CONTIGUOUS value?

Perhaps that isn't something we need to worry about, I am just getting my head around this RFC but I wanted to ask if that had been considered.

**Edit the rangeConstraint determines how OpenJD will pass chunks to the task, that makes sense.
The CONTIGUOUS vs NONCONTIGUOUS value refers to if the task/adaptor supports NONCONTIGUOUS ranges.

I was implementing a POC for the NukeAdaptor and by default it currently only supports (\d+)-(\d+) frame ranges (CONTIGUOUS) but because we're running an adaptor workflow, we could add NONCONTIGUOUS functionality

Please correct me if I am wrong in that assumption.

**Perhaps having a third example would help. Where the frame range passed to the job parameter is a complex/NONCONTIGUOUS range but setting the rangeConstraint value to be CONTIGUOUS shows that the task only ever receives CONTIGUOUS chunks

@mwiebe
Copy link
Author

mwiebe commented Dec 3, 2024

Love this RFC, thanks Mark.

Awesome, thanks!

Question for you, do we have a reference to the rangeConstraint value to determine if the {{Task.Param.Frame}} value is a NONCONTIGUOUS or a CONTIGUOUS value?

I expect that the person setting the rangeConstraint and the person writing the step script are the same, so for a particular step it would always be one or the other. Do you have an idea about when you might reach for that value programmatically from the step script, versus just implementing one or the other behavior directly?

Perhaps that isn't something we need to worry about, I am just getting my head around this RFC but I wanted to ask if that had been considered.

I had considered a slightly different angle, of whether it makes sense for someone submitting a job to control this option. My conclusion was it's better not to, and the "@fmtstring selection" Design Rationale section talks about it.

@mwiebe
Copy link
Author

mwiebe commented Dec 3, 2024

I was implementing a POC for the NukeAdaptor and by default it currently only supports (\d+)-(\d+) frame ranges (CONTIGUOUS) but because we're running an adaptor workflow, we could add NONCONTIGUOUS functionality

Please correct me if I am wrong in that assumption

Yes this is exactly the idea. Then when someone submits a pickup render with a few scattered frames, it can run with chunks like "7,12,25".

@mwiebe
Copy link
Author

mwiebe commented Dec 3, 2024

**Perhaps having a third example would help. Where the frame range passed to the job parameter is a complex/NONCONTIGUOUS range but setting the rangeConstraint value to be CONTIGUOUS shows that the task only ever receives CONTIGUOUS chunks

This is a good idea, maybe I can do that in the first example by setting a non-contiguous default there with a comment explaining how it will get chunked.

UPDATE: I've changes the first example's default frames to something noncontiguous and added a description of likely chunk selection. Thanks for the suggestion!

@mwiebe mwiebe force-pushed the task-chunking-rfc branch 2 times, most recently from eaf88bf to eb5b459 Compare December 3, 2024 22:51
+ as chunks.
+ 3. *range* — The list of values that the parameter takes on to define Tasks of the Step.
+ 4. *chunks* — Specifies how to form sets of values into chunks.
+ 1. *defaultTaskCount* — How many tasks to combine into a single chunk by default.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the default prefix? Is it because this value can be overridden at runtime based on targetRuntimeSeconds or by the remaining frames in a range? If so, is this the same as maxTaskCount?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing the proposal! It's called default to indicate the scheduler should use this value if it doesn't have a better value to use. It's not the same as maxTaskCount because the scheduler can choose larger values, like when chunks are finishing quicker than in targetRuntimeSeconds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could consider renaming it to initialTaskCount or even initialChunkSize to emphasize that it's the starting point for chunk sizing and avoid confusion with optional parameters that have predefined defaults?

* Also fix rfc.md issue link and missing `about:` in markdown issue
  templates.
* Add a "Design Choice Rationale" section to the template RFC, as it was
  a useful section for this RFC.
* Show usage of RFC 0002, marking it as an extension of the current
  revision.

Signed-off-by: Mark Wiebe <[email protected]>
+ chunks:
+ defaultTaskCount: <integer> | <intstring> # @fmtstring
+ targetRuntimeSeconds: <integer> | <intstring> # @optional @fmtstring
+ rangeConstraint: "CONTIGUOUS" | "NONCONTIGUOUS"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make rangeContraint optional and have a default value of NONCONTIGUOUS? Pedantically, the NONCONTIGUOUS option is a lack of a constraint. I'd guess that NONCONTIGUOUS would be the much more common option, so it'd simplify the parameter definition for many templates.

Copy link
Author

@mwiebe mwiebe Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it required so that templates will be more understandable without having to know what the default is. I would have picked CONTIGUOUS as the default, and NONCONTIGUOUS as the more general option you can opt into. I was thinking of CLI commands that have start/end frame options.


![chunk example runtime](.images/0001-task-chunking-chunk-example-runtime.svg)

Here's an example timeline for running 15 tasks with this chunking on one worker host,
Copy link

@eherozhao eherozhao Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I misunderstand this.
Does these 15 tasks mean 15 frames here? or openjd Tasks? I am thinking that we are talking about 3 openjd tasks with 6 frames chunk size but I am a bit confused about the terminalogy here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, task == frame here, so it means 15 tasks and 15 frames. A chunk of 6 tasks is then also a chunk of 6 frames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants