Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] GCPC CustomTrainingJobOp should support runtime parameters for worker pool spec args #11504

Open
adamActable opened this issue Jan 8, 2025 · 0 comments

Comments

@adamActable
Copy link

What feature would you like to see?

Recent work enabled some runtime parameter expansion for certain arguments for this Operator: CustomTrainingJobOp #10883

This is great, and gives the user the impression that such expansions will work in other areas of the worker_pool_specs. Alas, they do not. Instead, the following pipeline params:

@dsl.pipeline(
    name="example",
)
def pipeline(
    some_arg: str,
    machine_type: str = "n1-standard-16",
):
    job = CustomTrainingJobOp(
        display_name="foo",
        worker_pool_specs=[
            {
                "machineSpec": {
                    ### working, nice 
                    "machineType": machine_type,
                    ### 
                },
                "replicaCount": "1",
                "diskSpec": {"bootDiskType": "", "bootDiskSizeGb": 100},
                "containerSpec": {
                    "imageUri": "...",
                    "command": ["..."],
                  ### does not work 
                    "args": ["--some_arg=%s" % some_arg],
                  ###
                },
            }
        ],
        timeout="604800s",
        project="...",
    )

Produce the following yaml:

# PIPELINE DEFINITION
# Name: example
# Inputs:
#    machine_type: str [Default: 'n1-standard-16']
#    some_arg: str
components:
...
              runtimeValue:
                constant:
                - containerSpec:
                    args:
                    # issue
                    - --some_arg={{channel:task=;name=some_arg;type=String;}}
                    command:
                    - '...'
                    imageUri: '...'
                  diskSpec:
                    bootDiskSizeGb: 100.0
                    bootDiskType: ''
                  machineSpec:
                    # functional after merge of #10883
                    machineType: '{{$.inputs.parameters[''pipelinechannel--machine_type'']}}'
                  replicaCount: '1'
...

What is the use case or pain point?

I need to use vertex AI Pipelines to deploy nearly-identical jobs, but with different arguments supplied to containers in those jobs.

Is there a workaround currently?

I can't find a reasonable work around. Have to duplicate the pipeline code.


Love this idea? Give it a 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant