-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-427] Add support for custom containers while using dataproc serverless #642
Comments
Hey @bveber, excuse my lack of expertise on the topic, but what kind of changes are required on Is that a new parameter in the profile that points to a container image, wired to the call dbt makes to dataproc? |
Hi @Fleid, you are correct. The custom container can be defined in the runtime_config that is passed to the dataproc call. It expects an image that exists in Container Registry or Artifact Registry with a naming format like |
Sweet! It looks like you would be able to contribute a PR here? |
I'm happy to raise a PR for this. I do have another related feature request for the dataproc serverless integration, specifically the ability to pass in custom Spark properties. Currently the number of executor instances is hard-coded and it would be nice to be able to define this in the model config as well as any other relevant properties. Should I open a new issue to track that change or should I try to implement it in my upcoming PR? |
Thank you everyone for all of the conversation here! I'm closing this out for now because this is not something we will be prioritizing for our roadmap for now. We are going to revisit our spark strategy this year but this will not include expansion on datapoc on dbt-bigquery for python models. |
Is this your first time submitting a feature request?
Describe the feature
I would like to use a custom container for my dataproc serveless jobs.
Describe alternatives you've considered
Dataproc serverless does not offer the ability to install extra packages in their standard runtime. The only other way I could add unsupported dependencies is by using a cluster.
Who will this benefit?
Users creating python models with additional dependencies not supported by the standard dataproc serverless runtime.
Are you interested in contributing this feature?
I have a working fork but haven't run the test suite yet.
Anything else?
No response
The text was updated successfully, but these errors were encountered: