Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] support copy multiple tables in parallel using copy_partitions #559

Open
3 tasks done
Klimmy opened this issue May 15, 2024 · 0 comments · May be fixed by dbt-labs/dbt-bigquery#1413
Open
3 tasks done

[Feature] support copy multiple tables in parallel using copy_partitions #559

Klimmy opened this issue May 15, 2024 · 0 comments · May be fixed by dbt-labs/dbt-bigquery#1413
Labels
feature:python-models Issues related to python models pkg:dbt-bigquery Issue affects dbt-bigquery type:enhancement New feature request

Comments

@Klimmy
Copy link

Klimmy commented May 15, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Python BigQuery Client supports asynchronous copy jobs while the dbt-bigquery adapter sends BigQuery requests one by one (using incremental_strategy = 'insert_overwrite' with copy_partitions=true).

We can achieve a better performance if we start sending requests in small batches of partitions.

dbt-bigquery already supports parallel execution in the copy_bq_table function.
But in the bq_copy_partitions macro partitions are sent one at a time.

We can probably implement this feature by introducing a batch_size argument to the configs:

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'insert_overwrite',
    partition_by = {
      "field": "day",
      "data_type": "date",
      "copy_partitions": true,
      "batch_size": 5
    }
) }}

Default value will be 1. And bq_copy_partitions macro will send a list of partitions to the copy_bq_table, where the size of list = batch_size.

Describe alternatives you've considered

No response

Who will this benefit?

Anyone who has high amount of heavy BigQuery partitions.

Are you interested in contributing this feature?

Definitely, just need a green light to proceed

Anything else?

No response

@Klimmy Klimmy added type:enhancement New feature request triage:product In Product's queue labels May 15, 2024
@amychen1776 amychen1776 added feature:python-models Issues related to python models and removed triage:product In Product's queue labels Aug 27, 2024
@mikealfare mikealfare added the pkg:dbt-bigquery Issue affects dbt-bigquery label Jan 14, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-bigquery Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:python-models Issues related to python models pkg:dbt-bigquery Issue affects dbt-bigquery type:enhancement New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants