From cff992c9585a5d344fe5241ba1e39a9403b38d03 Mon Sep 17 00:00:00 2001 From: Christopher Fiore Date: Thu, 9 Jan 2025 14:09:47 -0800 Subject: [PATCH 1/5] Update metrics-overview.md Update Cumulative metrics example to latest format. --- website/docs/docs/build/metrics-overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index e874dced63a..93a33759dd5 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -179,6 +179,7 @@ metrics: name: active_users fill_nulls_with: 0 join_to_timespine: true + cumulative_type_params: window: 7 days ``` From 7ffb29852f8193284a94c0cdadd3be79c86b0566 Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Fri, 10 Jan 2025 16:01:00 +1300 Subject: [PATCH 2/5] Swap pluses for daggers on databricks pae --- .../resource-configs/databricks-configs.md | 21 ++++++++++--------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 1ee89efc95c..05e32473721 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -21,8 +21,7 @@ When materializing a model as `table`, you may include several optional configs | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | -\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. -We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. +\* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. There is not yet a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. @@ -39,13 +38,13 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this | clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL+, Python+ | `{'my_tag': 'my_value'}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL†, Python† | `{'my_tag': 'my_value'}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. -\+ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. +\† `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. @@ -56,19 +55,21 @@ dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it | Option | Description | Required? | Model Support | Example | |---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------| | table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | -| file_format+ | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| file_format† | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | | location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | | partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | | liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | | clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL++, Python++ | `{'my_tag': 'my_value'}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL‡, Python‡ | `{'my_tag': 'my_value'}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. -\+ When `table_format` is `iceberg`, `file_format` must be `delta`. -\++ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. + +† When `table_format` is `iceberg`, `file_format` must be `delta`. + +‡ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. @@ -260,7 +261,7 @@ This strategy is currently only compatible with All Purpose Clusters, not SQL Wa This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy. -If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. +If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. Date: Fri, 10 Jan 2025 16:13:33 +1300 Subject: [PATCH 3/5] remove a unnecessary escaping backslash --- website/docs/reference/resource-configs/databricks-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 05e32473721..aa825c51f69 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -44,7 +44,7 @@ When materializing a model as `table`, you may include several optional configs \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. -\† `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. +† `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. From 8271cba27838a57c2bdbed7a9373f4fb1c1a9ede Mon Sep 17 00:00:00 2001 From: Joel Labes Date: Fri, 10 Jan 2025 16:19:03 +1300 Subject: [PATCH 4/5] add spaces after daggers --- .../docs/reference/resource-configs/databricks-configs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index aa825c51f69..dbcd9e0b01f 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -38,7 +38,7 @@ When materializing a model as `table`, you may include several optional configs | clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL†, Python† | `{'my_tag': 'my_value'}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL †, Python † | `{'my_tag': 'my_value'}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. @@ -55,14 +55,14 @@ dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it | Option | Description | Required? | Model Support | Example | |---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------| | table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | -| file_format† | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| file_format † | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | | location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | | partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | | liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | | clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL‡, Python‡ | `{'my_tag': 'my_value'}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL ‡, Python ‡ | `{'my_tag': 'my_value'}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. From bda0e84c9dcf7436347d6488af31f30e5e71f305 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Fri, 10 Jan 2025 09:36:58 +0000 Subject: [PATCH 5/5] embed raw HTML --- .../resource-configs/databricks-configs.md | 66 +++++++++---------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index dbcd9e0b01f..95bed967a14 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -10,16 +10,16 @@ When materializing a model as `table`, you may include several optional configs -| Option | Description | Required? | Model Support | Example | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------| -| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | +| Option | Description | Required? | Model support | Example | +|-----------|---------|-------------------|---------------|-------------| +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition.| Optional | SQL, Python | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. There is not yet a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. @@ -29,41 +29,41 @@ When materializing a model as `table`, you may include several optional configs 1.8 introduces support for [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) at the table level, in addition to all table configuration supported in 1.7. -| Option | Description | Required? | Model Support | Example | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------| -| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL †, Python † | `{'my_tag': 'my_value'}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | +| Option | Description | Required?| Model support | Example | +|-----------|---------------|----------|---------------|----------| +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL, Python | `{'my_tag': 'my_value'}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* Beginning in 1.7.12, we have added tblproperties to Python models via an alter statement that runs after table creation. We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. † `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. - + dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it now on the [dbt Cloud "Latest" release track](/docs/dbt-versions/cloud-release-tracks). All other table configurations were also supported in 1.8. -| Option | Description | Required? | Model Support | Example | -|---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------| -| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | -| file_format † | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | +| Option | Description| Required? | Model support | Example | +|-------------|--------|-----------|-----------------|---------------| +| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | +| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | | liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | -| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL ‡, Python ‡ | `{'my_tag': 'my_value'}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL , Python | `{'my_tag': 'my_value'}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | \* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties.