1.7.0 Catalog fetch improvement #486

benc-db · 2023-10-24T17:13:47Z

Description

Core idea is that we only get data for relations managed by dbt. For UC, this uses information schema. For HMS, this continues to use the existing approach, but filters to selected nodes. Will be updating this PR to refactor and add a couple of unit tests, but the key tests are basically the existing functional tests around documentation.

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

benc-db · 2023-10-24T17:15:47Z

@mikealfare please take a look when you have time.

dbt/include/databricks/macros/catalog.sql

benc-db · 2023-10-24T17:27:20Z

tests/integration/debug/test_debug.py

@@ -13,23 +13,23 @@ def schema(self):
    def models(self):
        return "models"

-    def run_and_test(self, contains_catalog: bool):
+    def run_and_test(self):


These tests changed because I now explicitly set the catalog to hive_metastore if it is set to None (after checking the cluster setting). We were defaulting to spark_catalog, but I couldn't find where this was set, and it was causing me pain.

So question: is there any situation where changing this default behavior is breaking?

benc-db · 2023-10-24T21:22:37Z

@susodapop @rcypher-databricks ready for review...I had planned on doing more refactoring, in part to get functions that are only for hive into their own module, but I think I'll wait for the day when I can take advantage of what Jade is working on.

rcypher-databricks

looks good to me

susodapop

LGTM

susodapop · 2023-10-25T16:37:53Z

dbt/adapters/databricks/impl.py

@@ -425,60 +431,54 @@ def parse_columns_from_information(  # type: ignore[override]
    def get_catalog(
        self, manifest: Manifest, selected_nodes: Optional[Set] = None


Noting for my own sake that in the previous implementation of this method, selected_nodes was optional but was never enforced.

I believe selected_nodes was added recently for the get_catalog performance improvements. We had a miss where we called it with the additional argument before checking if we could and @benc-db had to update this signature as a result. So the kwarg did nothing besides allowing get_catalog to be called with an additional argument.

benc-db added 3 commits October 23, 2023 10:20

wip

7d6146c

wip

148657d

first crack

5ef63e1

benc-db requested review from andrefurlan-db, susodapop and rcypher-databricks as code owners October 24, 2023 17:13

rcypher-databricks added the blocked We cannot currently make progress on this issue, but it is on our radar. label Oct 24, 2023

benc-db temporarily deployed to azure-prod October 24, 2023 17:23 — with GitHub Actions Inactive

benc-db had a problem deploying to azure-prod October 24, 2023 17:23 — with GitHub Actions Failure

rcypher-databricks reviewed Oct 24, 2023

View reviewed changes

dbt/include/databricks/macros/catalog.sql Outdated Show resolved Hide resolved

benc-db commented Oct 24, 2023

View reviewed changes

small cleanup

40ed74b

benc-db temporarily deployed to azure-prod October 24, 2023 20:44 — with GitHub Actions Inactive

revert sql for default

eaa5fc9

benc-db temporarily deployed to azure-prod October 24, 2023 21:20 — with GitHub Actions Inactive

rcypher-databricks previously approved these changes Oct 24, 2023

View reviewed changes

update changelog

04b17c2

benc-db dismissed rcypher-databricks’s stale review via 04b17c2 October 24, 2023 23:20

benc-db removed the blocked We cannot currently make progress on this issue, but it is on our radar. label Oct 24, 2023

susodapop approved these changes Oct 25, 2023

View reviewed changes

benc-db merged commit 5d53855 into main Oct 26, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.7.0 Catalog fetch improvement #486

1.7.0 Catalog fetch improvement #486

benc-db commented Oct 24, 2023 •

edited

Loading

benc-db commented Oct 24, 2023

benc-db Oct 24, 2023

benc-db Oct 24, 2023

benc-db commented Oct 24, 2023

rcypher-databricks left a comment

susodapop left a comment

susodapop Oct 25, 2023

mikealfare Oct 27, 2023

		@@ -425,60 +431,54 @@ def parse_columns_from_information( # type: ignore[override]
		def get_catalog(
		self, manifest: Manifest, selected_nodes: Optional[Set] = None

1.7.0 Catalog fetch improvement #486

1.7.0 Catalog fetch improvement #486

Conversation

benc-db commented Oct 24, 2023 • edited Loading

Description

Checklist

benc-db commented Oct 24, 2023

benc-db Oct 24, 2023

Choose a reason for hiding this comment

benc-db Oct 24, 2023

Choose a reason for hiding this comment

benc-db commented Oct 24, 2023

rcypher-databricks left a comment

Choose a reason for hiding this comment

susodapop left a comment

Choose a reason for hiding this comment

susodapop Oct 25, 2023

Choose a reason for hiding this comment

mikealfare Oct 27, 2023

Choose a reason for hiding this comment

benc-db commented Oct 24, 2023 •

edited

Loading