-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.7.0 Catalog fetch improvement #486
Conversation
@mikealfare please take a look when you have time. |
@@ -13,23 +13,23 @@ def schema(self): | |||
def models(self): | |||
return "models" | |||
|
|||
def run_and_test(self, contains_catalog: bool): | |||
def run_and_test(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests changed because I now explicitly set the catalog to hive_metastore if it is set to None (after checking the cluster setting). We were defaulting to spark_catalog, but I couldn't find where this was set, and it was causing me pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So question: is there any situation where changing this default behavior is breaking?
@susodapop @rcypher-databricks ready for review...I had planned on doing more refactoring, in part to get functions that are only for hive into their own module, but I think I'll wait for the day when I can take advantage of what Jade is working on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -425,60 +431,54 @@ def parse_columns_from_information( # type: ignore[override] | |||
def get_catalog( | |||
self, manifest: Manifest, selected_nodes: Optional[Set] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting for my own sake that in the previous implementation of this method, selected_nodes
was optional but was never enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe selected_nodes
was added recently for the get_catalog
performance improvements. We had a miss where we called it with the additional argument before checking if we could and @benc-db had to update this signature as a result. So the kwarg did nothing besides allowing get_catalog
to be called with an additional argument.
Description
Catalog fetch improvement
Core idea is that we only get data for relations managed by dbt. For UC, this uses information schema. For HMS, this continues to use the existing approach, but filters to selected nodes. Will be updating this PR to refactor and add a couple of unit tests, but the key tests are basically the existing functional tests around documentation.
Checklist
CHANGELOG.md
and added information about my change to the "dbt-databricks next" section.