-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Move `SemanticModel` sub dataclasses to dbt/artifacts * Move `NodeRelation` to dbt/artifacts * Move `SemanticModelConfig` to dbt/artifacts * Move data portion of `SemanticModel` to dbt/artifacts * Add contextual comments to `semantic_model.py` about DSI protocols * Fixup mypy complaint * Migrate v12 manifest to use artifact definitions of `SavedQuery`, `Metric`, and `SemanticModel` * Convert `SemanticModel` and `Metric` resources to full nodes in selector search In the `search` method in `selector_methods.py`, we were getting object representations from the incoming writable manifest by unique id. What we get from the writable manifest though is increasingly the `resource` (data artifact) part of the node, not the full node. This was problematic because a number of the selector processes _compare_ the old node to the new node, but the `resource` representation doesn't have the comparator methods. In this commit we dict-ify the resource and then get the full node by undictifying that. We should probably have a better built in process to the full node objects to do this, but this will do for now. * Add `from_resource` implementation on `BaseNode` to ease resource to node conversion We want to easily be able to create nodes from their resource counter parts. It's actually imperative that we can do so. The previous commit had a manual way to do so where needed. However, we don't want to have to put `from_dict(.to_dict())` everywhere. So here we hadded a `from_resource` class method to `BaseNode`. Everything that inherits from `BaseNode` thus automatically gets this functionality. HOWEVER, the implementation currently has a problem. Specifically, the type for `resource_instance` is `BaseResource`. Which means if one is calling say `Metric.from_resource()`, one could hand it a `SemanticModelResource` and mypy won't complain. In this case, a semi-cryptic error might get raised at runtime. Whether or not an error gets raised depends entirely on whether or not the dictified resource instance manages to satisfy all the required attributes of the desired node class. THIS IS VERY BAD. We should be able to solve this issue in an upcoming (hopefully next) commit, wherein we genericize `BaseNode` such that when inheriting it you declare it with a resource type. Technically a runtime error will still be possible, however any mixups should be caught by mypy on pre-commit hooks as well as PRs. * Make `BaseNode` a generic that is defined with a `ResourceType` Turning `BaseNode` into an ABC generic allows us to say that the inheriting class can define what resource type from artifacts it should be used with. This gives us added type safety to what resource type can be passed into `from_resource` when called via `SemanticModel.from_resource(...)`, `Metric.from_resource(...)`, and etc. NOTE: This only gives us type safety from mypy. If we begin ignoring mypy errors during development, we can still get into a situation for runtime errors (it's just harder to do so now).
- Loading branch information
Showing
16 changed files
with
361 additions
and
306 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Under the Hood | ||
body: Move data portion of `SemanticModel` to dbt/artifacts | ||
time: 2024-01-29T16:38:00.245253-08:00 | ||
custom: | ||
Author: QMalcolm | ||
Issue: "9387" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
import time | ||
|
||
from dataclasses import dataclass, field | ||
from dbt.artifacts.resources.base import GraphResource | ||
from dbt.artifacts.resources.v1.components import DependsOn, RefArgs | ||
from dbt_common.contracts.config.base import BaseConfig, CompareBehavior, MergeBehavior | ||
from dbt_common.dataclass_schema import dbtClassMixin | ||
from dbt_semantic_interfaces.references import ( | ||
DimensionReference, | ||
EntityReference, | ||
LinkableElementReference, | ||
MeasureReference, | ||
SemanticModelReference, | ||
TimeDimensionReference, | ||
) | ||
from dbt_semantic_interfaces.type_enums import ( | ||
AggregationType, | ||
DimensionType, | ||
EntityType, | ||
TimeGranularity, | ||
) | ||
from dbt.artifacts.resources import SourceFileMetadata | ||
from typing import Any, Dict, List, Optional, Sequence | ||
|
||
|
||
""" | ||
The classes in this file are dataclasses which are used to construct the Semantic | ||
Model node in dbt-core. Additionally, these classes need to at a minimum support | ||
what is specified in their protocol definitions in dbt-semantic-interfaces. | ||
Their protocol definitions can be found here: | ||
https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/semantic_model.py | ||
""" | ||
|
||
|
||
@dataclass | ||
class Defaults(dbtClassMixin): | ||
agg_time_dimension: Optional[str] = None | ||
|
||
|
||
@dataclass | ||
class NodeRelation(dbtClassMixin): | ||
alias: str | ||
schema_name: str # TODO: Could this be called simply "schema" so we could reuse StateRelation? | ||
database: Optional[str] = None | ||
relation_name: Optional[str] = None | ||
|
||
|
||
# ==================================== | ||
# Dimension objects | ||
# Dimension protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/dimension.py | ||
# ==================================== | ||
|
||
|
||
@dataclass | ||
class DimensionValidityParams(dbtClassMixin): | ||
is_start: bool = False | ||
is_end: bool = False | ||
|
||
|
||
@dataclass | ||
class DimensionTypeParams(dbtClassMixin): | ||
time_granularity: TimeGranularity | ||
validity_params: Optional[DimensionValidityParams] = None | ||
|
||
|
||
@dataclass | ||
class Dimension(dbtClassMixin): | ||
name: str | ||
type: DimensionType | ||
description: Optional[str] = None | ||
label: Optional[str] = None | ||
is_partition: bool = False | ||
type_params: Optional[DimensionTypeParams] = None | ||
expr: Optional[str] = None | ||
metadata: Optional[SourceFileMetadata] = None | ||
|
||
@property | ||
def reference(self) -> DimensionReference: | ||
return DimensionReference(element_name=self.name) | ||
|
||
@property | ||
def time_dimension_reference(self) -> Optional[TimeDimensionReference]: | ||
if self.type == DimensionType.TIME: | ||
return TimeDimensionReference(element_name=self.name) | ||
else: | ||
return None | ||
|
||
@property | ||
def validity_params(self) -> Optional[DimensionValidityParams]: | ||
if self.type_params: | ||
return self.type_params.validity_params | ||
else: | ||
return None | ||
|
||
|
||
# ==================================== | ||
# Entity objects | ||
# Entity protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/entity.py | ||
# ==================================== | ||
|
||
|
||
@dataclass | ||
class Entity(dbtClassMixin): | ||
name: str | ||
type: EntityType | ||
description: Optional[str] = None | ||
label: Optional[str] = None | ||
role: Optional[str] = None | ||
expr: Optional[str] = None | ||
|
||
@property | ||
def reference(self) -> EntityReference: | ||
return EntityReference(element_name=self.name) | ||
|
||
@property | ||
def is_linkable_entity_type(self) -> bool: | ||
return self.type in (EntityType.PRIMARY, EntityType.UNIQUE, EntityType.NATURAL) | ||
|
||
|
||
# ==================================== | ||
# Measure objects | ||
# Measure protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/measure.py | ||
# ==================================== | ||
|
||
|
||
@dataclass | ||
class MeasureAggregationParameters(dbtClassMixin): | ||
percentile: Optional[float] = None | ||
use_discrete_percentile: bool = False | ||
use_approximate_percentile: bool = False | ||
|
||
|
||
@dataclass | ||
class NonAdditiveDimension(dbtClassMixin): | ||
name: str | ||
window_choice: AggregationType | ||
window_groupings: List[str] | ||
|
||
|
||
@dataclass | ||
class Measure(dbtClassMixin): | ||
name: str | ||
agg: AggregationType | ||
description: Optional[str] = None | ||
label: Optional[str] = None | ||
create_metric: bool = False | ||
expr: Optional[str] = None | ||
agg_params: Optional[MeasureAggregationParameters] = None | ||
non_additive_dimension: Optional[NonAdditiveDimension] = None | ||
agg_time_dimension: Optional[str] = None | ||
|
||
@property | ||
def reference(self) -> MeasureReference: | ||
return MeasureReference(element_name=self.name) | ||
|
||
|
||
# ==================================== | ||
# SemanticModel final parts | ||
# ==================================== | ||
|
||
|
||
@dataclass | ||
class SemanticModelConfig(BaseConfig): | ||
enabled: bool = True | ||
group: Optional[str] = field( | ||
default=None, | ||
metadata=CompareBehavior.Exclude.meta(), | ||
) | ||
meta: Dict[str, Any] = field( | ||
default_factory=dict, | ||
metadata=MergeBehavior.Update.meta(), | ||
) | ||
|
||
|
||
@dataclass | ||
class SemanticModel(GraphResource): | ||
model: str | ||
node_relation: Optional[NodeRelation] | ||
description: Optional[str] = None | ||
label: Optional[str] = None | ||
defaults: Optional[Defaults] = None | ||
entities: Sequence[Entity] = field(default_factory=list) | ||
measures: Sequence[Measure] = field(default_factory=list) | ||
dimensions: Sequence[Dimension] = field(default_factory=list) | ||
metadata: Optional[SourceFileMetadata] = None | ||
depends_on: DependsOn = field(default_factory=DependsOn) | ||
refs: List[RefArgs] = field(default_factory=list) | ||
created_at: float = field(default_factory=lambda: time.time()) | ||
config: SemanticModelConfig = field(default_factory=SemanticModelConfig) | ||
unrendered_config: Dict[str, Any] = field(default_factory=dict) | ||
primary_entity: Optional[str] = None | ||
group: Optional[str] = None | ||
|
||
@property | ||
def entity_references(self) -> List[LinkableElementReference]: | ||
return [entity.reference for entity in self.entities] | ||
|
||
@property | ||
def dimension_references(self) -> List[LinkableElementReference]: | ||
return [dimension.reference for dimension in self.dimensions] | ||
|
||
@property | ||
def measure_references(self) -> List[MeasureReference]: | ||
return [measure.reference for measure in self.measures] | ||
|
||
@property | ||
def has_validity_dimensions(self) -> bool: | ||
return any([dim.validity_params is not None for dim in self.dimensions]) | ||
|
||
@property | ||
def validity_start_dimension(self) -> Optional[Dimension]: | ||
validity_start_dims = [ | ||
dim for dim in self.dimensions if dim.validity_params and dim.validity_params.is_start | ||
] | ||
if not validity_start_dims: | ||
return None | ||
return validity_start_dims[0] | ||
|
||
@property | ||
def validity_end_dimension(self) -> Optional[Dimension]: | ||
validity_end_dims = [ | ||
dim for dim in self.dimensions if dim.validity_params and dim.validity_params.is_end | ||
] | ||
if not validity_end_dims: | ||
return None | ||
return validity_end_dims[0] | ||
|
||
@property | ||
def partitions(self) -> List[Dimension]: # noqa: D | ||
return [dim for dim in self.dimensions or [] if dim.is_partition] | ||
|
||
@property | ||
def partition(self) -> Optional[Dimension]: | ||
partitions = self.partitions | ||
if not partitions: | ||
return None | ||
return partitions[0] | ||
|
||
@property | ||
def reference(self) -> SemanticModelReference: | ||
return SemanticModelReference(semantic_model_name=self.name) | ||
|
||
def checked_agg_time_dimension_for_measure( | ||
self, measure_reference: MeasureReference | ||
) -> TimeDimensionReference: | ||
measure: Optional[Measure] = None | ||
for measure in self.measures: | ||
if measure.reference == measure_reference: | ||
measure = measure | ||
|
||
assert ( | ||
measure is not None | ||
), f"No measure with name ({measure_reference.element_name}) in semantic_model with name ({self.name})" | ||
|
||
default_agg_time_dimension = ( | ||
self.defaults.agg_time_dimension if self.defaults is not None else None | ||
) | ||
|
||
agg_time_dimension_name = measure.agg_time_dimension or default_agg_time_dimension | ||
assert agg_time_dimension_name is not None, ( | ||
f"Aggregation time dimension for measure {measure.name} on semantic model {self.name} is not set! " | ||
"To fix this either specify a default `agg_time_dimension` for the semantic model or define an " | ||
"`agg_time_dimension` on the measure directly." | ||
) | ||
return TimeDimensionReference(element_name=agg_time_dimension_name) | ||
|
||
@property | ||
def primary_entity_reference(self) -> Optional[EntityReference]: | ||
return ( | ||
EntityReference(element_name=self.primary_entity) | ||
if self.primary_entity is not None | ||
else None | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.