Skip to content

Commit

Permalink
Merge branch 'current' into nfiann-sourcefreshness
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 14, 2025
2 parents 2226796 + b4d2ce8 commit 2727549
Show file tree
Hide file tree
Showing 4 changed files with 98 additions and 24 deletions.
16 changes: 11 additions & 5 deletions website/docs/docs/build/unit-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@ keywords:

<VersionCallout version="1.8" />




Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model.

Starting in dbt Core v1.8, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability.
Expand Down Expand Up @@ -219,10 +216,19 @@ dbt test --select test_is_valid_email_address

Your model is now ready for production! Adding this unit test helped catch an issue with the SQL logic _before_ you materialized `dim_customers` in your warehouse and will better ensure the reliability of this model in the future.


## Unit testing incremental models

When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes.
When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes.

:::note
Incremental models need to exist in the database first before running unit tests or doing a `dbt build`. Use the [`--empty` flag](/reference/commands/build#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](/reference/node-selection/syntax#shorthand).

```shell
dbt run --select "config.materialized:incremental" --empty
```

After running the command, you can then perform a regular `dbt build` for that model and then run your unit test.
:::

When testing an incremental model, the expected output is the __result of the materialization__ (what will be merged/inserted), not the resulting model itself (what the final table will look like after the merge/insert).

Expand Down
1 change: 1 addition & 0 deletions website/docs/guides/customize-schema-alias.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ icon: 'guides'
hide_table_of_contents: true
level: 'Advanced'
recently_updated: true
keywords: ["generate", "schema name", "guide", "dbt", "schema customization", "custom schema"]
---

<div style={{maxWidth: '900px'}}>
Expand Down
27 changes: 17 additions & 10 deletions website/docs/reference/resource-configs/event-time.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ models:
```
</File>
<File name='models/properties.yml'>
```yml
Expand Down Expand Up @@ -139,20 +138,28 @@ sources:
## Definition
Set the `event_time` to the name of the field that represents the timestamp of the event -- "at what time did the row occur" -- as opposed to an event ingestion date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.

`event_time` is required for the [incremental microbatch](/docs/build/incremental-microbatch) strategy and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.

### Best practices

Set the `event_time` to the name of the field that represents the actual timestamp of the event (like `account_created_at`). The timestamp of the event should represent "at what time did the row occur" rather than an event ingestion date. Marking a column as the `event_time` when it isn't diverges from the semantic meaning of the column which may result in user confusion when other tools make use of the metadata.

Here are some examples of good and bad `event_time` columns:
However, if an ingestion date (like `loaded_at`, `ingested_at`, or `last_updated_at`) are the only timestamps you use, you can set `event_time` to these fields. Here are some considerations to keep in mind if you do this:

- ✅ Good:
- `account_created_at` &mdash; This represents the specific time when an account was created, making it a fixed event in time.
- `session_began_at` &mdash; This captures the exact timestamp when a user session started, which won’t change and directly ties to the event.
- Using `last_updated_at` or `loaded_at` &mdash; May result in duplicate entries in the resulting table in the data warehouse over multiple runs. Setting an appropriate [lookback](/reference/resource-configs/lookback) value can reduce duplicates but it can't fully eliminate them since some updates outside the lookback window won't be processed.
- Using `ingested_at` &mdash; Since this column is created by your ingestion/EL tool instead of coming from the original source, it will change if/when you need to resync your connector for some reason. This means that data will be reprocessed and loaded into your warehouse for a second time against a second date. As long as this never happens (or you run a full refresh when it does), microbatches will be processed correctly when using `ingested_at`.

- ❌ Bad:
Here are some examples of recommended and not recommended `event_time` columns:

- `_fivetran_synced` &mdash; This isn't the time that the event happened, it's the time that the event was ingested.
- `last_updated_at` &mdash; This isn't a good use case as this will keep changing over time.

`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
| <div style={{width:'200px'}}>Status</div> | Column name | Description |
|--------------------|---------------------|----------------------|
| ✅ Recommended | `account_created_at` | Represents the specific time when an account was created, making it a fixed event in time. |
| ✅ Recommended | `session_began_at` | Captures the exact timestamp when a user session started, which won’t change and directly ties to the event. |
| ❌ Not recommended | `_fivetran_synced` | This represents the time the event was ingested, not when it happened. |
| ❌ Not recommended | `last_updated_at` | Changes over time and isn't tied to the event itself. If used, note the considerations mentioned earlier in [best practices](#best-practices). |

## Examples

Expand Down
78 changes: 69 additions & 9 deletions website/docs/reference/resource-configs/tags.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,21 @@ resource_type:
```
</File>
To apply tags to a model in your `models/` directory, add the `config` property similar to the following example:

<File name='models/model.yml'>

```yaml
models:
- name: my_model
description: A model description
config:
tags: ['example_tag']
```

</File>

</TabItem>

<TabItem value="config">
Expand Down Expand Up @@ -126,10 +141,24 @@ You can use the [`+` operator](/reference/node-selection/graph-operators#the-plu
- `dbt run --select +model_name+` &mdash; Run a model, its upstream dependencies, and its downstream dependencies.
- `dbt run --select tag:my_tag+ --exclude tag:exclude_tag` &mdash; Run model tagged with `my_tag` and their downstream dependencies, and exclude models tagged with `exclude_tag`, regardless of their dependencies.


:::tip Usage notes about tags

When using tags, consider the following:

- Tags are additive across project hierarchy.
- Some resource types (like sources, exposures) require tags at the top level.

Refer to [usage notes](#usage-notes) for more information.
:::

## Examples

The following examples show how to apply tags to resources in your project. You can configure tags in the `dbt_project.yml`, `schema.yml`, or SQL files.

### Use tags to run parts of your project

Apply tags in your `dbt_project.yml` as a single value or a string:
Apply tags in your `dbt_project.yml` as a single value or a string. In the following example, one of the models, the `jaffle_shop` model, is tagged with `contains_pii`.

<File name='dbt_project.yml'>

Expand All @@ -153,16 +182,52 @@ models:
- "published"
```
</File>


### Apply tags to models

This section demonstrates applying tags to models in the `dbt_project.yml`, `schema.yml`, and SQL files.

To apply tags to a model in your `dbt_project.yml` file, you would add the following:

<File name='dbt_project.yml'>

```yaml
models:
jaffle_shop:
+tags: finance # jaffle_shop model is tagged with 'finance'.
```

</File>

To apply tags to a model in your `models/` directory YAML file, you would add the following using the `config` property:

<File name='models/stg_customers.yml'>

```yaml
models:
- name: stg_customers
description: Customer data with basic cleaning and transformation applied, one row per customer.
config:
tags: ['santi'] # stg_customers.yml model is tagged with 'santi'.
columns:
- name: customer_id
description: The unique key for each customer.
data_tests:
- not_null
- unique
```

</File>

You can also apply tags to individual resources using a config block:
To apply tags to a model in your SQL file, you would add the following:

<File name='models/staging/stg_payments.sql'>

```sql
{{ config(
tags=["finance"]
tags=["finance"] # stg_payments.sql model is tagged with 'finance'.
) }}
select ...
Expand Down Expand Up @@ -211,14 +276,10 @@ seeds:

<VersionBlock lastVersion="1.8">

:::tip Upgrade to dbt Core 1.9
Applying tags to saved queries is only available in dbt Core versions 1.9 and later.
:::
<VersionCallout version="1.9" />

</VersionBlock>

<VersionBlock firstVersion="1.9">

This following example shows how to apply a tag to a saved query in the `dbt_project.yml` file. The saved query is then tagged with `order_metrics`.

Expand Down Expand Up @@ -263,7 +324,6 @@ Run resources with multiple tags using the following commands:
# Run all resources tagged "order_metrics" and "hourly"
dbt build --select tag:order_metrics tag:hourly
```
</VersionBlock>

## Usage notes

Expand Down

0 comments on commit 2727549

Please sign in to comment.