Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update dbt_is_deleted to string #6749

Merged
merged 12 commits into from
Jan 9, 2025
125 changes: 61 additions & 64 deletions website/docs/docs/build/snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -437,103 +437,100 @@ Snapshot <Term id="table">tables</Term> will be created as a clone of your sourc

In dbt Core v1.9+ (or available sooner in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks)):
- These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config.
ess)
- Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.
- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field.


| Field | Meaning | Usage |
| -------------- | ------- | ----- |
| dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. |
| dbt_valid_to | The timestamp when this row became invalidated. <br /> For current records, this is `NULL` by default <VersionBlock firstVersion="1.9"> or the value specified in `dbt_valid_to_current`.</VersionBlock> | The most recent snapshot record will have `dbt_valid_to` set to `NULL` <VersionBlock firstVersion="1.9"> or the specified value. </VersionBlock> |
| dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt |
| dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt |
| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt |
| Field | <div style={{width:'250px'}}>Meaning</div> | Notes | Example|
| -------------- | ------- | ----- | ------- |
| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | This column can be used to order the different "versions" of a record. | `snapshot_meta_column_names: {dbt_valid_from: start_date}` |
| `dbt_valid_to` | The timestamp when this row became invalidated. For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | `snapshot_meta_column_names: {dbt_valid_to: end_date}` |
| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` |
| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` |
| `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). |Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` |

*The timestamps used for each column are subtly different depending on the strategy you use:
All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to this [example](/reference/resource-configs/snapshot_meta_column_names#example) for more details.

For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns.
*The timestamps used for each column are subtly different depending on the strategy you use:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave the * out unless it's used as an endnote/footnote somewhere else for reference. We could use language we use elsewhere like Note: or similar

Suggested change
*The timestamps used for each column are subtly different depending on the strategy you use:
The timestamps used for each column are subtly different depending on the strategy you use:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same but the asterisk is actually referenced in line 436, which is why I added it back it 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even see that. Pretty easy to miss both ways and might want to consider removing both but doesn't need to block this PR


<details>
<summary> Details for the timestamp strategy </summary>
- For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns.

Snapshot query results at `2024-01-01 11:00`
<Expandable alt_header="Details for the timestamp strategy">
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

| id | status | updated_at |
| -- | ------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 |
Snapshot query results at `2024-01-01 11:00`

Snapshot results (note that `11:00` is not used anywhere):
| id | status | updated_at |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| -- | ------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 |

| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 |
Snapshot results (note that `11:00` is not used anywhere):

Query results at `2024-01-01 11:30`:
| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 |

| id | status | updated_at |
| -- | ------- | ---------------- |
| 1 | shipped | 2024-01-01 11:05 |
Query results at `2024-01-01 11:30`:

Snapshot results (note that `11:30` is not used anywhere):
| id | status | updated_at |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| -- | ------- | ---------------- |
| 1 | shipped | 2024-01-01 11:05 |

| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 |
| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 |
Snapshot results (note that `11:30` is not used anywhere):

Snapshot results with `hard_deletes='new_record'`:
| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 |
| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 |

| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted |
|----|---------|------------------|------------------|------------------|------------------|----------------|
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False |
| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False |
| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True |
Snapshot results with `hard_deletes='new_record'`:

| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
|----|---------|------------------|------------------|------------------|------------------|----------------|
| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False |
| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False |
| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True |

</details>

<br/>
</Expandable>

For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy.
- For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy.

<details>
<summary> Details for the check strategy </summary>
<Expandable alt_header="Details for the check strategy">
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

Snapshot query results at `2024-01-01 11:00`
Snapshot query results at `2024-01-01 11:00`

| id | status |
| -- | ------- |
| 1 | pending |
| id | status |
| -- | ------- |
| 1 | pending |

Snapshot results:
Snapshot results:

| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| -- | ------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 |
| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| -- | ------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 |

Query results at `2024-01-01 11:30`:
Query results at `2024-01-01 11:30`:

| id | status |
| -- | ------- |
| 1 | shipped |
| id | status |
| -- | ------- |
| 1 | shipped |

Snapshot results:
Snapshot results:

| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| --- | ------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 |
| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 |
| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at |
| --- | ------- | ---------------- | ---------------- | ---------------- |
| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 |
| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 |

Snapshot results with `hard_deletes='new_record'`:
Snapshot results with `hard_deletes='new_record'`:

| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted |
|----|---------|------------------|------------------|------------------|----------------|
| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False |
| 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False |
| 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True |
| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted |
|----|---------|------------------|------------------|------------------|----------------|
| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False |
| 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False |
| 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True |

</details>
</Expandable>

## Configure snapshots in versions 1.8 and earlier

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ snapshots:
dbt_valid_to: <string>
dbt_scd_id: <string>
dbt_updated_at: <string>
dbt_is_deleted: <boolean>
dbt_is_deleted: <string>

```

Expand All @@ -35,7 +35,7 @@ snapshots:
"dbt_valid_to": "<string>",
"dbt_scd_id": "<string>",
"dbt_updated_at": "<string>",
"dbt_is_deleted": "<boolean>",
"dbt_is_deleted": "<string>",
}
)
}}
Expand All @@ -54,7 +54,7 @@ snapshots:
dbt_valid_to: <string>
dbt_scd_id: <string>
dbt_updated_at: <string>
dbt_is_deleted: <boolean>
dbt_is_deleted: <string>
```

</File>
Expand All @@ -67,17 +67,17 @@ In order to align with an organization's naming conventions, the `snapshot_meta_

By default, dbt snapshots use the following column names to track change history using [Type 2 slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) records:

| Field | Meaning | Notes |
| -------------- | ------- | ----- |
| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). |
| `dbt_valid_to` | The timestamp when this row is no longer valid. | |
| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. |
| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. |
| `dbt_is_deleted` | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. |
| Field | <div style={{width:'250px'}}>Meaning</div> | Notes | [Example](#example) |
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
| -------------- | ------- | ----- | ------- |
| `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | The value is affected by the [`strategy`](/reference/resource-configs/strategy). | `snapshot_meta_column_names: {dbt_valid_from: start_date}` |
| `dbt_valid_to` | The timestamp when this row is no longer valid. | | `snapshot_meta_column_names: {dbt_valid_to: end_date}` |
| `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` |
| `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` |
| `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). |Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` |

However, these column names can be customized using the `snapshot_meta_column_names` config.
All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to the [Example](#example) for more details.

:::warning
:::warning

To avoid any unintentional data modification, dbt will **not** automatically apply any column renames. So if a user applies `snapshot_meta_column_names` config for a snapshot without updating the pre-existing table, they will get an error. We recommend either only using these settings for net-new snapshots, or arranging an update of pre-existing tables prior to committing a column name change.

Expand Down
Loading