Skip to content

Commit

Permalink
Merge branch 'current' into add_duckdb_ref
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 9, 2025
2 parents 874d4c5 + 13c3968 commit eb4f97e
Show file tree
Hide file tree
Showing 60 changed files with 1,178 additions and 346 deletions.
12 changes: 6 additions & 6 deletions website/blog/2022-04-19-dbt-cloud-postman-collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ is_featured: true
The dbt Cloud API has well-documented endpoints for creating, triggering and managing dbt Cloud jobs. But there are other endpoints that aren’t well documented yet, and they’re extremely useful for end-users. These endpoints exposed by the API enable organizations not only to orchestrate jobs, but to manage their dbt Cloud accounts programmatically. This creates some really interesting capabilities for organizations to scale their dbt Cloud implementations.

The main goal of this article is to spread awareness of these endpoints as the docs are being built & show you how to use them.
The main goal of this article is to spread awareness of these endpoints as the docs are being built & show you how to use them.

<!--truncate-->

Expand All @@ -45,7 +45,7 @@ Beyond the day-to-day process of managing their dbt Cloud accounts, many organiz

*Below this you’ll find a series of example requests - use these to guide you or [check out the Postman Collection](https://dbtlabs.postman.co/workspace/Team-Workspace~520c7ac4-3895-4779-8bc3-9a11b5287c1c/request/12491709-23cd2368-aa58-4c9a-8f2d-e8d56abb6b1dlinklink) to try it out yourself.*

## Appendix
## Appendix

### Examples of how to use the Postman Collection

Expand All @@ -55,7 +55,7 @@ Let’s run through some examples on how to make good use of this Postman Collec

One common question we hear from customers is “How can we migrate resources from one dbt Cloud project to another?” Often, they’ll create a development project, in which users have access to the UI and can manually make changes, and then migrate selected resources from the development project to a production project once things are ready.

There are several reasons one might want to do this, including:
There are several reasons one might want to do this, including:

- Probably the most common is separating dev/test/prod environments across dbt Cloud projects to enable teams to build manually in a development project, and then automatically migrate those environments & jobs to a production project.
- Building “starter projects” they can deploy as templates for new teams onboarding to dbt from a learning standpoint.
Expand Down Expand Up @@ -90,10 +90,10 @@ https://cloud.getdbt.com/api/v3/accounts/28885/projects/86704/environments/75286

#### Push the environment to the production project

We take the response from the GET request above, and then to the following:
We take the response from the GET request above, and then to the following:

1. Adjust some of the variables for the new environment:
- Change the the value of the “project_id” field from 86704 to 86711
- Change the value of the “project_id” field from 86704 to 86711
- Change the value of the “name” field from “dev-staging” to “production–api-generated”
- Set the “custom_branch” field to “main”

Expand All @@ -116,7 +116,7 @@ We take the response from the GET request above, and then to the following:
}
```

3. Note the environment ID returned in the response, as we’ll use to create a dbt Cloud job in the next step
3. Note the environment ID returned in the response, as we’ll use to create a dbt Cloud job in the next step

#### Pull the job definition from the dev project

Expand Down
14 changes: 7 additions & 7 deletions website/blog/2022-05-17-stakeholder-friendly-model-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ In this article, we’ll take a deeper look at why model naming conventions are

>[Data folks], what we [create in the database]… echoes in eternity.” -Max(imus, Gladiator)
Analytics Engineers are often centrally located in the company, sandwiched between data analysts and data engineers. This means everything AEs create might be read and need to be understood by both an analytics or customer-facing team and by teams who spend most of their time in code and the database. Depending on the audience, the scope of access differs, which means the user experience and context changes. Let’s elaborate on what that experience might look like by breaking end-users into two buckets:
Analytics Engineers are often centrally located in the company, sandwiched between data analysts and data engineers. This means everything AEs create might be read and need to be understood by both an analytics or customer-facing team and by teams who spend most of their time in code and the database. Depending on the audience, the scope of access differs, which means the user experience and context changes. Let’s elaborate on what that experience might look like by breaking end-users into two buckets:

- Analysts / BI users
- Analytics engineers / Data engineers
Expand All @@ -49,21 +49,21 @@ Here we have drag and drop functionality and a skin over top of the underlying `
**How model names can make this painful:**
The end users might not even know what tables the data refers to, as potentially everything is joined by the system and they don’t need to write their own queries. If model names are chosen poorly, there is a good chance that the BI layer on top of the database tables has been renamed to something more useful for the analysts. This adds an extra step of mental complexity in tracing the <Term id="data-lineage">lineage</Term> from data model to BI.

#### Read only access to the dbt Cloud IDE docs
#### Read only access to the dbt Cloud IDE docs
If Analysts want more context via documentation, they may traverse back to the dbt layer and check out the data models in either the context of the Project or Database. In the Project view, they will see the data models in the folder hierarchy present in your project’s repository. In the Database view you will see the output of the data models as present in your database, ie. `database / schema / object`.

![A screenshot depicting the dbt Cloud IDE menu's Database view which shows you the output of your data models. Next to this view, is the Project view.](/img/blog/2022-05-17-stakeholder-friendly-model-names/project-view.png)

**How model names can make this painful:**
For the Project view, generally abstracted department or organizational structures as folder names presupposes the reader/engineer knows what is contained within the folder beforehand or what that department actually does, or promotes haphazard clicking to open folders to see what is within. Organizing the final outputs by business unit or analytics function is great for end users but doesn't accurately represent all the sources and references that had to come together to build this output, as they often live in another folder.
For the Project view, generally abstracted department or organizational structures as folder names presupposes the reader/engineer knows what is contained within the folder beforehand or what that department actually does, or promotes haphazard clicking to open folders to see what is within. Organizing the final outputs by business unit or analytics function is great for end users but doesn't accurately represent all the sources and references that had to come together to build this output, as they often live in another folder.

For the Database view, pray your team has been declaring a logical schema bucketing, or a logical model naming convention, otherwise you will have a long, alphabetized list of database objects to scroll through, where staging, intermediate, and final output models are all intermixed. Clicking into a data model and viewing the documentation is helpful, but you would need to check out the DAG to see where the model lives in the overall flow.

#### The full dropdown list in their data warehouse.

If they have access to Worksheets, SQL runner, or another way to write ad hoc sql queries, then they will have access to the data models as present in your database, ie. `database / schema / object`, but with less documentation attached, and more proclivity towards querying tables to check out their contents, which costs time and money.

![A screenshot of the the SQL Runner menu within Looker showcasing the dropdown list of all data models present in the database.](/img/blog/2022-05-17-stakeholder-friendly-model-names/data-warehouse-dropdown.png)
![A screenshot of the SQL Runner menu within Looker showcasing the dropdown list of all data models present in the database.](/img/blog/2022-05-17-stakeholder-friendly-model-names/data-warehouse-dropdown.png)

**How model names can make this painful:**
Without proper naming conventions, you will encounter `analytics.order`, `analytics.orders`, `analytics.orders_new` and not know which one is which, so you will open up a scratch statement tab and attempt to figure out which is correct:
Expand All @@ -73,9 +73,9 @@ Without proper naming conventions, you will encounter `analytics.order`, `analyt
-- select * from analytics.orders limit 10
select * from analytics.orders_new limit 10
```
Hopefully you get it right via sampling queries, or eventually find out there is a true source of truth defined in a totally separate area: `core.dim_orders`.
Hopefully you get it right via sampling queries, or eventually find out there is a true source of truth defined in a totally separate area: `core.dim_orders`.

The problem here is the only information you can use to determine what data is within an object or the purpose of the object is within the schema and model name.
The problem here is the only information you can use to determine what data is within an object or the purpose of the object is within the schema and model name.

### The engineer’s user experience

Expand All @@ -98,7 +98,7 @@ There is not much worse than spending all week developing on a task, submitting
This is largely the same as the Analyst experience above, except they created the data models or are aware of their etymologies. They are likely more comfortable writing ad hoc queries, but also have the ability to make changes, which adds a layer of thought processing when working.

**How model names can make this painful:**
It takes time to become a subject matter expert in the database. You will need to know which schema a subject lives in, what tables are the source of truth and/or output models, versus experiments, outdated objects, or building blocks used along the way. Working within this context, engineers know the history and company lore behind why a table was named that way or how its purpose may differ slightly from its name, but they also have the ability to make changes.
It takes time to become a subject matter expert in the database. You will need to know which schema a subject lives in, what tables are the source of truth and/or output models, versus experiments, outdated objects, or building blocks used along the way. Working within this context, engineers know the history and company lore behind why a table was named that way or how its purpose may differ slightly from its name, but they also have the ability to make changes.

Change management is hard; how many places would you need to update, rename, re-document, and retest to fix a poor naming choice from long ago? It is a daunting position, which can create internal strife when constrained for time over whether we should continually revamp and refactor for maintainability or focus on building new models in the same pattern as before.

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2024-05-07-unit-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ group by 1

### Caveats and pro-tips

See the docs for [helpful information before you begin](https://docs.getdbt.com/docs/build/unit-tests#before-you-begin), including unit testing [incremental models](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-incremental-models), [models that depend on ephemeral model(s)](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model-that-depend-on-ephemeral-models), and platform-specific considerations like `STRUCT`s in BigQuery. In many cases, the [`sql` format](https://docs.getdbt.com/reference/resource-properties/data-formats#sql) can help solve tricky edge cases that come up.
See the docs for [helpful information before you begin](https://docs.getdbt.com/docs/build/unit-tests#before-you-begin), including unit testing [incremental models](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-incremental-models), [models that depend on ephemeral model(s)](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model-that-depends-on-ephemeral-models), and platform-specific considerations like `STRUCT`s in BigQuery. In many cases, the [`sql` format](https://docs.getdbt.com/reference/resource-properties/data-formats#sql) can help solve tricky edge cases that come up.

Another advanced topic is overcoming issues when non-deterministic factors are involved, such as a current timestamp. To ensure that the output remains consistent regardless of when the test is run, you can set a fixed, predetermined value by using the [`overrides`](https://docs.getdbt.com/reference/resource-properties/unit-test-overrides) configuration.

Expand Down
92 changes: 92 additions & 0 deletions website/docs/docs/build/dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ All dimensions require a `name`, `type`, and can optionally include an `expr` pa
| `description` | A clear description of the dimension. | Optional | String |
| `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use the column name itself to input a SQL expression. | Optional | String |
| `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Optional | String |
| [`meta`](/reference/resource-configs/meta) | Set metadata for a resource and organize resources. Accepts plain text, spaces, and quotes. | Optional | Dictionary |

Refer to the following for the complete specification for dimensions:

Expand All @@ -37,6 +38,8 @@ dimensions:
Refer to the following example to see how dimensions are used in a semantic model:
<VersionBlock firstVersion="1.9">
```yaml
semantic_models:
- name: transactions
Expand All @@ -59,13 +62,50 @@ semantic_models:
type_params:
time_granularity: day
label: "Date of transaction" # Recommend adding a label to provide more context to users consuming the data
config:
meta:
data_owner: "Finance team"
expr: ts
- name: is_bulk
type: categorical
expr: case when quantity > 10 then true else false end
- name: type
type: categorical
```
</VersionBlock>
<VersionBlock lastVersion="1.8">
```yaml
semantic_models:
- name: transactions
description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU.
model: {{ ref('fact_transactions') }}
defaults:
agg_time_dimension: order_date
# --- entities ---
entities:
- name: transaction
type: primary
...
# --- measures ---
measures:
...
# --- dimensions ---
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
label: "Date of transaction" # Recommend adding a label to provide more context to users consuming the data
expr: ts
- name: is_bulk
type: categorical
expr: case when quantity > 10 then true else false end
- name: type
type: categorical
```
</VersionBlock>
Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimension `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`.

Expand Down Expand Up @@ -101,12 +141,28 @@ This section further explains the dimension definitions, along with examples. Di

Categorical dimensions are used to group metrics by different attributes, features, or characteristics such as product type. They can refer to existing columns in your dbt model or be calculated using a SQL expression with the `expr` parameter. An example of a categorical dimension is `is_bulk_transaction`, which is a group created by applying a case statement to the underlying column `quantity`. This allows users to group or filter the data based on bulk transactions.

<VersionBlock firstVersion="1.9">

```yaml
dimensions:
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end
config:
meta:
usage: "Filter to identify bulk transactions, like where quantity > 10."
```
</VersionBlock>

<VersionBlock lastVersion="1.8">

```yaml
dimensions:
- name: is_bulk_transaction
type: categorical
expr: case when quantity > 10 then true else false end
```
</VersionBlock>

## Time

Expand All @@ -130,12 +186,17 @@ You can set `is_partition` for time to define specific time spans. Additionally,

Use `is_partition: True` to show that a dimension exists over a specific time window. For example, a date-partitioned dimensional table. When you query metrics from different tables, the dbt Semantic Layer uses this parameter to ensure that the correct dimensional values are joined to measures.

<VersionBlock firstVersion="1.9">

```yaml
dimensions:
- name: created_at
type: time
label: "Date of creation"
expr: ts_created # ts_created is the underlying column name from the table
config:
meta:
notes: "Only valid for orders from 2022 onward"
is_partition: True
type_params:
time_granularity: day
Expand All @@ -156,6 +217,37 @@ measures:
expr: 1
agg: sum
```
</VersionBlock>

<VersionBlock lastVersion="1.8">

```yaml
dimensions:
- name: created_at
type: time
label: "Date of creation"
expr: ts_created # ts_created is the underlying column name from the table
is_partition: True
type_params:
time_granularity: day
- name: deleted_at
type: time
label: "Date of deletion"
expr: ts_deleted # ts_deleted is the underlying column name from the table
is_partition: True
type_params:
time_granularity: day
measures:
- name: users_deleted
expr: 1
agg: sum
agg_time_dimension: deleted_at
- name: users_created
expr: 1
agg: sum
```
</VersionBlock>

</TabItem>

Expand Down
Loading

0 comments on commit eb4f97e

Please sign in to comment.