Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pytest unit tests to dbt unit tests #346

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9d8f0c8
Replaced python unit test with dbt 1.8 unit test
adamribaudo-velir Jun 7, 2024
298f373
refactored unit tests for stg_ga4__session_conversions_daily
adamribaudo-velir Jun 7, 2024
906bec2
update test name
adamribaudo-velir Jun 7, 2024
e994408
Replaced Python unit test with dbt unit test
adamribaudo-velir Jun 22, 2024
34bbab9
variable override working properly
adamribaudo-velir Jun 22, 2024
4b66d1f
using overrides properly
adamribaudo-velir Jun 22, 2024
79f7e27
replaced another unit test
adamribaudo-velir Jun 22, 2024
cecf337
replaced python unit test
adamribaudo-velir Jun 22, 2024
63a7d86
add unit test for stg_ga4__client_key_first_last_pageviews
adamribaudo-velir Jun 22, 2024
6e709db
replace unit test
adamribaudo-velir Jun 22, 2024
9d53c9a
unit test for stg_ga4__sessions_traffic_sources_last_non_direct_daily…
adamribaudo-velir Jun 22, 2024
3425fdf
Add package-lock.yml to .gitignore
davidbooke4 Oct 22, 2024
c3ba7f7
Add vars to dbt_project.yml for testing
davidbooke4 Oct 23, 2024
10456ef
Merge branch 'main' into feature/dbt-unit-tests
davidbooke4 Oct 23, 2024
a1f10df
Add unit tests to stg_ga4__events.yml for the url_parsing macros
davidbooke4 Oct 23, 2024
5972788
Add conditions for cases when event_source is null for session parame…
davidbooke4 Oct 23, 2024
20598fb
Add unit test to stg_ga4__sessions_traffic_sources_daily for testing …
davidbooke4 Oct 23, 2024
282eeee
Add unit test to stg_ga4__user_id_mapping to test the latest mapping …
davidbooke4 Oct 23, 2024
c321197
Add descriptions for unit tests that were missing them
davidbooke4 Oct 23, 2024
8a1796e
Remove python unit tests that have been migrated to dbt unit tests
davidbooke4 Oct 23, 2024
c0aba5f
Add unit test to stg_ga4__events for testing transformations in stg_g…
davidbooke4 Oct 24, 2024
922ba07
Remove todo and example stg_ga4__events unit test files
davidbooke4 Oct 24, 2024
3a4f677
Add sessions_traffic_sources_last_non_direct_daily python unit test back
davidbooke4 Oct 24, 2024
c870130
Comment out unit tests for disabled models
davidbooke4 Oct 24, 2024
7386371
Remove edits from dbt_project.yml
davidbooke4 Oct 24, 2024
76f2c7f
Comment out unit test for sessions_traffic_sources_last_non_direct_da…
davidbooke4 Oct 24, 2024
697bafd
Update unit test section in README
davidbooke4 Oct 24, 2024
616da99
Simplify event_params construction in test_base_to_stg_ga4__events in…
davidbooke4 Oct 24, 2024
653e1ae
Update yml files to use consistent new line convention
davidbooke4 Oct 24, 2024
50ff2e8
update PR template
adamribaudo-velir Oct 24, 2024
68f9f87
Update default channel grouping test to use seed instead of fixture a…
davidbooke4 Oct 25, 2024
a3d9c1e
Comment out unit tests for disabled models
davidbooke4 Oct 28, 2024
1dd415e
Un-comment unit tests
davidbooke4 Oct 29, 2024
4ef2503
Add profiles.yml for Github Actions to execute dbt commands and add .…
davidbooke4 Oct 29, 2024
83bd23b
Add profile and variables to dbt_project.yml so Github Action can run…
davidbooke4 Oct 29, 2024
6f4335e
Add dbt unit tests job to github CI workflow
davidbooke4 Oct 29, 2024
947868d
Remove empty step
davidbooke4 Oct 29, 2024
8c879f7
Add repo to checkout step so PR code is checked out to test adding ne…
davidbooke4 Oct 29, 2024
95b3a60
Change workflow on behavior for testing changes
davidbooke4 Oct 29, 2024
555671e
Add comments related to unit tests and new Github Actions job to mark…
davidbooke4 Oct 31, 2024
f198262
Make updates for dbt unit test Github Action and allow for use of env…
davidbooke4 Oct 31, 2024
621e429
Add conditional logic to allow for use of --empty flag
davidbooke4 Oct 31, 2024
c82a36f
Fix spacing for comments added to README.md
davidbooke4 Oct 31, 2024
0ae02cc
Enable models dependent on project variables if environment variables…
davidbooke4 Nov 5, 2024
99a50c8
Set start_date to environment variable if it exists
davidbooke4 Nov 5, 2024
cd35ef9
Remove variables from dbt_project.yml and have models look for increm…
davidbooke4 Nov 5, 2024
ac415db
Add more environment variables to CI workflow
davidbooke4 Nov 5, 2024
7e44907
Update README after removing project variables in dbt_project.yml
davidbooke4 Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ Describe your changes, and why you're making them.
- [ ] I have verified that these changes work locally
- [ ] I have updated the README.md (if applicable)
- [ ] I have added tests & descriptions to my models (and macros if applicable)
- [ ] I have run `dbt test` and `python -m pytest .` to validate existing tests
- [ ] I have run `dbt test` to validate existing tests
33 changes: 33 additions & 0 deletions .github/workflows/run_unit_tests_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ name: Run Unit Tests on Pull Request
on: [pull_request_target,workflow_dispatch]
env:
BIGQUERY_PROJECT: ${{ secrets.BIGQUERY_PROJECT }}
BIGQUERY_PROPERTY_ID: ${{ secrets.BIGQUERY_PROPERTY_ID }}
BIGQUERY_DATASET: ${{ secrets.BIGQUERY_DATASET }}
BIGQUERY_KEYFILE: ./unit_tests/dbt-service-account.json

jobs:
pytest_run_all:
Expand All @@ -16,6 +19,7 @@ jobs:
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{ github.event.pull_request.head.repo.full_name }}

- uses: actions/setup-python@v1
with:
Expand All @@ -35,3 +39,32 @@ jobs:

- name: Run tests
run: python -m pytest .

run_dbt_unit_tests:
name: Run dbt Unit Tests
runs-on: ubuntu-latest
steps:
- name: Check out
davidbooke4 marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{ github.event.pull_request.head.repo.full_name }}

- uses: actions/setup-python@v1
with:
python-version: "3.11.x"

- name: Authenticate using service account
run: 'echo "$KEYFILE" > ./unit_tests/dbt-service-account.json'
shell: bash
env:
KEYFILE: ${{ secrets.GCP_BIGQUERY_USER_KEYFILE }}

- name: Install dbt
run: |
pip install dbt-core
pip install dbt-bigquery
dbt deps

- name: Run dbt unit tests
run: dbt test -s test_type:unit
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
target/
dbt_packages/
logs/
package-lock.yml
.user.yml

google-cloud-sdk/
unit_tests/.env
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,26 @@ gcloud auth application-default login --scopes=https://www.googleapis.com/auth/b
```
# Unit Testing

This package uses `pytest` as a method of unit testing individual models. More details can be found in the [unit_tests/README.md](unit_tests) folder.
The dbt-ga4 package treats each model and macro as a 'unit' of code. If we fix the input to each unit, we can test that we received the expected output.

This package currently uses a combination of dbt unit tests and `pytest` as a method of unit testing individual models. The remaining `pytest` unit test will be refactored to a dbt unit test when possible - progress on the bug preventing that work can be tracked [here](https://github.com/dbt-labs/dbt-core/issues/10353).

### dbt unit tests

dbt's documentation on unit tests can be found [here](https://docs.getdbt.com/docs/build/unit-tests). Unit tests are performed the same way other types of dbt tests are executed.

Execute a specific test:
```
dbt test -s <test_name>
```
Execute all tests configured for a model:
```
dbt test -s <model_name>
```

### pytest

More details on using `pytest` for unit testing can be found in the [unit_tests/README.md](unit_tests) folder.

# Overriding Default Channel Groupings

Expand Down
18 changes: 18 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,24 @@ seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

profile: 'default'

vars:
source_project: "{{ env_var('BIGQUERY_PROJECT') }}"
property_ids: ["{{ env_var('BIGQUERY_PROPERTY_ID') }}"]
start_date: "20230306"
static_incremental_days: 3
derived_session_properties:
- event_parameter: "page_location"
session_property_name: "most_recent_page_location"
value_type: "string_value"
derived_user_properties:
- event_parameter: "page_title"
user_property_name: "most_recent_page_title"
value_type: "string_value"
conversion_events: ['large_button_clicked', 'add_to_cart']
session_attribution_lookback_window_days: 30

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
Expand Down
18 changes: 17 additions & 1 deletion models/staging/stg_ga4__client_key_first_last_events.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,20 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_stg_ga4__client_key_first_last_events
description: Test pulling the first and last event per client key
model: stg_ga4__client_key_first_last_events
given:
- input: ref('stg_ga4__events')
format: csv
rows: |
stream_id,client_key,event_key,event_timestamp
1,IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,1661339279816517
1,IX+OyYJBgjwqML19GB/XIQ==,gt1SoAtrxDv33uDGwVeMVA==,1661339279816518
expect:
format: csv
rows: |
client_key,first_event,last_event
IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,gt1SoAtrxDv33uDGwVeMVA==
18 changes: 17 additions & 1 deletion models/staging/stg_ga4__client_key_first_last_pageviews.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,20 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_stg_ga4__client_key_first_last_pageviews
description: Test pulling the first and last page view per client key
model: stg_ga4__client_key_first_last_pageviews
given:
- input: ref('stg_ga4__event_page_view')
format: csv
rows: |
stream_id,client_key,event_key,event_timestamp,page_location
1,IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,1661339279816517,A
1,IX+OyYJBgjwqML19GB/XIQ==,gt1SoAtrxDv33uDGwVeMVA==,1661339279816518,B
expect:
format: csv
rows: |
client_key,first_page_view_event_key,last_page_view_event_key,first_page_location,last_page_location
IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,gt1SoAtrxDv33uDGwVeMVA==,A,B
37 changes: 36 additions & 1 deletion models/staging/stg_ga4__derived_session_properties.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,39 @@ models:
columns:
- name: session_key
tests:
- unique
- unique
unit_tests:
- name: test_derived_session_properties
description: Test whether a derived property is successfully retrieved from multiple event payloads
model: stg_ga4__derived_session_properties
given:
- input: ref('stg_ga4__events')
format: sql
rows: |
select
'AAA' as session_key
, 1617691790431476 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
, ARRAY[STRUCT('my_property' as key, STRUCT('value1' as string_value) as value)] as user_properties
union all
select
'AAA' as session_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(2 as int_value) as value)] as event_params
, ARRAY[] as user_properties
union all
select
'BBB' as session_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
, ARRAY[STRUCT('my_property' as key, STRUCT('value2' as string_value) as value)] as user_properties
expect:
format: dict
rows:
- {session_key: AAA, my_derived_property: 2, my_derived_property2: value1}
- {session_key: BBB, my_derived_property: 1, my_derived_property2: value2}
overrides:
vars: {derived_session_properties: [{event_parameter: 'my_param',session_property_name: 'my_derived_property',value_type: 'int_value'},{user_property: 'my_property',session_property_name: 'my_derived_property2',value_type: 'string_value'}]}
34 changes: 33 additions & 1 deletion models/staging/stg_ga4__derived_user_properties.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,36 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_derived_user_properties
description: Test whether a derived user property is successfully retrieved from multiple event payloads
model: stg_ga4__derived_user_properties
given:
- input: ref('stg_ga4__events')
format: sql
rows: |
select
'AAA' as client_key
, 1617691790431476 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
union all
select
'AAA' as client_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(2 as int_value) as value)] as event_params
union all
select
'BBB' as client_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
expect:
format: dict
rows:
- {client_key: AAA, my_derived_property: 2}
- {client_key: BBB, my_derived_property: 1}
overrides:
vars: {derived_user_properties: [{event_parameter: 'my_param',user_property_name: 'my_derived_property',value_type: 'int_value'}]}
21 changes: 20 additions & 1 deletion models/staging/stg_ga4__event_to_query_string_params.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,23 @@ version: 2
models:
- name: stg_ga4__event_to_query_string_params
description: This model pivots the query string parameters contained within the event's page_location field to become rows. Each row is a single parameter/value combination contained in a single event's query string.

unit_tests:
- name: test_stg_ga4__event_to_query_string_params
description: Test whether event query strings are flattened for each query string parameter
model: stg_ga4__event_to_query_string_params
given:
- input: ref('stg_ga4__events')
format: csv
rows: |
event_key,page_query_string
aaa,param1=value1&param2=value2
bbb,param1
ccc,param1=
expect:
format: csv
rows: |
event_key,param,value
aaa,param1,value1
aaa,param2,value2
bbb,param1,
ccc,param1,
Loading
Loading