Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt run-operation stage_external_sources erroring (potential package incompatibility issue) #325

Closed
1 of 3 tasks
ashleemtib opened this issue Nov 25, 2024 · 7 comments
Closed
1 of 3 tasks
Labels
bug Something isn't working triage

Comments

@ashleemtib
Copy link

ashleemtib commented Nov 25, 2024

Describe the bug

dbt run_operation stage_external_sources command is not working suddenly even though there have been no changes to our external source or the way it is configured. We expect this is some sort of package incompatibility issue.

Update: this issue appears related to the way the SQL compiles after the --full-refresh flag is applied.

Same as reported here

Steps to reproduce

  1. Install the following packages with pip install (this is what our git workflow is installing and failing with)

pip install agate==1.9.1 alembic==1.13.3 annotated-types==0.7.0 anyio==4.6.2.post1 asn1crypto==1.5.1 asynciolimiter==1.0.0 attrs==24.2.0 babel==2.16.0 backoff==2.2.1 bcrypt==4.2.0 boto3==1.34.108 botocore==1.34.162 cachetools==5.5.0 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==2.0.4 click==8.1.7 colorama==0.4.6 coloredlogs==14.0 croniter==3.0.3 cryptography==43.0.0 daff==1.3.46 dagster==1.7.14 dagster-aws==0.23.14 dagster-dbt==0.23.14 dagster-graphql==1.7.14 dagster-k8s==0.23.14 dagster-pipes==1.7.14 dagster-postgres==0.23.14 dagster-slack==0.23.14 dagster-snowflake==0.23.14 dagster-webserver==1.7.14 dbt-adapters==1.7.0 dbt-common==1.11.0 dbt-core==1.8.7 dbt-extractor==0.5.1 dbt-semantic-interfaces==0.5.1 dbt-snowflake==1.8.4 deepdiff==7.0.1 defusedxml==0.7.1 docstring_parser==0.16 durationpy==0.9 filelock==3.16.1 fsspec==2024.10.0 google-auth==2.35.0 gql==3.5.0 graphene==3.4 graphql-core==3.2.5 graphql-relay==3.2.0 grpcio==1.64.3 grpcio-health-checking==1.62.3 h11==0.14.0 httptools==0.6.4 humanfriendly==10.0 idna==3.7 importlib-metadata==6.11.0 isodate==0.6.1 jaraco.classes==3.4.0 jaraco.context==6.0.1 jaraco.functools==4.1.0 Jinja2==3.1.4 jmespath==1.0.1 joblib==1.4.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 keyring==25.4.1 kubernetes==31.0.0 leather==0.4.0 Logbook==1.5.3 Mako==1.3.6 markdown-it-py==3.0.0 MarkupSafe==3.0.2 mashumaro==3.13.1 mdurl==0.1.2 minimal-snowplow-tracker==0.0.2 more-itertools==10.5.0 msgpack==1.1.0 multidict==6.1.0 networkx==3.4.2 numpy==2.1.2 oauthlib==3.2.2 ordered-set==4.1.0 orjson==3.10.9 packaging==24.1 pandas==2.2.2 paramiko==3.4.0 parsedatetime==2.6 pathspec==0.12.1 pendulum==3.0.0 pip==24.2 platformdirs==3.10.0 propcache==0.2.0 protobuf==4.25.5 psycopg2-binary==2.9.9 pyarrow==17.0.0 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycparser==2.21 pydantic==2.9.2 pydantic_core==2.23.4 Pygments==2.18.0 PyJWT==2.9.0 PyNaCl==1.5.0 pyOpenSSL==24.2.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.0 python-slugify==8.0.4 pytimeparse==1.1.8 pytz==2024.2 PyYAML==6.0.2 referencing==0.35.1 requests==2.31.0 requests-oauthlib==2.0.0 requests-toolbelt==1.0.0 rich==13.9.2 rpds-py==0.20.0 rsa==4.9 s3transfer==0.10.3 scikit-learn==1.5.2 scipy==1.14.1 setuptools==71.1.0 shellingham==1.5.4 six==1.16.0 slack_sdk==3.33.1 sniffio==1.3.1 snowflake-connector-python==3.12.2 sortedcontainers==2.4.0 SQLAlchemy==2.0.36 sqlglot==25.26.0 sqlglotrs==0.2.12 sqlparse==0.5.1 starlette==0.41.0 structlog==24.4.0 tableauserverclient==0.25 tabulate==0.9.0 text-unidecode==1.3 threadpoolctl==3.5.0 time-machine==2.16.0 tomli==2.0.2 tomlkit==0.13.2 toposort==1.10 tqdm==4.66.5 typer==0.12.5 typing_extensions==4.12.2 tzdata==2024.2 universal_pathlib==0.2.5 urllib3==1.26.20 uvicorn==0.32.0 uvloop==0.21.0 watchdog==5.0.3 watchfiles==0.24.0 websocket-client==1.8.0 websockets==13.1 yarl==1.16.0 zipp==3.20.2 --force-reinstall

  1. Run dbt run-operation stage_external_sources --args "select: [source]" --vars "ext_full_refresh: true"

Expected results

The external table is refreshed without errors.

Actual results

The external table is not refreshed with a sql compilation error (shown below).

Screenshots and log output

Error:

16:04:09  Found 544 models, 329 data tests, 54 seeds, 1 operation, 673 sources, 17 exposures, 895 macros
16:04:09  1 of 36 START external source ingest_eden.external_eden_adt_v001
16:04:11  1 of 36 (1) select 'Schema ingest_eden exists' from dual;
16:04:11  1 of 36 (1) SUCCESS 1
16:04:11  1 of 36 (2) create or replace external table DATA_PLATFORM_DEV.ingest_eden.external_eden_adt...  
16:04:11  Encountered an error while running operation: Database Error
  001003 (42000): SQL compilation error:
  syntax error line 2 at position 14 unexpected 'as'.
  syntax error line 49 at position 24 unexpected ')'.

System information

The contents of your packages.yml file:
Package version:

  - package: dbt-labs/dbt_external_tables
    version: 0.10.0

Which database are you using dbt with?

  • redshift
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.8.7
  - latest:    1.8.9 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.8.4 - Up to date!

The operating system you're using:
MacOS 14.6.1

The output of python --version:
3.12.7

Additional context

@ashleemtib ashleemtib added bug Something isn't working triage labels Nov 25, 2024
@nash71
Copy link

nash71 commented Nov 25, 2024

We are encountering the same issue. Our experience is it works in dbt Cloud but the error occurs in dbt Core in our CI/CD pipeline.

@ashleemtib
Copy link
Author

@nash71 something I noticed is that this appears to only happen when the --full-refresh flag is applied. It looks like it is compiling the SQL incorrectly based on snowflake query history. For now I've taken out that flag to get our CI/CD running with the understanding that changes to external source DDL (adding a column, changing a data type, etc) will not take effect.

@jonas-berg-h2gs
Copy link

I spent some time debugging this so might help save someone else a bit of time. The error occurs here for me:

image

The variable partitions looks normal upon inspection, but is a string instead of a mapping for me (compared to the variable external which is a mapping). I used the following command to investigate this:

{%- do log(partitions is mapping, info=true) -%}
{%- do log(partitions is string, info=true) -%}

This seems to lead to {{ partition.name }} returning a null value (can be verified by logging same as above) whereby the code outputs simply:

...(
              as 
        ) partition by ()

Causing the error you reported. Hope it helps someone!

@kendalldyke14
Copy link

This issue is caused by a package incompatibility issue in dbt-core. We were able to fix this by pinning the mashumaro version to <3.15 thanks to the comments here: dbt-labs/dbt-core#11044

@Guipetris
Copy link

Same issue here - was only able to repair fixing mashumaro==3.14

@dataders
Copy link
Collaborator

dataders commented Dec 19, 2024

sorry y'all are experiencing this! like @kendalldyke14's referenced dbt-labs/dbt-core#11044 it appears to be a dependency mismatch b/w dbt-core and some adapters (dbt-snowflake dbt-redshift).

Good news is that dbt-labs/dbt-core#11051 is open and still in progress (though I'll admit the problem seems rather thorny)

I'm going the issue here and flag the Core issue internally. I recommend that those here who have experienced the issue to:

@ashleemtib
Copy link
Author

Thank you all @kendalldyke14 @Guipetris @dataders

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

6 participants