Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cast as int bug #140

Merged
merged 24 commits into from
Apr 16, 2024
Merged

cast as int bug #140

merged 24 commits into from
Apr 16, 2024

Conversation

fivetran-reneeli
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli commented Apr 8, 2024

PR Overview

This PR will address the following Issue/Feature:
#139
This PR will result in the following new package version: v0.17.1

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

  • Updates casting of vid_to_merge as {{ dbt.type_int() }} to {{ dbt.type_string() }}. Casting only to int caused model failures resulting from integer fields that exceeded the range allowed in certain warehouses. In addition, for the case where the contact_merge_audit table is not present, the parsed calculated_merged_vids from the contact table are outputted as strings, therefore requiring the titular datatype cast in the join.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt compile
  • dbt run –full-refresh
  • dbt run
  • dbt test
  • dbt run –vars hubspot_contact_merge_audit_enabled: true

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked and tagged
  • You are assigned to the corresponding issue and this PR
  • BuildKite integration tests are passing

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

  • You have validated these changes and assure this PR will address the respective Issue/Feature.
  • [ x] You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
  • You have provided details below around the validation steps performed to gain confidence in these changes.

Testing where contact_merge_audit exists

  • I first recreated the issue by changing a value of vid_to_merge in the contact_audit_merge table to something larger than 2147483647, then running. Running against prod, as expected I got a size error (Value out of range for 4 bytes.)
    image
    image
    image

  • Then in this branch, I updated the cast to use string. The model ran successfully.
    image

  • Updating to bigint was also successful.

  • We ultimately ended choosing to cast as string for the added reason where in the case where the contact_merge_audit table is not present, the parsed calculated_merged_vids from the contact table are outputted as strings, therefore requiring the titular datatype cast in the join. Therefore both join keys are going to be cast as strings

Testing for when contact_merge_audit doesn't exist

  • I set removed hubspot_contact_merge_audit_enabled as the default is false.
  • Then I ran the compiled code with the customer's shared data and the model succeeds (see screenshots in internal ticket)

Standard Updates

Please acknowledge that your PR contains the following standard updates:

  • Package versioning has been appropriately indexed in the following locations:
    • indexed within dbt_project.yml
    • indexed within integration_tests/dbt_project.yml
  • CHANGELOG has individual entries for each respective change in this PR
  • README updates have been applied (if applicable)
  • DECISIONLOG updates have been updated (if applicable)
  • Appropriate yml documentation has been added (if applicable)

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

  • docs were regenerated (unless this PR does not include any code or yml updates)

If you had to summarize this PR in an emoji, which would it be?

💃

@fivetran-reneeli fivetran-reneeli changed the title cast as bigint cast as int bug Apr 9, 2024
@fivetran-reneeli
Copy link
Contributor Author

regen docs once approved

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for applying these changes. I just have a few comments that I feel should be applied. In addition, would you be able to provide more details to your validation efforts? I see you listed the steps you took, however, would you also be able to share screenshots (if appropriate to share publicly then you may include them in the PR, if not you can share in the height ticket) of your validation steps. This way I can see directly the steps taken and the expected failures and successes. Finally, it looks like buildkite is still failing. It may just need a schema refresh.

Lastly, were you able to test these changes on the customers data who initially opened this bug report? I would want to get confirmation that these solve the initial error directly. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
Comment on lines 5 to 7
- In `int_hubspot__contact_merge_adjust`, updates casting of `vid_to_merge` as `{{ dbt.type_int() }}` to `{{ dbt.type_string() }}` in the join of `contact_merge_audit` to `contacts`. Previously, casting only to `int` caused model failures resulting from integer fields that exceeded the range allowed in certain warehouses. In addition, for the case where the `contact_merge_audit` table is not present, the parsed `calculated_merged_vids` from the contact table are outputted as strings, therefore requiring the titular datatype cast in the join.

For context, the [Nov 2022 release of the Hubspot connector](https://fivetran.com/docs/connectors/applications/hubspot/changelog#november2022) should not have the `contact_merge_audit` table as that was deprecated in place of storing `property_hs_calculated_merged_vids` in the `contact` table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels quite lengthy and crowded. I believe we can make this a bit more streamlined and direct. What are your thoughts on the following?

Suggested change
- In `int_hubspot__contact_merge_adjust`, updates casting of `vid_to_merge` as `{{ dbt.type_int() }}` to `{{ dbt.type_string() }}` in the join of `contact_merge_audit` to `contacts`. Previously, casting only to `int` caused model failures resulting from integer fields that exceeded the range allowed in certain warehouses. In addition, for the case where the `contact_merge_audit` table is not present, the parsed `calculated_merged_vids` from the contact table are outputted as strings, therefore requiring the titular datatype cast in the join.
For context, the [Nov 2022 release of the Hubspot connector](https://fivetran.com/docs/connectors/applications/hubspot/changelog#november2022) should not have the `contact_merge_audit` table as that was deprecated in place of storing `property_hs_calculated_merged_vids` in the `contact` table.
- Included explicit datatype casts to `{{ dbt.type_string() }}` within the join of `contact_merge_audit.vid_to_merge` to `contacts.contact_id` in the `int_hubspot__contact_merge_adjust` model.
- This update was required to address a bug where the IDs in the join would overflow to bigint or be interpreted as strings. This change ensures the join fields have matching datatypes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I would still add that snippet referring to the nov 2022 api update, so that people reading this who don't have contact_merge_audit table will know it's still relevant, but I'm going to update that part to make it more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update-- never mind, will not mention the nov 2022 api update since it's been a while since then

@fivetran-reneeli fivetran-reneeli linked an issue Apr 12, 2024 that may be closed by this pull request
4 tasks
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for working on these updates and adding more validation details to the PR description. Ultimately this PR is good to go! I do have a small comment in the CHANGELOG and reminder to uncomment the variables in the integration_tests/dbt_project.yml. Once that is updated this is good for release review.

CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines +10 to +14
# hubspot_sales_enabled: true # enable when generating docs
# hubspot_marketing_enabled: true # enable when generating docs
# hubspot_contact_merge_audit_enabled: true # enable when generating docs
# hubspot_using_all_email_events: true # enable when generating docs
# hubspot_merged_deal_enabled: true # enable when generating docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be commented back in to properly test during integration tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz I just had these enabled to generate docs, but commented out otherwise since I assumed they weren't needed for tests as they're explicitly configured in the run script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I was wondering why they were not commented out previously. This looks good then, thanks!

@fivetran-avinash fivetran-avinash self-requested a review April 16, 2024 19:59
@fivetran-reneeli fivetran-reneeli merged commit 3c56c7f into main Apr 16, 2024
9 checks passed
@fivetran-reneeli fivetran-reneeli deleted the bugfix/integer_overflow branch April 16, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Integer overflow
3 participants