Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify join and joinEach behavior #1212

Open
norberttech opened this issue Sep 7, 2024 · 0 comments
Open

Unify join and joinEach behavior #1212

norberttech opened this issue Sep 7, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@norberttech
Copy link
Member

Currently join and joinEach behaves a bit differently.

join is using HashJoin algorithm under the hood when joinEach due is based on a nested loop algorithm.

The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on id column when on both sides this column is called id we are going to get DuplicatedEntriesException coming from Rows::merge() method.

What we should do is to remove from the right dataset join columns to avoid duplicates.

@norberttech norberttech converted this from a draft issue Sep 7, 2024
@norberttech norberttech added the bug Something isn't working label Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

1 participant