Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmatched duplicates connected to max_distance #22

Open
Janjko opened this issue Jan 12, 2021 · 4 comments
Open

Unmatched duplicates connected to max_distance #22

Janjko opened this issue Jan 12, 2021 · 4 comments

Comments

@Janjko
Copy link

Janjko commented Jan 12, 2021

Hi,
I'm importing addresses in Croatia, and there is a problem with unmatched duplicates I am experiencing. When the max_distance is put in too big, like 30 meters, conflator is finding addresses from the dataset that are closer than 30 meters to each other, and putting them in the unmatched duplicates category. Therefore, not uploading them. When I gradually lower the max_distance, the unmatched duplicates are lowering too, and when I get to 0 unmatched duplicates, I upload. But by then I lost some of the existing osm points I could have merged with, which are too far now.

Here is my profile: https://github.com/osm-hr/imports/blob/main/Adrese_Sisak-Moslavina/adrese.py

I hope you find the cause,

Thanks for great software,
Janko

@Janjko Janjko changed the title Unmatched duplicates Unmatched duplicates connected to max_distance Jan 12, 2021
@Zverik
Copy link
Contributor

Zverik commented Jan 15, 2021

How come there are duplicates in your dataset? Are there identical addresses all over?

@Janjko
Copy link
Author

Janjko commented Jan 16, 2021

No, unless I don't know how conflator determines duplicate addresses. They have unique IDs in the dataset collection, and different coordinates.Tags have addr:house number, addr:street, source:addr, and source:addr:date, of which there is never the same combination. I don't have the ref tag, and I set no_dataset_id = True, so maybe conflator doesn't know how to find duplicates? Should I set the find_ref function?

@Janjko
Copy link
Author

Janjko commented Jan 18, 2021

I tried to set find_ref, and concatenated the city, street and house number strings, so that find_ref is always unique. But I saw that the unmatched duplicates is printed before find_ref is even called, so it's not relevant. Now I think maybe the master tags should be unique? I'll try that today..

@Janjko
Copy link
Author

Janjko commented Jan 28, 2021

I added a new tag in the dataset, a ref:addr tag that concatenates place name, street name and housenumber. That made conflator think they are unique, and didn't remove "duplicates" any more. Then I deleted those ref:addr tags in the output osm file. This is a workaround, but I still don't know exactly how conflator determines it is going to delete nodes closer then max_distance. Tags are in the equation somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants