-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmatched duplicates connected to max_distance #22
Comments
How come there are duplicates in your dataset? Are there identical addresses all over? |
No, unless I don't know how conflator determines duplicate addresses. They have unique IDs in the dataset collection, and different coordinates.Tags have addr:house number, addr:street, source:addr, and source:addr:date, of which there is never the same combination. I don't have the ref tag, and I set no_dataset_id = True, so maybe conflator doesn't know how to find duplicates? Should I set the find_ref function? |
I tried to set find_ref, and concatenated the city, street and house number strings, so that find_ref is always unique. But I saw that the unmatched duplicates is printed before find_ref is even called, so it's not relevant. Now I think maybe the master tags should be unique? I'll try that today.. |
I added a new tag in the dataset, a ref:addr tag that concatenates place name, street name and housenumber. That made conflator think they are unique, and didn't remove "duplicates" any more. Then I deleted those ref:addr tags in the output osm file. This is a workaround, but I still don't know exactly how conflator determines it is going to delete nodes closer then max_distance. Tags are in the equation somehow. |
Hi,
I'm importing addresses in Croatia, and there is a problem with unmatched duplicates I am experiencing. When the max_distance is put in too big, like 30 meters, conflator is finding addresses from the dataset that are closer than 30 meters to each other, and putting them in the unmatched duplicates category. Therefore, not uploading them. When I gradually lower the max_distance, the unmatched duplicates are lowering too, and when I get to 0 unmatched duplicates, I upload. But by then I lost some of the existing osm points I could have merged with, which are too far now.
Here is my profile: https://github.com/osm-hr/imports/blob/main/Adrese_Sisak-Moslavina/adrese.py
I hope you find the cause,
Thanks for great software,
Janko
The text was updated successfully, but these errors were encountered: