Unmatched duplicates connected to max_distance #22

Janjko · 2021-01-12T23:16:47Z

Hi,
I'm importing addresses in Croatia, and there is a problem with unmatched duplicates I am experiencing. When the max_distance is put in too big, like 30 meters, conflator is finding addresses from the dataset that are closer than 30 meters to each other, and putting them in the unmatched duplicates category. Therefore, not uploading them. When I gradually lower the max_distance, the unmatched duplicates are lowering too, and when I get to 0 unmatched duplicates, I upload. But by then I lost some of the existing osm points I could have merged with, which are too far now.

Here is my profile: https://github.com/osm-hr/imports/blob/main/Adrese_Sisak-Moslavina/adrese.py

I hope you find the cause,

Thanks for great software,
Janko

Zverik · 2021-01-15T21:36:17Z

How come there are duplicates in your dataset? Are there identical addresses all over?

Janjko · 2021-01-16T08:05:27Z

No, unless I don't know how conflator determines duplicate addresses. They have unique IDs in the dataset collection, and different coordinates.Tags have addr:house number, addr:street, source:addr, and source:addr:date, of which there is never the same combination. I don't have the ref tag, and I set no_dataset_id = True, so maybe conflator doesn't know how to find duplicates? Should I set the find_ref function?

Janjko · 2021-01-18T09:00:29Z

I tried to set find_ref, and concatenated the city, street and house number strings, so that find_ref is always unique. But I saw that the unmatched duplicates is printed before find_ref is even called, so it's not relevant. Now I think maybe the master tags should be unique? I'll try that today..

Janjko · 2021-01-28T11:36:25Z

I added a new tag in the dataset, a ref:addr tag that concatenates place name, street name and housenumber. That made conflator think they are unique, and didn't remove "duplicates" any more. Then I deleted those ref:addr tags in the output osm file. This is a workaround, but I still don't know exactly how conflator determines it is going to delete nodes closer then max_distance. Tags are in the equation somehow.

Janjko changed the title ~~Unmatched duplicates~~ Unmatched duplicates connected to max_distance Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unmatched duplicates connected to max_distance #22

Unmatched duplicates connected to max_distance #22

Janjko commented Jan 12, 2021

Zverik commented Jan 15, 2021

Janjko commented Jan 16, 2021

Janjko commented Jan 18, 2021

Janjko commented Jan 28, 2021

Unmatched duplicates connected to max_distance #22

Unmatched duplicates connected to max_distance #22

Comments

Janjko commented Jan 12, 2021

Zverik commented Jan 15, 2021

Janjko commented Jan 16, 2021

Janjko commented Jan 18, 2021

Janjko commented Jan 28, 2021