-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix ids for Swedish data #71
Comments
If you need help with the slug generation, just say so. |
@augusto-herrmann if you could take over, that would be great. |
Ok, @todrobbins, I've got it for now. Before considering this issue closed, @mattiasaxell please check if the generated ids are acceptable.
|
@augusto-herrmann Great. I have checked and I believe the split function may be OK, looks good to me at least. @peterk do you know if Python's .split() function is OK for Swedish language? @augusto-herrmann They are correct. The duplicate ids like habo is there because there is Habo Kommun (Municipality) and Håbo Kommun. I'm suggesting to change Håbo to |
@augusto-herrmann @mattiasaxell 👍 This looks great. Thanks for solving this Augusto! Pending @peterk's review, I think we're ready to merge. |
@mattiasaxell @todrobbins split() is fine for word tokenization. Please note that you may end up with dupes if you do closest latin char substitution. Also I noticed there may be some need for data normalization in the url and org id fields. |
@augusto-herrmann @mattiasaxell @peterk I'm going to review the URL normalization and commit/merge accordingly. |
@augusto-herrmann good point - if the org name changes, is it the same? My sense would be to say "no" in some sense. |
What about "Rättshjälpsmyndigheten", @mattiasaxell ? I believe this line is really duplicated, considering that both entries share the same email address. They have different Boxes in the |
Swedish data have blank cells for the 'id' column.
These should be fixed with a code that includes jurisdiction and local identifier:
se/{local-id}
as the example in http://data.okfn.org/data/okfn/public-bodies .
@todrobbins is working on generating slugs for the data.
Once #65 is solved we can also add the official local ids as well, as @mattiasaxell mentioned having that.
The text was updated successfully, but these errors were encountered: