Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

motis: Enable duplicate matching #54

Merged
merged 1 commit into from
Feb 29, 2024
Merged

motis: Enable duplicate matching #54

merged 1 commit into from
Feb 29, 2024

Conversation

jbruechert
Copy link
Collaborator

No description provided.

@jbruechert jbruechert marked this pull request as ready for review February 26, 2024 21:38
@felixguendling
Copy link
Contributor

Loading performance with this option for timetables with large overlap is not really well tested. It works well for the MOTIS demo instance (where only a few long distance trains overlap).

This vecvec<K, V> data structure is not really made for many push_back operations. If you want to use this for large overlaps it might make sense to try a different data structure (like mutable_fws_multimap).

https://github.com/motis-project/nigiri/blob/d806245b6b313f143e1705a01df4dc7d9abbbe19/src/loader/build_footpaths.cc#L152

https://github.com/motis-project/nigiri/blob/d806245b6b313f143e1705a01df4dc7d9abbbe19/include/nigiri/timetable.h#L356

@vkrause
Copy link
Member

vkrause commented Feb 27, 2024

Loading times still seem ok here locally (but then we don't really have large overlaps yet). However it's not filtering the DE <-> NL Regionalexpress duplicates that triggered this originally either it seems.

Your optimization suggestion however seems to be the solution for a problem we have with GTFS-RT updates :)

@felixguendling
Copy link
Contributor

Maybe it makes sense to create a minimal example with those trips? Usually what I do is:

  • click on the trains on the web UI, get the trip_id from the URL
  • grep for the corresponding trip_id in the GTFS data and create a stop_times.txt with only times from this trip
  • reuse all other files (stops.txt, routes.txt, etc. from the GTFS feed(s) and call gtfs-tidy
  • voilà - you have a minimal testing example where you can print some output on where the code bails out and decides that they can't be duplicates

Maybe it would even make sense to write a very simple command line tool to do this extraction (like gtfs-extract feed.zip trip_id1 trip_id2 ...) to be able to generate minimal examples and debug them quickly.

@jbruechert jbruechert merged commit 96b2d67 into main Feb 29, 2024
2 checks passed
@jbruechert jbruechert deleted the work/jbb/duplicates branch March 3, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants