Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount remote filesystem with sshfs #255

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

tmichela
Copy link
Member

Attempt to solve executing the context file with online data without having 2 separate databases.

The current solution is very simple: it mounts the remote directory with sshfs and the execution runs on maxwell.
I think ideally, when we have a centralized backend, we may want to have a permanent NFS mount but that's for a longer term solution.
There's a meta variable to set in order to run the context on online data.
I've loosely tested it and I couldn't see a noticeable slowdown due to the remote data in the kind of data we work with (at HED).

Some limitations:

  • one has to (purposely) start the backend on one of the display nodes with the 10G interface to the online cluster (to avoid causing issues going through the gateway machine)
  • we skip cluster variables. This would complicate things as regular cluster nodes do not have direct connection to the online cluster. And realistically, cluster variable are expected to take longer time so I don't think there would be any requirement to run those on online data.

@tmichela tmichela force-pushed the processOnlineData branch from e583142 to 0617b50 Compare May 31, 2024 09:31
@tmichela tmichela marked this pull request as ready for review May 31, 2024 09:35
@takluyver
Copy link
Member

I think this generally LGTM.

As I understand it, it will listen for all 4 possible events at all times. The online events will only do anything if we set a flag in the database. Then it will process any non-cluster variables with the data mounted through sshfs. Once the data is migrated to Maxwell (and possibly corrected), the same variables will be reevaluated with data in the usual location, plus any cluster=True variables will run. So there's a bit of duplicated effort, but this should be used with relatively small data and light computation, so it shouldn't matter too much. Does that all sound correct?

@tmichela
Copy link
Member Author

tmichela commented Jun 5, 2024

Yep, you got it right. I know there's some duplication, we could e.g. ignore offline processing (at least for raw data) if online is set. But I think for the use case we have this should not cause issues.
I also want to keep this simple as this is likely a temporary solution. We can always reiterate after some useage.

There are also other ideas in mind for long term, for example Philipp suggested that file based migration might be there soon so that could be an alternative to remove this special case. If we can also special case the calibration pipeline to run as soon as the data is available on maxwell. But that's an other conversation

@tmichela
Copy link
Member Author

tmichela commented Jun 5, 2024

I also prevented now the GUI to start the backend if it's running on the online cluster.

damnit/gui/main_window.py Outdated Show resolved Hide resolved
@takluyver
Copy link
Member

Other than that, LGTM

@JamesWrigley
Copy link
Member

Are we still doing this? I thought ITDM wasn't a fan?

@tmichela
Copy link
Member Author

Are we still doing this? I thought ITDM wasn't a fan?

This not an option for the long run, but it is fine as temporary solution until IT sets up what we need online.

@tmichela
Copy link
Member Author

tmichela commented Aug 8, 2024

I'll test this a bit more, and will merge end of this week or early next week, unless there are objections.

@tmichela
Copy link
Member Author

tmichela commented Aug 8, 2024

main change since the last review: it's adapted to work running on the solaris cluster, tunneling data mounts through the display nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants