mount remote filesystem with sshfs #255

tmichela · 2024-05-27T09:21:31Z

Attempt to solve executing the context file with online data without having 2 separate databases.

The current solution is very simple: it mounts the remote directory with sshfs and the execution runs on maxwell.
I think ideally, when we have a centralized backend, we may want to have a permanent NFS mount but that's for a longer term solution.
There's a meta variable to set in order to run the context on online data.
I've loosely tested it and I couldn't see a noticeable slowdown due to the remote data in the kind of data we work with (at HED).

Some limitations:

one has to (purposely) start the backend on one of the display nodes with the 10G interface to the online cluster (to avoid causing issues going through the gateway machine)
we skip cluster variables. This would complicate things as regular cluster nodes do not have direct connection to the online cluster. And realistically, cluster variable are expected to take longer time so I don't think there would be any requirement to run those on online data.

damnit/ctxsupport/ctxrunner.py

clean debug code

damnit/backend/extract_data.py

damnit/backend/listener.py

damnit/ctxsupport/ctxrunner.py

damnit/backend/listener.py

takluyver · 2024-06-04T10:38:53Z

I think this generally LGTM.

As I understand it, it will listen for all 4 possible events at all times. The online events will only do anything if we set a flag in the database. Then it will process any non-cluster variables with the data mounted through sshfs. Once the data is migrated to Maxwell (and possibly corrected), the same variables will be reevaluated with data in the usual location, plus any cluster=True variables will run. So there's a bit of duplicated effort, but this should be used with relatively small data and light computation, so it shouldn't matter too much. Does that all sound correct?

…nline cluster

tmichela · 2024-06-05T10:59:00Z

Yep, you got it right. I know there's some duplication, we could e.g. ignore offline processing (at least for raw data) if online is set. But I think for the use case we have this should not cause issues.
I also want to keep this simple as this is likely a temporary solution. We can always reiterate after some useage.

There are also other ideas in mind for long term, for example Philipp suggested that file based migration might be there soon so that could be an alternative to remove this special case. If we can also special case the calibration pipeline to run as soon as the data is available on maxwell. But that's an other conversation

tmichela · 2024-06-05T10:59:23Z

I also prevented now the GUI to start the backend if it's running on the online cluster.

damnit/gui/main_window.py

takluyver · 2024-06-05T11:03:42Z

Other than that, LGTM

run as slurm job, proxy through 10G node

JamesWrigley · 2024-07-15T12:53:28Z

Are we still doing this? I thought ITDM wasn't a fan?

tmichela · 2024-07-15T12:56:17Z

Are we still doing this? I thought ITDM wasn't a fan?

This not an option for the long run, but it is fine as temporary solution until IT sets up what we need online.

tmichela · 2024-08-08T09:01:30Z

I'll test this a bit more, and will merge end of this week or early next week, unless there are objections.

tmichela · 2024-08-08T09:02:56Z

main change since the last review: it's adapted to work running on the solaris cluster, tunneling data mounts through the display nodes

…solute and break with using the mount point

damnit/backend/extraction_control.py

JamesWrigley reviewed May 27, 2024

View reviewed changes

damnit/ctxsupport/ctxrunner.py Outdated Show resolved Hide resolved

mount remote filesystem with sshfs

0617b50

clean debug code

tmichela force-pushed the processOnlineData branch from e583142 to 0617b50 Compare May 31, 2024 09:31

tmichela marked this pull request as ready for review May 31, 2024 09:35

takluyver reviewed Jun 4, 2024

View reviewed changes

damnit/backend/extract_data.py Outdated Show resolved Hide resolved

takluyver reviewed Jun 4, 2024

View reviewed changes

damnit/backend/listener.py Outdated Show resolved Hide resolved

takluyver reviewed Jun 4, 2024

View reviewed changes

damnit/ctxsupport/ctxrunner.py Show resolved Hide resolved

takluyver reviewed Jun 4, 2024

View reviewed changes

damnit/backend/listener.py Show resolved Hide resolved

adress reviewer comments, prevent user to start the listener on the o…

b97fd0a

…nline cluster

takluyver reviewed Jun 5, 2024

View reviewed changes

damnit/gui/main_window.py Outdated Show resolved Hide resolved

tmichela and others added 6 commits June 5, 2024 13:05

fix typo

28ea333

suppress cli argument help

7774252

merge master

92a4dd1

Process online data from solaris (#288)

87df37f

run as slurm job, proxy through 10G node

can run cluster variable on remote data

de16f57

Update main_window.py

91f439b

do not read the virtual overview file online because the links are ab…

ee21679

…solute and break with using the mount point

takluyver reviewed Aug 12, 2024

View reviewed changes

damnit/backend/extraction_control.py Outdated Show resolved Hide resolved

tmichela mentioned this pull request Aug 30, 2024

fallback on local execution if slurm fails #323

Merged

tmichela and others added 3 commits September 11, 2024 14:13

Merge branch 'master' into processOnlineData

72aadc7

fix conflicts

1565554

sort imports

9c80b22

tmichela added 2 commits September 11, 2024 14:44

missing import

f9e4dd5

bugfixes, update tests

63fe256

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mount remote filesystem with sshfs #255

mount remote filesystem with sshfs #255

tmichela commented May 27, 2024

takluyver commented Jun 4, 2024

tmichela commented Jun 5, 2024

tmichela commented Jun 5, 2024

takluyver commented Jun 5, 2024

JamesWrigley commented Jul 15, 2024

tmichela commented Jul 15, 2024

tmichela commented Aug 8, 2024

tmichela commented Aug 8, 2024

mount remote filesystem with sshfs #255

Are you sure you want to change the base?

mount remote filesystem with sshfs #255

Conversation

tmichela commented May 27, 2024

takluyver commented Jun 4, 2024

tmichela commented Jun 5, 2024

tmichela commented Jun 5, 2024

takluyver commented Jun 5, 2024

JamesWrigley commented Jul 15, 2024

tmichela commented Jul 15, 2024

tmichela commented Aug 8, 2024

tmichela commented Aug 8, 2024