-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
species_separator run as Slurm sbatch job on HPC exists prematurely #105
Comments
Hi Rebecca, Many thanks for trying out the software! I haven't actually run Sargasso using a job scheduler before, so it would be greatly helpful to us to be able to get this working for you in this new situation. Can I just confirm that when you run in interactive mode and the tool completes, that all the expected output files are present, as described here - e.g. that the filtered_reads directory contains a BAM file for each sample and species? It's strange that when running in batch mode the tool exits prematurely, but my first guess would be that it may be an interaction between the scheduler and the particular way in which Sargasso executes its commands. Basically the main python code (the "species_separator" invocation) writes a Makefile into the output directory, and then opens a subprocess to execute "make" using that Makefile. Subsequently, the "make" execution will call various other bash scripts and Python code as it works through the Makefile. My initial guess is that because the main python code exits once it has initiated the execution of "make", perhaps the job scheduler is noticing this, and thus thinking that all of the execution is finished, and is thus terminating the job while the Makefile is still being executed. That could explain why adding a "sleep" allows you to work around this. I think that we might hopefully test this in the following way. If you remove the "--run-separation" flag from the "species_separator" invocation, then the Makefile will be written, but it will not be executed. Then you could manually add execution of that Makefile with "make" as a subsequent step to the script that you are submitting to the job scheduler. In terms of Sargasso operation, this will behave exactly as if the Makefile had been executed automatically with "--run-separation", but it would mean that the job scheduler would at least know about the make invocation, and would thus hopefully not prematurely terminate the job? Would it be possible to try that as a test? Best regards, |
Hi and thanks for the helpful response.
I'll ask our sys admins if they have any insight into the behavior. Thanks! |
That's great that it works now. My hunch is that Slurm is not aware of the subprocess that Sargasso starts in order to run make, and so terminates the whole batch job as soon as the main python process finishes (because it thinks that everything is done) - but as you suggest your sys admins may have a better insight into that! If they do have any clues it would brilliant to know, and then I'll update the documentation to give this tip for running the tool under a batch scheduler. Many thanks, |
Hello and thanks for the great software!
I run Sargasso on an HPC which uses the Slurm job scheduler.
I notice that when I run a batch job using sbatch, the jobs exit prematurely.
When I run in interactive mode, they complete.
The code I am running looks like this:
The stdout looks like this when it exists prematurely shows that it stops after the
collate_raw_reads
step. The log:Right now I am getting around it by putting a
sleep
command at the end of the script in order to keep the job alive. This isn't ideal because I don't know in advance how long it will take and want to be mindful of cluster resources. Do you have any idea why the job is exiting prematurely in batch mode or how I can keep it doing until it is done?Thanks again.
Rebecca
The text was updated successfully, but these errors were encountered: