-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sanger.config #724
Update sanger.config #724
Conversation
I think the sanger small queue is 30 min, might be good to perhaps adjust this?
True, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check with @muffato if you should investigate the small
queue batching before merging.
hmm,let me try on one pipeline that spins up loads of small jobs and see if it causes any isues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ISG think the 30 min limit apply to each job within the batch. I'm running a test now and will confirm later
- However, once a job has been committed onto a batch, it has to wait for all previous jobs of the batch to start. Say Nextflow submits 1 or multiple jobs in the
small
queue, LSF may take them together with other jobs of the user into a batch that's dispatched on a machine somewhere. Because of batching, the n-th job may wait up to (n-1)x30 min before even starting, which can be an unnecessary wait. (Without batching, each job has the chance of starting somewhere when resources become available). It's not something I want on the Tree of Life cluster without further testing. Can you add a test based onclustername.startsWith("tol")
so that we still use a threshold of 1 min on the ToL farm ?
i agree that if this is what it truly does i dont think we want that also in Humgen. |
It seems like the CHUNK_JOB_SIZE is set to 10, I agree its probably not best to have jobs that take the full 30 tied into chunks of 10. Maybe there is a intermediate such as 10/15 mins max that wouldn't cause much delay but actually get use out of the small queue? Not sure how often NF-CORE pipelines request that sort of time! |
Hi. I can confirm that the 30 min RUNLIMIT applies to the jobs within the batch, not the entire batch. In my test, each job ran for 20 min, and the batch 200 min. |
hmm, yes, thats not good and defeats the purpose. thanks for checking this @muffato |
Some of the comments i got from ISG @muffato
I presume in ToL you have pipelines with jobs that request <30 min and there is not too many of them? |
Systems also told me this:
I think 10 minutes may even be too high a threshold for the small queue. 5 min may be fine ? But how many jobs declare a |
You should probably change the config to just remove the fall through to the small queue. If you ever care about efficiency of your jobs and want to get them run as quickly as possible, especially in the case that you do have a workflow that has thousands of <5min jobs, you'll want to switch to wr for your scheduler. |
I think the sanger small queue is 30 min, might be good to perhaps adjust this?
name: New Config
about: A new cluster config
Please follow these steps before submitting your PR:
[WIP]
in its titlemaster
branchSteps for adding a new config profile:
conf/
directorydocs/
directorynfcore_custom.config
file in the top-level directoryREADME.md
file in the top-level directoryprofile:
scope in.github/workflows/main.yml