-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding flye's --min-overlap command line option as a param to wf-clone-valiudation #52
Comments
Hi @potapovneb, many thanks for reaching out.
Could you explain some of these reasons? Thanks! |
Hi @julibeg, Flye computes the minimum overlap value based on the observed read length distribution. I believe it takes N90 value for the minimum overlap. It works fine for most samples. In my case, I extract a small subset of the reads based on their read length (let's say any read from 4900 nt to 5100 nt). This is done to build plasmid assembly for a specific peak in the read length distribution (let's say 5000 nt in this case). In cases like this (when there is very little variation in read lengths), N90 value computed by flye is too high and assembly fails. Manually overriding |
Hi @potapovneb, makes sense; thanks for the further information! |
This is also critical for us and it triggers a bug: It seems like perhaps Flye calculates the minimum overlap based on N90, but also rounds UP to the nearest kb. This probably works for genomes or large constructs, but for smaller plasmids, this often causes a minimum overlap size that is larger than the entire template, especially if the library dosen't have many smaller reads (i.e. mostly linearised circular plasmid). We often see failed assemblies for (what I suspect is) this reason. |
This is valuable input, thank you! Have you seen similar issues when running the workflow with Canu? |
I haven't rigorously tested both solutions, but on the occasion where we see the failed assemblies, they are almost always resolved by Canu. |
Thanks for letting us know, we know that sometimes Canu assembles fine where Flye fails for smaller plasmids but Canu does not work on mac arm which is why we offer both and have Flye as the default. Once Canu supports Arm which is in the pipeline we will consider changing the default to Canu. In the meantime we will look in to exposing min overlap. |
We would much prefer to use Flye instead of Canu, because Canu (or something else in the pipeline) appears to regularly make small (<200bp) errors in the assemblies due to what we suspect is something to do with read trimming. But the current behaviour with rounding up to the nearest 1kb (if that's what's happening) prevents us from using the Flye option. |
the min overlap for flye is 1000, it complains if you go lower with the error |
We saw another set of Canu assemblies that were ~200bp too short, and the --trim_length parameter seems to have solved the issue. We'd still prefer to use flye though, since flye seems to do a better job in general. But, the min-overlap problem causes too many failures. |
Closing for now but we have opened a ticket internally to add this in a future release, although can't give a timeline yet - sometime next year. |
Hi @scottcoutts looking in to adding this, do you have any example data you would be happy to share - if not no worries. |
Is your feature related to a problem?
flye assembler is used to estimate a minimum read overlap automatically (when
--assembly_tool
is set toflye
). Sometimes, the computed value (or the failsafe3000
) is not suitable for various reasons.Describe the solution you'd like
I would be great to pass something like
--flye-min-overlap
to wf-clone-validation. This flag could be set to'auto'
or to an actual value specified by user.Describe alternatives you've considered
Manually editing main.nf in wf-clone-validation to a required value.
Additional context
No response
The text was updated successfully, but these errors were encountered: