Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding flye's --min-overlap command line option as a param to wf-clone-valiudation #52

Open
potapovneb opened this issue Jun 26, 2024 · 12 comments

Comments

@potapovneb
Copy link

Is your feature related to a problem?

flye assembler is used to estimate a minimum read overlap automatically (when --assembly_tool is set to flye). Sometimes, the computed value (or the failsafe 3000) is not suitable for various reasons.

Describe the solution you'd like

I would be great to pass something like --flye-min-overlap to wf-clone-validation. This flag could be set to 'auto' or to an actual value specified by user.

Describe alternatives you've considered

Manually editing main.nf in wf-clone-validation to a required value.

Additional context

No response

@julibeg
Copy link
Contributor

julibeg commented Jun 27, 2024

Hi @potapovneb, many thanks for reaching out.

Sometimes, the computed value (or the failsafe 3000) is not suitable for various reasons.

Could you explain some of these reasons? Thanks!

@potapovneb
Copy link
Author

Hi @julibeg,

Flye computes the minimum overlap value based on the observed read length distribution. I believe it takes N90 value for the minimum overlap. It works fine for most samples. In my case, I extract a small subset of the reads based on their read length (let's say any read from 4900 nt to 5100 nt). This is done to build plasmid assembly for a specific peak in the read length distribution (let's say 5000 nt in this case). In cases like this (when there is very little variation in read lengths), N90 value computed by flye is too high and assembly fails. Manually overriding --min-overlap would be useful.

@julibeg
Copy link
Contributor

julibeg commented Jul 2, 2024

Hi @potapovneb, makes sense; thanks for the further information!
We will consider exposing flye's --min-overalap parameter in a future release.

@scottcoutts
Copy link

scottcoutts commented Aug 1, 2024

This is also critical for us and it triggers a bug: It seems like perhaps Flye calculates the minimum overlap based on N90, but also rounds UP to the nearest kb. This probably works for genomes or large constructs, but for smaller plasmids, this often causes a minimum overlap size that is larger than the entire template, especially if the library dosen't have many smaller reads (i.e. mostly linearised circular plasmid). We often see failed assemblies for (what I suspect is) this reason.

@julibeg
Copy link
Contributor

julibeg commented Aug 1, 2024

This is valuable input, thank you! Have you seen similar issues when running the workflow with Canu?

@scottcoutts
Copy link

I haven't rigorously tested both solutions, but on the occasion where we see the failed assemblies, they are almost always resolved by Canu.

@sarahjeeeze
Copy link
Contributor

Thanks for letting us know, we know that sometimes Canu assembles fine where Flye fails for smaller plasmids but Canu does not work on mac arm which is why we offer both and have Flye as the default. Once Canu supports Arm which is in the pipeline we will consider changing the default to Canu. In the meantime we will look in to exposing min overlap.

@micromongenomics
Copy link

We would much prefer to use Flye instead of Canu, because Canu (or something else in the pipeline) appears to regularly make small (<200bp) errors in the assemblies due to what we suspect is something to do with read trimming. But the current behaviour with rounding up to the nearest 1kb (if that's what's happening) prevents us from using the Flye option.

@sarahjeeeze
Copy link
Contributor

sarahjeeeze commented Oct 4, 2024

the min overlap for flye is 1000, it complains if you go lower with the error --min-overlap: value should be in the range [1000, 10000] If you look at the flye repo I think its explained why somewhere. But for Canu mode if you set the trim_length parameter of the workflow to 0 do you still get the 200bp errors?

@scottcoutts
Copy link

We saw another set of Canu assemblies that were ~200bp too short, and the --trim_length parameter seems to have solved the issue.

We'd still prefer to use flye though, since flye seems to do a better job in general. But, the min-overlap problem causes too many failures.

@sarahjeeeze
Copy link
Contributor

Closing for now but we have opened a ticket internally to add this in a future release, although can't give a timeline yet - sometime next year.

@sarahjeeeze
Copy link
Contributor

sarahjeeeze commented Dec 17, 2024

Hi @scottcoutts looking in to adding this, do you have any example data you would be happy to share - if not no worries.

@sarahjeeeze sarahjeeeze reopened this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants