-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Homopolymer indels not consistently aligned #48
Comments
This is actually the optimal alignment in terms of the whole partial-order graph.
the "AC" was put after the gaps: This is because the number of gaps matches So if you set the mismatch penalty larger than 2 or more indels, it may give you the result you want. But this may lead to other alignment issues where a lot of indels might show up. |
right... interesting. I thought if I isolated these subsequences and reran progressive alignment using the same parameters, it could remove some order dependence. The result was slightly more interpretable, but still not ideal. However, if I sort the reads by ascending length first and then rerun I get this output, which is closer to what I was hoping for:
Is the Regardless, this result makes me think that a combination of seeding/chaining and locally computed guide trees could be interesting. But perhaps it would turn out to be just another exercise in manual parameter tuning. |
@rlorigro FWIW in cactus we pass in the sequences in descending order of length to abpoa. Sorting in this way really helped accuracy if I recall, even (counterintuitively?) when abPOA's progressive mode is enabled. |
Yea I tried both and got "better" results with ascending order this time. I think ascending order works in this case because we want to enforce that a gap is introduced early on in the graph, allowing a lower cost path for future sequences to extend the gap successively. |
It would be interesting to see each stage of the graph being built to verify what is happening |
@rlorigro The --progressive in abPOA does not perform pairwise alignment between each of the two sequences, only calculates the approximate similarity to minimize the run time of this step. |
Hi, I am trying to get a reasonable alignment in a region which has some tandem repeats, flanked by non-repetitive sequence. I can get good (enough) results in the tandem region using these parameters:
However, in the (mostly non-repetitive) flanking region there is a long homopolymer, where I get this result:
Where it seems to arbitrarily assign different paths to the same
AC
prefix. Do you think this can be resolved with parameter choices or is this an unavoidable aspect of POA?Thanks
The text was updated successfully, but these errors were encountered: