Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] FastGA support #1459

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

[experimental] FastGA support #1459

wants to merge 3 commits into from

Conversation

glennhickey
Copy link
Collaborator

FastGA is a new pairwise genome aligner that seems like a good candidate to help speed up Cactus.. This PR adds an option to drop it into progressive cactus as a lastz replacement.

It's got a ways to go before merging though, as it doesn't yet pass the evolver mammals test. Issues so far:

  • FastGA aborts when when aligning trimmed ingroups to outgroups. I've hacked around this by disabling trimIngroups when fastga is activated.
  • The Anc0 alignment is empty, leading to a cactus_consolidated crash. I think this is because the ancestor(s) below it are too fragmented for FastGA to align.

Without having spent much time on this, it looks like FastGA does not work well with small contigs, at least with its default parameters. This leads to trouble with trimmed and ancestral sequences in Cactus.

This branch should still be runnable on pairwise alignments in Cactus, and pairwise tests are probably the next step before seeing how much it's worth pursuing the above issues.

@glennhickey
Copy link
Collaborator Author

make evolver_test_poa_local (primates star tree) fails with

Comparing mafcomp accuracy 0.980491,0.980145 to baseline accuracy 0.998757,0.985563 with threshold (0.0025, 0.0075)

make evolver_test_local (mammals progressive) fails with

Comparing mafcomp accuracy 0.749021,0.290327 to baseline accuracy 0.894622,0.706771 with threshold (0.05, 0.13)

When I switch to a star tree for the mammals

Comparing mafcomp accuracy 0.840356,0.336892 to baseline accuracy 0.894622,0.706771 with threshold (0.05, 0.13)

which means the divergence rather than ancestor alignements seems to be the driving force for most of the recall drop. so could be hope for improvement via tuning parameters

@glennhickey
Copy link
Collaborator Author

@benedictpaten super low priority, but paffy chain always gives 0 scores to fastga alignments for reasons I don't quite see (but are probably pretty obvious). File to reproduce here: http://public.gi.ucsc.edu/~hickey/debug/fastga-chaining/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant