-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in step 8 of run_genespace() #171
Comments
I have seen this issue before and it can be caused by a couple different (but rare) situations, typically when a single gene is assigned to many many blocks (maybe the plants you are working with have gone through a lot of WGDs?), yet ploidy is set to 1. Is it possible that you have some genomes that are flagged as ploidy = 1 in |
I am hoping to use the pan-gene sets too. The situation is that WGD definitely happened but then the plants went through rediplodization. |
In your case, I bet there are no genomes that are truly 1x relative to all the others. This makes the underlying graph structure very complex and causes this particular error. I have yet to be able to recreate it myself, but likely this is because I haven't tried as complex a run as you have tried here. |
Oh, I also didn't see this note: |
Thank you for the explanation! Right - it seems that plants underwent many rounds of WGD and indeed most of these genomes are probably ancient tetraploids of some sort. It makes sense. Right, I noticed these notes too. After removing three species that are unscaffolded, the run was ok. |
Good to hear! |
Thank you for developing genespace! I have been using it in many projects.
Previously when I ran a similar dataset it was fine, until I added another species, the following error happened:
############################ 8. Constructing syntenic pan-gene sets ... **WARNING**: genomes Aquifoliales_Ilex_paraguariensis, Escalloniales_Escallonia_herrerae have < 75% of genes on chromosomes that contain > 10 genes. Synteny is not a useful metric for these genomes. Be very careful with your pan-gene sets. Camellia_lanceoleosa : Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice. Calls: run_genespace ... merge -> merge.data.table -> [ -> [.data.table -> vecseq Execution halted
These are 41 eudicot genomes that have a pretty deep divergence.
Thank you for any pointers.
The text was updated successfully, but these errors were encountered: