-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No translocations called #20
Comments
Hi Christoph, |
Hi Li, thank you for your reply and help! In the meantime I reran LinkedSV with temporary files (--save_temp_files) and it just finished. Interestingly, in the raw sv calls in file phased_possorted_bam.bam.raw_large_svcalls.bedpe, there are lots of translocations listed. For example: chr1 121136682 121136683 chr2 90473950 90473951 TRA N.A. 13 0 R_end R_end 52.794027 0.000000 0.000000 CACAAACCAGACACTT-1,AAGCAGGCAGTCCGGT-1,CTCCACACAGAGGACT-1,CTCGTACGTCAACCGC-1,TCCCTTTCAACGTTTG- .... and so on .... this is just a random set of the raw translocations calls (I also cut the listed barcodes, which would be too much), but I also saw a lot of translocations in there that made sense and match the other WGS data. I haven't used Loupe in a while and it currently gets stuck opening my files on my mac (maybe the old Loupe from 2016 has problem with newer mac os version?). Anyway, I'll try to find a workaround, but maybe in the meantime, I hope the results from the temporary files also help to clear things up. I would be really grateful for any hint on how I could proceed from here or how I can interpret the temporary data. Best, |
Hi Li, I kept looking into the temporary files and the source code and I have a question / wild guess: Are columns 14 and 15 of the raw sv call file "phased_possorted_bam.bam.raw_large_svcalls.bedpe" the "dbo-scores" and does this represent the significance of the barcode overlaps between breakpoints? I saw, that this is used to filter translocations. Not sure if anything of that is correct, but maybe it can help a little bit to clear things up? Best, |
You're right. columns 14 and 15 are the dbo-scores. LinkedSV has different filters. A call may be filtered out if it is in the black list, or lies in a extremely high coverage region, or low dbo scores. Would you mind sharing the bam file with me and I can figure out why the dbo scores are 0? If your data is confidential, you can email to [email protected] Thanks, |
@seismicon Best, |
Hi Li, I am very sorry for my late response. It is still an issue I could not really solve. I noticed, however, that the rearrangements from the temporary file are already quite good when filtering manually by the barcode support only and thereby "skipping" the dbo-score. Is that reasonable? Thanks a lot for your help and patience! |
Sure. Thanks for letting me know the bug! |
Hello,
I am running LinkedSV on a 10x linked-read WGS dataset of a highly rearranged human cell line. The alignment was done previously with the 10x Longranger pipeline. LinkedSV ran through, the results seemed fine (I got all the bedpe files, 1401 large SVs, and the images), but for some reason no translocations were called. I know for sure from a standard WGS analysis and other sources, that there are interchromosomal translocations in that sample.
I used the most recent LinkedSV version with python 2.7, the same hg19 reference genome file that was used for the Longranger alignment, and the following parameters (I substituted all the long directory names with "dir"):
linkedsv.py -i dir/phased_possorted_bam.bam -d dir -r dir/genome.fa -v hg19 -t 8 --somatic_mode
There were some errors concerning reference sequences in standard output, but I don't know if they are related to the problem:
dir/LinkedSV/scripts/../fermikit/fermi.kit/fermi2 simplify -CSo 76 -m 114 -T 71 dir/region_000001/region_000001.all_hap.pre.gz 2> dir/region_000001/region_000001.all_hap.mag.gz.log | gzip -1 > dir/region_000001/region_000001.all_hap.mag.gz
[04/29/2021 18:00:25 (6.820 GB)] cd dir/region_000001 && perl dir/LinkedSV/scripts/../fermikit/fermi.kit/run-calling -t 1 dir/region_000001/region_000001.fasta dir/region_000001/region_000001.all_hap.mag.gz | sh
ERROR: can't find the reference sequences
gzip: dir/region_000002/region_000002.all_hap.flt.vcf.gz: No such file or directory
ERROR: can't find the reference sequences
gzip: dir/region_000002/region_000002.all_hap.sv.vcf.gz: No such file or directory
and so on ... the same error comes for a few regions.
But still, the program went on and called deletions, inversion and duplications (they seem fine and match previous WGS analyses), only translocations were missing.
I have no idea where to start and look for the problem, so I would be really grateful for your help!
Many thanks,
Christoph
The text was updated successfully, but these errors were encountered: