Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM reference problem in snp:extract_not_haplotagged_contigs #235

Open
fa2k opened this issue Jan 13, 2025 · 1 comment
Open

CRAM reference problem in snp:extract_not_haplotagged_contigs #235

fa2k opened this issue Jan 13, 2025 · 1 comment

Comments

@fa2k
Copy link

fa2k commented Jan 13, 2025

Operating System

Other Linux (please specify below)

Other Linux

RHEL 9

Workflow Version

v2.6.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

/data/common/tools/nextflow/nextflow-23.10.1-all run \
	/data/runScratch.boston/analysis/pipelines/epi2me-labs-wf-human-variation_v2.6.0/v2_6_0/ \
	-profile singularity \
	-c /data/runScratch.boston/analysis/pipelines/nsc_slurm.conf \
	-c singularity.conf \
	--bam ../../241218_TestS2/no_sample_id/20241218_1558_P2S-02531-B_PAY55351_5638a90f/bam_pass \
	--ref /data/runScratch.boston/analysis/pipelines/nf-core/references/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
    --sample_name TestS2 \
	--sv --snp --cnv --str --mod --phased \
    --bam_min_coverage 15 \
	--out_dir output \
    --threads 64 \
    -resume

Workflow Execution - CLI Execution Profile

singularity

What happened?

I have downloaded the pipeline using nf-core tools.

There is an error in the process snp:extract_not_haplotagged_contigs

It's possibly caused by the fact that I don't have Internet access.

I will submitt a PR to use the reference in the samtools commands.

Relevant log output

ERROR ~ Error executing process > 'snp:extract_not_haplotagged_contigs (1)'

Caused by:
  Process `snp:extract_not_haplotagged_contigs (1)` terminated with an error exit status (1)

Command executed:

  mkdir -p output

  # create file of sequence names by extracting all SQ SN
  samtools view -H --no-PG 'TestS2.cram' | grep '^@SQ' | sed -nE 's,.*SN:([^[:space:]]*).*,\1,p' > all_sq.fosn

  # pull out contigs that do not appear in the haplotagged fofn
  # sort haplotagged contig list here (dont leave it to nextflow collectfile)
  comm -23 <(sort all_sq.fosn) <(sort haplotagged_sq.fosn) > unhaplotagged_sq.fosn

  if [ -s unhaplotagged_sq.fosn ]; then
      while read sq; do
          echo "Extracting ${sq}"
          samtools view TestS2.cram "${sq}" -@ 7 --no-PG -o "output/${sq}_nohp.bam"
      done < unhaplotagged_sq.fosn
  fi

  # bonus bam: pull out unaligned - this file will always be created
  # use '*' region rather than -f4 for speeds
  echo "Extracting *"
  samtools view TestS2.cram '*' -@ 7 --no-PG -o output/unaligned.bam

Command exit status:
  1

Command output:
  Extracting chr11_KI270721v1_random

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/9654b5d3f36845bb9d19a6dbd15d2f22": Destination address required
  /boston/runScratch/ONT/analysis/241218_TestS2_human_variation_failing/work/71/381ff83cc459aa72033bd3a5a5aa95/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/boston/runScratch/ONT/analysis/241218_TestS2_human_variation_failing/work/71/381ff83cc459aa72033bd3a5a5aa95/genome.fa'
  [W::cram_get_ref] Failed to populate reference for id 43
  [E::cram_decode_slice] Unable to fetch reference #43:81454-98754

  [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/9654b5d3f36845bb9d19a6dbd15d2f22": Destination address required
  /boston/runScratch/ONT/analysis/241218_TestS2_human_variation_failing/work/71/381ff83cc459aa72033bd3a5a5aa95/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/boston/runScratch/ONT/analysis/241218_TestS2_human_variation_failing/work/71/381ff83cc459aa72033bd3a5a5aa95/genome.fa'
  [W::cram_get_ref] Failed to populate reference for id 43
  [E::cram_next_slice] Slice decode failure
  samtools view: retrieval of region #22058 failed

Work dir:
  /boston/runScratch/ONT/analysis/241218_TestS2_human_variation_failing/work/5e/0c0d8cd6bb10cbc6cfeba1dbfea803

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

@SamStudio8
Copy link
Member

SamStudio8 commented Jan 13, 2025

Thanks for spotting this one @fa2k - a classic CRAM footgun, we'll fix this in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants