Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: 'bc_longlist_dir/3M-february-2018.txt.gz' #148

Open
Josephinedh opened this issue Dec 18, 2024 · 1 comment
Open

Comments

@Josephinedh
Copy link

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux 8.10 (Ootpa)

Workflow Version

v2.3.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-single-cell
--expected_cells 100
--fastq '../wf-single-cell-demo/chr17.fq.gz'
--kit '3prime:v3'
-r v2.3.0
--ref_genome_dir '../wf-single-cell-demo'
-profile singularity
--genes_of_interest '../wf-single-cell-demo/umap_plot_genes.csv'

Workflow Execution - CLI Execution Profile

singularity

What happened?

The workflow has previously been working fine but this week I started getting the error No such file or directory: 'bc_longlist_dir/3M-february-2018.txt.gz' also when running with the demo data.
demo_wfsinglecell.log

Relevant log output

ERROR ~ Error executing process > 'pipeline:preprocess:call_adapter_scan (1)'

Caused by:
  Process `pipeline:preprocess:call_adapter_scan (1)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=8
  
  workflow-glue adapter_scan_vsearch         chunk.fq.gz         --kit 3prime         --summary "adapters.json"         --keep_fl_only     | workflow-glue extract_barcode         -         bc_longlist_dir/3M-february-2018.txt.gz         --kit 3prime         --adapter1_suff_length 10         --min_barcode_qv 15         --barcode_length 16         --umi_length 12         --output_read_tags "bc_extract.tsv"         --output_barcode_counts "high_quality_bc_counts.tsv"     | minimap2 -ax splice -uf --MD         -t 7 -K 10M         --junc-bed ref_genes.bed          --cap-kalloc 100m --cap-sw-mem 50m         genome_index.mmi -     | samtools view -uh --no-PG -     | tee >(seqkit bam -s  2> bamstats.tsv )     | samtools view -uh -F 256 -     | tee >(samtools sort --write-index -o "sorted.bam"##idx##"sorted.bam.bai" --no-PG  -)     | seqkit bam -F - 2> bam_info.tsv
  
  # TODO: improve this with pipes?
  csvtk cut -tlf Read,Pos,EndPos,Ref,MapQual bam_info.tsv > bam_info_cut.tsv
  # Left join of barcode
  csvtk join -tlf 1 bam_info_cut.tsv bc_extract.tsv --left-join         | csvtk rename -tl -f Read,Pos,EndPos,Ref,MapQual -n read_id,start,end,chr,mapq -o read_tags.tsv
  
  rm bam_info.tsv bam_info_cut.tsv bc_extract.tsv

Command exit status:
  1

Command output:
  (empty)

Command error:
  [13:45:34 - workflow_glue] Bootstrapping CLI.
  [13:45:34 - workflow_glue] Bootstrapping CLI.
  [WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
  [M::main::0.492*0.98] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::0.599*0.98] mid_occ = 220
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::0.669*0.98] distinct minimizers: 9925342 (84.65% are singletons); average occurrences: 1.579; average spacing: 5.314; total length: 83257441
  [13:45:36 - workflow_glue] Starting entrypoint.
  [13:45:36 - workflow_glue.ExtractBC ] Loading barcode whitelist from bc_longlist_dir/3M-february-2018.txt.gz
  Traceback (most recent call last):
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/__init__.py", line 82, in cli
      args.func(args)
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/extract_barcode.py", line 368, in main
      wl = pd.read_csv(args.superlist, header=None).iloc[:, 0].values
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
      return _read(filepath_or_buffer, kwds)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
      parser = TextFileReader(filepath_or_buffer, **kwds)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
      self._engine = self._make_engine(f, self.engine)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
      self.handles = get_handle(
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/pandas/io/common.py", line 753, in get_handle
      handle = gzip.GzipFile(  # type: ignore[assignment]
    File "/home/epi2melabs/conda/lib/python3.8/gzip.py", line 173, in __init__
      fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: 'bc_longlist_dir/3M-february-2018.txt.gz'
  [13:45:36 - workflow_glue] Starting entrypoint.
  [13:45:36 - workflow_glue.AdaptScan ] Writing adapter sequences to adapters.fasta.
  [13:45:36 - workflow_glue.AdaptScan ] Running vsearch
  [13:45:36 - workflow_glue.AdaptScan ] seqkit fq2fa chunk.fq.gz | vsearch --usearch_global -         --db adapters.fasta --minseqlength 20 --maxaccepts 5 --id 0.7         --strand plus --wordlength 3 --minwordmatches 10 --output_no_hits --userfields         'query+target+id+alnlen+mism+opens+qilo+qihi+qstrand+tilo+tihi+ql+tl'         --userout chunk.fq.vsearch.tsv --threads 8
  [M::main] Version: 2.24-r1122
  [M::main] CMD: minimap2 -ax splice -uf --MD -t 7 -K 10M --junc-bed ref_genes.bed --cap-kalloc 100m --cap-sw-mem 50m genome_index.mmi -
  [M::main] Real time: 2.254 sec; CPU: 0.679 sec; Peak RSS: 0.388 GB
  [13:45:59 - workflow_glue.AdaptScan ] Parsing vsearch hits.
  [13:45:59 - workflow_glue.AdaptScan ] Reading data
  [13:45:59 - workflow_glue.AdaptScan ] Finished reading and sorting data
  [13:46:07 - workflow_glue.AdaptScan ] Creating fastq for 500000 reads.
  Traceback (most recent call last):
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow-glue", line 7, in <module>
      cli()
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/__init__.py", line 82, in cli
      args.func(args)
    File "/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/bin/workflow_glue/adapter_scan_vsearch.py", line 432, in main
      sys.stdout.write(read)
  BrokenPipeError: [Errno 32] Broken pipe

Work dir:
  /maps/datasets/weischenfeldt_lab-AUDIT/gbm/data/scrna/demo/work/f1/39eee5c74f6816c585a705a23c66ef

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

No response

@nrhorner
Copy link
Contributor

Hi @Josephinedh

Something fishy going on here:
WARN: Unable to fetch attribute for file: /home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/data/OPTIONAL_FILE - Hash is inferred from Git repository commit Id

Could you try deleting the following folder and trying to run again please?
/home/vbj167/.nextflow/assets/epi2me-labs/wf-single-cell/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants