Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stocasticity in assemblies #61

Open
jamespblloyd-uwa opened this issue Jan 13, 2025 · 0 comments
Open

Stocasticity in assemblies #61

jamespblloyd-uwa opened this issue Jan 13, 2025 · 0 comments

Comments

@jamespblloyd-uwa
Copy link

Operating System

macOS

Other Linux

No response

Workflow Version

v1.7.1-g5b0d735

Workflow Execution

Command line (Local)

Other workflow execution

No response

EPI2ME Version

v1.7.1-g5b0d735

CLI command run

nextflow run epi2me-labs/wf-clone-validation
--fastq /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01
--sample_sheet /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01/sample_sheet_BigTest01.csv
--out_dir /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01.2_output

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

I have found that the output of wf-clone-validation is stochastic, not deterministic, and that sometimes the output is a mis-assembly, and others, it is accurate to expected. Sometimes the errors are small indels but other times very large (repeated) plasmids are the output. I think that this is related to the use of Flye but I cannot use Canu on MacOS. When I run the same command 5 times (only change is the location of the output directory), I will see different results for some plasmids. Interesting, of my 12 test plasmids, some are always assembled the same, but others are not. This could be related to how pure the clone is? If a mutation has occurred before plasmid purification, then perhaps that partly explains the issue. Size of the plasmid may also be a factor. I always saw good in silico assemblies for the larger plasmids (8-20 Kb) rather than the smaller ones (<4 Kb).

At the moment I need to run the workflow a couple of times at least to be confident about at least one of the assembly for some of the plasmids, rather than just looking at the output of one and thinking "oh no, it is bad". It would be good if this could be improved. Perhaps this is a feature request rather than bug report.

Relevant log output

N E X T F L O W   ~  version 24.10.3

Launching `https://github.com/epi2me-labs/wf-clone-validation` [tiny_poitras] DSL2 - revision: 5b0d7357de [master]


||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-clone-validation v1.7.1-g5b0d735
--------------------------------------------------------------------------------
Core Nextflow options
  revision       : master
  runName        : tiny_poitras
  containerEngine: docker
  container      : [withLabel:wfplasmid:ontresearch/wf-clone-validation:sha55e97540ca3ca4f06310269c2ebd3175e1e9352a, withLabel:canu:ontresearch/canu:shabbdea3813f6fb436ea0cbaa19958ad772db9154c, withLabel:medaka:ontresearch/medaka:sha447c70a639b8bcf17dc49b51e74dfcde6474837b, withLabel:wf_common:ontresearch/wf-common:shaabceef445fb63214073cbf5836fdd33c04be4ac7, withLabel:plannotate:ontresearch/plannotate:shaf7f37f751dd0bc529121b765fb63322502288a03]
  launchDir      : /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat
  workDir        : /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/work
  projectDir     : /Users/jameslloyd/.nextflow/assets/epi2me-labs/wf-clone-validation
  userName       : jameslloyd
  profile        : standard
  configFiles    : /Users/jameslloyd/.nextflow/assets/epi2me-labs/wf-clone-validation/nextflow.config

Input Options
  fastq          : /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01

Sample Options
  sample_sheet   : /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01/sample_sheet_BigTest01.csv

Output Options
  out_dir        : /Users/jameslloyd/Documents/Main_Work_JPBL/LL/BULLYWUG/2024-12-19_James_plasmids/fastq_pass_cat/BigTest01.2_output

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-clone-validation for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x


--------------------------------------------------------------------------------
This is epi2me-labs/wf-clone-validation v1.7.1-g5b0d735.
--------------------------------------------------------------------------------
Searching input for [.fastq, .fastq.gz, .fq, .fq.gz] files.
executor >  local (180)
[8d/3b5d2b] validate_sample_sheet                     [100%] 1 of 1 ✔
[d3/cfed9a] fastcat (11)                              [100%] 12 of 12 ✔
[-        ] cutsite_qc                                -
[-        ] pipeline:filterHostReads                  -
[d5/ff3fa8] pipeline:checkIfEnoughReads (11)          [100%] 12 of 12 ✔
[79/bf12bf] pipeline:assembleCore (10)                [100%] 21 of 21, failed: 9, retries: 9 ✔
[b9/b464c2] pipeline:medakaPolishAssembly (10)        [100%] 10 of 10 ✔
[a3/6733cf] pipeline:reorientateFastqAndGetFasta (10) [100%] 10 of 10 ✔
[bb/857b80] pipeline:downsampledStats (12)            [100%] 12 of 12 ✔
[27/8e65ae] pipeline:findPrimers (10)                 [100%] 10 of 10 ✔
[04/baafed] pipeline:medakaVersion                    [100%] 1 of 1 ✔
[53/628590] pipeline:flyeVersion                      [100%] 1 of 1 ✔
[92/ae1a29] pipeline:getVersions                      [100%] 1 of 1 ✔
[de/b26257] pipeline:getParams                        [100%] 1 of 1 ✔
[5d/94d5b2] pipeline:assembly_qc (10)                 [100%] 10 of 10 ✔
[f3/ed737a] pipeline:inserts (1)                      [100%] 1 of 1 ✔
[-        ] pipeline:insert_qc                        -
[-        ] pipeline:align_assembly                   -
[-        ] pipeline:assembly_comparison              -
[07/ecbbfd] pipeline:runPlannotate (1)                [100%] 1 of 1 ✔
[aa/bd819c] pipeline:assemblyMafs (10)                [100%] 10 of 10 ✔
[ef/f2ebd7] pipeline:report (1)                       [100%] 1 of 1 ✔
[40/70c00d] publish (65)                              [100%] 65 of 65 ✔
Completed at: 13-Jan-2025 05:49:50
Duration    : 15m 23s
CPU hours   : 1.5 (16.8% failed)
Succeeded   : 171
Failed      : 9

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant