Skip to content

Commit

Permalink
Merge pull request #96 from peterk87/force-clair3-full-aln
Browse files Browse the repository at this point in the history
Fix Clair3 sometimes missing variants
  • Loading branch information
peterk87 authored Dec 13, 2024
2 parents 92f1f2e + 7b298d2 commit d4c5f42
Show file tree
Hide file tree
Showing 11 changed files with 83 additions and 34 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[3.6.1](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.6.1)] - 2024-12-13

This patch release fixes an issue with Clair3 not producing variant calls for some regions due to full-alignment not being triggered. This issue was resolved by adding `--var_pct_phasing=1`, `--var_pct_full=1` and `--ref_pct_full=1` to the Clair3 command line.

### Changes

* fix: Added `--var_pct_phasing=1`, `--var_pct_full=1` and `--ref_pct_full=1` to Clair3 command line to ensure full-alignment is triggered for all reads to avoid missing variant calls in some regions.
* fix: Added `stageAs: "input*/*"` to `CAT_NANOPORE_FASTQ` process input channels to ensure that input files are not concatenated with themselves in an infinite loop until disk space is exhausted in rare cases.
* feat: Don't save NCBI Influenza reference sequences, metadata CSV and BLAST DB to the output directory by default. Added `--save_ncbi_db` and `--save_blastdb` workflow params to save these files to the output directory if desired.
* docs: Updated README.md to mention Apptainer. Updated `usage.md` to describe new workflow params. Updated `output.md` to better describe BLAST subtyping results.

## [[3.6.0](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.6.0)] - 2024-12-02

This minor release adds [FluMut](https://github.com/izsvenezie-virology/FluMut) to "to search for molecular markers with potential impact on the biological characteristics of Influenza A viruses of the A(H5N1) subtype."
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# CFIA-NCFAD/nf-flu - Influenza A and B Virus Genome Assembly Nextflow Workflow

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13892044.svg)](https://doi.org/10.5281/zenodo.13892044)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14268099.svg)](https://doi.org/10.5281/zenodo.14268099)
[![CI](https://github.com/CFIA-NCFAD/nf-flu/actions/workflows/ci.yml/badge.svg)](https://github.com/CFIA-NCFAD/nf-flu/actions/workflows/ci.yml)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with apptainer](https://img.shields.io/badge/run%20with-apptainer-1d355c.svg?labelColor=000000)](https://apptainer.org/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![run with podman](https://img.shields.io/badge/run%20with-podman-1d355c.svg?labelColor=000000)](https://podman.io/)

## Introduction

Expand All @@ -32,25 +34,25 @@ After reference sequence selection, the pipeline performs read mapping to each r

## Quick Start

1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`).
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort)_
1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=22.10.1`; latest stable release recommended!).
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Apptainer`][], [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort)_
3. Download the pipeline and test it on a minimal dataset with a single command:

For Illumina workflow test:

```bash
nextflow run CFIA-NCFAD/nf-flu -profile test_illumina,<docker/singularity/podman/shifter/charliecloud/conda> \
nextflow run CFIA-NCFAD/nf-flu -profile test_illumina,<docker/apptainer/singularity/podman/shifter/charliecloud/conda> \
--max_cpus $(nproc) # use all available CPUs; default is 2
```

For Nanopore workflow test:

```bash
nextflow run CFIA-NCFAD/nf-flu -profile test_nanopore,<docker/singularity/podman/shifter/charliecloud/conda> \
nextflow run CFIA-NCFAD/nf-flu -profile test_nanopore,<docker/apptainer/singularity/podman/shifter/charliecloud/conda> \
--max_cpus $(nproc) # use all available CPUs; default is 2
```

> * If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
> * If you are using `apptainer`/`singularity` then the pipeline will auto-detect this and attempt to download the Apptainer/Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Apptainer/Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
> * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.

4. Run your own analysis
Expand All @@ -69,7 +71,7 @@ After reference sequence selection, the pipeline performs read mapping to each r
nextflow run CFIA-NCFAD/nf-flu \
--input samplesheet.csv \
--platform illumina \
--profile <docker/singularity/podman/shifter/charliecloud/conda>
--profile <docker/apptainer/singularity/podman/shifter/charliecloud/conda>
```

* Typical command for Nanopore Platform
Expand All @@ -78,7 +80,7 @@ After reference sequence selection, the pipeline performs read mapping to each r
nextflow run CFIA-NCFAD/nf-flu \
--input samplesheet.csv \
--platform nanopore \
--profile <docker/singularity/conda>
--profile <docker/apptainer/singularity/conda>
```

## Documentation
Expand Down Expand Up @@ -223,8 +225,9 @@ Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney
* [nf-core](https://nf-co.re) project for establishing Nextflow workflow development best-practices, [nf-core tools](https://nf-co.re/tools-docs/) and [nf-core modules](https://github.com/nf-core/modules)
* [nf-core/viralrecon](https://github.com/nf-core/viralrecon) for inspiration and setting a high standard for viral sequence data analysis pipelines
* [Conda](https://docs.conda.io/projects/conda/en/latest/) and [Bioconda](https://bioconda.github.io/) project for making it easy to install, distribute and use bioinformatics software.
* [Biocontainers](https://biocontainers.pro/) for automatic creation of [Docker] and [Singularity] containers for bioinformatics software in [Bioconda]
* [Biocontainers](https://biocontainers.pro/) for automatic creation of [Docker] and [Apptainer]/[Singularity] containers for bioinformatics software in [Bioconda]
[Apptainer]: https://apptainer.org/
[BcfTools]: https://samtools.github.io/bcftools/
[BLAST]: https://blast.ncbi.nlm.nih.gov/Blast.cgi
[Clair3]: https://github.com/HKU-BAL/Clair3
Expand Down
8 changes: 4 additions & 4 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ process {
}
withName: 'BLAST_MAKEBLASTDB' {
ext.args = '-dbtype nucl'
publishDir = [
publishDir = [ params.save_blastdb ?
[
path: { "${params.outdir}/blast/db/ncbi"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
mode: params.publish_dir_mode
]
] : []
]
}
withName: 'BLAST_BLASTN.*' {
Expand Down Expand Up @@ -99,12 +99,12 @@ process {
]
}
withName: 'ZSTD_DECOMPRESS_.*' {
publishDir = [
publishDir = [ params.save_ncbi_db ?
[
path: { "${params.outdir}/ncbi-influenza-db"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
mode: params.publish_dir_mode
]
] : []
]
}
withName: 'MQC_VERSIONS_TABLE' {
Expand Down
4 changes: 2 additions & 2 deletions conf/modules_illumina.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ process {

withName: 'BLAST_MAKEBLASTDB_NCBI' {
ext.args = '-dbtype nucl'
publishDir = [
publishDir = [ params.save_blastdb ?
[
path: { "${params.outdir}/blast/db/ncbi"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
mode: params.publish_dir_mode
]
] : []
]
}

Expand Down
14 changes: 2 additions & 12 deletions conf/modules_nanopore.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ process {
}
withName: 'BLAST_MAKEBLASTDB_REFDB' {
ext.args = '-dbtype nucl'
publishDir = [
publishDir = [ params.save_blastdb ?
[
path: { "${params.outdir}/blast/db/ref_db" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
mode: params.publish_dir_mode
]
] : []
]
}
withName: 'BLAST_BLASTN_IRMA' {
Expand Down Expand Up @@ -213,16 +213,6 @@ process {
]
}

withName: 'ZSTD_DECOMPRESS_.*' {
publishDir = [
[
path: { "${params.outdir}/ncbi-influenza-db"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
mode: params.publish_dir_mode
]
]
}

withName: 'READ_COUNT_FAIL_TSV' {
publishDir = [
[
Expand Down
6 changes: 3 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,11 @@ The report contains 2 sheets:
<details markdown="1">
<summary>Output files</summary>

- H/N subtyping Excel report: `iav-subtyping-report.xlsx`
- H/N subtyping Excel report: `nf-flu-subtyping-report.xlsx`

</details>

A H/N subtyping Excel report is generated from all [BLAST analysis](#blast-analysis) results for all samples and final assembled gene segments.
A H/N subtyping Excel report is generated from all [BLAST analysis](#blast-analysis) results for all samples and final assembled gene segments. The H and N subtypes are based on the proportion of high-quality BLAST matches that support the subtype prediction, that is, the top BLAST match for the HA and NA segments does not determine the subtype since the metadata for the top match could be incorrectly entered into NCBI.

The subtyping report spreadsheet contains four sheets:

Expand Down Expand Up @@ -190,7 +190,7 @@ This sheet contains the H/N subtype prediction results for each sample along wit

#### Sheet: 2_Top Segment Matches

This sheet contains the top 3 Influenza DB sequence matches for each segment of each sample along with BLASTN hit values and reference sequence metadata.
This sheet contains the top N Influenza DB sequence matches for each segment of each sample along with BLASTN hit values and reference sequence metadata.

| Field | Description | Example |
|-------|-------------|---------|
Expand Down
14 changes: 14 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,20 @@ Reference database in fasta file, sequence ID must be in format `SequenceName_se

The output directory where the results will be saved.

#### `--save_ncbi_db`

- Type: boolean
- Default: `false`

Save the NCBI Influenza database FASTA and metadata CSV to the output directory.

#### `--save_blastdb`

- Type: boolean
- Default: `false`

Save the BLAST database to the output directory.

### IRMA assembly options

#### `--irma_module`
Expand Down
5 changes: 4 additions & 1 deletion modules/local/clair3.nf
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,10 @@ process CLAIR3 {
--haploid_sensitive \\
--enable_long_indel \\
--keep_iupac_bases \\
--include_all_ctgs
--include_all_ctgs \\
--var_pct_phasing=1 \\
--var_pct_full=1 \\
--ref_pct_full=1
ln -s ${clair3_dir}/merge_output.vcf.gz ${vcf}
Expand Down
2 changes: 1 addition & 1 deletion modules/local/misc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ process CAT_NANOPORE_FASTQ {
}

input:
tuple val(meta), path(fqgz), path(fq)
tuple val(meta), path(fqgz, stageAs: "input*/*"), path(fq, stageAs: "input*/*")

output:
tuple val(meta), path(merged_fqgz), emit: reads
Expand Down
20 changes: 18 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ params {
irma_module = ''
keep_ref_deletions = true
skip_irma_subtyping_report = true
save_ncbi_db = false
save_blastdb = false
// H/N subtyping options
pident_threshold = 0.85
min_aln_length = 700
Expand Down Expand Up @@ -68,15 +70,26 @@ params {
includeConfig 'conf/base.config'

profiles {
apptainer {
apptainer.enabled = true
apptainer.autoMounts = true
charliecloud.enabled = false
docker.enabled = false
apptainer.enabled = false
podman.enabled = false
shifter.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
apptainer.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
conda {
conda.enabled = true
apptainer.enabled = false
docker.enabled = false
singularity.enabled = false
podman.enabled = false
Expand All @@ -89,6 +102,7 @@ profiles {
mamba {
conda.enabled = true
conda.useMamba = true
apptainer.enabled = false
docker.enabled = false
singularity.enabled = false
podman.enabled = false
Expand All @@ -101,20 +115,22 @@ profiles {
docker {
docker.enabled = true
docker.userEmulation = true
apptainer.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
apptainer.enabled = false
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
Expand Down Expand Up @@ -155,7 +171,7 @@ manifest {
description = 'Influenza A virus genome assembly pipeline'
homePage = 'https://github.com/CFIA-NCFAD/nf-flu'
author = 'Peter Kruczkiewicz, Hai Nguyen'
version = '3.6.0'
version = '3.6.1'
nextflowVersion = '!>=22.10.1'
mainScript = 'main.nf'
doi = '10.5281/zenodo.13892044'
Expand Down
12 changes: 12 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,18 @@
"description": "The output directory where the results will be saved.",
"default": "./results",
"fa_icon": "fas fa-folder-open"
},
"save_ncbi_db": {
"type": "boolean",
"description": "Save the NCBI Influenza database FASTA and metadata CSV to the output directory.",
"default": false,
"fa_icon": "fas fa-database"
},
"save_blastdb": {
"type": "boolean",
"description": "Save the BLAST database to the output directory.",
"default": false,
"fa_icon": "fas fa-database"
}
},
"required": [
Expand Down

0 comments on commit d4c5f42

Please sign in to comment.