-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InterProScan to Pipeline and integrate in AMPcombi #428
base: dev
Are you sure you want to change the base?
Changes from all commits
86592d9
491f25d
bbd456e
5b5cb3e
d8c5bf2
fd1ef46
b54f1ea
fee3adb
8b44ed5
ed81b0b
0340ba9
7e1f164
5be17ef
b782b54
e58b322
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -230,4 +230,11 @@ process { | |||||
memory = { 6.GB * task.attempt } | ||||||
time = { 2.h * task.attempt } | ||||||
} | ||||||
|
||||||
withName: INTERPROSCAN_DATABASE { | ||||||
memory = { 6.GB * task.attempt } | ||||||
time = { 4.h * task.attempt } // Download might take longer with some Bandwidth! | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
cpus = { 6 * task.attempt } | ||||||
} | ||||||
|
||||||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -83,7 +83,7 @@ process { | |||||
] | ||||||
} | ||||||
|
||||||
withName: SEQKIT_SEQ { | ||||||
withName: SEQKIT_SEQ_LENGTH { | ||||||
ext.prefix = { "${meta.id}_long" } | ||||||
publishDir = [ | ||||||
path: { "${params.outdir}/bgc/seqkit/" }, | ||||||
|
@@ -96,6 +96,45 @@ process { | |||||
].join(' ').trim() | ||||||
} | ||||||
|
||||||
withName: SEQKIT_SEQ_FILTER { | ||||||
ext.prefix = { "${meta.id}_cleaned.faa" } | ||||||
publishDir = [ | ||||||
path: { "${params.outdir}/function/interproscan/" }, | ||||||
mode: params.publish_dir_mode, | ||||||
enabled: { params.run_function_interproscan }, | ||||||
saveAs: { filename -> filename.equals('versions.yml') ? null : filename } | ||||||
] | ||||||
ext.args = [ | ||||||
"--gap-letters '* \t.' --remove-gaps" | ||||||
].join(' ').trim() | ||||||
} | ||||||
|
||||||
withName: INTERPROSCAN_DATABASE { | ||||||
publishDir = [ | ||||||
path: { "${params.outdir}/databases/interproscan/" }, | ||||||
mode: params.publish_dir_mode, | ||||||
enabled: params.save_db, | ||||||
saveAs: { filename -> filename.equals('versions.yml') ? null : filename } | ||||||
] | ||||||
} | ||||||
|
||||||
withName: INTERPROSCAN { | ||||||
ext.prefix = { "${meta.id}_interproscan.faa" } | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this meant to have the file suffix at the end? |
||||||
publishDir = [ | ||||||
path: { "${params.outdir}/function/interproscan/" }, | ||||||
mode: params.publish_dir_mode, | ||||||
enabled: params.run_function_interproscan, | ||||||
saveAs: { filename -> filename.equals('versions.yml') ? null : filename } | ||||||
] | ||||||
ext.args = [ | ||||||
"--applications ${params.function_interproscan_applications}", | ||||||
params.function_interproscan_enableprecalc ? '' : '--disable-precalc', | ||||||
params.function_interproscan_enableresidueannot ? '' : '--disable-residue-annot', | ||||||
params.function_interproscan_disableresidueannottsv ? '--enable-tsv-residue-annot' : '', | ||||||
"--formats tsv" | ||||||
].join(' ').trim() | ||||||
} | ||||||
|
||||||
withName: PROKKA { | ||||||
ext.prefix = { "${meta.id}_prokka" } | ||||||
publishDir = [ | ||||||
|
@@ -676,7 +715,7 @@ process { | |||||
|
||||||
withName: AMP_DATABASE_DOWNLOAD { | ||||||
publishDir = [ | ||||||
path: { "${params.outdir}/databases/${params.amp_ampcombi_db}" }, | ||||||
path: { "${params.outdir}/databases/" }, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If we use the interproscan example above? |
||||||
mode: params.publish_dir_mode, | ||||||
enabled: params.save_db, | ||||||
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -25,6 +25,8 @@ results/ | |||||
| ├── prodigal/ | ||||||
| ├── prokka/ | ||||||
| └── pyrodigal/ | ||||||
├── function/ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What exactly do you mean here by function? I don't find that particularly descriptive and would'nt know what you mean by that necessarily. |
||||||
| └── interproscan/ | ||||||
├── amp/ | ||||||
| ├── ampir/ | ||||||
| ├── amplify/ | ||||||
|
@@ -74,6 +76,10 @@ ORF prediction and annotation with any of: | |||||
- [Prokka](#prokka) – open reading frame prediction and functional protein annotation. | ||||||
- [Bakta](#bakta) – open reading frame prediction and functional protein annotation. | ||||||
|
||||||
CDS domain annotation: | ||||||
|
||||||
- [InterProScan](#interproscan) (default) – for open reading frame protein and domain predictions. | ||||||
|
||||||
Antimicrobial Resistance Genes (ARGs): | ||||||
|
||||||
- [ABRicate](#abricate) – antimicrobial resistance gene detection, based on alignment to one of several databases. | ||||||
|
@@ -216,6 +222,23 @@ Output Summaries: | |||||
|
||||||
[Bakta](https://github.com/oschwengers/bakta) is a tool for the rapid & standardised annotation of bacterial genomes and plasmids from both isolates and MAGs. It provides dbxref-rich, sORF-including and taxon-independent annotations in machine-readable JSON & bioinformatics standard file formats for automated downstream analysis. The output is used by some of the functional screening tools. | ||||||
|
||||||
### Functional classifications | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about 'domainprediction' or something like that? |
||||||
|
||||||
[InterProScan](#interproscan) | ||||||
|
||||||
#### InterProScan | ||||||
|
||||||
<details markdown="1"> | ||||||
<summary>Output files</summary> | ||||||
|
||||||
- `interproscan/` | ||||||
- `<samplename>_cleaned.faa`: clean version of the fasta files (amino acids) generated by one of the annotated tools. These contain sequences with no special character | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
What is an 'annotated tool'? Is this atool within interproscan? |
||||||
- `<samplename>_interproscan_faa.tsv`: predicted proteins and domains using the InterPro database in TSV format | ||||||
|
||||||
</details> | ||||||
|
||||||
[InterProScan](https://academic.oup.com/bioinformatics/article/30/9/1236/237988?login=true) (**a**nti**m**icrobial **p**eptide **p**rediction **i**n **r**) was designed to predict the protein function and and provide possible domain and motif information for the coding regions. It utilizes the InterPro database that consists of multiple sister databases such as PANTHER, ProSite, Pfam, etc. More details can be found in the [documentation](https://interproscan-docs.readthedocs.io/en/latest/index.html). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### AMP detection tools | ||||||
|
||||||
[ampir](#ampir), [AMPlify](#amplify), [hmmsearch](#hmmsearch), [Macrel](#macrel) | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -111,7 +111,7 @@ We highly recommend performing quality control on input contigs before running t | |||||
For example, ideally BGC screening requires contigs of at least 3,000 bp else downstream tools may crash. | ||||||
::: | ||||||
|
||||||
## Notes on screening tools and taxonomic classification | ||||||
## Notes on screening tools, taxonomic and functional classifications | ||||||
|
||||||
The implementation of some tools in the pipeline may have some particular behaviours that you should be aware of before you run the pipeline. | ||||||
|
||||||
|
@@ -133,6 +133,18 @@ MMseqs2 is currently the only taxonomic classification tool used in the pipeline | |||||
--taxa_classification_mmseqs_db_id 'Kalamari' | ||||||
``` | ||||||
|
||||||
### InterProScan | ||||||
|
||||||
[InterProScan](https://github.com/ebi-pf-team/interproscan) is currently the only functional classification tool that gives a snapshot of the protein families and domains for each coding region. By runnning this tool `--run_function_interproscan`, the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.67-99.0/) v5.67-99.0 is by default downloaded and prepared. This can be changed by downloading and extracting the files from any [InterPro version](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/) and the path to the folder assigned. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also give the rough instrucitons on how to do this manually to the end of the USAGE section in the corresponding section as we do for all other tools |
||||||
|
||||||
```bash | ||||||
--function_interproscan_db 'path/to/InterPro_directory/' | ||||||
``` | ||||||
|
||||||
:::info | ||||||
By default the databases used to assign the nearest protein domain is set as `PANTHER,ProSiteProfiles,ProSitePatterns,Pfam`. An addition of other application to the list, does not guarantee that the results will be integrated correctly within `AMPcombi`. | ||||||
::: | ||||||
|
||||||
### antiSMASH | ||||||
|
||||||
antiSMASH has a minimum contig parameter, in which only contigs of a certain length (or longer) will be screened. In cases where no hits are found in these, the tool ends successfully without hits. However if no contigs in an input file reach that minimum threshold, the tool will end with a 'failure' code, and cause the pipeline to crash. | ||||||
|
@@ -258,6 +270,10 @@ The pipeline will automatically run Pyrodigal instead of Prodigal if the paramet | |||||
This is due to an incompatibility issue of Prodigal's output `.gbk` file with multiple downstream tools. | ||||||
::: | ||||||
|
||||||
:::tip | ||||||
If the `run_function_interproscan` is activated, protein and domain classifications of the coding regions are generated and the output is then integrated into the `AMPcombi parsetables` resulting table for every sample and the complete summary files e.g., `Ampcombi_summary.tsv`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
::: | ||||||
|
||||||
### Abricate | ||||||
|
||||||
The default ABRicate installation comes with a series of 'default' databases: | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
process INTERPROSCAN_DATABASE { | ||
tag "interproscan_database_download" | ||
label 'process_medium' | ||
|
||
conda "conda-forge::sed=4.7" | ||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/curl:7.80.0' : | ||
'biocontainers/curl:7.80.0' }" | ||
|
||
input: | ||
val database_url | ||
|
||
output: | ||
path("interproscan_db/*") , emit: db | ||
path "versions.yml" , emit: versions | ||
|
||
when: | ||
task.ext.when == null || task.ext.when | ||
|
||
script: | ||
""" | ||
mkdir -p interproscan_db/ | ||
|
||
filename=\$(basename ${database_url}) | ||
|
||
curl -L ${database_url} -o interproscan_db/\$filename | ||
tar -xzf interproscan_db/\$filename -C interproscan_db/ | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
tar: \$(tar --version 2>&1 | sed -n '1s/tar (busybox) //p') | ||
curl: "\$(curl --version 2>&1 | sed -n '1s/^curl \\([0-9.]*\\).*/\\1/p')" | ||
END_VERSIONS | ||
""" | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.