From a3e4810ebde04e3652dbff9ea3fd705c4f5d6748 Mon Sep 17 00:00:00 2001 From: Zethson Date: Sun, 15 Dec 2019 19:12:30 +0100 Subject: [PATCH 001/374] Initial template commit --- .gitattributes | 1 + .github/CONTRIBUTING.md | 47 +++ .github/ISSUE_TEMPLATE/bug_report.md | 31 ++ .github/ISSUE_TEMPLATE/feature_request.md | 16 + .github/PULL_REQUEST_TEMPLATE.md | 15 + .github/markdownlint.yml | 9 + .gitignore | 7 + .travis.yml | 42 ++ CHANGELOG.md | 4 + CODE_OF_CONDUCT.md | 46 +++ Dockerfile | 7 + LICENSE | 21 + README.md | 67 ++++ assets/email_template.html | 54 +++ assets/email_template.txt | 40 ++ assets/multiqc_config.yaml | 9 + assets/nf-core-proteomicslfq_logo.png | Bin 0 -> 11668 bytes assets/sendmail_template.txt | 53 +++ bin/markdown_to_html.r | 51 +++ bin/scrape_software_versions.py | 52 +++ conf/awsbatch.config | 18 + conf/base.config | 58 +++ conf/igenomes.config | 192 ++++++++++ conf/test.config | 26 ++ docs/README.md | 12 + docs/images/nf-core-proteomicslfq_logo.png | Bin 0 -> 20821 bytes docs/output.md | 41 ++ docs/usage.md | 286 ++++++++++++++ environment.yml | 13 + main.nf | 421 +++++++++++++++++++++ nextflow.config | 134 +++++++ 31 files changed, 1773 insertions(+) create mode 100644 .gitattributes create mode 100644 .github/CONTRIBUTING.md create mode 100644 .github/ISSUE_TEMPLATE/bug_report.md create mode 100644 .github/ISSUE_TEMPLATE/feature_request.md create mode 100644 .github/PULL_REQUEST_TEMPLATE.md create mode 100644 .github/markdownlint.yml create mode 100644 .gitignore create mode 100644 .travis.yml create mode 100644 CHANGELOG.md create mode 100644 CODE_OF_CONDUCT.md create mode 100644 Dockerfile create mode 100644 LICENSE create mode 100644 README.md create mode 100644 assets/email_template.html create mode 100644 assets/email_template.txt create mode 100644 assets/multiqc_config.yaml create mode 100644 assets/nf-core-proteomicslfq_logo.png create mode 100644 assets/sendmail_template.txt create mode 100755 bin/markdown_to_html.r create mode 100755 bin/scrape_software_versions.py create mode 100644 conf/awsbatch.config create mode 100644 conf/base.config create mode 100644 conf/igenomes.config create mode 100644 conf/test.config create mode 100644 docs/README.md create mode 100644 docs/images/nf-core-proteomicslfq_logo.png create mode 100644 docs/output.md create mode 100644 docs/usage.md create mode 100644 environment.yml create mode 100644 main.nf create mode 100644 nextflow.config diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..7fe5500 --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +*.config linguist-language=nextflow diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 0000000..6b3a55b --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,47 @@ +# nf-core/proteomicslfq: Contributing Guidelines + +Hi there! Many thanks for taking an interest in improving nf-core/proteomicslfq. + +We try to manage the required tasks for nf-core/proteomicslfq using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time. + +However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) + +> If you need help using or modifying nf-core/proteomicslfq then the best place to ask is on the pipeline channel on [Slack](https://nf-co.re/join/slack/). + + + +## Contribution workflow +If you'd like to write some code for nf-core/proteomicslfq, the standard workflow +is as follows: + +1. Check that there isn't already an issue about your idea in the + [nf-core/proteomicslfq issues](https://github.com/nf-core/proteomicslfq/issues) to avoid + duplicating work. + * If there isn't one already, please create one so that others know you're working on this +2. Fork the [nf-core/proteomicslfq repository](https://github.com/nf-core/proteomicslfq) to your GitHub account +3. Make the necessary changes / additions within your forked repository +4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged. + +If you're not used to this workflow with git, you can start with some [basic docs from GitHub](https://help.github.com/articles/fork-a-repo/) or even their [excellent interactive tutorial](https://try.github.io/). + + +## Tests +When you create a pull request with changes, [Travis CI](https://travis-ci.org/) will run automatic tests. +Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then. + +There are typically two types of tests that run: + +### Lint Tests +The nf-core has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. + +If any failures or warnings are encountered, please follow the listed URL for more documentation. + +### Pipeline Tests +Each nf-core pipeline should be set up with a minimal set of test-data. +Travis CI then runs the pipeline on this data to ensure that it exists successfully. +If there are any failures then the automated tests fail. +These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code. + +## Getting help +For further information/help, please consult the [nf-core/proteomicslfq documentation](https://github.com/nf-core/proteomicslfq#documentation) and don't hesitate to get in touch on the [nf-core/proteomicslfq pipeline channel](https://nfcore.slack.com/channels/nf-core/proteomicslfq) on [Slack](https://nf-co.re/join/slack/). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..5b1a680 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,31 @@ +Hi there! + +Thanks for telling us about a problem with the pipeline. Please delete this text and anything that's not relevant from the template below: + +#### Describe the bug +A clear and concise description of what the bug is. + +#### Steps to reproduce +Steps to reproduce the behaviour: +1. Command line: `nextflow run ...` +2. See error: _Please provide your error message_ + +#### Expected behaviour +A clear and concise description of what you expected to happen. + +#### System: + - Hardware: [e.g. HPC, Desktop, Cloud...] + - Executor: [e.g. slurm, local, awsbatch...] + - OS: [e.g. CentOS Linux, macOS, Linux Mint...] + - Version [e.g. 7, 10.13.6, 18.3...] + +#### Nextflow Installation: + - Version: [e.g. 0.31.0] + +#### Container engine: + - Engine: [e.g. Conda, Docker or Singularity] + - version: [e.g. 1.0.0] + - Image tag: [e.g. nfcore/proteomicslfq:1.0.0] + +#### Additional context +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000..1f025b7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,16 @@ +Hi there! + +Thanks for suggesting a new feature for the pipeline! Please delete this text and anything that's not relevant from the template below: + +#### Is your feature request related to a problem? Please describe. +A clear and concise description of what the problem is. +Ex. I'm always frustrated when [...] + +#### Describe the solution you'd like +A clear and concise description of what you want to happen. + +#### Describe alternatives you've considered +A clear and concise description of any alternative solutions or features you've considered. + +#### Additional context +Add any other context about the feature request here. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..97678e3 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,15 @@ +Many thanks to contributing to nf-core/proteomicslfq! + +Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). + +## PR checklist + - [ ] This comment contains a description of changes (with reason) + - [ ] If you've fixed a bug or added code that should be tested, add tests! + - [ ] If necessary, also make a PR on the [nf-core/proteomicslfq branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/proteomicslfq) + - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). + - [ ] Make sure your code lints (`nf-core lint .`). + - [ ] Documentation in `docs` is updated + - [ ] `CHANGELOG.md` is updated + - [ ] `README.md` is updated + +**Learn more about contributing:** https://github.com/nf-core/proteomicslfq/tree/master/.github/CONTRIBUTING.md diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml new file mode 100644 index 0000000..e052a63 --- /dev/null +++ b/.github/markdownlint.yml @@ -0,0 +1,9 @@ +# Markdownlint configuration file +default: true, +line-length: false +no-multiple-blanks: 0 +blanks-around-headers: false +blanks-around-lists: false +header-increment: false +no-duplicate-header: + siblings_only: true diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5b54e3e --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +.nextflow* +work/ +data/ +results/ +.DS_Store +tests/test_data +*.pyc diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..284fef2 --- /dev/null +++ b/.travis.yml @@ -0,0 +1,42 @@ +sudo: required +language: python +jdk: openjdk8 +services: docker +python: '3.6' +cache: pip +matrix: + fast_finish: true + +before_install: + # PRs to master are only ok if coming from dev branch + - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && ([ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ] || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]))' + # Pull the docker image first so the test doesn't wait for this + - docker pull nfcore/proteomicslfq:dev + # Fake the tag locally so that the pipeline runs properly + # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) + - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + +install: + # Install Nextflow + - mkdir /tmp/nextflow && cd /tmp/nextflow + - wget -qO- get.nextflow.io | bash + - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow + # Install nf-core/tools + - pip install --upgrade pip + - pip install nf-core + # Reset + - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests + # Install markdownlint-cli + - sudo apt-get install npm && npm install -g markdownlint-cli + +env: + - NXF_VER='0.32.0' # Specify a minimum NF version that should be tested and work + - NXF_VER='' # Plus: get the latest NF version and check that it works + +script: + # Lint the pipeline code + - nf-core lint ${TRAVIS_BUILD_DIR} + # Lint the documentation + - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml + # Run the pipeline with the test profile + - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..14b0aab --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,4 @@ +# nf-core/proteomicslfq: Changelog + +## v1.0dev - [date] +Initial release of nf-core/proteomicslfq, created with the [nf-core](http://nf-co.re/) template. diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..1cda760 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,46 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack/). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] + +[homepage]: http://contributor-covenant.org +[version]: http://contributor-covenant.org/version/1/4/ diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..155a2ad --- /dev/null +++ b/Dockerfile @@ -0,0 +1,7 @@ +FROM nfcore/base:1.7 +LABEL authors="The Heumos Brothers - Simon and Lukas" \ + description="Docker image containing all requirements for nf-core/proteomicslfq pipeline" + +COPY environment.yml / +RUN conda env create -f /environment.yml && conda clean -a +ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..a4a5bdb --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) The Heumos Brothers - Simon and Lukas + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..105b15d --- /dev/null +++ b/README.md @@ -0,0 +1,67 @@ +# ![nf-core/proteomicslfq](docs/images/nf-core-proteomicslfq_logo.png) + +**Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.**. + +[![Build Status](https://travis-ci.com/nf-core/proteomicslfq.svg?branch=master)](https://travis-ci.com/nf-core/proteomicslfq) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.32.0-brightgreen.svg)](https://www.nextflow.io/) + +[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) +[![Docker](https://img.shields.io/docker/automated/nfcore/proteomicslfq.svg)](https://hub.docker.com/r/nfcore/proteomicslfq) + +## Introduction + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. + +## Quick Start + +i. Install [`nextflow`](https://nf-co.re/usage/installation) + +ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) + +iii. Download the pipeline and test it on a minimal dataset with a single command + +```bash +nextflow run nf-core/proteomicslfq -profile test, +``` + +iv. Start running your own analysis! + + +```bash +nextflow run nf-core/proteomicslfq -profile --reads '*_R{1,2}.fastq.gz' --genome GRCh37 +``` + +See [usage docs](docs/usage.md) for all of the available options when running the pipeline. + +## Documentation + +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline, found in the `docs/` directory: + +1. [Installation](https://nf-co.re/usage/installation) +2. Pipeline configuration + * [Local installation](https://nf-co.re/usage/local_installation) + * [Adding your own system config](https://nf-co.re/usage/adding_own_config) + * [Reference genomes](https://nf-co.re/usage/reference_genomes) +3. [Running the pipeline](docs/usage.md) +4. [Output and how to interpret the results](docs/output.md) +5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) + + + +## Credits + +nf-core/proteomicslfq was originally written by The Heumos Brothers - Simon and Lukas. + +## Contributions and Support + +If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). + +For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/nf-core/proteomicslfq) (you can join with [this invite](https://nf-co.re/join/slack)). + +## Citation + + + + +You can cite the `nf-core` pre-print as follows: +Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). diff --git a/assets/email_template.html b/assets/email_template.html new file mode 100644 index 0000000..ba4fb4c --- /dev/null +++ b/assets/email_template.html @@ -0,0 +1,54 @@ + + + + + + + + + nf-core/proteomicslfq Pipeline Report + + +
+ + + +

nf-core/proteomicslfq v${version}

+

Run Name: $runName

+ +<% if (!success){ + out << """ +
+

nf-core/proteomicslfq execution completed unsuccessfully!

+

The exit status of the task that caused the workflow execution to fail was: $exitStatus.

+

The full error message was:

+
${errorReport}
+
+ """ +} else { + out << """ +
+ nf-core/proteomicslfq execution completed successfully! +
+ """ +} +%> + +

The workflow was completed at $dateComplete (duration: $duration)

+

The command used to launch the workflow was as follows:

+
$commandLine
+ +

Pipeline Configuration:

+ + + <% out << summary.collect{ k,v -> "" }.join("\n") %> + +
$k
$v
+ +

nf-core/proteomicslfq

+

https://github.com/nf-core/proteomicslfq

+ +
+ + + diff --git a/assets/email_template.txt b/assets/email_template.txt new file mode 100644 index 0000000..95765b1 --- /dev/null +++ b/assets/email_template.txt @@ -0,0 +1,40 @@ +---------------------------------------------------- + ,--./,-. + ___ __ __ __ ___ /,-._.--~\\ + |\\ | |__ __ / ` / \\ |__) |__ } { + | \\| | \\__, \\__/ | \\ |___ \\`-._,-`-, + `._,._,' + nf-core/proteomicslfq v${version} +---------------------------------------------------- + +Run Name: $runName + +<% if (success){ + out << "## nf-core/proteomicslfq execution completed successfully! ##" +} else { + out << """#################################################### +## nf-core/proteomicslfq execution completed unsuccessfully! ## +#################################################### +The exit status of the task that caused the workflow execution to fail was: $exitStatus. +The full error message was: + +${errorReport} +""" +} %> + + +The workflow was completed at $dateComplete (duration: $duration) + +The command used to launch the workflow was as follows: + + $commandLine + + + +Pipeline Configuration: +----------------------- +<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %> + +-- +nf-core/proteomicslfq +https://github.com/nf-core/proteomicslfq diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml new file mode 100644 index 0000000..5cc6771 --- /dev/null +++ b/assets/multiqc_config.yaml @@ -0,0 +1,9 @@ +report_comment: > + This report has been generated by the nf-core/proteomicslfq + analysis pipeline. For information about how to interpret these results, please see the + documentation. +report_section_order: + nf-core/proteomicslfq-software-versions: + order: -1000 + +export_plots: true diff --git a/assets/nf-core-proteomicslfq_logo.png b/assets/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..b7651c75ac5dc3298f6c264a3d42e2ce252f6499 GIT binary patch literal 11668 zcmb_?WmgaUJ^^?&GYs{+{_!AUMnFxti;oQAr{j~0i;ALNvbjWm zhNh$piH=0SD$Es0*yDxm2l+w&bJ^@&CKDAElSGr0#g>dW8m9On-h&=siA5fki9})l zb=0yx>*V?E+HLx7;=%2H;^A(B_I|3ezFlZTb5UsHUPI`St;-}}*UQIC-!$BN(GZ4Vucu zq7kVqiL0PXRMj?q16Yi>*}NthYdYozl7Y$OYv(XJOzE@JFxxu-NWnoL z387C%dIh5hmvuN?>KvBnMSb(0@#OIFu+9H8;rJ-&BMesXvx1s)ZHOndFVuTssyKs6 zG=+-xm%C*$quRn&*e2=_cN1NvbPV>Fo__+|f+z|aS3eamNV)@0E6Sj}-l1Qbi&}`C zQ+HpSknMfFpT^P@nQLhHenZ#l01IsMVR#nR!mfU1apKRNs%>K$cdeRDlL_7&7frIH z)6o1+RL*G6CBCp>E3OwSTdL1@n2z@#0uuUrM7AG|o0qdkMiV^sg6VE6Ht@5M*JkBe zp{Bgal!p1H&YrKtp3S|2S+y*ze&=+PEct`Gx3B|I8g+@ZL}*YY3-H_uP}S!bXU#)q zbY^87vODl-b4JKYz6*^2Pa z9wToQ+Yzbb9I>!H{Z}TJ)9}l`b>|1>bh3MuOhhe3I)WB+wlaav_XjP{8Yp~R^ss~| zgp#%+*r~#Z+V60G;0EUGl_Y?Wv{x9NG#A~7+WkZcW9zHWm&yX1&UEI(Jmkyy)2DBX z!+WO%L15_&v{(9ze}(-@O*+{T06zxvZlqV4F$;u)i=S1x@QVQCkjV;HywEk*4x-H+G4 zlE1v|o-Zlr%QzJ*4oGG3j))G zt8>fJko@Q=9-g^}CCg{X23F&^qdgJ6=>fx*lSdLWLcIhLdfMc~&uvojPNZp1nrrN< zb|Mn-#oU$|14a%=c70%`lP6 zCs__+(t+~1AC3vE90N;Th-abgi$55_BbTZM<+o3RRBQ?5 zC#BmX>zB{wd$V%I5+o}0O)?%H+^xFSP9FDshB|J($~$y>K}0%oFADg=U#m%yqdYp{ zax$!;gI-DZ;K5sF*R9(7&3L4~_hvjLAii|C+I}Cq(tk%={8O3dcB_ji|2D%>@A$b8 zOOE?h3ue|f;2$qB6;B5kTj5_x3t1i--{2`)k@%QcmN^Ru%&b|*)-O<_L~x35HATjEoiV}q(l zv(Jp3u1>_(``o&O!>DTj(2)6XQ5H4OAWmx&Kk8wKzX5Kz^>@2j|3Bq`V?3{)8|$3` zud{ldC#ywSQqAYkhu>*(gHGfn?nAF8AK_+*MYyd~}d}`?wKn^T>-rYxYzY zvP~64Vt3igQKSMJAO&-}3-LBXL}iUWwf$&H+*pg%Y=@lnzds)LZm={NHK-k2!TwSy zKQwBs*w+WJm6-y5xEr6>9&2gwI1p(Zbp-g#A4vmk&4*ErstL4B*k{Jh{2!HDi=MMp zIATf&>k%@*Z~1V93ID+P8B(|E7pEtp6{a$q{_SuKA$OSs%d}FbQ9j+F9_jiY>va0SE2vQSZ^= zx}J8^m?~d0R7q*Vdpiq4El?6w7d|DJDu|$GGYWSL8ZHrM%TUG=jurMM?IG|BL>XW7?wzO9Q9f~+|11|m6FV?2}7iC^viZgzUo%(lEnQr2FI7Yox zDx8;-9Q=xT0w0m>B5hSd)R+{1P_X>=$IVlQ4>ed-mDxG%mOd+*nSg1#CM(^=c6M^B zTA+JKIgzRG1wB@~)Fm(%+=h+_`UbdzpZ48F1!@FleB@VXp4B!BMbZsbyo=SGuNWhL z-$FWFs{_m51ZL2Opsk>-VBj%oAiQB3<=vkGq+hR?uo5lgL}Fy+_jY}r4B*$tMzAFe zdaip$d*Bhs@ILQ(Z#%I!|1AUHk?)ANnU;1UKYn#4N_5-qvFZ7S?6goN4L41M$qU+Q zpB`oKLYQ1o6Dj=tjJAbm9WF&`#ouflp88F?vl3Dq-g$5Ey@Y^z2|(cao)t@kKOeWN zakrFU_sDqDCPA9u(rq=W_bVpl$sDslI*qiArRx6VSC7r=*s>`WZ z-(UMXF2uK8Wyd>lEh#7jl_&xRN%x_)@qZB|GP$t62dgQ6Vc`tUNxJV63D*G2u@_uH zuG_iTPOHawhZ8-^#1(1Dd4H=miA=SMwHIk#Z@A77qy0soEQ*pG5=Cv(K*E0_so z`Q{?99EfUgzTssV8+8_ASUCWg`%?&BivgP$fMb=Qs?^(bT*^Z0g|&C>Pl?QAA*@+z$pvfcHs z>}EUo(#`QA>zuUqOy_Ev;)?X?1a?!biW}BERFpLJ6jmi=8Rh(s`n-p1@>5f^c-d|g zTvyYyQx@`;Hh)_~f?Bb@rbk%!w3zE*!=RC$<$;{0e#xMHM4lcoo~p$#%GNT>D^;PN zJOKX0f51WEj%vKZ5@S$}SX(7BV;JEcn+@8M9I`NAxeeNX=1;06R=3W%MF%{&!pwszYbDHJ`?pJHX40NK3pSQ)h(|rF-s)lU4=C*_JMy}swhZQe@M#GGmKIF9qP3^ zxhI;m>K@c)zHtQ)wjfuA)~^Xcvja7BQm8GoC`iKn*qcFHkNln0NvJC1P9B{Pp}a*b z9FCLRaZ`!yaFmD@RT;~^y=lQFC{NhsWi2RSzd2w+S6%U$_RGZna^8}?9x?=TkM zqV01Fhxt3Yv&hxwLM477N)&iimPI?!e9?84Mh*-Dz3;molDe9F!~IhGBf8s>7)J$g ze&MT(U8H>*wly%_zu2p?XJLg5`F`P*unf549o;$#=YU+yb8pisYx-10=A&H=c~2sK zqNOUGZ9J}~(e2-Y>w)u&rPz4N`8`>kHT4GKsaL%l9y@*h59Zo*SQFK(IUwQLw9&0g zg{O#arzlvYfEm|gB2UIDC#d*w3t&I2@pqXz7ADQEO7ZHo<(JtU@y1&_5r1yct$>(m zA&I@N?dh#6_Wb*5U_h6Q)#}_>{T=ZC=gBO=4SuQ^w7kBT%d7G&XP(WkbqxF zAAvp!o{iG0Bpk--_DsWejRN~4lcxnPeCpN1*wa*W=O#S~76O{BWK(Y3tH%3?w-e9# z-ti0{b72p@GwX=5{@#G#Srv3J@4eo_*3^9S35Y8RratXc z8%)fhzzK|f_IZ-yLB3hUxf3OsqLmOPz^G;%or>j&`w<3A#sU5tB-y z?Y2pb>o$nrOox~Gxw@u80H5ZrvV#1JIcALaID}_Ql5d$D#|K|13W+wF41L;51G6dO zf!f@|8+#eU>A_uulJB4N06S;mn1_GtUGlCIa=}&tpR5UK3sIJUT3k`ZWRM$$%Ypn? zDsV$Y{p$dcINb&F`v^BWJBI5-?6AH!E%Q(O5}$03l7V z<2p9?qD99&X6nB~9+bMB;3IluSu=LduQHYy9vpdQBW_LOP2>2^8d)f@+n)=&Qv`RL zqaRE37)r4{8v z3QoYW{T06FTB)z;`&VO=mU)$>ljV+re!s8r{iy}v#Jy>B*HuHuWjTM0p)S)*I>eRF zxJnFL*(zY&{4*6?uUb=;-)U&}mNtvEl3X42h)>FwechZ2j+i#0?x0g;E!~tMYYay> ziokCzKe9zEbdwKNnpox~10iIyO+qp;DkVeq4)ngqwA27e7lpR#N^Vudf2kyGNT(W|E=#ME7pcCyW{z|eFcGPS^&HWf~Z)?hj1 zTNi&aLUGxd9NbLFF|E{l8cVgD3R>5-bF(9sL>6e|e3uB7oiuC8Be#|>6*5g0`Q1cX z5yxjPy4H41qhNC3A+{y6=so*JA(>jdZlDCK-e>CQo8|+Lep?PTp-+r*!xz5HnD+hd zmxUwTVj#;mpJIl+IK|Oh%_Cz^_Pl7r@jrLAENh-$nf?8fHuGgHqzaGS_63_4@rj+r z&pF`An6dIhqI1nmZJFsv4vUy~?t0c?lwDD>lLp#tba3Yy8FpZdd-)!1wjf{M@*t`oxee;s z>YNm91$h2j8ks7!-<>aDX2magG$@$d4qd83uS)`ye+;vS zi2cK6WB6T~axRFN(q~IqC>$;Dz=7h5Mm@^SrSN<3H=>^OJ^T0KC$1g12d!kXEn8mL zWuwdCepe&BB8*n#0A!s+msX~eBp}x&kY2eAJFZ!a+v%XS*Sf<3)`=va(=VTjUthK= zQ!8-XG+zc|fU9zY=3}T}=dcqUsLNXY9-t63OdqEw5`=(K3E{g02lPVeuF%5gqofR& zsw6nzsLA)3=S?3G%lX${_;ohUR_DtFJ-u{+)nqrkmY%cAa!QD6F)J506`3}CBbY6! zziBo|qhh?LM$Na!8Bz~)`i`R4mD{AbmuBT*Nf}s zChLdKSEVu#mzgaE$a*)CR4vz%B6%E=mf*sL_1O$M4na5`Inr!nN;k68Cg5) zLYh%oFnz}cRD0@W?MoTL0uwfR^v!X)4tk!#5@6SDM#hcSf1QdOMkRgc41F$RkC)5# zWv@BS0Dwio{u<>t@Cb5&2rln|F?>t7veQ%wZ7-Cj8ZsoT?tIo&Xe?bXQV^WrWqD;k z#${ZYEK7nC9RjB_5LO+_phiCkE2%|Co5Jy@=7^|0>Ivu$U0KUVz!`Q{D4Ri+sk{L2 zesFLw-1v-NM;rCsf~%~KSTudHv(GsSONPp;2f-~GtyqvF5n$32A+J*m=d z+@EYR??FYTQEMan?lhD#&Zp;zwmE+L7X9Cc*@KXq-jcn9B;hNz6&a5-iwcq{qA4T8 zvU$RDYKE3=3@yA->S*P2y);oVH>oSo=@yIIiGuVn$Ee(gEbG;8!uzlm}cfww3&LP0>d+T{@W} zyRgxVeLH~}aN)iE!jMj-dgEjmes+om|Ec3Ahhd4B-R8f!Z||$;lzN0=Y1d6GEOly| zeseZ_-H06p<`RwTaJjN=X&%_!Fv1EB6sIzMwP9#uTaol&X&R#E8@@P}oLel~y?c#J z&IeMWp9M9)3uWmQ6ja(bi+Xx{?qda6mVLG>qNZiS-5^10B8KzVtxXdqKy0`JlPtQSE!Y@$UCdBT*t1K zCSs&1S&*;G3de|S!CvPAyu^j2$y)efP-Y2XWVxy`+egJ{I3`ftl7ndj%44Bf3ofAw z#Ps#**`}Ak08+8yHcxfm13xsEyhXOE&pJqFN!;r-P*Y-uK?=jK!pM|}Rl|vkcw~h> zm$tNZU>Tb6a)-3x2u?1`=6)aCNxk=n=M5wOv)p4?Rs&CHBBOb$PoL8JLy>XZIT7RX z%Q>eczxenI#Y|r~H+B(fzmq`e9;rxI6nk5hmiY6g%krb!oR~{mo|30M_?R=9p^R4K zx1qz%W1BzJvZ9fS7FTzEEK$fV_*fVRi{_GiX;Rf?WUXbD=sZU2`755m(~;)|33p%g zBJ!nfyJ9O@2UH8x_i?M{M(j*Uc2%rUn1n%v!hsQ@MOl4OF6ZD;3|&f4F%S0v?$Fj%BapBG)he_ep~CS=F% zvr$C=C)wYY74LE<*O$af0ccb(ar;~0$FAD8dNN^K-wlVc1dfB?D6BuDO6PkeYu}<( z5#u4f#bWnee#4BC9+?F-KiYt4E!1L3yGY?+Y@|#(axf$Oa$cyG>qvRyTBL{Goa@MG zBU3g{*ph*a@YCEx$kP05!?(|#nr>!7_1?y>1guqKWF2RJ8_8DuwISQJ7fPPAd-fA} zs~ITB9zv_#2R!_jZPufeE{>VvC9N}=G!*}d0?8I!@}BT29Hy$VQiY3>+g=SdKh&_J zcd?Ym%g0cHrL789xY$mz6!oC>59qbqSH*m&Tj%%;NqY&RfZdBTj163$?CzP<&G}Gu z@k){hC1XBiN2)nJQtFdLtwZfljvenszYs6G*8}Z_-B{?IQ{^bEZN%(9Qk6>|4)P2; zciMn0N(CbRvSu(pFZ7$`-ipeEgy#teKIqTTGen|buMbPRp`@4or_CfY`8U+neip-9 zr;)D}`x!Ag_*t*rmHH*(%mbVLzLQ<=MbDutkp^xl+@kh;|M%m(5pvC zNus~W@M42b?C;oR)7qI2%3A~RSs8f75{?+_^b*67s(cl-u8o_= z!!~1>mEX_7F@?}cw`mOY_&Znlm-Is<`ssWAs0~`cOI52LLL<|dJySV&>CnX0$37Uw znP$E>h<>N;u_cvZ9#9Q*h#Nj0B1DfF>`NhQOKv)!sp7TmQnM{@L8R}MUHXD=;R_|> zk2{!eo(G|qFLq*4WwI*ZaHRA!}@sZr*xG^-zCn5#P(he5J3h+uMQQAuJh%OElJz_A4 zQ&#~^RARE)6iIn97r#&v|Ri|#^`;pxshl^4aXacJzu7^;v zqDLli~Nn6%&C=KqJ}f7+Zs7eC*2 zqlm7iuP#~l3X0N`ZtNM15B^ACd?YidmBpVgE~HqI(}(<`N77i(wAEFV^u)I2`YTVk z3$H8$l&c*s1#d3KmSf{o@Xv}<~Z$?FcrXU7~>e@s82h}TWsQDK^ zI@yhve0?}PhW3Q*oad-{(q$`^N_xPhRn>_b)#c|A|8uiDLB=-$Zs52PYaV-i#koIq zxWviI?O7J|{*&FKzjA+&0`wu&CA=lHdJJ0@MRtLYC@yiJWl7%8-z)Z9y<%P8lCR)TEVi>U zpF-Bl_JWkjAt211s%nb;H)a?|%B;(2XiESYAYK~m-e@&)f8$5|gc=BKuk-_%{Bm4j zV9WpWjlmr`N;p%=%Z|^Qk1@fOh#_CPW-$*|YsRHS*ILDhJM27xWb6&*zIUrY?->C@ z-bnjDll^=JiC7tnCvE6yK%nRlOuCBi!B-MqQm-1w3-5mZ3raV)tbN56G-UoSzWO{p z*=;87UhZB}gc(hO>Y4xtbn6kB=xNrNFRF`EWa3b2;MQfoAnuYHF`A&lJue%m>7r*+ z)~=q^rAL38#n_JINrZ$dA*+W~#yM{-xB&B#;?*P1JIrBuXk^-o1#`Y^Nw5h6X!)wm6K%I;CIJwjXZEa@=wnt&ap7ZC_->H5Owu%%_gxsN1yI3yI(eqM%?0#3 zE+n$?r*#kgC%@f5I5rd9aaWVTwuOiMl$-EL^4l&Uj|nP6QWLZdd_bnCp)!R~I`CK& z*-p(^Az%N)IKcm(7hw4Xb8aj_g6_(@W?fErSkAGVDnEoyiN;%D;sxf0v@1*Wi3jrn z`b0FgNA~7#JpP((EF54ixTBKFCQ94UbEc5J#bsd=7}bsw#5zRZuW{%KN8pY?U|B3#U^qATA?PePEouO5AVVP3EA7zS znDpwh5tk+C>|i6ZTvESycgox1vpfT1mW*KFTV>SEp{^wNd^o+Q(%?Ewe%Rx7Hx5-n z@hAneGZq>}gc}1TNp3V3`4kn<0-`91p%xeO-742VZLujTmh7NJ7=uffIXW5qSrS^NkUq@rM0MFG+)&+Z<}Ed zsOx(}6JwkdDlhJ%dMp>0spKz=VMSN_fhW)gH)DrcKUfacJAi&ZU)qOIHG@>@sr{V%e$65iRv5OB$hX3bgS>1& z*6hnwcZ3^P-5d}fgdD|iF3MdxEc-IdKHtVE4J>`+zcnoe`ApnAzOE*2Ua~c#EAG){ zX!BjhAQ()0ZKFu=J4xzCjs^`RdA_U1+Ii0;O4D5vGN~dCCJ`p`4n0OL&@Kuig66ff z@-H$`ob;c(QQ56|q={1&Le%7z_)rh6v;pt^kSYGc~m1yRZ{XJ%+bM&pNO(MJTzhVza~qN3c!Kx8yv& zt*)67`&v&loSJ<0T5wj#gmrxvoBENkm zYp1y$pCU88LsJ_wj(uBt#){5-8~iF;QPEUfRk&gTd8Cv!6REfpW-^lixm%_FBlW~T z%yMjTE`VM5u$bLN!do4=256f*&Qy)FX5A|56oH5DVYjZGTzf5Tbd+|9fdnTP!Xbh= zG2h9$t8ubS9|cc*g(-oH;jo26(S-!#GTFsxjncu~LdT$zTshUtzbZl+TAQHi#--?= zi{MM^{yV{r*&xBaXq6B*Dl;h_$~`4dbP;>SN{q#qkjG$c^`B`dQ4G7*Y_>MQOlqgb zh|90;?e)cL-xN57l`M$McwPE{8`!0nMJ3d+%u%BbOP`x(rwMc!+exzQ?qv!GVXLA~ z@|_e)u-R=a0CUk6oK;wAb{0_pu4DBoWG>lzXk&^;oje+zPTFu!fkI-|S()0IEXUNpGnwVB`F+c;Q zQtQU5x@zf4)6?F6QlU2naleypIHez6Urd4(9Di!{EmW=vrKt@Rr@_xEqFHgmjV%!f z2g<~qTnO~NCJ&_=k#Woj&*2au$Q4bm;oU1JQFJK~PW)DNPO zTtD>VOdCJy`}reVd>p$t1R;}gISH<>GSUjAfK2Wcd?Q(ChA-yuZHiWm*#8L@Wo%6< zeJgf1jBxH_K28+8cv<_UQ3sU7;*~f)189SObQn&Cz9*nogRktf0X$@=0;U}E5 ziVI~&{{hLau&0Cfg8zozUc~fT(jzCR1^t#=n3%JQ-d2^6x%Zq<5CLIE2hBiI#I^MD z&U)gySG4*`yr11)2VlBOF##T%#v=JvNth-E<-Z%jwgtr`=|e;N;pqLiPZ4h(Tb1jf$!G zSn#VD!{gPSdV7;mvU>tDBzqPqi54#qwYAvSLLF^bu}k%$~s!TANQg25g|gBOK za^lj=wd{7&=q5L5@SU1tv7q4l*FtwN>a}=gjVR>|T%0PdP3Mq7gr|ojU-IYD&xbhF z1<_ek(h~efQn^HQ#>R~}Y+_*b+?R@^t2-X5xT03sX>%41z1OlHyO9Ge1Pf9ZQ_@-^dxGK46U#=3SJK?%okmA_7j;u2Vkkbxqz=Byd0%sI$=9V$Iv zi8A+=;F}#j`s(Xykb*R_WNsa9qWlBjZn*VBv*G}f877^J)gLCYJz^Z(cGYiV*~p(q zkW|Mx8;HG=6w+j!Uz{WeJiY=j8@mukCcs?z;Xk0Sn=hj5ZZQY&Nus&5P^AjZy3vz8 z4zwIVrCQwNk!ownE3oDWvCBzqMd_C&3xtV2&6Ow?)(z0mCk9+Mgos62rZG2FBJ2oA-RC!6lNG5RR#cQ~~x-UO~NglxZaf3=Q z)k*j*kw!UZvN;NGse{xJ(1myVm%_YGbIxAW5+D<9?u5JA53UWRXQ#58--{rXoUL@!{O;{u_%6xcPjwoQb1i zGnb&@vSqr@S^tu>(qyIP8`QQqB3UU-ADs_{DkE-AC&pacnSXtwR9FV#;P5myfw8%W z8mNNLm+4Be!}lg1%#fIKt`I78$InT6Vrcw21dPK=N@OWL0SF+MvL_a(0MCV|nqUa% zD+gyv*x}ShktIPIXHZM;ly48zOaFMIy?!ADz9bON7sN3c5o-TDaDp9J4n(+DJa$1( zTMkrR>nW7RHaJpkw+yf%0%Q@@6N{BnE^Ei|nln;934K4MuyZl=29tN`3~p_T`s{X`t{Br&!QF@hHK1CHi$;tr-lVTS{o!n>r zi}iGeZJ&hqQ-$5Xpx-%3ZMVuJpJs-WAh)m*K4vyQpHxaoWpmSpm_}!VJKWUKJ729q zxf=HB={~zm@ZVW0u-^a1h)kpT aCPK9^3 +Content-Disposition: inline; filename="nf-core-proteomicslfq_logo.png" + +<% out << new File("$baseDir/assets/nf-core-proteomicslfq_logo.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> + +<% +if (mqcFile){ +def mqcFileObj = new File("$mqcFile") +if (mqcFileObj.length() < mqcMaxSize){ +out << """ +--nfcoremimeboundary +Content-Type: text/html; name=\"multiqc_report\" +Content-Transfer-Encoding: base64 +Content-ID: +Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" + +${mqcFileObj. + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} +""" +}} +%> + +--nfcoremimeboundary-- diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r new file mode 100755 index 0000000..abe1335 --- /dev/null +++ b/bin/markdown_to_html.r @@ -0,0 +1,51 @@ +#!/usr/bin/env Rscript + +# Command line argument processing +args = commandArgs(trailingOnly=TRUE) +if (length(args) < 2) { + stop("Usage: markdown_to_html.r ", call.=FALSE) +} +markdown_fn <- args[1] +output_fn <- args[2] + +# Load / install packages +if (!require("markdown")) { + install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') + library("markdown") +} + +base_css_fn <- getOption("markdown.HTML.stylesheet") +base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) +custom_css <- paste(base_css, " +body { + padding: 3em; + margin-right: 350px; + max-width: 100%; +} +#toc { + position: fixed; + right: 20px; + width: 300px; + padding-top: 20px; + overflow: scroll; + height: calc(100% - 3em - 20px); +} +#toc_header { + font-size: 1.8em; + font-weight: bold; +} +#toc > ul { + padding-left: 0; + list-style-type: none; +} +#toc > ul ul { padding-left: 20px; } +#toc > ul > li > a { display: none; } +img { max-width: 800px; } +") + +markdownToHTML( + file = markdown_fn, + output = output_fn, + stylesheet = custom_css, + options = c('toc', 'base64_images', 'highlight_code') +) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py new file mode 100755 index 0000000..ca2b897 --- /dev/null +++ b/bin/scrape_software_versions.py @@ -0,0 +1,52 @@ +#!/usr/bin/env python +from __future__ import print_function +from collections import OrderedDict +import re + +# TODO nf-core: Add additional regexes for new tools in process get_software_versions +regexes = { + 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], + 'Nextflow': ['v_nextflow.txt', r"(\S+)"], + 'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"], + 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], +} +results = OrderedDict() +results['nf-core/proteomicslfq'] = 'N/A' +results['Nextflow'] = 'N/A' +results['FastQC'] = 'N/A' +results['MultiQC'] = 'N/A' + +# Search each file using its regex +for k, v in regexes.items(): + try: + with open(v[0]) as x: + versions = x.read() + match = re.search(v[1], versions) + if match: + results[k] = "v{}".format(match.group(1)) + except IOError: + results[k] = False + +# Remove software set to false in results +for k in results: + if not results[k]: + del(results[k]) + +# Dump to YAML +print (''' +id: 'software_versions' +section_name: 'nf-core/proteomicslfq Software Versions' +section_href: 'https://github.com/nf-core/proteomicslfq' +plot_type: 'html' +description: 'are collected at run time from the software output.' +data: | +
+''') +for k,v in results.items(): + print("
{}
{}
".format(k,v)) +print ("
") + +# Write out regexes as csv file: +with open('software_versions.csv', 'w') as f: + for k,v in results.items(): + f.write("{}\t{}\n".format(k,v)) diff --git a/conf/awsbatch.config b/conf/awsbatch.config new file mode 100644 index 0000000..14af586 --- /dev/null +++ b/conf/awsbatch.config @@ -0,0 +1,18 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running on AWS batch + * ------------------------------------------------- + * Base config needed for running with -profile awsbatch + */ +params { + config_profile_name = 'AWSBATCH' + config_profile_description = 'AWSBATCH Cloud Profile' + config_profile_contact = 'Alexander Peltzer (@apeltzer)' + config_profile_url = 'https://aws.amazon.com/de/batch/' +} + +aws.region = params.awsregion +process.executor = 'awsbatch' +process.queue = params.awsqueue +executor.awscli = '/home/ec2-user/miniconda/bin/aws' +params.tracedir = './' diff --git a/conf/base.config b/conf/base.config new file mode 100644 index 0000000..b3b69b0 --- /dev/null +++ b/conf/base.config @@ -0,0 +1,58 @@ +/* + * ------------------------------------------------- + * nf-core/proteomicslfq Nextflow base config file + * ------------------------------------------------- + * A 'blank slate' config file, appropriate for general + * use on most high performace compute environments. + * Assumes that all software is installed and available + * on the PATH. Runs in `local` mode - all jobs will be + * run on the logged in environment. + */ + +process { + + // TODO nf-core: Check the defaults for all processes + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 7.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' + + // Process-specific resource requirements + // NOTE - Only one of the labels below are used in the fastqc process in the main script. + // If possible, it would be nice to keep the same label naming convention when + // adding in your processes. + // TODO nf-core: Customise requirements for specific processes. + // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 14.GB * task.attempt, 'memory' ) } + time = { check_max( 6.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 42.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 84.GB * task.attempt, 'memory' ) } + time = { check_max( 10.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withName:get_software_versions { + cache = false + } +} + +params { + // Defaults only, expecting to be overwritten + max_memory = 128.GB + max_cpus = 16 + max_time = 240.h + igenomes_base = 's3://ngi-igenomes/igenomes/' +} diff --git a/conf/igenomes.config b/conf/igenomes.config new file mode 100644 index 0000000..392f250 --- /dev/null +++ b/conf/igenomes.config @@ -0,0 +1,192 @@ +/* + * ------------------------------------------------- + * Nextflow config file for iGenomes paths + * ------------------------------------------------- + * Defines reference genomes, using iGenome paths + * Can be used by any config that customises the base + * path using $params.igenomes_base / --igenomes_base + */ + +params { + // illumina iGenomes reference file paths + // TODO nf-core: Add new reference types and strip out those that are not needed + genomes { + 'GRCh37' { + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/" + } + 'GRCm38' { + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCh37/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCh37/Sequence/BWAIndex/" + } + 'TAIR10' { + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/" + } + 'EB2' { + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/" + } + 'UMD3.1' { + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/" + + } + 'WBcel235' { + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/" + } + 'CanFam3.1' { + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/" + } + 'GRCz10' { + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/" + } + 'BDGP6' { + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/" + } + 'EquCab2' { + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/" + } + 'EB1' { + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/" + } + 'Galgal4' { + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/" + } + 'Gm01' { + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/" + } + 'Mmul_1' { + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/" + } + 'IRGSP-1.0' { + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/" + } + 'CHIMP2.1.4' { + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/" + } + 'Rnor_6.0' { + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/" + } + 'R64-1-1' { + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/" + } + 'EF2' { + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/" + } + 'Sbi1' { + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/" + } + 'Sscrofa10.2' { + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/" + } + 'AGPv3' { + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/" + } + } +} diff --git a/conf/test.config b/conf/test.config new file mode 100644 index 0000000..08c1bf8 --- /dev/null +++ b/conf/test.config @@ -0,0 +1,26 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a fast and simple test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test + */ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + // Limit resources so that this can run on Travis + max_cpus = 2 + max_memory = 6.GB + max_time = 48.h + + // Input data + // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets + // TODO nf-core: Give any required params for the test so that command line flags are not needed + singleEnd = false + readPaths = [ + ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], + ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] + ] +} diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..010beba --- /dev/null +++ b/docs/README.md @@ -0,0 +1,12 @@ +# nf-core/proteomicslfq: Documentation + +The nf-core/proteomicslfq documentation is split into the following files: + +1. [Installation](https://nf-co.re/usage/installation) +2. Pipeline configuration + * [Local installation](https://nf-co.re/usage/local_installation) + * [Adding your own system config](https://nf-co.re/usage/adding_own_config) + * [Reference genomes](https://nf-co.re/usage/reference_genomes) +3. [Running the pipeline](usage.md) +4. [Output and how to interpret the results](output.md) +5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) diff --git a/docs/images/nf-core-proteomicslfq_logo.png b/docs/images/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..465ca25d947e6dce2b026d8c77f805aa39aceba5 GIT binary patch literal 20821 zcmdSB1ydYt*9M3aTnBduKDY);aCdiich}(V4#7jP!QI^h1ed`HfkA_1kmY&5-5;@4 zZB<=;`t-5euB&5Jm1QtcNl;;6U@+ulCDmbI;B?=w7e64qAF~$w6y8rLuCjU_Fff=S z|Lw4uESRJ)FjO#dl3z4^3(gAzLdmo;uR_|ps+JUoIKE5g)z*^?9J#x_YYhHFQ~E*lGY)`v9}OE8M^Earq;V9sxU4v?I4->zSMF&& z^u_H*S51I_B~)8S?Bw|y7q?Mzz>ls!E1f3=0ho+&rT>4}78w<55LNM|)adyc^-hlj~}Y*X81-il6JNmq|ngdbyy=b`68~he-FLglYxpgP zA^ZC!9{>#zSmA^n>UvUZQG)fuO*qD24$%wVMs|N@Q-(ZWub?zA@}@br!n{8;HDY!; zh^#z4%nu}U8AOwL3d7YRyU!nD->bKyx(%F{SZ1lfQ$rJ2sk1kk@=OHA?0*>Ah_n&tvN@9r*%al~Tj6*mDg2H*iYej6eWG z)Uh%om9FHx3~DDeR>A6yLOOv98DC@xAP$9vFXroeJ2Ierm8sXdVTog@)<*mG?rBec z1x(?e3dHqoYuTg;F1vVR^C7O65roc9Pfj7G%VUYc_rEk|@`bHxX8+cX<%DpK2|MiO z6Db??d#qwS2zM>I-1#xap^+TczYF6@NXHV*sk#O`?m@rIEVr}n5Vwmxno32UelJU+ zX*Ua6fxl64BKtSIQm2%qtCse^FUUAPSOzKS=sIRmv`|?j`u!XdhCd672zlEb-qZ(X zm-3J#4ywN!hm8_(M{v%cvIF7K-ebg$>`mB*jmSf|)X;2qzXD6N2l7xyps1D?`af6A zmSOpdaRe1OA%&Un$rOyG6>_aNKWz!swGvr)>vnkS)kLai(3hZUqveD=2@y^C9U?`o z-<4AMM>}r7boN|c%0B#qUaZMEyC+4zL7;U6T51`Ve%uEvI#g53f5ze$eS4l|#d0dX z^hFA_U2e6OEORl*H>)Dn$n{)J`J`*&;p`Lui zmpXM!LLksyKQ`W-qn}uG&4O*$gK`aU6V)E=1ZdYgHfqK#3TqD9V=K5Ajcq{A{+U6l zt{}O(O$6~cL8y4o<_W2J24nb{tWT5{gSM4iQl|tv5_jtn*>FawinkZTn8TPDY^bk) z!?fx$WYqE!nRPLCtwK6NTMbs(hHfMRh1grv!x)wK*wv?gP1%%uoR(~R4_Jd7G%_8l ze2jrsrIZRLM^b;b!l9XFU3BXY=ULj*<&3kEfZDHmo=b?1fycSD1FuL5VlkP_wP=)9 znoLVqF0;y&jN4;^pDU;Mvf(BE6XE+4O%bpuJWQ|z$P$#Q1|$k7=7bH}SergJw*HN- zWOQKCtvz%xcrIAhwmHZfJ^t^f55J}Ag{2S1%lk7KPzOmYX)bcVp87j{1B(U|%2?ci&T ze$c15{RC!Rl*V~16(ozffp+2#8I0&^v-XOuOt{~y;>puW84Kste^gm_pgn3FNXf-m zO`G21fvinn?oB$7#ERwfDZ~k1H&?m%!cBVGhgM{`E9}#>k~x03DGgpQPmAJ}sMiM&;l6 z!qmsFhl=wGPb#_&Hd1w(qr~B*X1TU40d|Mo$%Km3Ei28s2`5g|RkODgA=~`CSXrf> z^kT?Kls3Is1xSWVWN!P7Cm$IW#e!7k&p3nVoe0cyV~!$CPt0?ZS~WQ7nDpI z7dvl=uMdh-fq6L3+6C#Qj9wlD%g{9zJBgz|DWsw!Kd%OacWDCOW*;pTnw~P}2RT!{ zA}Pa`cZi~@l<%V+^3t{&-)>*FS47g!g3QzTj~4EB?%^K>^H9ez@HnHRng7-x-MsQx z%vkyEoSgte9yXzT{`r?8XZa{5uu|2O@9AQ?eS2@t?|>D>@!9BjHYQRi!H|7>9LD!p z}tuUDmtn-h`sLx zKdbNR_IFD`V-V%hi;+MW!z0y^pU^|@F~r}vU(9N>Ywy?G-n$vo_;=hYNP!fqQKV1z zy27`c5Nj3*H|oS)H&E}U!!6##yHs~ zq6x}j=~-0tDw0F2}-7NwxY*Ws^YChDC)+;jw(G7q}tTTpDh}aZejYhMd_#t4vlDnec%MfVX@gA{zW0Fhfou~ zFIjmx+3VZzY-0>U-T)N5$pTY(uY&LjyMmM*we&!FvF}?KzJq@ONiGoFbUEOc+K9MC z_YInfpgpasp&f2N=aZ#gW0ilZXbM*lg+^95A=0Z_lOoI@&!i07$WmxNvuELaVSV4r36s0^o z5$gku181Sxv{Raf+Hl42eG0NpPS;OGi{R5Z6iE2~&IwyPe7?!E65ZvuA-D253XZR!11~e;`BYFa zW6L7x1FVnnBQw}W!sWcY=pWG45gyO%Uqi{SuxAszULY)Qa$|xM>b_ z>=N#rKDpvvX^(K_lrL`}zY_D0S~I&+b4@$Thc_LDI;29YwKVRbVdF$0DI738*o`0B zicE^8g+m^r`A&-Wn+z&q!w(wBe3E@3l5Jre?cdf-&ECe9J4zw$@l1Hhm0cgIjNF8) zMeeO15^~?R>Yb|01W?ZAhZ&4ixoE5z8%bnDu|TO#3K;rUZEgk_c5(6>CTmFqCEzPutwa0pv@Pn!vvw#c-re z?C2Loc*f^viAOGo*R@U4gk$Ce+MYlLgyib!x5ambNB*qnSZ!K2;>*)V#iQN%Bbv1o!BG4lvxX5Z6byU$ z{Z`;MloBQcR=G+fg9R(~h@VHgGmZ=I5n%4+NZ(p%ZJWP)-`5a~uyjBk;ZpUr6Ngpu zpB5vd1m7dsQ*F5k+h4oUm(nv3pMW)pJu;d0#n%$-x+$B#@CuHq2Gd@Kh8OIb&Fp9I z1Ufdem6ayQE#cR~d!HvoJ{)Ww)VXl;`G$w9A@7@IGfd$Z5t0%g;kRc`akrW7rOmVQ z{nJ!8qDR!jZr2@j;t*p@tk zFfYa|-T>%AJIUo}-PA5^)u6ot{5WTNWXfN{e}WYc4D;Ls2p^G3d-x)ZccC%7HF{(b zjFSg$Larh2O=s-7NX_7e4s7J>LXNo@F@!}y1U2Jxc>Z?UK<<*V-eWDt?CN)NGY(= zo8{QP&X>km(>R|?!06C%@z)=m|H{QL$$jyI9CFWzLz+$|Ju;lj~nA;N$%-zuRIFgpk}}LHuC+WWWSKt3=UL$ROi8qqA>Pn>kY~ z+g{l~w?0r3kOaH4YEKQf81s=OmN>r764srfP9B+?+-;J#tJolP<|}!!R*RZFQfP(@ z``XhAK`(NoDBth+S|F4xZp_zWZTr@DbLZvN>(&O!Pu4>D-@Zx#N)Fj7>V_J3gqW7F zUznVWDdiHJRIj?|M?^+URY7`Cgz}J!7r=^r?n$j`P z#&%bluc>ucdU_sZ-fZps#0n%9^SM3;WtVz-cL+lf1XTc?v~{GFYS&wN_q?WJfYn7{ z5}c@VfI&Uigkqof6dFIdoXC~-#>Ah!&m?Njf8-ET5ys)hsjJCIMg=u}CXMB}{DyHy zW?=oN>ofsfGeei7!J9W}Ouet;){xm2&y8XH_QL$a5h?bZxzwQa7Zs7th0VQ9HDr?I zQNX`hP(RPi+}}s24flEKjWF8lv0>48R8()3w0q=S`SUi;Pq7n3ky9Bf>*~^$gsdLb zQazNRs$H6aaq zmq2;g!}yjD@)dB`o-RG`DW((4jl3{|5xHJl7)y@(`jYdgCNeX|yn}cxyg);jf`L^H zABfBY$gSp&XguiU8enh}13Sqk)w-zpeM^_l$YtNlO1W1(4Ct6tE7Je|k$1QiTlmar zD8BE3AZoFi_|-KDHz4q*Ql8oqvJ=rk1~=1X!Bh1l*#9cl=ykEewEbeP8J?D~Jh785 z`#}XoBp7Kpp24OU%|`iFtkpaH1aX{wtP*LHc=&G-#SXs(d(wsl7rknwWBJ=d!g`v# zpUL#mEx5=!Ks(62-`Kgw28yq~6L9^4c6ExSpr)c+)pJ`c#v7!6%T`sYS?-6beW+K5 zFF@S7mg7oN&)Ii9qcyOM1ZB9?Lwe&BLRPti#kyeHEa-^HCFNi27D@l;lr+p=Jh<_+ z(8L=FmTo1rTi(y5@c3P1A_Og|I|=UHXFA_Hev6f(MUEcDlE?%qb*(B?6d~mU@r8iP zIr56@=7alrCC~M@>iAW_>~7gxL97s0^(tmv4eTQ-wg=jkhNFq;ke)N_tTjIoSGY8) zmX&O1$>9DJ{J>;FF)$craZy{#`9u)ZF?TbD`*BVe?a6<2d|ET$5i5{5fS#zp;=ij|ZP`-CPe|)yo=GJvj zaq;WZHYb>NG&^BFHuMDc#rQ{Rwu+j4>ANo^)0u>gmU~U%AD_6j`>r=Fpo`f6|HqO} zun6>%OLD3K$1{2{?Xc>tJE7TNjV=-ON9^}(R_bA(KmVXWzg1om^eCkio2M6GC9!xUr(}5AhKiNFu<{8dgbD+xz&Zs?v^hQ z3IK1Mr%d|%8PGhX(ch)v8IAs~xDfYM4L~uJp%cKm_@h09{JaT>{OrD+&-xmnTS;Qz zxt#f1Ck#c1#Lii``N71e_uI}A%Tq5$F~^4{u97G6DaEU6UH`LjJ^qeUqM1i_$ku(K zV=pAtd~ECf8e-nm<+dwwmkeVj&w$}l6uqz19I#Og5>A!K9G3z^r;hJQqUJ`P&l z*qWO~YJ;>|+-$QX_xo-gC{xT~oEKy~r6s?}dun#-d>Y-XdVpq&-X%nH_IzWQ9rY-h z4^H<-tr&~5BRrJ(bQHAB|H@D`|MK{_I?sB*dG)j~3C`uWONJt2=$L@4Bak{+fK|Ml zx=3mQLtv(t){)-hA768<4*Q<_FseHvOt}4^jkZ_5*Bxy$F$81w3*B!~0e&-eql?GW z8>q=HN(%`9q)rAdqo1#^LQQYWxFJXt)?HLIUIOE729#!pr;tdmopSu@9XXD)@na|I zU9c|&Utu@}rGv*PW8l0&7y`+vxR|=)TIA0-xBP?PgT~lvldYf*Qy06{Y zu=Sr{fXRHiWlW%AWZn6_1qbc)o^L$GdHT2Wv4?IY>WO?z7IBTLsmDI)T-G9 zlNHmt&ab>W>=RlgO$TmSlm;sXE%mR`5|C!Avn{3iRspe3zT#mBJ!s=ah;Vw&^d(R`c`5)c_e&U zo-DEpcPZr;#lb>-HE~|}Np3N$t_Hf#I8oAh{y9VsWQcgDhY}Z!xDM7}YUV4RH)+1? zTLxOkEzY&DVrfstcdK?)VpCHz!i{Q32cCaB*)#+7j-Go}#EEMz$V;EZ(znF5SPd&z zs%Pk_OVZ4;t{E29wIgJ)qx;%&QYMszoENNhI4)qB*C1{f}&% zRs42LMK2$46GJe>>Edodv`nN_vXi?;EBzrR3?0J&E5p2c(@ast)cI1jp3t0?^s5|E zB=ElkZC`Y&Xr)TY#KdQV9})m}C%}z!>}!rBy7nn00Z6d8u3dC!;utb|pD;LGe(haj z*~-ke&)GT}K`5N0+|<>_aaqc%SO4I)6^u7IBW{qWYizjn9ZHo|+lH2t^fNj@w8?!WL11U}FnKU6kt3O#m)Ih5U(Xb)97VCqkR2$=|LcHfK&) z++=QOBo-VZOtI}hOb1ajS$p;$2EU&jj3|yn(m|-BgRKeENNKUv#NvjsY>H*QnJg)R zMv$V`{+<2#*2BlH*nf4Fk70IRq?5PrnY1Z>O zVI_DIIH1!XWio35GFWK^@yhA3UE(;n+mO)Nhi)rqn1wk;Smoo}B$v^i} z!$ijowkb6M6K>pJ#)k4hJ^FOE>pXn}|7G(2u)aU}38h-R!-k6}p$4D57RTK)TLCRy zAP?XOuo@w4(ZVj%i2}}kmTYeC}q5(R#-;lK@mb=R^F?cN?O7AwX>Qz2D_ndv0r`zV{S5v95r(1BMO`UMTyr- zCve`69{9<_*OHqlaW$WMIe(CaHBFWWKu2^pbzdFB+1k~zzok*<&a$i#)4AT#eTMN# z0;Sf5U{6rI;<~;XVSr&^$oJil_o>3Grf_R%Ar{EoLab{ep>ev;}jgqq& zRmK_g5;)v)BJ7Z$Z8mX7KkQzr^@#Xp{WKH-cBJO%Vk>4H-EBtZLFpF8Nc>puX7eiz#{zVuKxq2C@q!2D^{i z8Ci>-h|>v+o-|Q(q=i4xLQQ;90{kQpkEja-b)WUvM@_HNjA==1u@RDGRz41%TUj)8 z*WHYFHtH-ZkV%;~j8uHbSe+A&C6lL$i>VQ+!>g?G2~T)`yN~J%^bPYOov^Fuw`tpF z>S|k1;@_c~zThX1SwdAc5=n;})5u1-uPbB{ovBQ!T|NGQV%bQRTA)L=d4!zcnWF$i z;Cr2w#GC1rgr{)&Hy;or22Vvxg`(WwHIuiR>H_@&RXa^8j#R3kM`hFef-qY^ua7$> zz*RUZbp|3Idgnw?njSR7`$dKFZ=xAd^hvJLbQaJdwl;3@gFgN$4krM4N_J0!v=!mw zl6GgDb}0e?U2(F77|=`i%JLVDB)7DnKz`L5orYa8+jKv2qId1Qi6)<{y*ReV@vQZi z;;@lkyoO=ddWW$}ihhtwMX!b`8S+d`(=5io)cdkIr!Hxi1?>*Je3O}4f>!He-*%(p zI+y08dRHXDsG=l%kCPdnoB$g8{*MNs2@yy3oMCZ+e2}|m5N^g8F+#Oi{F!69ha8QP zL_pJq8+5&;5`P9SX;VeYU1WK>q(dw~h5n~=*?MUAjUOF^eBcsxxyjhxJ__}5wVUAzNY=>q-Ezzv?N9U|`9`%VXV`G3tvGzv`FEbcdC0s#xfgi9< zbfe;uC}hdVKYyW@ad5Q$Jd^Hzrc$K&M;yvoY2$P@qLndX*B1^T9{n00`%1_7%w(Tj zEJ;x-&Q{xW&+(;k;VV}sp}UjRjR}C&df{0&k_*SkVep?vLXa|YB?D0Qy$`+3d-w0t z<_z4p&;_@w4Wq88Yx{vH#TE6250d%&YC2oR(*>l>da2UNm#G2UM)!Ak8AW{fiMXs^ zrH_dh7PFD0!Y5t>4pl%%-VK<^h_WAOu%Z1c4qvx{r1C_P9!q*N(GuvaIt)DF+cFVY z3XQkB5a}d%SgSm6Wk=NPdm|eUqA2k|MFx{t6qi*O!Ic0Mk#`LLKNLb4NTuT4z?6)T z%`jQ1_@=^y)35R{aq8Sa5?<-Rh_GWdMjbr2#*fq{oR~jqx5IDnyeL;<6DbSJ3R zoVX$@HVb58Y}^C^kBpPs*)lY-l;h?^9mCf-L7%r&Q$=^VJBq!CD7XFW8p6p_Q4@riX;H2^K%tJgEeARJgOq zr}3x^GAl9a9MdNRYX?5c(oBf#r?CJtC1N+(a0pA7ywI{T;Nm(^7(S}2LLJr>VP+?hz*`l!MXBuco3Dmel z=&efasE!YliKPrIf}Q)(G>d(}a)p2!wk?bH%zc`{Qv$q2i@;(co6hDMY=N`B0^o;}oH{gxF4%3AZ!s{z(y>%UP)U z%kX{7jJQ-b%!B>~^{1@-_n}%L+(rh(I>e{%G=lrXIWTwx9~5yl{sowcmSA|8O5Fr{ zH6WtVF)ZqYC68`}FkkJMu19r{ty0C(M5hG!eBvOKyT)$cnj_dH*)<%0@y9tE6^bb4 zr(MbanVAiX3HO@#5AIQjGxT5f`(zNuSRX|xE4$Q1iBN=vNYWOvsJl}Y9F%(X(e;l; zCG(^zu#s^B=9`9LyONbUeXA>7_B{SsW1hJl*SFm&#@cpW2dL7NqD-cCg@UcpD>3|+ z3TO1b0| z$+)lbfjF`4u^Lru8PhWOp|FRYc_d^F6fD>BpPJjYIb5mtgX>P7tKbpW-Wf?5kacqiM}LK z;3T%8DU)ze8sTv&v!jZ8(oM5vaHJJx8=PR(WSk~Fbw(qIX>!AaxP<8;2G-44_{OO7 zBIiWvcKq?{3#1GB!tm>%W(0#ysC%x=SA=xirBFwLG@C~^D(NWCVc@~f%8dr^^KY7# z!E;Dcta4cX&w(#}Z^Qq=?NAo~64i<;-*L?KESl4pU3iiBDVpZ7wGfN60W&L%r3wGSUAuAP4MWEw^k4%vjU{5Ol<23v*GpzF1`h1k=-EKWB-Plr^e;qD9DgOSU_ zNujeq@gTRk2c7;c7oZRbzL$pffE2QOB2tBQ&Fo1HWGy^oP3G&pl|I!lD)YQZV@~g_ zUD)Iok zEe>s8(r@2_A0Ry~+LC{uUB49iGC5URP5fsF+(Wu@(|}DZj3J8jMEGuD2MFgL_@fym zJuVjxonewP5^C_}Z6KHld1uk2aK5Lw*I_E0vY#n8T{YvYZC*G)dA`*QRMr|f52`2z zgj0uE-k3@`DwuVrddqLtlmVrSWf&zY=HIJefS_nv*Pc~O>vv)|J-3Wb6B@3ad=RSz zCqf{FFK}P`O!`8m{`A2X?AvquqxNB8_ug$X?=KlEn!LL)&VAUZoOBtDNF8oyl$X z%kfMOwEBP7RX2_iPDFPB0|pV`^x(aJvU9a)U$y|8zA$PZL}d?Hl~#GK@C#-GbGivPqLhs?8|F&djo0An_H3S=v_$>!M^U>tw==b?x9l41LH@WyNOYw=Nt zfxmQ7Vy1*$do3{q2^TwG5&w_Gn|twRBidI9c!iW#a9EGnL`)EMwdP(KD6w&)b=PAm zd3)njIgsxDH(%UGl`I%`@jv&olGa{UDTJn&eyNyLxl1~5Au_CJ4~s-BKQVXRB^UBy zGxq$L(o)}DSt1lRWF6rrBe;b#J?iZ8LojeU7CP7=6W%Rz=Zht+*5<-0>JcJfc zKdySFlyY_AG9`Obnwz0E_jYr@3p7sgk){lOUgh-Z7`I3h87`mcfv*>PX-VDo$k*<_ z5t$535~kJIm2lw?X=`S`LLU?heb_zo?3Lkw)1^WI2{czeY3c2a&r z02jq@I$Q~iugzd2_w70GnylJTKph)6k01J6u-XZPu|f=Xz&Gi2@DZy{!7D;``fs~J zRS-^61-=4JqPiv=@<0-*%$uDK4;}lbTfzc%O15_(EMdcB%FWzpd-2es&Zh)K;mf>j zewUCl6L4LueCGY<{{YhTTzb1^W5pr@A9}F)9~{O(;0EwfMbvEFBbgnn#nttUyYOu; z#o`0xLgWa1*L7rZ@iwzDA5cwZavc8IsSyY`oE63(YEZIV-Zc|6Eh&P$b}d^xcxHt02XT}1~q2Z>F9 zo#XHdD1x0&YWZIdCQHg5wt|-R<~P%WxJ5kN{luIUzgbHCtqt5M-aB}?m`67%DB`(6 zy78_a_21)D%bdY~z4q@^F#0MF{q_!eWW)m`|EVT+mvs$TP1|-#!J)5;j}<$rx^vnX zzzyuC`)Tgp8R!o)59Z{u`+3%S%0#gFusJ*LZ$?w_8~xO_ub<=lc6;!By8+Vic83=g zKkZWUzvY07A_E>je(MU2Q#q#Ou++f&)WYt6Z}(es=Y^ZdHqI22L8E257pyT0aY!T6 zHlc%UpDaJC$^QqG&xoDcHziK)4zFx?jm^ zHGzf#+7G?wHHa0fp0>NtA%U|6Y~TcT_>9QY)En6wJI96=8f9 zfzH#InteMDPW~BP1gQ*N^yo>H04kT6si!3$KA;e&R8u0}o;i*SCl{qa(#_kZ!U{wE zQ+3U{)e?+DseJ-aH%1=*r_z^URE-vyRcU4&%?*Bk$LJuZ^}YpfG^Y0Quz;e?j&a|r zAB;*;z;_;kqNF#>+xyT3wo9yAl;xE#-Yfw2?HKqb8ocpfzS%Y=mSS+$0%5U>oAf=3T;znQJX%2XW zIBz%!dN^~lXz!4r-Dxxy?rC)pkCv{BIO&ouG_T1kG*)*yju<@Mj&{hb9g{VWE-_FQ zuzKg|Ra>y=_x&9g8pk!L_w*3tPIMdbvqzIr-UWRn4sD)DBSE`8O1$;>x4+`<9vX`v za!CG94BbBr7LS9flZ2Cd)0bHYAT(N4~q1;6J(sV(61tfX~LmOv4p~lS*D`A%LIx z7Qd{{uaT7K+U;FJVovTY!9aLpl?D-# z>G8(j=d-KIl8&Zef!MSg^RZ_1{y+VnDjI6*vXCk{d3+oP6C zqsIx9>||nJ(FcRef%8=P{1&~0J;vQgQ}=;k)63zL*VMq^B_8kFRmfH%MpbYlip>wk z$^Tbzw*UAKT{gvv1{MUSGpV}Jd0m2CPQWxrR6wJl;77=IF!;=VUMBPDZ@$3uZji6{ z3!dXf55EdKR3Cm`TrDsD|CF3%?e!vdC^DY6beS+n=X$+u4&W--ocj@QpV}lO-@brm znF{wCeo&A%AA9tetf{B8WHEK)e_cPLxnh8Rbeel({xD_wk92f< z9U43y1gXue>f}PDA!t1hIaWNIQ!GCwu31HsV z?WyTe-xY?;h`};^x@a!N3IC5=1WQQ8CSHN^Jzn*l-lW!5Ts^>4M_^7<^1oD&It@0C zu(V2+$Qc!hE{B;+>hFZJ|DRzpVh3u>&$$!maD#BJ!cEta1u%3Ab1aSG^#7x8;a5Nf zx7-3KAANSKtcqQb`jEE}$TC6=jsDLK+oNZ$Se82rAHHO?nW3hw{fZY$7h`) z`d^t@G!p)451Y|u!F+L)9=!A63q3AL#C!QEW!#?i+yXzakWx=bivKU)q)wgpf8~)W z8pfeWGjx97jv)}m=jS_-kVrnjXA1AG zoJfy0BVB*y`7L>$cNX!+5q5=k)l&h_@{eaF?KS3JZ(fCOvNQY&T#fDgW2D{03-(UF zMoREd)n8%*Qr6)*oTXgWo3vc<(--ohCB8$lbbFQ_@7&#v;V(i4wwJ-x#Fi=0+Qqwd z>60yuLdk*_R5=l>JFdw6X3{^bd(EVM?3*{a_Bx0x@MLg3WWFdM>>Pz_DJ=}jg+5A{ z3z%T|EjbVTEcw6i@nIoftli1Nd%iAdiWY;0^$?#?9s#(nIU_NpSOTcNydZ=e704=Y zG^4%`Y&#NHo&4uYV`gx?8yBWciU%21v1NpR73?@%gH-*g;tw$cLCG(jSs6*5%zcfI7=Dz9uIzO(v zY6HJt%4JApbq(Ys8Lg%9AeDYH!O443ol9@DlGU>fwgw znL@OzGWC7RD`Vg#>?V?`IG*gYno z?fGOt-XtxE-;rn82d)ZL!Jf~!lxM$UlT3~ANF3KMLSifP(`bi=m4&HL&#y1ht-2hB zA&0$v`};9n+IY+a=2e+%_0n#abYg%7B?)fMOIP{hlWVF(YQP>-+7lZET9Oa`UADg* z?ZfQyL~AnblJagy7#0<(7uoTM?Za2*@t%ySpq)7)ClUU!#?onI^P+$ZzX6-jkd*UV zDA7#;o9ySYq)`dyy#ACPb>0;v56)%YTLDGbov;M?^gy@BX-bO;9@0m%h{ISmEZlx^ zs0k^n!PAYcqM$@2?UX%j{b_o~s!1&Xiz)M37ydAbtvlrm?_rd?==m0>;>3rpY;EOV zVL=@JWKc&KDhwO1|4NKga&pbyqB&^Lf z;dj=%>qyi1MmGaKZu~MnY2jWxp!&g^qp*gQTC{%3N4JP0HWNPemq=AK4IGCn5gr@t zl|M&ne@MOB#RsQof=tUyviA1v1#`m$x>HWtw9z&Th@3nl!k5njJ{q=hh1qk? zAyZU!5_SZx$9Bl?FQNtJ9VeDMXn}-BoECp%FNnn47_tbcOR0UTgDe zjQ!?!BJ7+;XFP=d^M?e@Dg)wVM0QkLD(zVy@9x9I;yKs+;QInE1?{kLyXx%AHl zh!sdz`7GD%@yYIq_8Py~npqba5y}tU`1Qk7sp=^LO$)@_oHAzc=Ah>dEAwfBJYV~0 zuq*T)MOXnoo2uE+(Cx01KjI)vW$_=GO>{~DV&yVfB*I$TeqLYlV>+quq1EOen^_6g z-o!ZInRXV8i(M=B0S946>V2Zqdk}`Grz2-P&U~>R?)K#)kwVHv|9R!3!D38gw=pl7 z{lz>wk29hsRZ%M3a#N~WkOW2c!m^UK(6g62dw-wWjeN=cJb=$tv1xyVK5;7_@^jD| z_hjkuL~t&^6!nVRRC}n2Zt}Ec{zWWRj`n>H(6fOoynMxuu}&34fO~1RU~3Ie*s34% z=}BO6e(OiJ%Azq=7K=&wo%|sBVYinV32yYNvZ91~1!Jqnhcxqw?MIr$>~q|J?27(9Edx!{Q#N3F2siXLFnT7X#RVI5xx z@l4##r0`cg{-o4tu}WK;ckR>0GKZjVC0mlr(OBQ;P&pUtA1sM0%UJLH@^>jx2!hY$ z9oUw=>9Yxwi{3FO!y! zNfi|%YKa|8gv-UjfWy>XOgs?z-io3V(1M+YxmMmIH}1?{5A7LW?t&nT7=^ppCMEh* zb@whtPhhT7>C302H$06o`&xv0iJL!;(p(5w>YDf>hlWjItQuh_)bgjohh~)VKG^IA zmwa@Sn?7*4%&AgaLB-PbjVx6(Ws{9bZ}4=azJdyF$;BO{lfhyXqz=L>QXOEKv#@q} zejQyON)TCRlF65Pg}T2q@+%?BASVLf_Tk2|t*}5HA1?2xW4)N6=djOggo^`yJlcEr zsWuQK%dth~Ls4YE%3~TWfbzFshpyx;m;TRvV9jR!dV8Vov2LE zWfJ*-lWf|)Oo~HWgu2wRsbs2wVoxySj`Q9RrqNzl-KraXkw56eDWL)RCCQPGW<9Da z_C@D#z4@~b=ES!^;kmNDg7mYn{FD=4B`OsXza%U;JsO%VfbX#Sw z{jmlz!5>OvMsv-suL zp)idQH!YAi?j_A%8g9dV!@v4Yad_t|ET}Aj^ERj?9+y@o%^KO{`dcT$3108CR!KcxgdAeXC~*QLK~C+lh( z$q&=kbUP2$s2H6OUf6ES9R!DeV@#q-v{*-9x}PccM+YAV0=_$cVzc2aw>O@jX7x*ffHC6~0W8I6_dpp30{KQgg6Ut;V-G^Df;(}^?#?sK_)RCvy5 zr#t;=YoB#~|Gg=@IK++@EF`N`pxUY73j9QeX(bO{{q7%@0nyPG^qdUs3HPN8Z8NBs zIo?ah8X?lZhu{D)Vwe=X7nRF{b{a^-Iz4xL#PjUK2?#yfAq}I5S$zqneM@7w=^^DA z24pR|4*~V|f(T#i!4o_mt#TIYpQ@7=XiAUB-SSIqsKq876zX1_5W;{&sWiHS*tWhq@hwE03roRlrEo+EjI}2!8t7K27=Qxdo+whsL3Gh#Jx>Q=0n9s zhN`n;m_6Fr&$E_~Y7&cQpWT>rm(bmCKi>&`L(6_|crgGg-|u~UJDV)*CS*xICS*CB zLsMcW+LmxtR!jDVT{|a)ov#Si%9gZqN}Cs7>Ueii{Rb*X(!ZfCW~PH37iDK}z_D?z z%5OC)hjRc#TwCQ%toc$gQJD1zHH?-&#)~*woG?ru=TW8B!MA@wYY&kFVwQv+wzHwq z?xMyQ9YJ}o#`MYQN$Uh*TwSz&28AW_=<0Mj3)_{6oAV{}ikjt*_`5I#VPp0VJWx_M z1tEHSDn|MUp&1%@`Aix>%KTg%I~!#2Nd?-8?a7w4|(0){}Q~0ZB$W8 zOR?yhCnhAFlGm-O{u+3AxhW;>rcchRKV)lSWWgbJGtAAm%L2T==)^DSIIXYFOgA01 z@L-SH^VP79FJpATWqr#tB`E*j# ze{T9AO|a8r5$g=@=$Yi>8jm?v`O-pVa7Ct5>D?L8I*z|v0nEC6$yqTK^MI%j@Mh>v z|5=qoB_>@DCToA__{59fIN~oas?*TEaazZir;LBY-M+o;KYXJvV`4k^z*GgRL?2T1 z?*20Foqk{#O4)m7;*uU_auz{aw)xBUxo&b6ZS(w2T#}Z=PLvYfq9&V;SKlw>Ro1L3 z*RlHiL!}0Q)qL4}H+ADX(c6|>2SHVoI4okn@ae#MNCE$>Q}70*kZ;!=+~FJWremV{ z$n#LGSNGqFq}>m;B$y#(DZ2n)85U@BR~yl&ydTD><&ZkNx>Uv3q%zL|1Yj*sr`ZB? zYq2;gv?I*`8C3Gh>RSo9b|Q>W25)dbpY_mh_AUslCuy6>3Re--l}nh3H*^yxM!DW0 zS@w3LMg{dvG!l>p94AB+!?j}Lo@b?CJw-SB;OlW>jGc&Zax|do9`fbwXv;<_b_pah zWDRC?3u!J?%;J)HkFCfHy?YF-quZo|bo;Dl7!4@!zD|LP;OLV-v755l*TT1- z`Y4lkAw(y+xK(pm|Z&jh$XUnlvcHXA0V#TAy- zZGF3WvfqKaY|RTJYYNOgi*vs&U+Y!|9wWF?u+m2x@H0)ul%5*EOX+QdIZ=J_0!;=! zsh7=gK5hlFerBlhex@j0l-wG$_2oB3?TT6%)P&i+il~E4j~!p&lb>-CgWA&` zFl5yZW_~-5y616|ay8)-6^SEH&%D?V%1_)YC_G9HRBcsLRKzF|jTXl_(w*+}OlG$M z?XyT^K9u!IjXpJJQI^e2q$Jl1u+zAF04h3A+||Z+wZELo^FjL(Px<*B+->;;x6C^e zhjAgC!;RKJ(?`$@s8rUMDBcWvM8yNiCC;>gu9?O{+Ff13;}GRMsCS7I_Afn82@=_T z*DcbEMW)Tsjx{nVJz^of#SGC3+!-5!hbXViBsSx9UL-<`B1r;S^qtYhPd*X8LFM+F z{KGwKOa1L__+Lh%xJh3Hs0`PYwzA0TfsT(3=v}v%tFlyJo7}+?@4x#DcyKN%Valvc zew=YFYDD#tU8RUVkRo2RMu!j9`TEje6!#VP(#8hoQw_g%!N zc_Z-Q<=9e~0Q#x4$8ILx+5GfUpN)$PZ;QMCNI#X)1qHr=aZ;fnEB=?G|lnh*$ zf~i#tS1|{+{|Llf?U>uu?#eBOqv@ZuOg>Q&qMu?~(VUCM4B*m@^ziX&z88b=Il+ts zrFyXt2FO5wz%+Sr|HC6^6&crKIkTwC2C!rWm!AVGpDZSa;p$CCEL5LVlsdDZ$6z=v zftE${fY&=I8yzRdPfiBqlb!|BD63fe4az&bEuUlq)j!wRk47Yg!bbK6`KQmD*v@#; zt#5jRST+HJ{yCZ}VfvpQD@> zC(v^leM{SkO+gf`tEqOBVZxyNh_ptN_C_+(CIaoZTSux?dn33zY6g3S8i2oegYi<8 zhKAdiB_O(OV;?Z!Z?g_P-%;VgXQwNDn@C;NC)^0*nc+k5)HPnn$q>8`<-_if5Z2v^ zLba1kxl+ZxYU!a|=}so+D|oKCWKO*!$AKVm@8DlD05>!DP8l2je z-wIUz#I0Lp{VA_c7~?wjHH`i16SdySXv%9@NRpKlBk)5|knD^Z7?dWAIJuJ$m+}({4~rIJhZ0Vw_@39UqJ2@ z7FR~xH2pLy6u)(Qu~X%U7ab0=A6L81n<<2|O01|59!vXBs%GRit@Ui)A!^H%ycBm3 zQzmq}MgRUkKh7~PvW zL=i}{=VvA~yqc$uCVJUij=LeO^bY`o4M!^K;x&0$gF zEW(}9hY!oCE!x93)OT%Guv0^(G3i4Lp%3@W#8U%@f*|-yr$szODrC-UilU+HBzaSK zF%V1}Ki>26jOUog45wtwJ>VRy}g!#VmkfrZN4qoVlF1PcVOkmjJq;<<_fR02ziw*A~F{0J4mMrp>!bM<7!bK zRD~y7iX6-Y6ZkDu;yW`s!E@x#GHvjS5NMqzM>T|HSCvDSPMNph*MRT$Hm7T~kQA96 zb51a2aF1hANo`)GW)7&*^Dl#@;MhvRB%NYk`1{%J!y7|bxZg!lGQU;Db8e=FU!6|c6LRlBJYlrPpI>brD>zF5n|D0H;eEw4kFct|gG$CU(5ADoi78TyF z4X6#9asL87O)X!hX$DK86!~H@JAM{D`l)w%iDIlc30XnCJy3JY1b*?+G0+;uUa$}I zPgUnf74wa{cN5WXfH!h!Gz5wQy%}L)GjUh^Z>BD75#$AlU32N^$1dC;VtP&H`h(h% zrO$30e>ZKhP`w--#PjS&awWU5<5~5yHK>B2{(b7dq9XX5(BU6W+PLW10aE2qZg)QA zs;0hEmvVD_Y$~}Gp8>(ZmUpXAQ)OqGpMH&eg=bgv9$sBwT{xNh(9i4pB3G`>V0SPB z%oD0oL0Wr2(3*))?VJ*r4z%F2su31;%X|sFX$Ek)Tjm?rbT%l{$(SpMn7$S1KI0Nr~MzqFonndX1u?rg-eU@6)%LtVJpX7fqkZbOqN=?Y9%QR9h vF}`pDEW2i!~dfxm6BCf3u{r8hhJ~{;}3mZW1V^}=ji_dE=qD# literal 0 HcmV?d00001 diff --git a/docs/output.md b/docs/output.md new file mode 100644 index 0000000..c14ff13 --- /dev/null +++ b/docs/output.md @@ -0,0 +1,41 @@ +# nf-core/proteomicslfq: Output + +This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. + + + +## Pipeline overview +The pipeline is built using [Nextflow](https://www.nextflow.io/) +and processes data using the following steps: + +* [FastQC](#fastqc) - read quality control +* [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline + +## FastQC +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences. + +For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). + +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory. + +**Output directory: `results/fastqc`** + +* `sample_fastqc.html` + * FastQC report, containing quality metrics for your untrimmed raw fastq files +* `zips/sample_fastqc.zip` + * zip file containing the FastQC report, tab-delimited data file and plot images + + +## MultiQC +[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. + +The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. + +**Output directory: `results/multiqc`** + +* `Project_multiqc_report.html` + * MultiQC report - a standalone HTML file that can be viewed in your web browser +* `Project_multiqc_data/` + * Directory containing parsed statistics from the different tools used in the pipeline + +For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info) diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..e6686bf --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,286 @@ +# nf-core/proteomicslfq: Usage + +## Table of contents + + + +* [Table of contents](#table-of-contents) +* [Introduction](#introduction) +* [Running the pipeline](#running-the-pipeline) + * [Updating the pipeline](#updating-the-pipeline) + * [Reproducibility](#reproducibility) +* [Main arguments](#main-arguments) + * [`-profile`](#-profile) + * [`--reads`](#--reads) + * [`--singleEnd`](#--singleend) +* [Reference genomes](#reference-genomes) + * [`--genome` (using iGenomes)](#--genome-using-igenomes) + * [`--fasta`](#--fasta) + * [`--igenomesIgnore`](#--igenomesignore) +* [Job resources](#job-resources) + * [Automatic resubmission](#automatic-resubmission) + * [Custom resource requests](#custom-resource-requests) +* [AWS Batch specific parameters](#aws-batch-specific-parameters) + * [`--awsqueue`](#--awsqueue) + * [`--awsregion`](#--awsregion) +* [Other command line parameters](#other-command-line-parameters) + * [`--outdir`](#--outdir) + * [`--email`](#--email) + * [`--email_on_fail`](#--email_on_fail) + * [`-name`](#-name) + * [`-resume`](#-resume) + * [`-c`](#-c) + * [`--custom_config_version`](#--custom_config_version) + * [`--custom_config_base`](#--custom_config_base) + * [`--max_memory`](#--max_memory) + * [`--max_time`](#--max_time) + * [`--max_cpus`](#--max_cpus) + * [`--plaintext_email`](#--plaintext_email) + * [`--monochrome_logs`](#--monochrome_logs) + * [`--multiqc_config`](#--multiqc_config) + + + +## Introduction +Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler. + +It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`): + +```bash +NXF_OPTS='-Xms1g -Xmx4g' +``` + + + +## Running the pipeline +The typical command for running the pipeline is as follows: + +```bash +nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker +``` + +This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. + +Note that the pipeline will create the following files in your working directory: + +```bash +work # Directory containing the nextflow working files +results # Finished results (configurable, see below) +.nextflow_log # Log file from Nextflow +# Other nextflow hidden files, eg. history of pipeline runs and old logs. +``` + +### Updating the pipeline +When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: + +```bash +nextflow pull nf-core/proteomicslfq +``` + +### Reproducibility +It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. + +First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-core/proteomicslfq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. + +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. + + +## Main arguments + +### `-profile` +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! + +If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. + +* `awsbatch` + * A generic configuration profile to be used with AWS Batch. +* `conda` + * A generic configuration profile to be used with [conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) +* `docker` + * A generic configuration profile to be used with [Docker](http://docker.com/) + * Pulls software from dockerhub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) +* `singularity` + * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) + * Pulls software from DockerHub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) +* `test` + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters + + + +### `--reads` +Use this to specify the location of your input FastQ files. For example: + +```bash +--reads 'path/to/data/sample_*_{1,2}.fastq' +``` + +Please note the following requirements: + +1. The path must be enclosed in quotes +2. The path must have at least one `*` wildcard character +3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs. + +If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` + +### `--singleEnd` +By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--singleEnd` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: + +```bash +--singleEnd --reads '*.fastq' +``` + +It is not possible to run a mixture of single-end and paired-end files in one run. + + +## Reference genomes + +The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. + +### `--genome` (using iGenomes) +There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. + +You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: + +* Human + * `--genome GRCh37` +* Mouse + * `--genome GRCm38` +* _Drosophila_ + * `--genome BDGP6` +* _S. cerevisiae_ + * `--genome 'R64-1-1'` + +> There are numerous others - check the config file for more. + +Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. + +The syntax for this reference configuration is as follows: + + + +```nextflow +params { + genomes { + 'GRCh37' { + fasta = '' // Used if no star index given + } + // Any number of additional genomes, key is used with --genome + } +} +``` + + +### `--fasta` +If you prefer, you can specify the full path to your reference genome when you run the pipeline: + +```bash +--fasta '[path to Fasta reference]' +``` + +### `--igenomesIgnore` +Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. + +## Job resources +### Automatic resubmission +Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. + +### Custom resource requests +Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. + +If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack/). + +## AWS Batch specific parameters +Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use the `-awsbatch` profile and then specify all of the following parameters. +### `--awsqueue` +The JobQueue that you intend to use on AWS Batch. +### `--awsregion` +The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs. + +Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. + +## Other command line parameters + + + +### `--outdir` +The output directory where the results will be saved. + +### `--email` +Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. + +### `--email_on_fail` +This works exactly as with `--email`, except emails are only sent if the workflow is not successful. + +### `-name` +Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. + +This is used in the MultiQC report (if not default) and in the summary HTML / e-mail (always). + +**NB:** Single hyphen (core Nextflow option) + +### `-resume` +Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. + +You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. + +**NB:** Single hyphen (core Nextflow option) + +### `-c` +Specify the path to a specific config file (this is a core NextFlow command). + +**NB:** Single hyphen (core Nextflow option) + +Note - you can use this to override pipeline defaults. + +### `--custom_config_version` +Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default is set to `master`. + +```bash +## Download and use config file with following git commid id +--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96 +``` + +### `--custom_config_base` +If you're running offline, nextflow will not be able to fetch the institutional config files +from the internet. If you don't need them, then this is not a problem. If you do need them, +you should download the files from the repo and tell nextflow where to find them with the +`custom_config_base` option. For example: + +```bash +## Download and unzip the config files +cd /path/to/my/configs +wget https://github.com/nf-core/configs/archive/master.zip +unzip master.zip + +## Run the pipeline +cd /path/to/my/data +nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/ +``` + +> Note that the nf-core/tools helper package has a `download` command to download all required pipeline +> files + singularity containers + institutional configs in one go for you, to make this process easier. + +### `--max_memory` +Use to set a top-limit for the default memory requirement for each process. +Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` + +### `--max_time` +Use to set a top-limit for the default time requirement for each process. +Should be a string in the format integer-unit. eg. `--max_time '2.h'` + +### `--max_cpus` +Use to set a top-limit for the default CPU requirement for each process. +Should be a string in the format integer-unit. eg. `--max_cpus 1` + +### `--plaintext_email` +Set to receive plain-text e-mails instead of HTML formatted. + +### `--monochrome_logs` +Set to disable colourful command line output and live life in monochrome. + +### `--multiqc_config` +Specify a path to a custom MultiQC configuration file. diff --git a/environment.yml b/environment.yml new file mode 100644 index 0000000..d3a78f6 --- /dev/null +++ b/environment.yml @@ -0,0 +1,13 @@ +# You can use this file to create a conda environment for this pipeline: +# conda env create -f environment.yml +name: nf-core-proteomicslfq-1.0dev +channels: + - conda-forge + - bioconda + - defaults +dependencies: + # TODO nf-core: Add required software dependencies here + - bioconda::fastqc=0.11.8 + - bioconda::multiqc=1.7 + - conda-forge::r-markdown=1.1 + - conda-forge::r-base=3.6.1 diff --git a/main.nf b/main.nf new file mode 100644 index 0000000..002fd2b --- /dev/null +++ b/main.nf @@ -0,0 +1,421 @@ +#!/usr/bin/env nextflow +/* +======================================================================================== + nf-core/proteomicslfq +======================================================================================== + nf-core/proteomicslfq Analysis Pipeline. + #### Homepage / Documentation + https://github.com/nf-core/proteomicslfq +---------------------------------------------------------------------------------------- +*/ + +def helpMessage() { + // TODO nf-core: Add to this help message with new command line parameters + log.info nfcoreHeader() + log.info""" + + Usage: + + The typical command for running the pipeline is as follows: + + nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker + + Mandatory arguments: + --reads Path to input data (must be surrounded with quotes) + -profile Configuration profile to use. Can use multiple (comma separated) + Available: conda, docker, singularity, awsbatch, test and more. + + Options: + --genome Name of iGenomes reference + --singleEnd Specifies that the input is single end reads + + References If not specified in the configuration file or you wish to overwrite any of the references. + --fasta Path to Fasta reference + + Other options: + --outdir The output directory where the results will be saved + --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits + --email_on_fail Same as --email, except only send mail if the workflow is not successful + --maxMultiqcEmailFileSize Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) + -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. + + AWSBatch options: + --awsqueue The AWSBatch JobQueue that needs to be set when running on AWSBatch + --awsregion The AWS Region for your AWS Batch job to run on + """.stripIndent() +} + +// Show help message +if (params.help) { + helpMessage() + exit 0 +} + +/* + * SET UP CONFIGURATION VARIABLES + */ + +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +} + +// TODO nf-core: Add any reference files that are needed +// Configurable reference genomes +// +// NOTE - THIS IS NOT USED IN THIS PIPELINE, EXAMPLE ONLY +// If you want to use the channel below in a process, define the following: +// input: +// file fasta from ch_fasta +// +params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false +if (params.fasta) { ch_fasta = file(params.fasta, checkIfExists: true) } + +// Has the run name been specified by the user? +// this has the bonus effect of catching both -name and --name +custom_runName = params.name +if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + custom_runName = workflow.runName +} + +if ( workflow.profile == 'awsbatch') { + // AWSBatch sanity checking + if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + // related: https://github.com/nextflow-io/nextflow/issues/813 + if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + // Prevent trace files to be stored on S3 since S3 does not support rolling files. + if (workflow.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." +} + +// Stage config files +ch_multiqc_config = file(params.multiqc_config, checkIfExists: true) +ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) + +/* + * Create a channel for input read files + */ +if (params.readPaths) { + if (params.singleEnd) { + Channel + .from(params.readPaths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } + .into { read_files_fastqc; read_files_trimming } + } else { + Channel + .from(params.readPaths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true), file(row[1][1], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } + .into { read_files_fastqc; read_files_trimming } + } +} else { + Channel + .fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 ) + .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --singleEnd on the command line." } + .into { read_files_fastqc; read_files_trimming } +} + +// Header log info +log.info nfcoreHeader() +def summary = [:] +if (workflow.revision) summary['Pipeline Release'] = workflow.revision +summary['Run Name'] = custom_runName ?: workflow.runName +// TODO nf-core: Report custom parameters here +summary['Reads'] = params.reads +summary['Fasta Ref'] = params.fasta +summary['Data Type'] = params.singleEnd ? 'Single-End' : 'Paired-End' +summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +summary['Output dir'] = params.outdir +summary['Launch dir'] = workflow.launchDir +summary['Working dir'] = workflow.workDir +summary['Script dir'] = workflow.projectDir +summary['User'] = workflow.userName +if (workflow.profile == 'awsbatch') { + summary['AWS Region'] = params.awsregion + summary['AWS Queue'] = params.awsqueue +} +summary['Config Profile'] = workflow.profile +if (params.config_profile_description) summary['Config Description'] = params.config_profile_description +if (params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact +if (params.config_profile_url) summary['Config URL'] = params.config_profile_url +if (params.email || params.email_on_fail) { + summary['E-mail Address'] = params.email + summary['E-mail on failure'] = params.email_on_fail + summary['MultiQC maxsize'] = params.maxMultiqcEmailFileSize +} +log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") +log.info "-\033[2m--------------------------------------------------\033[0m-" + +// Check the hostnames against configured profiles +checkHostname() + +def create_workflow_summary(summary) { + def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') + yaml_file.text = """ + id: 'nf-core-proteomicslfq-summary' + description: " - this information is collected when the pipeline is started." + section_name: 'nf-core/proteomicslfq Workflow Summary' + section_href: 'https://github.com/nf-core/proteomicslfq' + plot_type: 'html' + data: | +
+${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} +
+ """.stripIndent() + + return yaml_file +} + +/* + * Parse software version numbers + */ +process get_software_versions { + publishDir "${params.outdir}/pipeline_info", mode: 'copy', + saveAs: { filename -> + if (filename.indexOf(".csv") > 0) filename + else null + } + + output: + file 'software_versions_mqc.yaml' into software_versions_yaml + file "software_versions.csv" + + script: + // TODO nf-core: Get all tools to print their version number here + """ + echo $workflow.manifest.version > v_pipeline.txt + echo $workflow.nextflow.version > v_nextflow.txt + fastqc --version > v_fastqc.txt + multiqc --version > v_multiqc.txt + scrape_software_versions.py &> software_versions_mqc.yaml + """ +} + +/* + * STEP 1 - FastQC + */ +process fastqc { + tag "$name" + label 'process_medium' + publishDir "${params.outdir}/fastqc", mode: 'copy', + saveAs: { filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename" } + + input: + set val(name), file(reads) from read_files_fastqc + + output: + file "*_fastqc.{zip,html}" into fastqc_results + + script: + """ + fastqc --quiet --threads $task.cpus $reads + """ +} + +/* + * STEP 2 - MultiQC + */ +process multiqc { + publishDir "${params.outdir}/MultiQC", mode: 'copy' + + input: + file multiqc_config from ch_multiqc_config + // TODO nf-core: Add in log files from your new processes for MultiQC to find! + file ('fastqc/*') from fastqc_results.collect().ifEmpty([]) + file ('software_versions/*') from software_versions_yaml.collect() + file workflow_summary from create_workflow_summary(summary) + + output: + file "*multiqc_report.html" into multiqc_report + file "*_data" + file "multiqc_plots" + + script: + rtitle = custom_runName ? "--title \"$custom_runName\"" : '' + rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + // TODO nf-core: Specify which MultiQC modules to use with -m for a faster run time + """ + multiqc -f $rtitle $rfilename --config $multiqc_config . + """ +} + +/* + * STEP 3 - Output Description HTML + */ +process output_documentation { + publishDir "${params.outdir}/pipeline_info", mode: 'copy' + + input: + file output_docs from ch_output_docs + + output: + file "results_description.html" + + script: + """ + markdown_to_html.r $output_docs results_description.html + """ +} + +/* + * Completion e-mail notification + */ +workflow.onComplete { + + // Set up the e-mail variables + def subject = "[nf-core/proteomicslfq] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[nf-core/proteomicslfq] FAILED: $workflow.runName" + } + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = custom_runName ?: workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary + email_fields['summary']['Date Started'] = workflow.start + email_fields['summary']['Date Completed'] = workflow.complete + email_fields['summary']['Pipeline script file path'] = workflow.scriptFile + email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision + if (workflow.container) email_fields['summary']['Docker image'] = workflow.container + email_fields['summary']['Nextflow Version'] = workflow.nextflow.version + email_fields['summary']['Nextflow Build'] = workflow.nextflow.build + email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + // TODO nf-core: If not using MultiQC, strip out this code (including params.maxMultiqcEmailFileSize) + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList) { + log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'multiqc', will use only one" + mqc_report = mqc_report[0] + } + } + } catch (all) { + log.warn "[nf-core/proteomicslfq] Could not attach MultiQC report to summary email" + } + + // Check if we are only sending emails on failure + email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$baseDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$baseDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.maxMultiqcEmailFileSize.toBytes() ] + def sf = new File("$baseDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + if (email_address) { + try { + if ( params.plaintext_email ){ throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (sendmail)" + } catch (all) { + // Catch failures and try with plaintext + [ 'mail', '-s', subject, email_address ].execute() << email_txt + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (mail)" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File( "${params.outdir}/pipeline_info/" ) + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File( output_d, "pipeline_report.html" ) + output_hf.withWriter { w -> w << email_html } + def output_tf = new File( output_d, "pipeline_report.txt" ) + output_tf.withWriter { w -> w << email_txt } + + c_reset = params.monochrome_logs ? '' : "\033[0m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_red = params.monochrome_logs ? '' : "\033[0;31m"; + + if (workflow.stats.ignoredCount > 0 && workflow.success) { + log.info "${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}" + log.info "${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}" + log.info "${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}" + } + + if (workflow.success) { + log.info "${c_purple}[nf-core/proteomicslfq]${c_green} Pipeline completed successfully${c_reset}" + } else { + checkHostname() + log.info "${c_purple}[nf-core/proteomicslfq]${c_red} Pipeline completed with errors${c_reset}" + } + +} + + +def nfcoreHeader(){ + // Log colors ANSI codes + c_reset = params.monochrome_logs ? '' : "\033[0m"; + c_dim = params.monochrome_logs ? '' : "\033[2m"; + c_black = params.monochrome_logs ? '' : "\033[0;30m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; + c_blue = params.monochrome_logs ? '' : "\033[0;34m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; + c_white = params.monochrome_logs ? '' : "\033[0;37m"; + + return """ -${c_dim}--------------------------------------------------${c_reset}- + ${c_green},--.${c_black}/${c_green},-.${c_reset} + ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} + ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} + ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} + ${c_green}`._,._,\'${c_reset} + ${c_purple} nf-core/proteomicslfq v${workflow.manifest.version}${c_reset} + -${c_dim}--------------------------------------------------${c_reset}- + """.stripIndent() +} + +def checkHostname(){ + def c_reset = params.monochrome_logs ? '' : "\033[0m" + def c_white = params.monochrome_logs ? '' : "\033[0;37m" + def c_red = params.monochrome_logs ? '' : "\033[1;91m" + def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" + if (params.hostnames) { + def hostname = "hostname".execute().text.trim() + params.hostnames.each { prof, hnames -> + hnames.each { hname -> + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { + log.error "====================================================\n" + + " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + + " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + + " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + + "============================================================" + } + } + } + } +} diff --git a/nextflow.config b/nextflow.config new file mode 100644 index 0000000..2039248 --- /dev/null +++ b/nextflow.config @@ -0,0 +1,134 @@ +/* + * ------------------------------------------------- + * nf-core/proteomicslfq Nextflow config file + * ------------------------------------------------- + * Default config options for all environments. + */ + +// Global default params, used in configs +params { + + // Workflow flags + // TODO nf-core: Specify your pipeline's command line flags + genome = false + reads = "data/*{1,2}.fastq.gz" + singleEnd = false + outdir = './results' + + // Boilerplate options + name = false + multiqc_config = "$baseDir/assets/multiqc_config.yaml" + email = false + email_on_fail = false + maxMultiqcEmailFileSize = 25.MB + plaintext_email = false + monochrome_logs = false + help = false + igenomes_base = "./iGenomes" + tracedir = "${params.outdir}/pipeline_info" + awsqueue = false + awsregion = 'eu-west-1' + igenomesIgnore = false + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = false + config_profile_description = false + config_profile_contact = false + config_profile_url = false +} + +// Container slug. Stable releases should specify release tag! +// Developmental code should specify :dev +process.container = 'nfcore/proteomicslfq:dev' + +// Load base.config by default for all pipelines +includeConfig 'conf/base.config' + +// Load nf-core custom profiles from different Institutions +try { + includeConfig "${params.custom_config_base}/nfcore_custom.config" +} catch (Exception e) { + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") +} + +profiles { + awsbatch { includeConfig 'conf/awsbatch.config' } + conda { process.conda = "$baseDir/environment.yml" } + debug { process.beforeScript = 'echo $HOSTNAME' } + docker { docker.enabled = true } + singularity { singularity.enabled = true } + test { includeConfig 'conf/test.config' } +} + +// Avoid this error: +// WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. +// Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351, once this is established and works well, nextflow might implement this behavior as new default. +docker.runOptions = '-u \$(id -u):\$(id -g)' + +// Load igenomes.config if required +if (!params.igenomesIgnore) { + includeConfig 'conf/igenomes.config' +} + +// Capture exit codes from upstream processes when piping +process.shell = ['/bin/bash', '-euo', 'pipefail'] + +timeline { + enabled = true + file = "${params.tracedir}/execution_timeline.html" +} +report { + enabled = true + file = "${params.tracedir}/execution_report.html" +} +trace { + enabled = true + file = "${params.tracedir}/execution_trace.txt" +} +dag { + enabled = true + file = "${params.tracedir}/pipeline_dag.svg" +} + +manifest { + name = 'nf-core/proteomicslfq' + author = 'The Heumos Brothers - Simon and Lukas' + homePage = 'https://github.com/nf-core/proteomicslfq' + description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' + mainScript = 'main.nf' + nextflowVersion = '>=0.32.0' + version = '1.0dev' +} + +// Function to ensure that resource requirements don't go beyond +// a maximum limit +def check_max(obj, type) { + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } + } +} From a9c52e2a536354caf8a8a3fe684e7f3b453380e4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 12:29:01 +0100 Subject: [PATCH 002/374] First real commit. --- bin/create_trivial_design.py | 50 ++++ main.nf | 427 +++++++++++++++++++++++++---------- nextflow.config | 2 +- 3 files changed, 363 insertions(+), 116 deletions(-) create mode 100755 bin/create_trivial_design.py diff --git a/bin/create_trivial_design.py b/bin/create_trivial_design.py new file mode 100755 index 0000000..e42b26c --- /dev/null +++ b/bin/create_trivial_design.py @@ -0,0 +1,50 @@ +#!/usr/bin/env python3 + +import sys +import glob +import re + +# code to sort in a human readable way +def atoi(text): + return int(text) if text.isdigit() else text + +def natural_keys(text): + ''' + alist.sort(key=natural_keys) sorts in human order + e.g. UPS1_50amol_R1 is less than UPS1_1200amol_R1 + http://nedbatchelder.com/blog/200712/human_sorting.html + ''' + return [ atoi(c) for c in re.split(r'(\d+)', text) ] + +if not len(sys.argv) == 3: + print("Usage: MZML_FOLDER LABEL_PER_FILE") + exit() + +in_path = sys.argv[1] +label_per_file = int(sys.argv[2]) + +mzmls = [f for f in glob.glob(in_path + "/*.mzML", recursive=False)] +mzmls.sort(key=natural_keys) + +file_count = 1 +fraction_group = 1 +label = 1 +sample = 1 +print("Fraction_Group\tFraction\tSpectra_Filepath\tLabel\tSample") + +for f in mzmls: + for label in range(1, label_per_file + 1): + print(str(file_count) + "\t" + str(fraction_group) + "\t" + f + "\t" + str(label) + "\t" + str(sample)) + sample += 1 + file_count += 1 +print() +print("Sample\tMSstats_Condition\tMSstats_BioReplicate") +sample = 1 +condition = 1 +bioreplicate = 1 +for f in mzmls: + for label in range(1, label_per_file + 1): + print(str(sample) + "\t" + str(condition) + "\t" + str(bioreplicate)) + sample += 1 + condition += 1 + bioreplicate += 1 diff --git a/main.nf b/main.nf index 4550fb7..ba3af60 100644 --- a/main.nf +++ b/main.nf @@ -19,29 +19,22 @@ def helpMessage() { The typical command for running the pipeline is as follows: - nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker + nextflow run nf-core/proteomicslfq --reads '*.mzML' -profile docker Mandatory arguments: - --reads Path to input data (must be surrounded with quotes) + --spectra Path to input spectra as mzML or Thermo Raw + --database Path to input protein database as fasta -profile Configuration profile to use. Can use multiple (comma separated) Available: conda, docker, singularity, awsbatch, test and more. Options: - --genome Name of iGenomes reference - --singleEnd Specifies that the input is single end reads - - References If not specified in the configuration file or you wish to overwrite any of the references. - --fasta Path to Fasta reference + --expdesign Path to experimental design file Other options: --outdir The output directory where the results will be saved --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --maxMultiqcEmailFileSize Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. - AWSBatch options: - --awsqueue The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion The AWS Region for your AWS Batch job to run on """.stripIndent() } @@ -55,26 +48,6 @@ if (params.help){ exit 0 } -// Check if genome exists in the config file -if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" -} - -// TODO nf-core: Add any reference files that are needed -// Configurable reference genomes -fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false -if ( params.fasta ){ - fasta = file(params.fasta) - if( !fasta.exists() ) exit 1, "Fasta file not found: ${params.fasta}" -} -// -// NOTE - THIS IS NOT USED IN THIS PIPELINE, EXAMPLE ONLY -// If you want to use the above in a process, define the following: -// input: -// file fasta from fasta -// - - // Has the run name been specified by the user? // this has the bonus effect of catching both -name and --name custom_runName = params.name @@ -82,53 +55,325 @@ if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ custom_runName = workflow.runName } - -if( workflow.profile == 'awsbatch') { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" - if (!workflow.workDir.startsWith('s3') || !params.outdir.startsWith('s3')) exit 1, "Specify S3 URLs for workDir and outdir parameters on AWSBatch!" - // Check workDir/outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!workflow.workDir.startsWith('s3:') || !params.outdir.startsWith('s3:')) exit 1, "Workdir or Outdir not on S3 - specify S3 Buckets for each to run on AWSBatch!" -} - // Stage config files -ch_multiqc_config = Channel.fromPath(params.multiqc_config) +//ch_multiqc_config = Channel.fromPath(params.multiqc_config) ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") /* * Create a channel for input read files */ -if(params.readPaths){ - if(params.singleEnd){ +if (params.spectra) +{ + raw = hasExtension(params.spectra, 'raw') + if (raw){ Channel - .from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0])]] } - .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } - .into { read_files_fastqc; read_files_trimming } - } else { - Channel - .from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } - .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } - .into { read_files_fastqc; read_files_trimming } + .fromPath(params.spectra) + .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } + .into { rawfiles } + + process raw_file_conversion { + + input: + file rawfile from rawfiles + + output: + file "*.mzML" into mzmls, mzmls_plfq + + when: + raw + + script: + """ + mv ${rawfile} ${rawfile.baseName}.mzML + """ + } + } + else if (hasExtension(params.spectra, 'mzML')) { + Channel + .fromPath(params.spectra) + .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } + .into { mzmls; mzmls_plfq} } -} else { + else { + log.error "Unsupported spectra file type" + } +} +else { + log.error "No spectra provided" +} + +if (params.expdesign) +{ + Channel + .fromPath(params.expdesign) + .ifEmpty { exit 1, "params.expdesign was empty - no input files supplied" } + .into { expdesign } +} + + + +if (params.database) { Channel - .fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 ) - .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --singleEnd on the command line." } - .into { read_files_fastqc; read_files_trimming } + .fromPath(params.database) + .ifEmpty { exit 1, "params.database was empty - no input files supplied" } + .into { searchengine_in_db; pepidx_in_db; plfq_in_db } +} +else { + //WHY IS THE WHEN: NOT ENOUGH?? + process generate_decoy_database { + + input: + file mydatabase from params.database + + output: + file "${database.baseName}_decoy.fasta" into searchengine_in_db, pepidx_in_db, plfq_in_db + + when: + !params.database + + script: + """ + DecoyDatabase -in ${mydatabase} \\ + -out ${mydatabase.baseName}_decoy.fasta \\ + -decoy_string DECOY_ \\ + -decoy_string_position prefix + """ + } } +// Test +//process generate_simple_exp_design_file { +// publishDir "${params.outdir}", mode: 'copy' +// input: +// val mymzmls from mzmls.collect() + +// output: +// file "expdesign.csv" into expdesign + +// when: +// !params.expdesign + +// script: +// strng = mymzmls.join(',') +// """ +// echo ${strng} > expdesign.csv +// """ +//} + + +// Doesnt work. Py script needs all the inputs to be together in a folder +// Wont work with nextflow. It needs to accept a list of paths for the inputs!! +//process generate_simple_exp_design_file { +// publishDir "${params.outdir}", mode: 'copy' +// input: +// val mymzmls from mzmls.collect() + +// output: +// file "expdesign.tsv" into expdesign +// when: +// !params.expdesign + +// script: +// strng = new File(mymzmls[0].toString()).getParentFile() +// """ +// create_trivial_design.py ${strng} 1 > expdesign.tsv +// """ +//} + +/// Search engine +// TODO parameterize +process search_engine { + + input: + file database from searchengine_in_db.first() + file mzml_file from mzmls + + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} \\ + + """ + //-precursor_mass_tolerance ${params.precursor_mass_tolerance} \\ + //-fragment_bin_tolerance ${params.fragment_mass_tolerance} \\ + //-fragment_bin_offset ${params.fragment_bin_offset} \\ + //-num_hits ${params.num_hits} \\ + //-digest_mass_range ${params.digest_mass_range} \\ + //-max_variable_mods_in_peptide ${params.number_mods} \\ + //-allowed_missed_cleavages 0 \\ + //-precursor_charge ${params.prec_charge} \\ + //-activation_method ${params.activation_method} \\ + //-use_NL_ions true \\ + //-variable_modifications ${params.variable_mods.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + //-fixed_modifications ${params.fixed_mods.tokenize(',').collect { "'${it}'"}.join(" ")} \\ + //-enzyme '${params.enzyme}' \\ + //-spectrum_batch_size ${params.spectrum_batch_size} \\ + //$a_ions \\ + //$c_ions \\ + //$x_ions \\ + //$z_ions \\ +} + + +process index_peptides { + + input: + file id_file from id_files + file database from pepidx_in_db.first() + + output: + file "${id_file.baseName}_idx.idXML" into id_files_idx + + script: + """ + PeptideIndexer -in ${id_file} \\ + -out ${id_file.baseName}_idx.idXML \\ + -threads ${task.cpus} \\ + -fasta ${database} + """ + +} + +process extract_perc_features { + + input: + file id_file from id_files_idx + + output: + file "${id_file.baseName}_feat.idXML" into id_files_idx_feat + + script: + """ + PSMFeatureExtractor -in ${id_file} \\ + -out ${id_file.baseName}_feat.idXML \\ + -threads ${task.cpus} + """ + +} + +//TODO parameterize +process percolator { + + input: + file id_file from id_files_idx_feat + + output: + file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc + + script: + """ + PercolatorAdapter -in ${id_file_idx_feat} \\ + -out ${id_file_idx_feat.baseName}_perc.idXML \\ + -threads ${task.cpus} \\ + -post-processing-tdc -subset-max-train 100000 + """ + +} + +process fdr { + + input: + file id_file from id_files_idx_feat_perc + + output: + file "${id_file.baseName}_fdr.idXML" into id_files_idx_feat_perc_fdr + + script: + """ + FalseDiscoveryRate -in ${id_file_idx_feat_perc} \\ + -out ${id_file_idx_feat_perc.baseName}_fdr.idXML \\ + -threads ${task.cpus} \\ + -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins + """ + +} + + +// TODO parameterize +process idfilter { + + input: + file id_file from id_files_idx_feat_perc_fdr + + output: + file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_fdr_filter + + script: + """ + IDFilter -in ${id_file_idx_feat_perc_fdr} \\ + -out ${id_file_idx_feat_perc_fdr.baseName}_filter.idXML \\ + -threads ${task.cpus} \\ + -score:pep 0.05 + """ + +} + +//TODO check if needed +process idscoreswitcher { + + input: + file id_file from id_files_idx_feat_perc_fdr_filter + + output: + file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + + script: + """ + IDFilter -in ${id_file_idx_feat_perc_fdr_filter} \\ + -out ${id_file_idx_feat_perc_fdr_filter.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + -score:pep 0.05 + -old_score q-value -new_score MS:1001493 -new_score_orientation lower_better -new_score_type "Posterior Error Probability" + """ + +} + +process proteomicslfq { + + publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' + + input: + file mzmls from mzmls_plfq.collect() + file id_files from id_files_idx_feat_perc_fdr_filter_switched.collect() + file expdes from expdesign + file fasta from plfq_in_db + + output: + file "out.mzTab" into out_mzTab + file "out.consensusXML" into out_consensusXML + file "out.csv" into out_msstats + + script: + id_files_str = id_files.sort().join(' ') + mzmls_str = mzmls.sort().join(' ') + """ + ProteomicsLFQ -in ${mzmls_str} + -ids ${id_files_str} \\ + -design ${expdes} \\ + -fasta ${fasta} \\ + -targeted_only "true" \\ + -mass_recalibration "false" \\ + -out out.mzTab \\ + -threads ${task.cpus} \\ + -out_msstats out.csv \\ + -out_cxml out.consensusXML \\ + -debug 667 + + """ + +} + // Header log info log.info nfcoreHeader() def summary = [:] summary['Run Name'] = custom_runName ?: workflow.runName // TODO nf-core: Report custom parameters here -summary['Reads'] = params.reads -summary['Fasta Ref'] = params.fasta -summary['Data Type'] = params.singleEnd ? 'Single-End' : 'Paired-End' summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output dir'] = params.outdir @@ -185,64 +430,11 @@ process get_software_versions { """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt - fastqc --version > v_fastqc.txt - multiqc --version > v_multiqc.txt - scrape_software_versions.py > software_versions_mqc.yaml - """ -} - - - -/* - * STEP 1 - FastQC - */ -process fastqc { - tag "$name" - publishDir "${params.outdir}/fastqc", mode: 'copy', - saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"} - - input: - set val(name), file(reads) from read_files_fastqc - - output: - file "*_fastqc.{zip,html}" into fastqc_results - - script: - """ - fastqc -q $reads - """ -} - - - -/* - * STEP 2 - MultiQC - */ -process multiqc { - publishDir "${params.outdir}/MultiQC", mode: 'copy' - - input: - file multiqc_config from ch_multiqc_config - // TODO nf-core: Add in log files from your new processes for MultiQC to find! - file ('fastqc/*') from fastqc_results.collect().ifEmpty([]) - file ('software_versions/*') from software_versions_yaml - file workflow_summary from create_workflow_summary(summary) - - output: - file "*multiqc_report.html" into multiqc_report - file "*_data" - - script: - rtitle = custom_runName ? "--title \"$custom_runName\"" : '' - rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' - // TODO nf-core: Specify which MultiQC modules to use with -m for a faster run time - """ - multiqc -f $rtitle $rfilename --config $multiqc_config . + echo "foo" > software_versions_mqc.yaml """ } - /* * STEP 3 - Output Description HTML */ @@ -410,3 +602,8 @@ def checkHostname(){ } } } + +// Check file extension +def hasExtension(it, extension) { + it.toString().toLowerCase().endsWith(extension.toLowerCase()) +} diff --git a/nextflow.config b/nextflow.config index bd585a6..b337591 100644 --- a/nextflow.config +++ b/nextflow.config @@ -38,7 +38,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/proteomicslfq:dev' +process.container = 'openms/executables' // Load base.config by default for all pipelines includeConfig 'conf/base.config' From f35e17bc3e3630ad145582294b1c721ccda8a23c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 13:39:02 +0100 Subject: [PATCH 003/374] Deactivate nfcore docker for now. --- .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 23265b8..a3e44c8 100644 --- a/.travis.yml +++ b/.travis.yml @@ -11,10 +11,10 @@ before_install: # PRs to master are only ok if coming from dev branch - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])' # Pull the docker image first so the test doesn't wait for this - - docker pull nfcore/proteomicslfq:dev + #- docker pull nfcore/proteomicslfq:dev # Fake the tag locally so that the pipeline runs properly # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + #- docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev install: # Install Nextflow From bd021427018973a3254813e7e0b8ed073b75c5ab Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 13:55:44 +0100 Subject: [PATCH 004/374] Modified test profile --- conf/test.config | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/conf/test.config b/conf/test.config index 6d1c793..16539bd 100644 --- a/conf/test.config +++ b/conf/test.config @@ -13,11 +13,15 @@ params { max_memory = 6.GB max_time = 48.h // Input data - // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets // TODO nf-core: Give any required params for the test so that command line flags are not needed - singleEnd = false - readPaths = [ - ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], - ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] + spectra = [ + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F1.mzML', + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F2.mzML', + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA2_F1.mzML', + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA2_F2.mzML', + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA3_F1.mzML', + 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA3_F2.mzML' ] + database = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' + expdesign = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA_design.tsv' } From 69680036b10fc141100e237800f892631f830fef Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 14:10:51 +0100 Subject: [PATCH 005/374] Support both wildcard and list --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index ba3af60..9f20d6f 100644 --- a/main.nf +++ b/main.nf @@ -64,7 +64,7 @@ ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") */ if (params.spectra) { - raw = hasExtension(params.spectra, 'raw') + raw = hasExtension(params.spectra.first(), 'raw') if (raw){ Channel .fromPath(params.spectra) @@ -88,7 +88,7 @@ if (params.spectra) """ } } - else if (hasExtension(params.spectra, 'mzML')) { + else if (hasExtension(params.spectra.first(), 'mzML')) { Channel .fromPath(params.spectra) .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } From d9c0fdd82cea78f6326641775e0793602af35a68 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 14:24:50 +0100 Subject: [PATCH 006/374] support both list and wildcard? --- main.nf | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 9f20d6f..8315e77 100644 --- a/main.nf +++ b/main.nf @@ -64,9 +64,18 @@ ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") */ if (params.spectra) { - raw = hasExtension(params.spectra.first(), 'raw') - if (raw){ - Channel + if (params.spectra instanceof String) { + in_is_raw = hasExtension(params.spectra, 'raw') + in_is_mzml = hasExtension(params.spectra, 'mzML') + } else if (params.spectra instanceof List){ + in_is_raw = hasExtension(params.spectra.first(), 'raw') + in_is_mzml = hasExtension(params.spectra.first(), 'mzML') + } else { + log.error "Specify list or wildcard string" + } + + if (in_is_raw){ + channel .fromPath(params.spectra) .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } .into { rawfiles } @@ -80,7 +89,7 @@ if (params.spectra) file "*.mzML" into mzmls, mzmls_plfq when: - raw + in_is_raw script: """ @@ -88,7 +97,7 @@ if (params.spectra) """ } } - else if (hasExtension(params.spectra.first(), 'mzML')) { + else if (in_is_mzml) { Channel .fromPath(params.spectra) .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } From dcc9ac93a47e25514277f3de45b4e1c2fd761282 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 15:01:21 +0100 Subject: [PATCH 007/374] Switched to working nf release for travis --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index a3e44c8..6802fe3 100644 --- a/.travis.yml +++ b/.travis.yml @@ -30,7 +30,7 @@ install: - sudo apt-get install npm && npm install -g markdownlint-cli env: - - NXF_VER='0.32.0' # Specify a minimum NF version that should be tested and work + - NXF_VER='19.10.0' # Specify a minimum NF version that should be tested and work - NXF_VER='' # Plus: get the latest NF version and check that it works script: From c6f9ff8aad8287f2168cdfa563e9e78df7344db4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 15:19:38 +0100 Subject: [PATCH 008/374] Comment out html generation. Needs R --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 8315e77..745c260 100644 --- a/main.nf +++ b/main.nf @@ -447,7 +447,7 @@ process get_software_versions { /* * STEP 3 - Output Description HTML */ -process output_documentation { +/*process output_documentation { publishDir "${params.outdir}/pipeline_info", mode: 'copy' input: @@ -461,7 +461,7 @@ process output_documentation { markdown_to_html.r $output_docs results_description.html """ } - +*/ /* From e3c83ba055cfc9aa126c8539b0328800ba72b1c5 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 15:21:24 +0100 Subject: [PATCH 009/374] put docker pull in beginning as suggested --- .travis.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.travis.yml b/.travis.yml index 6802fe3..8faffd9 100644 --- a/.travis.yml +++ b/.travis.yml @@ -12,6 +12,7 @@ before_install: - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])' # Pull the docker image first so the test doesn't wait for this #- docker pull nfcore/proteomicslfq:dev + - docker pull openms/executables # Fake the tag locally so that the pipeline runs properly # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) #- docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev From fcc12d59dc7fdfb3332b4baf2c4b5ef91102bed4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 15:40:36 +0100 Subject: [PATCH 010/374] DEactivate ANSI feature of nf --- .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 8faffd9..f4ffb2d 100644 --- a/.travis.yml +++ b/.travis.yml @@ -31,8 +31,8 @@ install: - sudo apt-get install npm && npm install -g markdownlint-cli env: - - NXF_VER='19.10.0' # Specify a minimum NF version that should be tested and work - - NXF_VER='' # Plus: get the latest NF version and check that it works + - NXF_VER='19.10.0' NXF_ANSI_LOG=0 # Specify a minimum NF version that should be tested and work + - NXF_VER='' NXF_ANSI_LOG=0 # Plus: get the latest NF version and check that it works script: # Lint the pipeline code From 3d4c37ba243ebad7e3b5998336d9372057f69e07 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 15:52:13 +0100 Subject: [PATCH 011/374] Added gh actions --- .github/workflows/main.yml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 .github/workflows/main.yml diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..1c931ab --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,20 @@ +name: nf-core proteomicslfq CI +#This workflow is triggered on pushes and PRs to the repository. +on: [push, pull_request] + +jobs: + github_actions_ci: + runs-on: ubuntu-latest + env: + NXF_ANSI_LOG: 0 + steps: + - uses: actions/checkout@v1 + - name: Docker pull OpenMS image + run: docker pull openms/executables + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + - name: BASIC Run the basic pipeline with the test + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker \ No newline at end of file From a2189b862af36fdae7a9f94a790c70be705d710c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 16:25:44 +0100 Subject: [PATCH 012/374] Typo --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 745c260..dfb7eb1 100644 --- a/main.nf +++ b/main.nf @@ -75,7 +75,7 @@ if (params.spectra) } if (in_is_raw){ - channel + Channel .fromPath(params.spectra) .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } .into { rawfiles } From b917f36e26b6755de9e1b91e5b73e15dfada64f6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 20:03:43 +0100 Subject: [PATCH 013/374] Added @apeltzer s features --- main.nf | 130 ++++++++++++++++++++++++++------------------------------ 1 file changed, 60 insertions(+), 70 deletions(-) diff --git a/main.nf b/main.nf index dfb7eb1..2910ca6 100644 --- a/main.nf +++ b/main.nf @@ -29,6 +29,7 @@ def helpMessage() { Options: --expdesign Path to experimental design file + --adddecoys Add decoys to the given fasta Other options: --outdir The output directory where the results will be saved @@ -62,55 +63,45 @@ ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") /* * Create a channel for input read files */ -if (params.spectra) -{ - if (params.spectra instanceof String) { - in_is_raw = hasExtension(params.spectra, 'raw') - in_is_mzml = hasExtension(params.spectra, 'mzML') - } else if (params.spectra instanceof List){ - in_is_raw = hasExtension(params.spectra.first(), 'raw') - in_is_mzml = hasExtension(params.spectra.first(), 'mzML') - } else { - log.error "Specify list or wildcard string" - } - if (in_is_raw){ - Channel - .fromPath(params.spectra) - .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } - .into { rawfiles } +ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) +if (!params.spectra) { exit 1, "Please provide --spectra as input!" } - process raw_file_conversion { +//use a branch operator for this sort of thing and access the files accordingly! - input: - file rawfile from rawfiles +ch_spectra +.branch { + raw: hasExtension(it, 'raw') + mzML_for_mix: hasExtension(it, 'mzML') +} +.set {branched_input} - output: - file "*.mzML" into mzmls, mzmls_plfq +//Push raw files through process that does the conversion, everything else directly to downstream Channel with mzMLs - when: - in_is_raw - - script: - """ - mv ${rawfile} ${rawfile.baseName}.mzML - """ - } - } - else if (in_is_mzml) { - Channel - .fromPath(params.spectra) - .ifEmpty { exit 1, "params.spectra was empty - no input files supplied" } - .into { mzmls; mzmls_plfq} - } - else { - log.error "Unsupported spectra file type" - } -} -else { - log.error "No spectra provided" + +//This piece only runs on data that is a.) raw and b.) needs conversion +//mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) + + +process raw_file_conversion { + + input: + file rawfile from branched_input.raw + + output: + file "*.mzML" into mzmls_converted + + script: + """ + mv ${rawfile} ${rawfile.baseName}.mzML + """ } +//Mix the converted raw data with the already supplied mzMLs and push these to the same channels as before + +branched_input.mzML_for_mix.mix(mzmls_converted).into{mzmls; mzmls_plfq} + + if (params.expdesign) { Channel @@ -120,34 +111,33 @@ if (params.expdesign) } +//Create channel from database, then depending on when add decoys or not +Channel.fromPath(params.database).set{ db_for_decoy_creation } -if (params.database) { - Channel - .fromPath(params.database) - .ifEmpty { exit 1, "params.database was empty - no input files supplied" } - .into { searchengine_in_db; pepidx_in_db; plfq_in_db } -} -else { - //WHY IS THE WHEN: NOT ENOUGH?? - process generate_decoy_database { +//Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. +(searchengine_in_db, pepidx_in_db, plfq_in_db) = ( params.adddecoys + ? [ Channel.empty(), Channel.empty(), Channel.empty() ] + : [ Channel.fromPath(params.database),Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) - input: - file mydatabase from params.database +//Add decoys if params.adddecoys is set appropriately +process generate_decoy_database { - output: - file "${database.baseName}_decoy.fasta" into searchengine_in_db, pepidx_in_db, plfq_in_db - - when: - !params.database - - script: - """ - DecoyDatabase -in ${mydatabase} \\ - -out ${mydatabase.baseName}_decoy.fasta \\ - -decoy_string DECOY_ \\ - -decoy_string_position prefix - """ - } +input: + file(mydatabase) from db_for_decoy_creation + +output: + file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy + //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... + +when: params.adddecoys + +script: + """ + DecoyDatabase -in ${mydatabase} \\ + -out ${mydatabase.baseName}_decoy.fasta \\ + -decoy_string DECOY_ \\ + -decoy_string_position prefix + """ } @@ -195,7 +185,7 @@ else { process search_engine { input: - file database from searchengine_in_db.first() + file database from searchengine_in_db.mix(searchengine_in_db_decoy) file mzml_file from mzmls output: @@ -234,7 +224,7 @@ process index_peptides { input: file id_file from id_files - file database from pepidx_in_db.first() + file database from pepidx_in_db.mix(pepidx_in_db_decoy) output: file "${id_file.baseName}_idx.idXML" into id_files_idx @@ -351,7 +341,7 @@ process proteomicslfq { file mzmls from mzmls_plfq.collect() file id_files from id_files_idx_feat_perc_fdr_filter_switched.collect() file expdes from expdesign - file fasta from plfq_in_db + file fasta from plfq_in_db.mix(plfq_in_db_decoy) output: file "out.mzTab" into out_mzTab From 1bf3f35d780b925fd4e5d290e35dc763c128691a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 22:19:37 +0100 Subject: [PATCH 014/374] Wrong variables used in scripts --- main.nf | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/main.nf b/main.nf index 2910ca6..d5feaf5 100644 --- a/main.nf +++ b/main.nf @@ -267,8 +267,8 @@ process percolator { script: """ - PercolatorAdapter -in ${id_file_idx_feat} \\ - -out ${id_file_idx_feat.baseName}_perc.idXML \\ + PercolatorAdapter -in ${id_file} \\ + -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ -post-processing-tdc -subset-max-train 100000 """ @@ -285,8 +285,8 @@ process fdr { script: """ - FalseDiscoveryRate -in ${id_file_idx_feat_perc} \\ - -out ${id_file_idx_feat_perc.baseName}_fdr.idXML \\ + FalseDiscoveryRate -in ${id_file} \\ + -out ${id_file.baseName}_fdr.idXML \\ -threads ${task.cpus} \\ -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins """ @@ -305,8 +305,8 @@ process idfilter { script: """ - IDFilter -in ${id_file_idx_feat_perc_fdr} \\ - -out ${id_file_idx_feat_perc_fdr.baseName}_filter.idXML \\ + IDFilter -in ${id_file} \\ + -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ -score:pep 0.05 """ @@ -324,8 +324,8 @@ process idscoreswitcher { script: """ - IDFilter -in ${id_file_idx_feat_perc_fdr_filter} \\ - -out ${id_file_idx_feat_perc_fdr_filter.baseName}_switched.idXML \\ + IDFilter -in ${id_file} \\ + -out ${id_file.baseName}_switched.idXML \\ -threads ${task.cpus} \\ -score:pep 0.05 -old_score q-value -new_score MS:1001493 -new_score_orientation lower_better -new_score_type "Posterior Error Probability" From ba26982e31f630c028c4fc69f5ffaf2d23fe4704 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 22:27:29 +0100 Subject: [PATCH 015/374] Percolator decoy pattern --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index d5feaf5..7c7c2e6 100644 --- a/main.nf +++ b/main.nf @@ -270,7 +270,7 @@ process percolator { PercolatorAdapter -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ - -post-processing-tdc -subset-max-train 100000 + -post-processing-tdc -subset-max-train 100000 -decoy-pattern "DECOY_" """ } From b5fa7f5092c9b5987776e1abb35238680d5365c3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jan 2020 22:42:05 +0100 Subject: [PATCH 016/374] Percolator decoy pattern2 --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 7c7c2e6..51eba5a 100644 --- a/main.nf +++ b/main.nf @@ -270,7 +270,7 @@ process percolator { PercolatorAdapter -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ - -post-processing-tdc -subset-max-train 100000 -decoy-pattern "DECOY_" + -post-processing-tdc -subset-max-train 100000 -decoy-pattern "rev" """ } From a4ee0a569f07f56bd1b35fbdd30942369c273e0c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 01:11:13 +0100 Subject: [PATCH 017/374] upload artifact? --- .github/workflows/main.yml | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 1c931ab..14e29be 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -17,4 +17,9 @@ jobs: sudo mv nextflow /usr/local/bin/ - name: BASIC Run the basic pipeline with the test run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker \ No newline at end of file + nextflow run ${GITHUB_WORKSPACE} -profile test,docker + + - uses: actions/upload-artifact@v1 + with: + name: workspace + path: /home/runner/work/proteomicslfq/proteomicslfq/work/ \ No newline at end of file From bc65fff303e26df702904469acc380af12db0f17 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 01:19:55 +0100 Subject: [PATCH 018/374] wtf --- .github/workflows/main.yml | 5 ----- 1 file changed, 5 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 14e29be..8972b98 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -18,8 +18,3 @@ jobs: - name: BASIC Run the basic pipeline with the test run: | nextflow run ${GITHUB_WORKSPACE} -profile test,docker - - - uses: actions/upload-artifact@v1 - with: - name: workspace - path: /home/runner/work/proteomicslfq/proteomicslfq/work/ \ No newline at end of file From 60db7933ffdccb646567412faeb60f72055293b1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 01:22:33 +0100 Subject: [PATCH 019/374] try again --- .github/workflows/main.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 8972b98..b3ee802 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -18,3 +18,7 @@ jobs: - name: BASIC Run the basic pipeline with the test run: | nextflow run ${GITHUB_WORKSPACE} -profile test,docker + - uses: actions/upload-artifact@v1 + with: + name: workspace + path: work/ \ No newline at end of file From 0e960014de02f31617685c666d286e7dc6b469c9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 01:24:13 +0100 Subject: [PATCH 020/374] wtff gh actions does not allow uploading --- .github/workflows/main.yml | 4 ---- 1 file changed, 4 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index b3ee802..8972b98 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -18,7 +18,3 @@ jobs: - name: BASIC Run the basic pipeline with the test run: | nextflow run ${GITHUB_WORKSPACE} -profile test,docker - - uses: actions/upload-artifact@v1 - with: - name: workspace - path: work/ \ No newline at end of file From 7b37b592e16a523eaa0e21bf42f05f63d9165e8c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 02:04:58 +0100 Subject: [PATCH 021/374] add debug output for previous steps --- main.nf | 29 ++++++----------------------- 1 file changed, 6 insertions(+), 23 deletions(-) diff --git a/main.nf b/main.nf index 51eba5a..05ced8a 100644 --- a/main.nf +++ b/main.nf @@ -183,7 +183,7 @@ script: /// Search engine // TODO parameterize process search_engine { - + echo true input: file database from searchengine_in_db.mix(searchengine_in_db_decoy) file mzml_file from mzmls @@ -193,39 +193,22 @@ process search_engine { script: """ - CometAdapter -in ${mzml_file} \\ + XTandemAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ - + -precursor_mass_tolerance 50 \\ + -missed_cleavages 2 """ - //-precursor_mass_tolerance ${params.precursor_mass_tolerance} \\ - //-fragment_bin_tolerance ${params.fragment_mass_tolerance} \\ - //-fragment_bin_offset ${params.fragment_bin_offset} \\ - //-num_hits ${params.num_hits} \\ - //-digest_mass_range ${params.digest_mass_range} \\ - //-max_variable_mods_in_peptide ${params.number_mods} \\ - //-allowed_missed_cleavages 0 \\ - //-precursor_charge ${params.prec_charge} \\ - //-activation_method ${params.activation_method} \\ - //-use_NL_ions true \\ - //-variable_modifications ${params.variable_mods.tokenize(',').collect { "'${it}'" }.join(" ") } \\ - //-fixed_modifications ${params.fixed_mods.tokenize(',').collect { "'${it}'"}.join(" ")} \\ - //-enzyme '${params.enzyme}' \\ - //-spectrum_batch_size ${params.spectrum_batch_size} \\ - //$a_ions \\ - //$c_ions \\ - //$x_ions \\ - //$z_ions \\ } process index_peptides { - + echo true input: file id_file from id_files file database from pepidx_in_db.mix(pepidx_in_db_decoy) - + output: file "${id_file.baseName}_idx.idXML" into id_files_idx From 6350a7250fb0847a978f84faa5700402b4e0a1be Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jan 2020 16:12:20 +0100 Subject: [PATCH 022/374] Try MSGF, since I installed java on the docker image --- main.nf | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 05ced8a..668d923 100644 --- a/main.nf +++ b/main.nf @@ -193,12 +193,10 @@ process search_engine { script: """ - XTandemAdapter -in ${mzml_file} \\ + MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ - -database ${database} \\ - -precursor_mass_tolerance 50 \\ - -missed_cleavages 2 + -database ${database} """ } From f25622586cbec2646c8e23069da3d65d550cf826 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jan 2020 14:31:10 +0100 Subject: [PATCH 023/374] Added nf tower --- .github/workflows/main.yml | 11 ++++++++++- main.nf | 1 + 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 8972b98..3c9d844 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -7,14 +7,23 @@ jobs: runs-on: ubuntu-latest env: NXF_ANSI_LOG: 0 + TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_TOKEN }} steps: - uses: actions/checkout@v1 - name: Docker pull OpenMS image run: docker pull openms/executables + - name: Extract branch name + shell: bash + run: echo "::set-env name=RUN_NAME::`echo ${GITHUB_REPOSITORY//\//_}`-`echo ${GITHUB_HEAD_REF//\//@} | rev | cut -f1 -d@ | rev`-${{ github.event_name }}-`echo ${GITHUB_SHA} | cut -c1-6`" + id: extract_branch + - name: Determine tower usage + shell: bash + run: echo "::set-env name=TOWER::`[ -z "$TOWER_ACCESS_TOKEN" ] && echo '' || echo '-with-tower'`" + id: tower_usage - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: BASIC Run the basic pipeline with the test run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME-basic" -profile test,docker diff --git a/main.nf b/main.nf index 668d923..e5a8828 100644 --- a/main.nf +++ b/main.nf @@ -193,6 +193,7 @@ process search_engine { script: """ + echo $PATH MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ From d611616b5b1fe038eea5e853db8d861e902e1c2f Mon Sep 17 00:00:00 2001 From: Zethson Date: Sun, 19 Jan 2020 22:32:07 +0100 Subject: [PATCH 024/374] [FEATURE] Some refactoring & first parameters as suggested by J P --- README.md | 2 +- docs/output.md | 34 +------------- docs/usage.md | 116 +++++++++++++++++++++--------------------------- main.nf | 33 +++++++++----- nextflow.config | 15 +++++-- 5 files changed, 88 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index 0a94e86..f33780a 100644 --- a/README.md +++ b/README.md @@ -27,4 +27,4 @@ The nf-core/proteomicslfq pipeline comes with documentation about the pipeline, ## Credits -nf-core/proteomicslfq was originally written by The Heumos Brothers - Simon and Lukas. +nf-core/proteomicslfq was originally written by [Julianus Pfeuffer](https://github.com/jpfeuffer), [Lukas Heumos](github.com/zethson), [Timo Sachsenberg](https://github.com/timosachsenberg) and [Leon Bichmann](https://github.com/Leon-Bichmann). diff --git a/docs/output.md b/docs/output.md index be99ac1..b9bfd25 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,6 +1,6 @@ # nf-core/proteomicslfq: Output -This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +This document describes the output produced by the pipeline. @@ -8,34 +8,4 @@ This document describes the output produced by the pipeline. Most of the plots a The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* [FastQC](#fastqc) - read quality control -* [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline - -## FastQC -[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences. - -For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). - -> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory. - -**Output directory: `results/fastqc`** - -* `sample_fastqc.html` - * FastQC report, containing quality metrics for your untrimmed raw fastq files -* `zips/sample_fastqc.zip` - * zip file containing the FastQC report, tab-delimited data file and plot images - - -## MultiQC -[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. - -The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. - -**Output directory: `results/multiqc`** - -* `Project_multiqc_report.html` - * MultiQC report - a standalone HTML file that can be viewed in your web browser -* `Project_multiqc_data/` - * Directory containing parsed statistics from the different tools used in the pipeline - -For more information about how to use MultiQC reports, see http://multiqc.info +* diff --git a/docs/usage.md b/docs/usage.md index 39ef37c..4d4126f 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -10,13 +10,17 @@ * [Updating the pipeline](#updating-the-pipeline) * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) + * [`--spectra`](#--spectra) + * [`--database`](#--database) * [`-profile`](#-profile) - * [`--reads`](#--reads) - * [`--singleEnd`](#--singleend) -* [Reference genomes](#reference-genomes) - * [`--genome` (using iGenomes)](#--genome-using-igenomes) - * [`--fasta`](#--fasta) - * [`--igenomesIgnore`](#--igenomesignore) +* [Mass Spectrometry Search](#Mass-Spectrometry-Search) + * [`--precursor_mass_tolerance`](#--precursor_mass_tolerance) + * [`--enzyme`](#--enzyme) + * [`--fixed_mods`](#--fixed_mods) + * [`--variable_mods`](#--variable_mods) + * [`--allowed_missed cleavages`](#--allowed_missed_cleavages) + * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) + * [`--protein_level_fdr_cutoff](#--protein_level_fdr_cutoff) * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) @@ -55,7 +59,7 @@ NXF_OPTS='-Xms1g -Xmx4g' The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker +nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -86,6 +90,27 @@ This version number will be logged in reports when you run the pipeline, so that ## Main arguments +### `--spectra` + +Use this to specify the location of your input mzML files. For example: + +```bash +--spectra 'path/to/data/*.mzML' +``` + +Please note the following requirements: + +1. The path must be enclosed in quotes +2. The path must have at least one `*` wildcard character + +### `--database` + +If you prefer, you can specify the full path to your fasta input protein database when you run the pipeline: + +```bash +--database '[path to Fasta protein database]' +``` + ### `-profile` Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! @@ -104,82 +129,44 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * Pulls software from DockerHub * `test` * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters - - + * Includes links to test data and therefore doesn't need additional parameters -### `--reads` -Use this to specify the location of your input FastQ files. For example: -```bash ---reads 'path/to/data/sample_*_{1,2}.fastq' -``` - -Please note the following requirements: +## Mass Spectrometry Search -1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character -3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs. +### `--precursor_mass_tolerance` -If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` +Specify the precursor mass tolerance used for the comet database search. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (eg. 5) -### `--singleEnd` -By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--singleEnd` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: +### `--enzyme` -```bash ---singleEnd --reads '*.fastq' -``` +Specify which enzymatic restriction should be applied ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) -It is not possible to run a mixture of single-end and paired-end files in one run. +### `--fixed_mods` +Specify which fixed modifications should be applied to the database search (eg. '' or 'Carbamidomethyl (C)', see OpenMS modifications) -## Reference genomes +### `--variable_mods` -The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. +Specify which variable modifications should be applied to the database search (eg. 'Oxidation (M)', see OpenMS modifications) -### `--genome` (using iGenomes) -There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. +Multiple fixed or variable modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)') -You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: +## `--allowed_missed_cleavages` -* Human - * `--genome GRCh37` -* Mouse - * `--genome GRCm38` -* _Drosophila_ - * `--genome BDGP6` -* _S. cerevisiae_ - * `--genome 'R64-1-1'` +Specify the number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if the no-enzyme option is specified for comet. -> There are numerous others - check the config file for more. +## `--psm_level_fdr_cutoff` -Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. +Specify the PSM level cutoff for the identification FDR for IDFilter. -The syntax for this reference configuration is as follows: +## `--protein_level_fdr_cutoff` - +Specify the protein level cutoff for the identification FDR of PLFQ -```nextflow -params { - genomes { - 'GRCh37' { - fasta = '' // Used if no star index given - } - // Any number of additional genomes, key is used with --genome - } -} -``` +Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. -### `--fasta` -If you prefer, you can specify the full path to your reference genome when you run the pipeline: - -```bash ---fasta '[path to Fasta reference]' -``` - -### `--igenomesIgnore` -Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. ## Job resources ### Automatic resubmission @@ -277,6 +264,3 @@ Set to receive plain-text e-mails instead of HTML formatted. ### `--monochrome_logs` Set to disable colourful command line output and live life in monochrome. - -### `--multiqc_config` -Specify a path to a custom MultiQC configuration file. diff --git a/main.nf b/main.nf index e5a8828..b73c9e4 100644 --- a/main.nf +++ b/main.nf @@ -11,7 +11,6 @@ def helpMessage() { - // TODO nf-core: Add to this help message with new command line parameters log.info nfcoreHeader() log.info""" @@ -19,7 +18,7 @@ def helpMessage() { The typical command for running the pipeline is as follows: - nextflow run nf-core/proteomicslfq --reads '*.mzML' -profile docker + nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker Mandatory arguments: --spectra Path to input spectra as mzML or Thermo Raw @@ -27,6 +26,15 @@ def helpMessage() { -profile Configuration profile to use. Can use multiple (comma separated) Available: conda, docker, singularity, awsbatch, test and more. + Mass Spectrometry Search: + --enzyme Enzymatic cleavage ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) + --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) + --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) + --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) + --allowed_missed_cleavages Allowed missed cleavages + --psm_level_fdr_cutoff Identification PSM-level FDR + --protein_level_fdr_cutoff Identification protein-level FDR + Options: --expdesign Path to experimental design file --adddecoys Add decoys to the given fasta @@ -43,7 +51,7 @@ def helpMessage() { * SET UP CONFIGURATION VARIABLES */ -// Show help emssage +// Show help message if (params.help){ helpMessage() exit 0 @@ -57,15 +65,21 @@ if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ } // Stage config files -//ch_multiqc_config = Channel.fromPath(params.multiqc_config) ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") +// Validate inputs +params.spectra = params.spectra ?: { log.error "No read data privided. Make sure you have used the '--spectra' option."; exit 1 }() +params.database = params.database ?: { log.error "No read data privided. Make sure you have used the '--database' option."; exit 1 }() +// params.expdesign = params.expdesign ?: { log.error "No read data privided. Make sure you have used the '--design' option."; exit 1 }() +params.outdir = params.outdir ?: { log.warn "No output directory provided. Will put the results into './results'"; return "./results" }() + /* * Create a channel for input read files */ ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) -if (!params.spectra) { exit 1, "Please provide --spectra as input!" } +ch_database = Channel.fromPath(params.database).set{ db_for_decoy_creation } +// ch_expdesign = Channel.fromPath(params.design, checkIfExists: true) //use a branch operator for this sort of thing and access the files accordingly! @@ -82,7 +96,9 @@ ch_spectra //This piece only runs on data that is a.) raw and b.) needs conversion //mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) - +/* + * STEP 1 - Raw file conversion + */ process raw_file_conversion { input: @@ -107,13 +123,10 @@ if (params.expdesign) Channel .fromPath(params.expdesign) .ifEmpty { exit 1, "params.expdesign was empty - no input files supplied" } - .into { expdesign } + .set { expdesign } } -//Create channel from database, then depending on when add decoys or not -Channel.fromPath(params.database).set{ db_for_decoy_creation } - //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db, pepidx_in_db, plfq_in_db) = ( params.adddecoys ? [ Channel.empty(), Channel.empty(), Channel.empty() ] diff --git a/nextflow.config b/nextflow.config index b337591..9d9ac26 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,13 +10,13 @@ params { // Workflow flags // TODO nf-core: Specify your pipeline's command line flags - reads = "data/*{1,2}.fastq.gz" - singleEnd = false + spectra = "data/*.mzML" + database = "data/*.fasta" + //expdesign = "data/*.tsv" outdir = './results' // Boilerplate options name = false - multiqc_config = "$baseDir/conf/multiqc_config.yaml" email = false maxMultiqcEmailFileSize = 25.MB plaintext_email = false @@ -34,6 +34,15 @@ params { config_profile_description = false config_profile_contact = false config_profile_url = false + + //workflow defaults + precursor_mass_tolerance = 5 + enzyme = 'unspecific cleavage' + fixed_mods = '' + variable_mods = 'Oxidation (M)' + allowed_missed_cleavages = 0 + psm_level_fdr_cutoff = 0.05 + protein_level_fdr_cutoff = 0.05 } // Container slug. Stable releases should specify release tag! From b4c15a18f7a260b9bc3523723d1b02f5fc91c81d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 20 Jan 2020 14:33:36 +0100 Subject: [PATCH 025/374] Fixes some logic with reusing the database. Introduce skipping Percolator as a hack to finally get to the actual ProteomicsLFQ step. --- conf/test.config | 1 + main.nf | 110 +++++++++++++++++++++++++++++++++++++++-------- nextflow.config | 2 + 3 files changed, 94 insertions(+), 19 deletions(-) diff --git a/conf/test.config b/conf/test.config index 16539bd..29cd4fc 100644 --- a/conf/test.config +++ b/conf/test.config @@ -24,4 +24,5 @@ params { ] database = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' expdesign = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA_design.tsv' + skipPercolator = true } diff --git a/main.nf b/main.nf index b73c9e4..37288f6 100644 --- a/main.nf +++ b/main.nf @@ -68,8 +68,8 @@ if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") // Validate inputs -params.spectra = params.spectra ?: { log.error "No read data privided. Make sure you have used the '--spectra' option."; exit 1 }() -params.database = params.database ?: { log.error "No read data privided. Make sure you have used the '--database' option."; exit 1 }() +params.spectra = params.spectra ?: { log.error "No spectra data provided. Make sure you have used the '--spectra' option."; exit 1 }() +params.database = params.database ?: { log.error "No protein database provided. Make sure you have used the '--database' option."; exit 1 }() // params.expdesign = params.expdesign ?: { log.error "No read data privided. Make sure you have used the '--design' option."; exit 1 }() params.outdir = params.outdir ?: { log.warn "No output directory provided. Will put the results into './results'"; return "./results" }() @@ -86,10 +86,31 @@ ch_database = Channel.fromPath(params.database).set{ db_for_decoy_creation } ch_spectra .branch { raw: hasExtension(it, 'raw') - mzML_for_mix: hasExtension(it, 'mzML') + mzML: hasExtension(it, 'mzML') } .set {branched_input} + +//TODO we could also check for outdated mzML versions and try to update them +branched_input.mzML +.branch { + nonIndexedMzML: file(it).withReader { + f = it; + 1.upto(5) { + if (f.readLine().contains("indexedmzML")) return false; + } + return true; + } + inputIndexedMzML: file(it).withReader { + f = it; + 1.upto(5) { + if (f.readLine().contains("indexedmzML")) return true; + } + return false; + } +} +.set {branched_input_mzMLs} + //Push raw files through process that does the conversion, everything else directly to downstream Channel with mzMLs @@ -97,7 +118,7 @@ ch_spectra //mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) /* - * STEP 1 - Raw file conversion + * STEP 0.1 - Raw file conversion */ process raw_file_conversion { @@ -107,15 +128,34 @@ process raw_file_conversion { output: file "*.mzML" into mzmls_converted + // TODO use actual ThermoRawfileConverter!! script: """ mv ${rawfile} ${rawfile.baseName}.mzML """ } +/* + * STEP 0.2 - MzML indexing + */ +process mzml_indexing { + + input: + file mzmlfile from branched_input_mzMLs.nonIndexedMzML + + output: + file "out/*.mzML" into mzmls_indexed + + script: + """ + mkdir out + FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML + """ +} + //Mix the converted raw data with the already supplied mzMLs and push these to the same channels as before -branched_input.mzML_for_mix.mix(mzmls_converted).into{mzmls; mzmls_plfq} +branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls; mzmls_plfq} if (params.expdesign) @@ -195,7 +235,9 @@ script: /// Search engine // TODO parameterize -process search_engine { +if (params.se == "msgf") +{ + process search_engine_msgf { echo true input: file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -212,17 +254,38 @@ process search_engine { -threads ${task.cpus} \\ -database ${database} """ + } +} else { + process search_engine_comet { + echo true + input: + file database from searchengine_in_db.mix(searchengine_in_db_decoy) + each file(mzml_file) from mzmls + + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + echo $PATH + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ + } } + process index_peptides { echo true input: - file id_file from id_files + each file(id_file) from id_files file database from pepidx_in_db.mix(pepidx_in_db_decoy) output: - file "${id_file.baseName}_idx.idXML" into id_files_idx + file "${id_file.baseName}_idx.idXML" into id_files_idx, id_files_idx_2 script: """ @@ -242,6 +305,9 @@ process extract_perc_features { output: file "${id_file.baseName}_feat.idXML" into id_files_idx_feat + when: + !params.skipPercolator + script: """ PSMFeatureExtractor -in ${id_file} \\ @@ -251,7 +317,7 @@ process extract_perc_features { } -//TODO parameterize +//TODO parameterize and find a way to run across all runs merged process percolator { input: @@ -260,6 +326,9 @@ process percolator { output: file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc + when: + !params.skipPercolator + script: """ PercolatorAdapter -in ${id_file} \\ @@ -270,10 +339,11 @@ process percolator { } +//TODO probably not needed when using Percolator. You can use the qval from there process fdr { input: - file id_file from id_files_idx_feat_perc + file id_file from id_files_idx_feat_perc.mix(id_files_idx_2) output: file "${id_file.baseName}_fdr.idXML" into id_files_idx_feat_perc_fdr @@ -283,7 +353,7 @@ process fdr { FalseDiscoveryRate -in ${id_file} \\ -out ${id_file.baseName}_fdr.idXML \\ -threads ${task.cpus} \\ - -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins + -protein false -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins """ } @@ -296,7 +366,7 @@ process idfilter { file id_file from id_files_idx_feat_perc_fdr output: - file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_fdr_filter + file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_fdr_filter, id_files_idx_feat_perc_fdr_filter_2 script: """ @@ -317,12 +387,14 @@ process idscoreswitcher { output: file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + when: + !params.skipPercolator + script: """ - IDFilter -in ${id_file} \\ + IDScoreSwitcher -in ${id_file} \\ -out ${id_file.baseName}_switched.idXML \\ -threads ${task.cpus} \\ - -score:pep 0.05 -old_score q-value -new_score MS:1001493 -new_score_orientation lower_better -new_score_type "Posterior Error Probability" """ @@ -334,7 +406,7 @@ process proteomicslfq { input: file mzmls from mzmls_plfq.collect() - file id_files from id_files_idx_feat_perc_fdr_filter_switched.collect() + file id_files from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_feat_perc_fdr_filter_2).collect() file expdes from expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) @@ -344,11 +416,11 @@ process proteomicslfq { file "out.csv" into out_msstats script: - id_files_str = id_files.sort().join(' ') - mzmls_str = mzmls.sort().join(' ') + //id_files_str = id_files.sort().join(' ') + //mzmls_str = mzmls.sort().join(' ') """ - ProteomicsLFQ -in ${mzmls_str} - -ids ${id_files_str} \\ + ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ + -ids ${(id_files as List).join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ -targeted_only "true" \\ diff --git a/nextflow.config b/nextflow.config index 9d9ac26..0295a65 100644 --- a/nextflow.config +++ b/nextflow.config @@ -13,6 +13,8 @@ params { spectra = "data/*.mzML" database = "data/*.fasta" //expdesign = "data/*.tsv" + adddecoys = false + se = "comet" outdir = './results' // Boilerplate options From 243519c87c92c27219dfed897d2b4a195d1d5ab0 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 20 Jan 2020 16:20:04 +0100 Subject: [PATCH 026/374] sort inputs to proteomics lfq --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 37288f6..e2e8270 100644 --- a/main.nf +++ b/main.nf @@ -419,8 +419,8 @@ process proteomicslfq { //id_files_str = id_files.sort().join(' ') //mzmls_str = mzmls.sort().join(' ') """ - ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ - -ids ${(id_files as List).join(' ')} \\ + ProteomicsLFQ -in ${(mzmls as List).sort().join(' ')} \\ + -ids ${(id_files as List).sort().join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ -targeted_only "true" \\ From ba4d93b228ff1a4d36d2b3a6a0a33bc554b456e6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 20 Jan 2020 20:06:48 +0100 Subject: [PATCH 027/374] Sort inputs for plfq --- main.nf | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/main.nf b/main.nf index e2e8270..21d4870 100644 --- a/main.nf +++ b/main.nf @@ -241,14 +241,13 @@ if (params.se == "msgf") echo true input: file database from searchengine_in_db.mix(searchengine_in_db_decoy) - file mzml_file from mzmls + each file(mzml_file) from mzmls output: file "${mzml_file.baseName}.idXML" into id_files script: """ - echo $PATH MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ @@ -267,7 +266,6 @@ if (params.se == "msgf") script: """ - echo $PATH CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ @@ -405,8 +403,8 @@ process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: - file mzmls from mzmls_plfq.collect() - file id_files from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_feat_perc_fdr_filter_2).collect() + file mzmls from mzmls_plfq.toSortedList({ a, b -> b.baseName <=> a.baseName }).view() + file id_files from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_feat_perc_fdr_filter_2).toSortedList({ a, b -> b.baseName <=> a.baseName }).view() file expdes from expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) @@ -416,11 +414,9 @@ process proteomicslfq { file "out.csv" into out_msstats script: - //id_files_str = id_files.sort().join(' ') - //mzmls_str = mzmls.sort().join(' ') """ - ProteomicsLFQ -in ${(mzmls as List).sort().join(' ')} \\ - -ids ${(id_files as List).sort().join(' ')} \\ + ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ + -ids ${(id_files as List).join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ -targeted_only "true" \\ From c5f800a9069615872d75c5c03fe99cb490d6ad88 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 24 Jan 2020 21:57:33 +0100 Subject: [PATCH 028/374] More parameterization. Added IDPEP branch. --- conf/test.config | 3 +- main.nf | 377 ++++++++++++++++++++++++++++++++++------------- nextflow.config | 28 ++-- 3 files changed, 295 insertions(+), 113 deletions(-) diff --git a/conf/test.config b/conf/test.config index 29cd4fc..2c3df32 100644 --- a/conf/test.config +++ b/conf/test.config @@ -24,5 +24,6 @@ params { ] database = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' expdesign = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA_design.tsv' - skipPercolator = true + posterior_probabilities = "fit_distributions" + search_engine = "msgf" } diff --git a/main.nf b/main.nf index 21d4870..5f7620a 100644 --- a/main.nf +++ b/main.nf @@ -23,26 +23,51 @@ def helpMessage() { Mandatory arguments: --spectra Path to input spectra as mzML or Thermo Raw --database Path to input protein database as fasta - -profile Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, awsbatch, test and more. - Mass Spectrometry Search: + + Database Search: + --search_engine Which search engine: "comet" or "msgf" --enzyme Enzymatic cleavage ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) --allowed_missed_cleavages Allowed missed cleavages - --psm_level_fdr_cutoff Identification PSM-level FDR - --protein_level_fdr_cutoff Identification protein-level FDR - - Options: - --expdesign Path to experimental design file - --adddecoys Add decoys to the given fasta - - Other options: + --psm_level_fdr_cutoff Identification PSM-level FDR cutoff + --posterior_probabilities How to calculate posterior probabilities for PSMs: + "percolator" = Re-score based on PSM-feature-based SVM and transform distance + to hyperplane for posteriors + "fit_distributions" = Fit positive and negative distributions to scores + (similar to PeptideProphet) + + + Inference: + --protein_inference Infer proteins through: + "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) + "bayesian" = computes a posterior probability for every protein based on a Bayesian network + --protein_level_fdr_cutoff Identification protein-level FDR cutoff + + Quantification: + --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: + increased memory consumption) + --targeted_only Only ID based quantification + --mass_recalibration Recalibrates masses to correct for instrument biases + --protein_quantification Quantify proteins based on: + "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) + "strictly_unique_peptides" = use peptides mapping to a unique single protein only + "shared_peptides" = use shared peptides only for its best group (by inference score) + + General Options: + --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) + --add_decoys Add decoys to the given fasta + + Other nextflow options: --outdir The output directory where the results will be saved - --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. + --email Set this parameter to your e-mail address to get a summary e-mail with details of the + run sent to you when the workflow exits + -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random + mnemonic. + -profile Configuration profile to use. Can use multiple (comma separated) + Available: conda, docker, singularity, awsbatch, test and more. """.stripIndent() } @@ -117,6 +142,18 @@ branched_input.mzML //This piece only runs on data that is a.) raw and b.) needs conversion //mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) + +//GENERAL TODOS +// - Check why we depend on full filepaths and if that is needed +/* Proposition from nextflow gitter https://gitter.im/nextflow-io/nextflow?at=5e25fabea259cb0f0607a1a1 +* +* unless the specific filenames are important (depends on the tool you're using), I usually use the pattern outlined here: +* https://www.nextflow.io/docs/latest/process.html#multiple-input-files +* e.g: file "?????.mzML" from mzmls_plfq.toSortedList() and ProteomicsLFQ -in *.mzML -ids *.id +*/ +// - Check how to avoid copying of the database for example (currently we get one copy for each SE run). Is it the +// "each file()" pattern I used? + /* * STEP 0.1 - Raw file conversion */ @@ -168,52 +205,31 @@ if (params.expdesign) //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. -(searchengine_in_db, pepidx_in_db, plfq_in_db) = ( params.adddecoys +(searchengine_in_db, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty() ] : [ Channel.fromPath(params.database),Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) -//Add decoys if params.adddecoys is set appropriately +//Add decoys if params.add_decoys is set appropriately process generate_decoy_database { -input: - file(mydatabase) from db_for_decoy_creation + input: + file(mydatabase) from db_for_decoy_creation -output: - file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy - //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... + output: + file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy + //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... -when: params.adddecoys + when: params.add_decoys -script: - """ - DecoyDatabase -in ${mydatabase} \\ - -out ${mydatabase.baseName}_decoy.fasta \\ - -decoy_string DECOY_ \\ - -decoy_string_position prefix - """ + script: + """ + DecoyDatabase -in ${mydatabase} \\ + -out ${mydatabase.baseName}_decoy.fasta \\ + -decoy_string DECOY_ \\ + -decoy_string_position prefix + """ } - -// Test -//process generate_simple_exp_design_file { -// publishDir "${params.outdir}", mode: 'copy' -// input: -// val mymzmls from mzmls.collect() - -// output: -// file "expdesign.csv" into expdesign - -// when: -// !params.expdesign - -// script: -// strng = mymzmls.join(',') -// """ -// echo ${strng} > expdesign.csv -// """ -//} - - // Doesnt work. Py script needs all the inputs to be together in a folder // Wont work with nextflow. It needs to accept a list of paths for the inputs!! //process generate_simple_exp_design_file { @@ -234,43 +250,53 @@ script: //} /// Search engine -// TODO parameterize -if (params.se == "msgf") +// TODO parameterize more +if (params.search_engine == "msgf") { - process search_engine_msgf { - echo true - input: - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls + search_engine_score = "SpecEValue" - output: - file "${mzml_file.baseName}.idXML" into id_files - - script: - """ - MSGFPlusAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ + process search_engine_msgf { + echo true + input: + tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) + + // This was another way of handling the combination + //file database from searchengine_in_db.mix(searchengine_in_db_decoy) + //each file(mzml_file) from mzmls + + + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + MSGFPlusAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ } + } else { + + search_engine_score = "expect" + process search_engine_comet { - echo true - input: - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls + echo true + input: + file database from searchengine_in_db.mix(searchengine_in_db_decoy) + each file(mzml_file) from mzmls - output: - file "${mzml_file.baseName}.idXML" into id_files - - script: - """ - CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ } } @@ -283,7 +309,7 @@ process index_peptides { file database from pepidx_in_db.mix(pepidx_in_db_decoy) output: - file "${id_file.baseName}_idx.idXML" into id_files_idx, id_files_idx_2 + file "${id_file.baseName}_idx.idXML" into id_files_idx_ForPerc, id_files_idx_ForIDPEP script: """ @@ -295,16 +321,21 @@ process index_peptides { } + +// --------------------------------------------------------------------- +// Branch a) Q-values and PEP from Percolator + + process extract_perc_features { input: - file id_file from id_files_idx + file id_file from id_files_idx_ForPerc output: file "${id_file.baseName}_feat.idXML" into id_files_idx_feat when: - !params.skipPercolator + params.posterior_probabilities == "percolator" script: """ @@ -325,7 +356,7 @@ process percolator { file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc when: - !params.skipPercolator + params.posterior_probabilities == "percolator" script: """ @@ -337,14 +368,68 @@ process percolator { } -//TODO probably not needed when using Percolator. You can use the qval from there +process idfilter { + + publishDir "${params.outdir}/ids", mode: 'copy' + + input: + file id_file from id_files_idx_feat_perc + + output: + file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_filter + + when: + params.posterior_probabilities == "percolator" + + script: + """ + IDFilter -in ${id_file} \\ + -out ${id_file.baseName}_filter.idXML \\ + -threads ${task.cpus} \\ + -score:pep ${params.psm_level_fdr_cutoff} + """ + +} + +process idscoreswitcher { + + input: + file id_file from id_files_idx_feat_perc_filter + + output: + file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + + when: + params.posterior_probabilities == "percolator" + + script: + """ + IDScoreSwitcher -in ${id_file} \\ + -out ${id_file.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + -old_score q-value \\ + -new_score MS:1001493 \\ + -new_score_orientation lower_better \\ + -new_score_type "Posterior Error Probability" + """ + +} + + + +// --------------------------------------------------------------------- +// Branch b) Q-values and PEP from OpenMS + process fdr { input: - file id_file from id_files_idx_feat_perc.mix(id_files_idx_2) + file id_file from id_files_idx_ForIDPEP output: - file "${id_file.baseName}_fdr.idXML" into id_files_idx_feat_perc_fdr + file "${id_file.baseName}_fdr.idXML" into id_files_idx_ForIDPEP_fdr + + when: + params.posterior_probabilities != "percolator" script: """ @@ -356,55 +441,134 @@ process fdr { } +process idscoreswitcher1 { + + input: + file id_file from id_files_idx_ForIDPEP_fdr -// TODO parameterize -process idfilter { + output: + file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch + + when: + params.posterior_probabilities != "percolator" + + script: + """ + IDScoreSwitcher -in ${id_file} \\ + -out ${id_file.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + -old_score q-value \\ + -new_score ${search_engine_score}_score \\ + -new_score_orientation lower_better \\ + -new_score_type ${search_engine_score} + """ + +} + +//TODO probably not needed when using Percolator. You can use the qval from there +process idpep { + + input: + file id_file from id_files_idx_ForIDPEP_fdr_switch + + output: + file "${id_file.baseName}_idpep.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep + + when: + params.posterior_probabilities != "percolator" + + script: + """ + IDPosteriorErrorProbability -in ${id_file} \\ + -out ${id_file.baseName}_idpep.idXML \\ + -threads ${task.cpus} + """ + +} + +process idscoreswitcher2 { + + input: + file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep + + output: + file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch + + when: + params.posterior_probabilities != "percolator" + + script: + """ + IDScoreSwitcher -in ${id_file} \\ + -out ${id_file.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + -old_score "Posterior Error Probability" \\ + -new_score q-value \\ + -new_score_orientation lower_better + """ + +} + +process idfilter2 { + publishDir "${params.outdir}/ids", mode: 'copy' + input: - file id_file from id_files_idx_feat_perc_fdr + file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch output: - file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_fdr_filter, id_files_idx_feat_perc_fdr_filter_2 + file "${id_file.baseName}_filter.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter + when: + params.posterior_probabilities != "percolator" + script: """ IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ - -score:pep 0.05 + -score:pep ${params.psm_level_fdr_cutoff} """ } -//TODO check if needed -process idscoreswitcher { +process idscoreswitcher3 { input: - file id_file from id_files_idx_feat_perc_fdr_filter + file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch when: - !params.skipPercolator + params.posterior_probabilities != "percolator" script: """ - IDScoreSwitcher -in ${id_file} \\ + IDScoreSwitcher -in ${id_file} \\ -out ${id_file.baseName}_switched.idXML \\ -threads ${task.cpus} \\ - -old_score q-value -new_score MS:1001493 -new_score_orientation lower_better -new_score_type "Posterior Error Probability" + -old_score q-value \\ + -new_score "Posterior Error Probability" \\ + -new_score_orientation lower_better """ } + +// --------------------------------------------------------------------- +// Main Branch + process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: file mzmls from mzmls_plfq.toSortedList({ a, b -> b.baseName <=> a.baseName }).view() - file id_files from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_feat_perc_fdr_filter_2).toSortedList({ a, b -> b.baseName <=> a.baseName }).view() + file id_files from id_files_idx_feat_perc_fdr_filter_switched + .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) + .toSortedList({ a, b -> b.baseName <=> a.baseName }) + .view() file expdes from expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) @@ -421,6 +585,7 @@ process proteomicslfq { -fasta ${fasta} \\ -targeted_only "true" \\ -mass_recalibration "false" \\ + -transfer_ids "false" \\ -out out.mzTab \\ -threads ${task.cpus} \\ -out_msstats out.csv \\ @@ -431,6 +596,15 @@ process proteomicslfq { } + + + + +//--------------------------------------------------------------- // +//---------------------- Nextflow specifics --------------------- // +//--------------------------------------------------------------- // + + // Header log info log.info nfcoreHeader() def summary = [:] @@ -500,7 +674,8 @@ process get_software_versions { /* * STEP 3 - Output Description HTML */ -/*process output_documentation { +/* TODO Deactivated for now +process output_documentation { publishDir "${params.outdir}/pipeline_info", mode: 'copy' input: diff --git a/nextflow.config b/nextflow.config index 0295a65..43f4f06 100644 --- a/nextflow.config +++ b/nextflow.config @@ -12,9 +12,22 @@ params { // TODO nf-core: Specify your pipeline's command line flags spectra = "data/*.mzML" database = "data/*.fasta" - //expdesign = "data/*.tsv" - adddecoys = false - se = "comet" + expdesign = "data/*.tsv" + posterior_probabilities = "percolator" + transfer_ids = false + targeted_only = false + mass_recalibration = true + add_decoys = false + search_engine = "comet" + protein_inference = "aggregation" + precursor_mass_tolerance = 5 + enzyme = 'Trypsin' + fixed_mods = 'Carbamidomethyl (C)' + variable_mods = 'Oxidation (M)' + allowed_missed_cleavages = 1 + psm_level_fdr_cutoff = 0.05 + protein_level_fdr_cutoff = 0.05 + outdir = './results' // Boilerplate options @@ -37,14 +50,7 @@ params { config_profile_contact = false config_profile_url = false - //workflow defaults - precursor_mass_tolerance = 5 - enzyme = 'unspecific cleavage' - fixed_mods = '' - variable_mods = 'Oxidation (M)' - allowed_missed_cleavages = 0 - psm_level_fdr_cutoff = 0.05 - protein_level_fdr_cutoff = 0.05 + } // Container slug. Stable releases should specify release tag! From caffea9d4ead251dd8c9295b842c327c80fd61b6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Jan 2020 01:29:09 +0100 Subject: [PATCH 029/374] No protein FDR on test data to get all results (also: it fails). Upload artifacts. --- .github/workflows/main.yml | 6 ++++++ conf/test.config | 1 + main.nf | 3 +++ 3 files changed, 10 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 3c9d844..d8fac63 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -10,6 +10,7 @@ jobs: TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_TOKEN }} steps: - uses: actions/checkout@v1 + name: Checkout sources - name: Docker pull OpenMS image run: docker pull openms/executables - name: Extract branch name @@ -27,3 +28,8 @@ jobs: - name: BASIC Run the basic pipeline with the test run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME-basic" -profile test,docker + - uses: actions/upload-artifact@v1 + name: Upload results + with: + name: results + path: ${GITHUB_WORKSPACE}/results diff --git a/conf/test.config b/conf/test.config index 2c3df32..35512ef 100644 --- a/conf/test.config +++ b/conf/test.config @@ -26,4 +26,5 @@ params { expdesign = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA_design.tsv' posterior_probabilities = "fit_distributions" search_engine = "msgf" + protein_level_fdr_cutoff = 1.0 } diff --git a/main.nf b/main.nf index 5f7620a..37a7a4a 100644 --- a/main.nf +++ b/main.nf @@ -576,6 +576,9 @@ process proteomicslfq { file "out.mzTab" into out_mzTab file "out.consensusXML" into out_consensusXML file "out.csv" into out_msstats + file "debug_mergedIDs.idXML" into debug_id + file "debug_mergedIDs_inference.idXML" into debug_id_inf + file "debug_mergedIDsGreedyResolved.idXML" into debug_id_resolve script: """ From 30bd399ff419ebaca578e1105c6793b5d687ed38 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Jan 2020 01:40:26 +0100 Subject: [PATCH 030/374] Pass param correctly. Always upload artifacts. --- .github/workflows/main.yml | 1 + main.nf | 1 + 2 files changed, 2 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index d8fac63..36461b1 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -29,6 +29,7 @@ jobs: run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME-basic" -profile test,docker - uses: actions/upload-artifact@v1 + if: always() name: Upload results with: name: results diff --git a/main.nf b/main.nf index 37a7a4a..30b7e8d 100644 --- a/main.nf +++ b/main.nf @@ -593,6 +593,7 @@ process proteomicslfq { -threads ${task.cpus} \\ -out_msstats out.csv \\ -out_cxml out.consensusXML \\ + -proteinFDR ${params.protein_level_fdr_cutoff} \\ -debug 667 """ From 45aa0735f8555cc126d25ae46fbe9303668adcf8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Jan 2020 01:51:33 +0100 Subject: [PATCH 031/374] ONe output too much. wrong artifact path --- .github/workflows/main.yml | 2 +- main.nf | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 36461b1..3f6cc3d 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -33,4 +33,4 @@ jobs: name: Upload results with: name: results - path: ${GITHUB_WORKSPACE}/results + path: results diff --git a/main.nf b/main.nf index 30b7e8d..99830c3 100644 --- a/main.nf +++ b/main.nf @@ -578,7 +578,7 @@ process proteomicslfq { file "out.csv" into out_msstats file "debug_mergedIDs.idXML" into debug_id file "debug_mergedIDs_inference.idXML" into debug_id_inf - file "debug_mergedIDsGreedyResolved.idXML" into debug_id_resolve + //file "debug_mergedIDsGreedyResolved.idXML" into debug_id_resolve script: """ From 44008034211e69e3a510411bff8bee96da0a46b3 Mon Sep 17 00:00:00 2001 From: Zethson Date: Sat, 25 Jan 2020 12:49:23 +0100 Subject: [PATCH 032/374] [FEATURE] some refactoring & percolator options --- docs/usage.md | 52 ++++++++++-- main.nf | 205 ++++++++++++++++++++++-------------------------- nextflow.config | 11 ++- 3 files changed, 150 insertions(+), 118 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 4d4126f..c0249ca 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -20,7 +20,14 @@ * [`--variable_mods`](#--variable_mods) * [`--allowed_missed cleavages`](#--allowed_missed_cleavages) * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) - * [`--protein_level_fdr_cutoff](#--protein_level_fdr_cutoff) +* [Protein inference](#Protein-Inference) + * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) + * [`train_FDR`](#--train_FDR) + * [`test_FDR`](#--test_FDR) + * [`percolator_enzyme`](#--percolator_enzyme) + * [`FDR_level`](#--FDR_level) + * [`klammer`](#--klammer) + * [`description_correct_features`](#--description_correct_features) * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) @@ -131,7 +138,6 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * A profile with a complete configuration for automated testing * Includes links to test data and therefore doesn't need additional parameters - ## Mass Spectrometry Search ### `--precursor_mass_tolerance` @@ -152,21 +158,53 @@ Specify which variable modifications should be applied to the database search (e Multiple fixed or variable modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)') -## `--allowed_missed_cleavages` +### `--allowed_missed_cleavages` Specify the number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if the no-enzyme option is specified for comet. -## `--psm_level_fdr_cutoff` +### `--psm_level_fdr_cutoff` Specify the PSM level cutoff for the identification FDR for IDFilter. -## `--protein_level_fdr_cutoff` +## Protein Inference + +### `--protein_level_fdr_cutoff` Specify the protein level cutoff for the identification FDR of PLFQ -Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. +### `--train_FDR` + +False discovery rate threshold to define positive examples in training. Set to testFDR if 0. + +### `--test_FDR` + +False discovery rate threshold for evaluating best cross validation result and reported end result. + +### `--percolator_enzyme` + +The type of used enzyme("no_enzyme","elastase","pepsin","proteinasek","thermolysin","trypsinp","chymotrypsin","lys-n","lys-c","arg-c","asp-n","glu-c","trypsin"). + +### `--FDR_level` + +Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs'). - +### `--klammer` + +Retention time features are calculated as in Klammer et al. instead of with Elude. Only available if --description_correct_features is set. + +### `--description_correct_features` + +Percolator provides a possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is the used as predictive features. + +1 iso-electric point + +2 mass calibration + +4 retention time + +8 delta_retention_time*delta_mass_calibration + +Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. ## Job resources ### Automatic resubmission diff --git a/main.nf b/main.nf index 99830c3..9aa829a 100644 --- a/main.nf +++ b/main.nf @@ -45,6 +45,12 @@ def helpMessage() { "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) "bayesian" = computes a posterior probability for every protein based on a Bayesian network --protein_level_fdr_cutoff Identification protein-level FDR cutoff + --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0. + --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result. + --percolator_enzyme Type of enzyme + --FDR_level Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs') + --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) + --klammer Retention time features are calculated as in Klammer et al. instead of with Elude. Quantification: --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: @@ -119,20 +125,20 @@ ch_spectra //TODO we could also check for outdated mzML versions and try to update them branched_input.mzML .branch { - nonIndexedMzML: file(it).withReader { - f = it; - 1.upto(5) { - if (f.readLine().contains("indexedmzML")) return false; - } - return true; + nonIndexedMzML: file(it).withReader { + f = it; + 1.upto(5) { + if (f.readLine().contains("indexedmzML")) return false; } - inputIndexedMzML: file(it).withReader { - f = it; - 1.upto(5) { - if (f.readLine().contains("indexedmzML")) return true; - } - return false; + return true; + } + inputIndexedMzML: file(it).withReader { + f = it; + 1.upto(5) { + if (f.readLine().contains("indexedmzML")) return true; } + return false; + } } .set {branched_input_mzMLs} @@ -160,16 +166,16 @@ branched_input.mzML process raw_file_conversion { input: - file rawfile from branched_input.raw + file rawfile from branched_input.raw output: - file "*.mzML" into mzmls_converted + file "*.mzML" into mzmls_converted // TODO use actual ThermoRawfileConverter!! script: - """ - mv ${rawfile} ${rawfile.baseName}.mzML - """ + """ + mv ${rawfile} ${rawfile.baseName}.mzML + """ } /* @@ -178,16 +184,16 @@ process raw_file_conversion { process mzml_indexing { input: - file mzmlfile from branched_input_mzMLs.nonIndexedMzML + file mzmlfile from branched_input_mzMLs.nonIndexedMzML output: - file "out/*.mzML" into mzmls_indexed + file "out/*.mzML" into mzmls_indexed script: - """ - mkdir out - FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML - """ + """ + mkdir out + FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML + """ } //Mix the converted raw data with the already supplied mzMLs and push these to the same channels as before @@ -213,21 +219,22 @@ if (params.expdesign) process generate_decoy_database { input: - file(mydatabase) from db_for_decoy_creation + file(mydatabase) from db_for_decoy_creation output: - file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy - //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... + file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy + //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... - when: params.add_decoys + when: + params.add_decoys script: - """ - DecoyDatabase -in ${mydatabase} \\ - -out ${mydatabase.baseName}_decoy.fasta \\ - -decoy_string DECOY_ \\ - -decoy_string_position prefix - """ + """ + DecoyDatabase -in ${mydatabase} \\ + -out ${mydatabase.baseName}_decoy.fasta \\ + -decoy_string DECOY_ \\ + -decoy_string_position prefix + """ } // Doesnt work. Py script needs all the inputs to be together in a folder @@ -251,59 +258,61 @@ process generate_decoy_database { /// Search engine // TODO parameterize more -if (params.search_engine == "msgf") -{ - search_engine_score = "SpecEValue" +process search_engine_msgf { + echo true - process search_engine_msgf { - echo true - input: - tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) - - // This was another way of handling the combination - //file database from searchengine_in_db.mix(searchengine_in_db_decoy) - //each file(mzml_file) from mzmls + input: + tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) + + // This was another way of handling the combination + //file database from searchengine_in_db.mix(searchengine_in_db_decoy) + //each file(mzml_file) from mzmls - output: - file "${mzml_file.baseName}.idXML" into id_files - - script: - """ - MSGFPlusAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ - } - -} else { - - search_engine_score = "expect" - - process search_engine_comet { - echo true - input: - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls - - output: - file "${mzml_file.baseName}.idXML" into id_files - - script: - """ - CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ - } + output: + file "${mzml_file.baseName}.idXML" into id_files + + when: + params.search_engine == "msgf" + + script: + search_engine_score = "SpecEValue" + """ + MSGFPlusAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ } + +process search_engine_comet { + echo true + + input: + file database from searchengine_in_db.mix(searchengine_in_db_decoy) + each file(mzml_file) from mzmls + + output: + file "${mzml_file.baseName}.idXML" into id_files + + when: + params.search_engine == "comet" + + script: + search_engine_score = "expect" + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ +} process index_peptides { echo true + input: each file(id_file) from id_files file database from pepidx_in_db.mix(pepidx_in_db_decoy) @@ -318,7 +327,6 @@ process index_peptides { -threads ${task.cpus} \\ -fasta ${database} """ - } @@ -340,10 +348,9 @@ process extract_perc_features { script: """ PSMFeatureExtractor -in ${id_file} \\ - -out ${id_file.baseName}_feat.idXML \\ - -threads ${task.cpus} + -out ${id_file.baseName}_feat.idXML \\ + -threads ${task.cpus} """ - } //TODO parameterize and find a way to run across all runs merged @@ -358,14 +365,18 @@ process percolator { when: params.posterior_probabilities == "percolator" + if (params.klammer && params.description_correct_features == 0) { + log.warn('Klammer was specified, but description of correct features was still 0. Please provide a description of correct features greater than 0.') + log.warn('Klammer has been turned off!') + } + script: """ - PercolatorAdapter -in ${id_file} \\ + PercolatorAdapter -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ -post-processing-tdc -subset-max-train 100000 -decoy-pattern "rev" """ - } process idfilter { @@ -388,7 +399,6 @@ process idfilter { -threads ${task.cpus} \\ -score:pep ${params.psm_level_fdr_cutoff} """ - } process idscoreswitcher { @@ -412,7 +422,6 @@ process idscoreswitcher { -new_score_orientation lower_better \\ -new_score_type "Posterior Error Probability" """ - } @@ -438,7 +447,6 @@ process fdr { -threads ${task.cpus} \\ -protein false -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins """ - } process idscoreswitcher1 { @@ -462,7 +470,6 @@ process idscoreswitcher1 { -new_score_orientation lower_better \\ -new_score_type ${search_engine_score} """ - } //TODO probably not needed when using Percolator. You can use the qval from there @@ -483,7 +490,6 @@ process idpep { -out ${id_file.baseName}_idpep.idXML \\ -threads ${task.cpus} """ - } process idscoreswitcher2 { @@ -506,7 +512,6 @@ process idscoreswitcher2 { -new_score q-value \\ -new_score_orientation lower_better """ - } process idfilter2 { @@ -529,7 +534,6 @@ process idfilter2 { -threads ${task.cpus} \\ -score:pep ${params.psm_level_fdr_cutoff} """ - } process idscoreswitcher3 { @@ -552,7 +556,6 @@ process idscoreswitcher3 { -new_score "Posterior Error Probability" \\ -new_score_orientation lower_better """ - } @@ -595,9 +598,7 @@ process proteomicslfq { -out_cxml out.consensusXML \\ -proteinFDR ${params.protein_level_fdr_cutoff} \\ -debug 667 - """ - } @@ -631,7 +632,6 @@ if(params.config_profile_contact) summary['Config Contact'] = params.con if(params.config_profile_url) summary['Config URL'] = params.config_profile_url if(params.email) { summary['E-mail Address'] = params.email - summary['MultiQC maxsize'] = params.maxMultiqcEmailFileSize } log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") log.info "\033[2m----------------------------------------------------\033[0m" @@ -730,21 +730,6 @@ workflow.onComplete { email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - // TODO nf-core: If not using MultiQC, strip out this code (including params.maxMultiqcEmailFileSize) - // On success try attach the multiqc report - def mqc_report = null - try { - if (workflow.success) { - mqc_report = multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList){ - log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'multiqc', will use only one" - mqc_report = mqc_report[0] - } - } - } catch (all) { - log.warn "[nf-core/proteomicslfq] Could not attach MultiQC report to summary email" - } - // Render the TXT template def engine = new groovy.text.GStringTemplateEngine() def tf = new File("$baseDir/assets/email_template.txt") @@ -757,7 +742,7 @@ workflow.onComplete { def email_html = html_template.toString() // Render the sendmail template - def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.maxMultiqcEmailFileSize.toBytes() ] + def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report ] def sf = new File("$baseDir/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() diff --git a/nextflow.config b/nextflow.config index 43f4f06..201dd70 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,10 +9,11 @@ params { // Workflow flags - // TODO nf-core: Specify your pipeline's command line flags spectra = "data/*.mzML" database = "data/*.fasta" expdesign = "data/*.tsv" + + // Tools flags posterior_probabilities = "percolator" transfer_ids = false targeted_only = false @@ -28,6 +29,14 @@ params { psm_level_fdr_cutoff = 0.05 protein_level_fdr_cutoff = 0.05 + // Percolator flags + train_FDR = 0.05 + test_FDR = 0.05 + percolator_enzyme = "no_enzyme" + FDR_level = 'peptide-level-fdrs' + klammer = false + description_correct_features = 0 + outdir = './results' // Boilerplate options From 03f8d633be11e53617b72c7efa42c3e1abf31996 Mon Sep 17 00:00:00 2001 From: Zethson Date: Sat, 25 Jan 2020 13:06:55 +0100 Subject: [PATCH 033/374] [FEATURE] SE refactoring --- main.nf | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/main.nf b/main.nf index 9aa829a..8da6a31 100644 --- a/main.nf +++ b/main.nf @@ -257,12 +257,17 @@ process generate_decoy_database { //} /// Search engine + +searchengine_in_db.into { searchengine_in_db_msgf; searchengine_in_db_comet } +searchengine_in_db_decoy.into { searchengine_in_db_decoy_msgf; searchengine_in_db_decoy_comet } +mzmls.into { mzmls_msgf; mzmls_comet } + // TODO parameterize more process search_engine_msgf { echo true input: - tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) + tuple file(database), file(mzml_file) from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf) // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -270,7 +275,7 @@ process search_engine_msgf { output: - file "${mzml_file.baseName}.idXML" into id_files + file "${mzml_file.baseName}.idXML" into id_files_msgf when: params.search_engine == "msgf" @@ -290,25 +295,28 @@ process search_engine_comet { echo true input: - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls + file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) + each file(mzml_file) from mzmls_comet output: - file "${mzml_file.baseName}.idXML" into id_files + file "${mzml_file.baseName}.idXML" into id_files_comet when: - params.search_engine == "comet" + params.search_engine == "comet" script: - search_engine_score = "expect" - """ - CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ + search_engine_score = "expect" + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ } +id_files = Channel.create() +id_files.mix(id_files_msgf, id_files_comet) + process index_peptides { echo true From c1d8dd7ea752e89f657dd4475de7439197420cbb Mon Sep 17 00:00:00 2001 From: Zethson Date: Sat, 25 Jan 2020 13:17:11 +0100 Subject: [PATCH 034/374] [FIX] duplicated channel --- docs/usage.md | 12 ++++++------ main.nf | 4 +--- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index c0249ca..c58d0a9 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -22,12 +22,12 @@ * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) * [Protein inference](#Protein-Inference) * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) - * [`train_FDR`](#--train_FDR) - * [`test_FDR`](#--test_FDR) - * [`percolator_enzyme`](#--percolator_enzyme) - * [`FDR_level`](#--FDR_level) - * [`klammer`](#--klammer) - * [`description_correct_features`](#--description_correct_features) + * [`--train_FDR`](#--train_FDR) + * [`--test_FDR`](#--test_FDR) + * [`--percolator_enzyme`](#--percolator_enzyme) + * [`--FDR_level`](#--FDR_level) + * [`--klammer`](#--klammer) + * [`--description_correct_features`](#--description_correct_features) * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) diff --git a/main.nf b/main.nf index 8da6a31..11a0432 100644 --- a/main.nf +++ b/main.nf @@ -314,9 +314,7 @@ process search_engine_comet { """ } -id_files = Channel.create() -id_files.mix(id_files_msgf, id_files_comet) - +id_files = Channel.create().mix(id_files_msgf, id_files_comet) process index_peptides { echo true From 47241a499a1a5078de3a2b363f35b73dea382be8 Mon Sep 17 00:00:00 2001 From: Zethson Date: Sat, 25 Jan 2020 14:07:45 +0100 Subject: [PATCH 035/374] [FIX] reverted to old SE separation --- main.nf | 103 ++++++++++++++++++++++++++------------------------------ 1 file changed, 47 insertions(+), 56 deletions(-) diff --git a/main.nf b/main.nf index 11a0432..83f8cf1 100644 --- a/main.nf +++ b/main.nf @@ -257,65 +257,56 @@ process generate_decoy_database { //} /// Search engine - -searchengine_in_db.into { searchengine_in_db_msgf; searchengine_in_db_comet } -searchengine_in_db_decoy.into { searchengine_in_db_decoy_msgf; searchengine_in_db_decoy_comet } -mzmls.into { mzmls_msgf; mzmls_comet } - // TODO parameterize more -process search_engine_msgf { - echo true - - input: - tuple file(database), file(mzml_file) from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf) - - // This was another way of handling the combination - //file database from searchengine_in_db.mix(searchengine_in_db_decoy) - //each file(mzml_file) from mzmls - - - output: - file "${mzml_file.baseName}.idXML" into id_files_msgf - - when: - params.search_engine == "msgf" - - script: - search_engine_score = "SpecEValue" - """ - MSGFPlusAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ -} - - -process search_engine_comet { - echo true - - input: - file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) - each file(mzml_file) from mzmls_comet - - output: - file "${mzml_file.baseName}.idXML" into id_files_comet - - when: - params.search_engine == "comet" - - script: - search_engine_score = "expect" - """ - CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ +if (params.search_engine == "msgf") +{ + search_engine_score = "SpecEValue" + + process search_engine_msgf { + echo true + input: + tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) + + // This was another way of handling the combination + //file database from searchengine_in_db.mix(searchengine_in_db_decoy) + //each file(mzml_file) from mzmls + + + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + MSGFPlusAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ + } + +} else { + + search_engine_score = "expect" + + process search_engine_comet { + echo true + input: + file database from searchengine_in_db.mix(searchengine_in_db_decoy) + each file(mzml_file) from mzmls + + output: + file "${mzml_file.baseName}.idXML" into id_files + + script: + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ + } } -id_files = Channel.create().mix(id_files_msgf, id_files_comet) - process index_peptides { echo true From 0842a4bce832da0354e4a23522b50736f2d8bab7 Mon Sep 17 00:00:00 2001 From: Zethson Date: Tue, 28 Jan 2020 22:36:33 +0100 Subject: [PATCH 036/374] [FEATURE] MSGFPlus parameter documentation --- docs/usage.md | 70 ++++++++++++++++++++++++++++++++++++++++++------- main.nf | 17 +++++++++--- nextflow.config | 31 +++++++++++++++++----- 3 files changed, 98 insertions(+), 20 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index c58d0a9..88fbfb0 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -24,10 +24,20 @@ * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) * [`--train_FDR`](#--train_FDR) * [`--test_FDR`](#--test_FDR) - * [`--percolator_enzyme`](#--percolator_enzyme) * [`--FDR_level`](#--FDR_level) * [`--klammer`](#--klammer) * [`--description_correct_features`](#--description_correct_features) + * [`--isotope_error_range`](#--isotope_error_range) + * [`--fragment_method`](#--fragment_method) + * [`--instrument`](#--instrument) + * [`--protocol`](#--protocol) + * [`--tryptic`](#--tryptic) + * [`--min_precursor_charge`](#--min_precursor_charge) + * [`--max_precursor_charge`](#--max_precursor_charge) + * [`--min_peptide_length`](#--min_peptide_length) + * [`--max_peptide_length`](#--max_peptide_length) + * [`--matches_per_spec`](#--matches_per_spec) + * [`--max_mods`](#--max_mods) * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) @@ -174,27 +184,23 @@ Specify the protein level cutoff for the identification FDR of PLFQ ### `--train_FDR` -False discovery rate threshold to define positive examples in training. Set to testFDR if 0. +Percolator: False discovery rate threshold to define positive examples in training. Set to testFDR if 0. ### `--test_FDR` -False discovery rate threshold for evaluating best cross validation result and reported end result. - -### `--percolator_enzyme` - -The type of used enzyme("no_enzyme","elastase","pepsin","proteinasek","thermolysin","trypsinp","chymotrypsin","lys-n","lys-c","arg-c","asp-n","glu-c","trypsin"). +Percolator: False discovery rate threshold for evaluating best cross validation result and reported end result. ### `--FDR_level` -Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs'). +Percolator: Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs'). ### `--klammer` -Retention time features are calculated as in Klammer et al. instead of with Elude. Only available if --description_correct_features is set. +Percolator: Retention time features are calculated as in Klammer et al. instead of with Elude. Only available if --description_correct_features is set. ### `--description_correct_features` -Percolator provides a possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is the used as predictive features. +Percolator provides the possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is the used as predictive features. 1 iso-electric point @@ -204,6 +210,50 @@ Percolator provides a possibility to use so called description of correct featur 8 delta_retention_time*delta_mass_calibration +### `--isotope_error_range` + +Range of allowed isotope peak errors (MS-GF+ parameter '-ti'). Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation. Combined with 'precursor_mass_tolerance'/'precursor_error_units', this determines the actual precursor mass tolerance. E.g. for experimental mass 'exp' and calculated mass 'calc', '-precursor_mass_tolerance 20 -precursor_error_units ppm -isotope_error_range -1,2' tests '|exp - calc - n * 1.00335 Da| < 20 ppm' for n = -1, 0, 1, 2. + +### `--fragment_method` + +MSGFPlus: Fragmentation method ('from_spectrum' relies on spectrum meta data and uses CID as fallback option; MS-GF+ parameter '-m') + +### `--instrument` + +MSGFPlus: Instrument that generated the data ('low_res'/'high_res' refer to LCQ and LTQ instruments; MS-GF+ parameter '-inst') + +### `--protocol` + +MSGFPlus: Labeling or enrichment protocol used, if any (MS-GF+ parameter '-p') + +### `--tryptic` + +MSGFPlus: Level of cleavage specificity required (MS-GF+ parameter '-ntt') + +### `--min_precursor_charge` + +MSGFPlus: Minimum precursor ion charge (only used for spectra without charge information; MS-GF+ parameter '-minCharge') + +### `--max_precursor_charge` + +MSGFPlus: Maximum precursor ion charge (only used for spectra without charge information; MS-GF+ parameter '-maxCharge') + +### `--min_peptide_length` + +MSGFPlus: Minimum peptide length to consider (MS-GF+ parameter '-minLength') + +### `--max_peptide_length` + +MSGFPlus: Maximum peptide length to consider (MS-GF+ parameter '-maxLength') + +### `--matches_per_spec` + +MSGFPLus: Number of matches per spectrum to be reported (MS-GF+ parameter '-n') + +### `--max_mods` + +MSGFPlus: Maximum number of modifications per peptide. If this value is large, the search may take very long. + Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. ## Job resources diff --git a/main.nf b/main.nf index 83f8cf1..e4ae80d 100644 --- a/main.nf +++ b/main.nf @@ -45,12 +45,23 @@ def helpMessage() { "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) "bayesian" = computes a posterior probability for every protein based on a Bayesian network --protein_level_fdr_cutoff Identification protein-level FDR cutoff - --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0. - --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result. + --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0 + --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result --percolator_enzyme Type of enzyme --FDR_level Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs') --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) - --klammer Retention time features are calculated as in Klammer et al. instead of with Elude. + --klammer Retention time features are calculated as in Klammer et al. instead of with Elude + --isotope_error_range Range of allowed isotope peak errors + --fragment_method Used fragmentation method + --instrument Type of instrument that generated the data + --protocol Used labeling or enrichment protocol (if any) + --tryptic Level of required cleavage specificity + --min_precursor_charge Minimum precursor ion charge (only used for spectra without charge information + --max_precursor_charge Maximum precursor ion charge (only used for spectra without charge information + --min_peptide_length Minimum peptide length to consider + --max_peptide_length Maximum peptide length to consider + --matches_per_spec Number of matches per spectrum to be reported + --max_mods Maximum number of modifications per peptide. If this value is large, the search may take very long Quantification: --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: diff --git a/nextflow.config b/nextflow.config index 201dd70..a574a7c 100644 --- a/nextflow.config +++ b/nextflow.config @@ -21,28 +21,46 @@ params { add_decoys = false search_engine = "comet" protein_inference = "aggregation" - precursor_mass_tolerance = 5 - enzyme = 'Trypsin' - fixed_mods = 'Carbamidomethyl (C)' - variable_mods = 'Oxidation (M)' allowed_missed_cleavages = 1 psm_level_fdr_cutoff = 0.05 protein_level_fdr_cutoff = 0.05 + // shared search engine parameters + enzyme = 'Trypsin' + precursor_mass_tolerance = 5 + precursor_error_units = "ppm" + fixed_mods = 'Carbamidomethyl (C)' + variable_mods = 'Oxidation (M)' + // Percolator flags train_FDR = 0.05 test_FDR = 0.05 - percolator_enzyme = "no_enzyme" FDR_level = 'peptide-level-fdrs' klammer = false description_correct_features = 0 + // MSGF+ flags + isotope_error_range = "0,1" + fragment_method = "from_spectrum" + instrument = "high_res" + protocol = "automatic" + tryptic = "non" + min_precursor_charge = 2 + max_precursor_charge = 3 + min_peptide_length = 6 + max_peptide_length = 40 + matches_per_spec = 1 + max_mods = 2 + + // Comet flags + // TODO + + outdir = './results' // Boilerplate options name = false email = false - maxMultiqcEmailFileSize = 25.MB plaintext_email = false monochrome_logs = false help = false @@ -59,7 +77,6 @@ params { config_profile_contact = false config_profile_url = false - } // Container slug. Stable releases should specify release tag! From 64ed0803166df79fc213f1e3a0458028cfa380ab Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 8 Feb 2020 22:28:25 +0100 Subject: [PATCH 037/374] Fixed if-case not being in script section! Added many parameters to the help, but not yet to the tools. --- conf/test.config | 6 ++- main.nf | 113 +++++++++++++++++++++++++++++++---------------- nextflow.config | 12 ++++- 3 files changed, 90 insertions(+), 41 deletions(-) diff --git a/conf/test.config b/conf/test.config index 35512ef..0c99642 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,8 +10,8 @@ params { // Limit resources so that this can run on Travis max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + max_memory = 4.GB + max_time = 24.h // Input data // TODO nf-core: Give any required params for the test so that command line flags are not needed spectra = [ @@ -27,4 +27,6 @@ params { posterior_probabilities = "fit_distributions" search_engine = "msgf" protein_level_fdr_cutoff = 1.0 + decoy_affix = "rev" + post-processing-tdc = false } diff --git a/main.nf b/main.nf index e4ae80d..0c246bb 100644 --- a/main.nf +++ b/main.nf @@ -24,50 +24,82 @@ def helpMessage() { --spectra Path to input spectra as mzML or Thermo Raw --database Path to input protein database as fasta + Decoy database: + --add_decoys Add decoys to the given fasta + --decoy_affix The decoy prefix or suffix used or to be used (default: DECOY_) + --affix_type Prefix (default) or suffix (WARNING: Percolator only supports prefices) Database Search: - --search_engine Which search engine: "comet" or "msgf" + --search_engine Which search engine: "comet" (default) or "msgf" --enzyme Enzymatic cleavage ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) + --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: + 'fully' valid: 'semi', 'fully', 'C-term unspecific', 'N-term unspecific') + --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff + --min_precursor_charge Minimum precursor ion charge + --max_precursor_charge Maximum precursor ion charge + --min_peptide_length Minimum peptide length to consider + --max_peptide_length Maximum peptide length to consider + --instrument Type of instrument that generated the data + --protocol Used labeling or enrichment protocol (if any) + --fragment_method Used fragmentation method + --max_mods Maximum number of modifications per peptide. If this value is large, the search may take very long + --db_debug Debug level during database search + + //TODO probably also still some options missing. Try to consolidate them whenever the two search engines share them + + PSM Rescoring: --posterior_probabilities How to calculate posterior probabilities for PSMs: "percolator" = Re-score based on PSM-feature-based SVM and transform distance to hyperplane for posteriors "fit_distributions" = Fit positive and negative distributions to scores (similar to PeptideProphet) + --rescoring_debug Debug level during PSM rescoring + --psm_pep_fdr_cutoff FDR cutoff on PSM level (or potential peptide level; see Percolator options) before going into + feature finding, map alignment and inference. + Percolator specific: + --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0 + --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result + --percolator_fdr_level Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') + --post-processing-tdc Use target-decoy competition to assign q-values and PEPs. + --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) + --generic-feature-set Use only generic (i.e. not search engine specific) features. Generating search engine specific + features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly. + --subset-max-train Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other + PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. + --klammer Retention time features are calculated as in Klammer et al. instead of with Elude + + Distribution specific: + --outlier_handling How to handle outliers during fitting: + - ignore_iqr_outliers (default): ignore outliers outside of 3*IQR from Q1/Q3 for fitting + - set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting + - ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) + - none: do nothing + --top_hits_only Use only the top hits for fitting + + //TODO add more options for rescoring part Inference: --protein_inference Infer proteins through: "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) "bayesian" = computes a posterior probability for every protein based on a Bayesian network - --protein_level_fdr_cutoff Identification protein-level FDR cutoff - --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0 - --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result - --percolator_enzyme Type of enzyme - --FDR_level Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs') - --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) - --klammer Retention time features are calculated as in Klammer et al. instead of with Elude - --isotope_error_range Range of allowed isotope peak errors - --fragment_method Used fragmentation method - --instrument Type of instrument that generated the data - --protocol Used labeling or enrichment protocol (if any) - --tryptic Level of required cleavage specificity - --min_precursor_charge Minimum precursor ion charge (only used for spectra without charge information - --max_precursor_charge Maximum precursor ion charge (only used for spectra without charge information - --min_peptide_length Minimum peptide length to consider - --max_peptide_length Maximum peptide length to consider - --matches_per_spec Number of matches per spectrum to be reported - --max_mods Maximum number of modifications per peptide. If this value is large, the search may take very long + ("percolator" not yet supported) + --protein_level_fdr_cutoff Protein level FDR cutoff (this affects and chooses the peptides used for quantification) Quantification: --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: increased memory consumption) --targeted_only Only ID based quantification --mass_recalibration Recalibrates masses to correct for instrument biases + --psm_pep_fdr_for_quant PSM/peptide level FDR used for quantification (if filtering on protein level is not enough) + If Bayesian inference was chosen, this will be a peptide-level FDR and only the best PSMs per + peptide will be reported. + (default: off = 1.0) --protein_quantification Quantify proteins based on: "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) "strictly_unique_peptides" = use peptides mapping to a unique single protein only @@ -75,7 +107,7 @@ def helpMessage() { General Options: --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) - --add_decoys Add decoys to the given fasta + Other nextflow options: --outdir The output directory where the results will be saved @@ -168,8 +200,6 @@ branched_input.mzML * https://www.nextflow.io/docs/latest/process.html#multiple-input-files * e.g: file "?????.mzML" from mzmls_plfq.toSortedList() and ProteomicsLFQ -in *.mzML -ids *.id */ -// - Check how to avoid copying of the database for example (currently we get one copy for each SE run). Is it the -// "each file()" pattern I used? /* * STEP 0.1 - Raw file conversion @@ -361,6 +391,8 @@ process extract_perc_features { """ } + + //TODO parameterize and find a way to run across all runs merged process percolator { @@ -373,18 +405,24 @@ process percolator { when: params.posterior_probabilities == "percolator" - if (params.klammer && params.description_correct_features == 0) { - log.warn('Klammer was specified, but description of correct features was still 0. Please provide a description of correct features greater than 0.') - log.warn('Klammer has been turned off!') - } - + // NICE-TO-HAVE: the decoy-pattern is automatically detected from PeptideIndexer. + // Parse its output and put the correct one here. script: - """ - PercolatorAdapter -in ${id_file} \\ - -out ${id_file.baseName}_perc.idXML \\ - -threads ${task.cpus} \\ - -post-processing-tdc -subset-max-train 100000 -decoy-pattern "rev" - """ + if (params.klammer && params.description_correct_features == 0) { + log.warn('Klammer was specified, but description of correct features was still 0. Please provide a description of correct features greater than 0.') + log.warn('Klammer will be implicitly off!') + } + + def pptdc = params.post-processing-tdc ? "" : "-post-processing-tdc" + + """ + PercolatorAdapter -in ${id_file} \\ + -out ${id_file.baseName}_perc.idXML \\ + -threads ${task.cpus} \\ + ${pptdc} \\ + -subset-max-train ${params.subset_max_train} \\ + -decoy-pattern ${params.decoy_affix} + """ } process idfilter { @@ -405,7 +443,7 @@ process idfilter { IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ - -score:pep ${params.psm_level_fdr_cutoff} + -score:pep ${params.psm_pep_fdr_cutoff} """ } @@ -453,7 +491,9 @@ process fdr { FalseDiscoveryRate -in ${id_file} \\ -out ${id_file.baseName}_fdr.idXML \\ -threads ${task.cpus} \\ - -protein false -algorithm:add_decoy_peptides -algorithm:add_decoy_proteins + -protein false \\ + -algorithm:add_decoy_peptides \\ + -algorithm:add_decoy_proteins """ } @@ -480,7 +520,6 @@ process idscoreswitcher1 { """ } -//TODO probably not needed when using Percolator. You can use the qval from there process idpep { input: @@ -540,7 +579,7 @@ process idfilter2 { IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ - -score:pep ${params.psm_level_fdr_cutoff} + -score:pep ${params.psm_pep_fdr_cutoff} """ } diff --git a/nextflow.config b/nextflow.config index a574a7c..f572bf7 100644 --- a/nextflow.config +++ b/nextflow.config @@ -21,10 +21,13 @@ params { add_decoys = false search_engine = "comet" protein_inference = "aggregation" - allowed_missed_cleavages = 1 - psm_level_fdr_cutoff = 0.05 + psm_pep_fdr_cutoff = 0.10 protein_level_fdr_cutoff = 0.05 + // decoys + decoy_affix = "DECOY_" + affix_type = "prefix" + // shared search engine parameters enzyme = 'Trypsin' precursor_mass_tolerance = 5 @@ -32,12 +35,17 @@ params { fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' + // Comet flags + allowed_missed_cleavages = 1 + // Percolator flags train_FDR = 0.05 test_FDR = 0.05 FDR_level = 'peptide-level-fdrs' klammer = false description_correct_features = 0 + subset_max_train = 0 + post-processing-tdc = false // MSGF+ flags isotope_error_range = "0,1" From 1a83875a86de50fb532889acfb59a031d8457ba3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 9 Feb 2020 01:17:53 +0100 Subject: [PATCH 038/374] small fixes to my changes, param names --- conf/test.config | 2 +- main.nf | 36 +++++++++++++++++++++++------------- nextflow.config | 16 +++++++++------- 3 files changed, 33 insertions(+), 21 deletions(-) diff --git a/conf/test.config b/conf/test.config index 0c99642..ca045a8 100644 --- a/conf/test.config +++ b/conf/test.config @@ -28,5 +28,5 @@ params { search_engine = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" - post-processing-tdc = false + post_processing_tdc = false } diff --git a/main.nf b/main.nf index 0c246bb..47d0f3b 100644 --- a/main.nf +++ b/main.nf @@ -84,18 +84,24 @@ def helpMessage() { //TODO add more options for rescoring part - Inference: + Inference and Quantification: + --inf_quant_debug Debug level during inference and quantification. (WARNING: Higher than 666 may produce a lot + of additional output files) + Inference: --protein_inference Infer proteins through: "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) "bayesian" = computes a posterior probability for every protein based on a Bayesian network ("percolator" not yet supported) --protein_level_fdr_cutoff Protein level FDR cutoff (this affects and chooses the peptides used for quantification) - Quantification: + Quantification: --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: - increased memory consumption) - --targeted_only Only ID based quantification - --mass_recalibration Recalibrates masses to correct for instrument biases + increased memory consumption). (default: false) TODO must specify true or false + --targeted_only Only ID based quantification. (default: true) TODO must specify true or false + --mass_recalibration Recalibrates masses to correct for instrument biases. (default: false) TODO must specify true + or false + + //TODO the following need to be passed still --psm_pep_fdr_for_quant PSM/peptide level FDR used for quantification (if filtering on protein level is not enough) If Bayesian inference was chosen, this will be a peptide-level FDR and only the best PSMs per peptide will be reported. @@ -413,7 +419,7 @@ process percolator { log.warn('Klammer will be implicitly off!') } - def pptdc = params.post-processing-tdc ? "" : "-post-processing-tdc" + def pptdc = params.post_processing_tdc ? "" : "-post-processing-tdc" """ PercolatorAdapter -in ${id_file} \\ @@ -626,9 +632,12 @@ process proteomicslfq { file "out.mzTab" into out_mzTab file "out.consensusXML" into out_consensusXML file "out.csv" into out_msstats - file "debug_mergedIDs.idXML" into debug_id - file "debug_mergedIDs_inference.idXML" into debug_id_inf - //file "debug_mergedIDsGreedyResolved.idXML" into debug_id_resolve + file "debug_mergedIDs.idXML" optional true + file "debug_mergedIDs_inference.idXML" optional true + file "debug_mergedIDsGreedyResolved.idXML" optional true + file "debug_mergedIDsGreedyResolvedFDR.idXML" optional true + file "debug_mergedIDsGreedyResolvedFDRFiltered.idXML" optional true + file "debug_mergedIDsFDRFilteredStrictlyUniqueResolved.idXML" optional true script: """ @@ -636,15 +645,16 @@ process proteomicslfq { -ids ${(id_files as List).join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ - -targeted_only "true" \\ - -mass_recalibration "false" \\ - -transfer_ids "false" \\ + -protein_inference ${params.protein_inference} \\ + -targeted_only ${params.targeted_only} \\ + -mass_recalibration ${params.mass_recalibration} \\ + -transfer_ids ${params.transfer_ids} \\ -out out.mzTab \\ -threads ${task.cpus} \\ -out_msstats out.csv \\ -out_cxml out.consensusXML \\ -proteinFDR ${params.protein_level_fdr_cutoff} \\ - -debug 667 + -debug ${params.inf_quant_debug} """ } diff --git a/nextflow.config b/nextflow.config index f572bf7..e76cf17 100644 --- a/nextflow.config +++ b/nextflow.config @@ -15,9 +15,6 @@ params { // Tools flags posterior_probabilities = "percolator" - transfer_ids = false - targeted_only = false - mass_recalibration = true add_decoys = false search_engine = "comet" protein_inference = "aggregation" @@ -35,9 +32,6 @@ params { fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' - // Comet flags - allowed_missed_cleavages = 1 - // Percolator flags train_FDR = 0.05 test_FDR = 0.05 @@ -45,7 +39,7 @@ params { klammer = false description_correct_features = 0 subset_max_train = 0 - post-processing-tdc = false + post_processing_tdc = false // MSGF+ flags isotope_error_range = "0,1" @@ -61,8 +55,16 @@ params { max_mods = 2 // Comet flags + allowed_missed_cleavages = 1 // TODO + // ProteomicsLFQ flags + inf_quant_debug = 0 + protein_inference = "aggregation" + // TODO convert to real flags? + targeted_only = "true" + mass_recalibration = "false" + transfer_ids = "false" outdir = './results' From a4157ec384b26d2509cc9a87faef8ac5e3e94fcc Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 9 Feb 2020 01:25:55 +0100 Subject: [PATCH 039/374] name too long??? wth --- .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 3f6cc3d..b31b70e 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -27,7 +27,7 @@ jobs: sudo mv nextflow /usr/local/bin/ - name: BASIC Run the basic pipeline with the test run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME-basic" -profile test,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test,docker - uses: actions/upload-artifact@v1 if: always() name: Upload results From a0655e1fc173ccbfbf9e089cdd33fe55c1b98c95 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 9 Feb 2020 01:30:10 +0100 Subject: [PATCH 040/374] oh come on... --- .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index b31b70e..2fbc2cb 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -27,7 +27,7 @@ jobs: sudo mv nextflow /usr/local/bin/ - name: BASIC Run the basic pipeline with the test run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "wHATEVER" -profile test,docker - uses: actions/upload-artifact@v1 if: always() name: Upload results From c160efaa0e6385488a3ed45336cbc1bc344d27ed Mon Sep 17 00:00:00 2001 From: Zethson Date: Tue, 11 Feb 2020 10:30:10 +0100 Subject: [PATCH 041/374] [FEATURE] 1.8 TEMPLATE --- .gitattributes | 1 + .github/CONTRIBUTING.md | 57 +++ .github/ISSUE_TEMPLATE/bug_report.md | 42 ++ .github/ISSUE_TEMPLATE/feature_request.md | 24 ++ .github/PULL_REQUEST_TEMPLATE.md | 19 + .github/markdownlint.yml | 5 + .github/workflows/branch.yml | 16 + .github/workflows/ci.yml | 29 ++ .github/workflows/linting.yml | 41 ++ .gitignore | 7 + .travis.yml | 47 +++ CHANGELOG.md | 16 + CODE_OF_CONDUCT.md | 46 +++ Dockerfile | 13 + LICENSE | 21 + README.md | 73 ++++ assets/email_template.html | 54 +++ assets/email_template.txt | 40 ++ assets/multiqc_config.yaml | 9 + assets/nf-core-proteomicslfq_logo.png | Bin 0 -> 11668 bytes assets/sendmail_template.txt | 53 +++ bin/markdown_to_html.r | 51 +++ bin/scrape_software_versions.py | 52 +++ conf/base.config | 51 +++ conf/igenomes.config | 420 ++++++++++++++++++++ docs/README.md | 12 + docs/images/nf-core-proteomicslfq_logo.png | Bin 0 -> 20821 bytes docs/output.md | 43 +++ docs/usage.md | 328 ++++++++++++++++ environment.yml | 14 + main.nf | 424 +++++++++++++++++++++ nextflow.config | 147 +++++++ 32 files changed, 2155 insertions(+) create mode 100644 .gitattributes create mode 100644 .github/CONTRIBUTING.md create mode 100644 .github/ISSUE_TEMPLATE/bug_report.md create mode 100644 .github/ISSUE_TEMPLATE/feature_request.md create mode 100644 .github/PULL_REQUEST_TEMPLATE.md create mode 100644 .github/markdownlint.yml create mode 100644 .github/workflows/branch.yml create mode 100644 .github/workflows/ci.yml create mode 100644 .github/workflows/linting.yml create mode 100644 .gitignore create mode 100644 .travis.yml create mode 100644 CHANGELOG.md create mode 100644 CODE_OF_CONDUCT.md create mode 100644 Dockerfile create mode 100644 LICENSE create mode 100644 README.md create mode 100644 assets/email_template.html create mode 100644 assets/email_template.txt create mode 100644 assets/multiqc_config.yaml create mode 100644 assets/nf-core-proteomicslfq_logo.png create mode 100644 assets/sendmail_template.txt create mode 100755 bin/markdown_to_html.r create mode 100755 bin/scrape_software_versions.py create mode 100644 conf/base.config create mode 100644 conf/igenomes.config create mode 100644 docs/README.md create mode 100644 docs/images/nf-core-proteomicslfq_logo.png create mode 100644 docs/output.md create mode 100644 docs/usage.md create mode 100644 environment.yml create mode 100644 main.nf create mode 100644 nextflow.config diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..7fe5500 --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +*.config linguist-language=nextflow diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 0000000..0d2da9a --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,57 @@ +# nf-core/proteomicslfq: Contributing Guidelines + +Hi there! +Many thanks for taking an interest in improving nf-core/proteomicslfq. + +We try to manage the required tasks for nf-core/proteomicslfq using GitHub issues, you probably came to this page when creating one. +Please use the pre-filled template to save time. + +However, don't be put off by this template - other more general issues and suggestions are welcome! +Contributions to the code are even more welcome ;) + +> If you need help using or modifying nf-core/proteomicslfq then the best place to ask is on the nf-core Slack [#proteomicslfq](https://nfcore.slack.com/channels/proteomicslfq) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Contribution workflow + +If you'd like to write some code for nf-core/proteomicslfq, the standard workflow is as follows: + +1. Check that there isn't already an issue about your idea in the [nf-core/proteomicslfq issues](https://github.com/nf-core/proteomicslfq/issues) to avoid duplicating work + * If there isn't one already, please create one so that others know you're working on this +2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/proteomicslfq repository](https://github.com/nf-core/proteomicslfq) to your GitHub account +3. Make the necessary changes / additions within your forked repository +4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged + +If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). + +## Tests + +When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests. +Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then. + +There are typically two types of tests that run: + +### Lint Tests + +`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. + +If any failures or warnings are encountered, please follow the listed URL for more documentation. + +### Pipeline Tests + +Each `nf-core` pipeline should be set up with a minimal set of test-data. +`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. +If there are any failures then the automated tests fail. +These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code. + +## Patch + +: warning: Only in the unlikely and regretful event of a release happening with a bug. + +* On your own fork, make a new branch `patch` based on `upstream/master`. +* Fix the bug, and bump version (X.Y.Z+1). +* A PR should be made on `master` from patch to directly this particular bug. + +## Getting help + +For further information/help, please consult the [nf-core/proteomicslfq documentation](https://nf-co.re/nf-core/proteomicslfq/docs) and don't hesitate to get in touch on the nf-core Slack [#proteomicslfq](https://nfcore.slack.com/channels/proteomicslfq) channel ([join our Slack here](https://nf-co.re/join/slack)). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..fc196f0 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,42 @@ +# nf-core/proteomicslfq bug report + +Hi there! + +Thanks for telling us about a problem with the pipeline. +Please delete this text and anything that's not relevant from the template below: + +## Describe the bug + +A clear and concise description of what the bug is. + +## Steps to reproduce + +Steps to reproduce the behaviour: + +1. Command line: `nextflow run ...` +2. See error: _Please provide your error message_ + +## Expected behaviour + +A clear and concise description of what you expected to happen. + +## System + +- Hardware: +- Executor: +- OS: +- Version + +## Nextflow Installation + +- Version: + +## Container engine + +- Engine: +- version: +- Image tag: + +## Additional context + +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000..5b266e5 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,24 @@ +# nf-core/proteomicslfq feature request + +Hi there! + +Thanks for suggesting a new feature for the pipeline! +Please delete this text and anything that's not relevant from the template below: + +## Is your feature request related to a problem? Please describe + +A clear and concise description of what the problem is. + +Ex. I'm always frustrated when [...] + +## Describe the solution you'd like + +A clear and concise description of what you want to happen. + +## Describe alternatives you've considered + +A clear and concise description of any alternative solutions or features you've considered. + +## Additional context + +Add any other context about the feature request here. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..70c10fd --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,19 @@ +# nf-core/proteomicslfq pull request + +Many thanks for contributing to nf-core/proteomicslfq! + +Please fill in the appropriate checklist below (delete whatever is not relevant). +These are the most common things requested on pull requests (PRs). + +## PR checklist + +- [ ] This comment contains a description of changes (with reason) +- [ ] If you've fixed a bug or added code that should be tested, add tests! +- [ ] If necessary, also make a PR on the [nf-core/proteomicslfq branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/proteomicslfq) +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Documentation in `docs` is updated +- [ ] `CHANGELOG.md` is updated +- [ ] `README.md` is updated + +**Learn more about contributing:** [CONTRIBUTING.md](https://github.com/nf-core/proteomicslfq/tree/master/.github/CONTRIBUTING.md) \ No newline at end of file diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml new file mode 100644 index 0000000..96b12a7 --- /dev/null +++ b/.github/markdownlint.yml @@ -0,0 +1,5 @@ +# Markdownlint configuration file +default: true, +line-length: false +no-duplicate-header: + siblings_only: true diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml new file mode 100644 index 0000000..b907a5c --- /dev/null +++ b/.github/workflows/branch.yml @@ -0,0 +1,16 @@ +name: nf-core branch protection +# This workflow is triggered on PRs to master branch on the repository +# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` +on: + pull_request: + branches: + - master + +jobs: + test: + runs-on: ubuntu-18.04 + steps: + # PRs are only ok if coming from an nf-core `dev` branch or a fork `patch` branch + - name: Check PRs + run: | + { [[ $(git remote get-url origin) == *nf-core/proteomicslfq ]] && [[ ${GITHUB_HEAD_REF} = "dev" ]]; } || [[ ${GITHUB_HEAD_REF} == "patch" ]] diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..704d792 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,29 @@ +name: nf-core CI +# This workflow is triggered on pushes and PRs to the repository. +# It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: [push, pull_request] + +jobs: + test: + env: + NXF_VER: ${{ matrix.nxf_ver }} + NXF_ANSI_LOG: false + runs-on: ubuntu-latest + strategy: + matrix: + # Nextflow versions: check pipeline minimum and current latest + nxf_ver: ['19.10.0', ''] + steps: + - uses: actions/checkout@v2 + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + - name: Pull docker image + run: | + docker pull nfcore/proteomicslfq:dev && docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + - name: Run pipeline with test data + run: | + # TODO nf-core: You can customise CI pipeline run tests as required + # (eg. adding multiple test runs with different parameters) + nextflow run ${GITHUB_WORKSPACE} -profile test,docker diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml new file mode 100644 index 0000000..7354dc7 --- /dev/null +++ b/.github/workflows/linting.yml @@ -0,0 +1,41 @@ +name: nf-core linting +# This workflow is triggered on pushes and PRs to the repository. +# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +on: [push, pull_request] + +jobs: + Markdown: + runs-on: ubuntu-18.04 + steps: + - uses: actions/checkout@v1 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install markdownlint + run: | + npm install -g markdownlint-cli + - name: Run Markdownlint + run: | + markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + nf-core: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + - uses: actions/setup-python@v1 + with: + python-version: '3.6' + architecture: 'x64' + - name: Install pip + run: | + sudo apt install python3-pip + pip install --upgrade pip + - name: Install nf-core tools + run: | + pip install nf-core + - name: Run nf-core lint + run: | + nf-core lint ${GITHUB_WORKSPACE} diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..0189a44 --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +.nextflow* +work/ +data/ +results/ +.DS_Store +test* +*.pyc diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..9a49c43 --- /dev/null +++ b/.travis.yml @@ -0,0 +1,47 @@ +sudo: required +language: python +jdk: openjdk8 +services: docker +python: '3.6' +cache: pip +matrix: + fast_finish: true + +before_install: + # PRs to master are only ok if coming from dev branch + - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ]) || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]' + # Pull the docker image first so the test doesn't wait for this + - docker pull nfcore/proteomicslfq:dev + # Fake the tag locally so that the pipeline runs properly + # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) + - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + +install: + # Install Nextflow + - mkdir /tmp/nextflow && cd /tmp/nextflow + - wget -qO- get.nextflow.io | bash + - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow + # Install nf-core/tools + - pip install --upgrade pip + - pip install nf-core + # Reset + - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests + # Install markdownlint-cli + - sudo apt-get install npm && npm install -g markdownlint-cli + +env: + # Tower token is to inspect runs on https://tower.nf + # Use public mailbox nf-core@mailinator.com to log in: https://www.mailinator.com/v3/index.jsp?zone=public&query=nf-core + # Specify a minimum NF version that should be tested and work + - NXF_VER='19.10.0' TOWER_ACCESS_TOKEN="1c1f493bc2703472d6f1b9f6fb9e9d117abab7b1" + # Plus: get the latest NF version and check that it works + - NXF_VER='' TOWER_ACCESS_TOKEN="1c1f493bc2703472d6f1b9f6fb9e9d117abab7b1" + + +script: + # Lint the pipeline code + - nf-core lint ${TRAVIS_BUILD_DIR} + # Lint the documentation + - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml + # Run the pipeline with the test profile + - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker -ansi-log false -name proteomicslfq-${TRAVIS_EVENT_TYPE}-${TRAVIS_PULL_REQUEST}-${TRAVIS_COMMIT:0:6}-test-description diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..9bbc5e7 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,16 @@ +# nf-core/proteomicslfq: Changelog + +The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). + +## v1.0dev - [date] + +Initial release of nf-core/proteomicslfq, created with the [nf-core](http://nf-co.re/) template. + +### `Added` + +### `Fixed` + +### `Dependencies` + +### `Deprecated` diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..cf930c8 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,46 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] + +[homepage]: http://contributor-covenant.org +[version]: http://contributor-covenant.org/version/1/4/ diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..aeee070 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,13 @@ +FROM nfcore/base:1.8 +LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" \ + description="Docker image containing all software requirements for the nf-core/proteomicslfq pipeline" + +# Install the conda environment +COPY environment.yml / +RUN conda env create -f /environment.yml && conda clean -a + +# Add conda installation dir to PATH (instead of doing 'conda activate') +ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH + +# Dump the details of the installed packages to a file for posterity +RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..c14b073 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..93974e4 --- /dev/null +++ b/README.md @@ -0,0 +1,73 @@ +# ![nf-core/proteomicslfq](docs/images/nf-core-proteomicslfq_logo.png) + +**Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.**. + +[![Build Status](https://travis-ci.com/nf-core/proteomicslfq.svg?branch=master)](https://travis-ci.com/nf-core/proteomicslfq) +[![GitHub Actions CI Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) +[![GitHub Actions Linting Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) + +[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) +[![Docker](https://img.shields.io/docker/automated/nfcore/proteomicslfq.svg)](https://hub.docker.com/r/nfcore/proteomicslfq) + +## Introduction + +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. + +## Quick Start + +i. Install [`nextflow`](https://nf-co.re/usage/installation) + +ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) + +iii. Download the pipeline and test it on a minimal dataset with a single command + +```bash +nextflow run nf-core/proteomicslfq -profile test, +``` + +> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile institute` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + +iv. Start running your own analysis! + + + +```bash +nextflow run nf-core/proteomicslfq -profile --reads '*_R{1,2}.fastq.gz' --genome GRCh37 +``` + +See [usage docs](docs/usage.md) for all of the available options when running the pipeline. + +## Documentation + +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline, found in the `docs/` directory: + +1. [Installation](https://nf-co.re/usage/installation) +2. Pipeline configuration + * [Local installation](https://nf-co.re/usage/local_installation) + * [Adding your own system config](https://nf-co.re/usage/adding_own_config) + * [Reference genomes](https://nf-co.re/usage/reference_genomes) +3. [Running the pipeline](docs/usage.md) +4. [Output and how to interpret the results](docs/output.md) +5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) + + + +## Credits + +nf-core/proteomicslfq was originally written by Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg. + +## Contributions and Support + +If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). + +For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/proteomicslfq) (you can join with [this invite](https://nf-co.re/join/slack)). + +## Citation + + + + +You can cite the `nf-core` pre-print as follows: + +> Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). diff --git a/assets/email_template.html b/assets/email_template.html new file mode 100644 index 0000000..ba4fb4c --- /dev/null +++ b/assets/email_template.html @@ -0,0 +1,54 @@ + + + + + + + + + nf-core/proteomicslfq Pipeline Report + + +
+ + + +

nf-core/proteomicslfq v${version}

+

Run Name: $runName

+ +<% if (!success){ + out << """ +
+

nf-core/proteomicslfq execution completed unsuccessfully!

+

The exit status of the task that caused the workflow execution to fail was: $exitStatus.

+

The full error message was:

+
${errorReport}
+
+ """ +} else { + out << """ +
+ nf-core/proteomicslfq execution completed successfully! +
+ """ +} +%> + +

The workflow was completed at $dateComplete (duration: $duration)

+

The command used to launch the workflow was as follows:

+
$commandLine
+ +

Pipeline Configuration:

+ + + <% out << summary.collect{ k,v -> "" }.join("\n") %> + +
$k
$v
+ +

nf-core/proteomicslfq

+

https://github.com/nf-core/proteomicslfq

+ +
+ + + diff --git a/assets/email_template.txt b/assets/email_template.txt new file mode 100644 index 0000000..95765b1 --- /dev/null +++ b/assets/email_template.txt @@ -0,0 +1,40 @@ +---------------------------------------------------- + ,--./,-. + ___ __ __ __ ___ /,-._.--~\\ + |\\ | |__ __ / ` / \\ |__) |__ } { + | \\| | \\__, \\__/ | \\ |___ \\`-._,-`-, + `._,._,' + nf-core/proteomicslfq v${version} +---------------------------------------------------- + +Run Name: $runName + +<% if (success){ + out << "## nf-core/proteomicslfq execution completed successfully! ##" +} else { + out << """#################################################### +## nf-core/proteomicslfq execution completed unsuccessfully! ## +#################################################### +The exit status of the task that caused the workflow execution to fail was: $exitStatus. +The full error message was: + +${errorReport} +""" +} %> + + +The workflow was completed at $dateComplete (duration: $duration) + +The command used to launch the workflow was as follows: + + $commandLine + + + +Pipeline Configuration: +----------------------- +<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %> + +-- +nf-core/proteomicslfq +https://github.com/nf-core/proteomicslfq diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml new file mode 100644 index 0000000..5cc6771 --- /dev/null +++ b/assets/multiqc_config.yaml @@ -0,0 +1,9 @@ +report_comment: > + This report has been generated by the nf-core/proteomicslfq + analysis pipeline. For information about how to interpret these results, please see the + documentation. +report_section_order: + nf-core/proteomicslfq-software-versions: + order: -1000 + +export_plots: true diff --git a/assets/nf-core-proteomicslfq_logo.png b/assets/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..b7651c75ac5dc3298f6c264a3d42e2ce252f6499 GIT binary patch literal 11668 zcmb_?WmgaUJ^^?&GYs{+{_!AUMnFxti;oQAr{j~0i;ALNvbjWm zhNh$piH=0SD$Es0*yDxm2l+w&bJ^@&CKDAElSGr0#g>dW8m9On-h&=siA5fki9})l zb=0yx>*V?E+HLx7;=%2H;^A(B_I|3ezFlZTb5UsHUPI`St;-}}*UQIC-!$BN(GZ4Vucu zq7kVqiL0PXRMj?q16Yi>*}NthYdYozl7Y$OYv(XJOzE@JFxxu-NWnoL z387C%dIh5hmvuN?>KvBnMSb(0@#OIFu+9H8;rJ-&BMesXvx1s)ZHOndFVuTssyKs6 zG=+-xm%C*$quRn&*e2=_cN1NvbPV>Fo__+|f+z|aS3eamNV)@0E6Sj}-l1Qbi&}`C zQ+HpSknMfFpT^P@nQLhHenZ#l01IsMVR#nR!mfU1apKRNs%>K$cdeRDlL_7&7frIH z)6o1+RL*G6CBCp>E3OwSTdL1@n2z@#0uuUrM7AG|o0qdkMiV^sg6VE6Ht@5M*JkBe zp{Bgal!p1H&YrKtp3S|2S+y*ze&=+PEct`Gx3B|I8g+@ZL}*YY3-H_uP}S!bXU#)q zbY^87vODl-b4JKYz6*^2Pa z9wToQ+Yzbb9I>!H{Z}TJ)9}l`b>|1>bh3MuOhhe3I)WB+wlaav_XjP{8Yp~R^ss~| zgp#%+*r~#Z+V60G;0EUGl_Y?Wv{x9NG#A~7+WkZcW9zHWm&yX1&UEI(Jmkyy)2DBX z!+WO%L15_&v{(9ze}(-@O*+{T06zxvZlqV4F$;u)i=S1x@QVQCkjV;HywEk*4x-H+G4 zlE1v|o-Zlr%QzJ*4oGG3j))G zt8>fJko@Q=9-g^}CCg{X23F&^qdgJ6=>fx*lSdLWLcIhLdfMc~&uvojPNZp1nrrN< zb|Mn-#oU$|14a%=c70%`lP6 zCs__+(t+~1AC3vE90N;Th-abgi$55_BbTZM<+o3RRBQ?5 zC#BmX>zB{wd$V%I5+o}0O)?%H+^xFSP9FDshB|J($~$y>K}0%oFADg=U#m%yqdYp{ zax$!;gI-DZ;K5sF*R9(7&3L4~_hvjLAii|C+I}Cq(tk%={8O3dcB_ji|2D%>@A$b8 zOOE?h3ue|f;2$qB6;B5kTj5_x3t1i--{2`)k@%QcmN^Ru%&b|*)-O<_L~x35HATjEoiV}q(l zv(Jp3u1>_(``o&O!>DTj(2)6XQ5H4OAWmx&Kk8wKzX5Kz^>@2j|3Bq`V?3{)8|$3` zud{ldC#ywSQqAYkhu>*(gHGfn?nAF8AK_+*MYyd~}d}`?wKn^T>-rYxYzY zvP~64Vt3igQKSMJAO&-}3-LBXL}iUWwf$&H+*pg%Y=@lnzds)LZm={NHK-k2!TwSy zKQwBs*w+WJm6-y5xEr6>9&2gwI1p(Zbp-g#A4vmk&4*ErstL4B*k{Jh{2!HDi=MMp zIATf&>k%@*Z~1V93ID+P8B(|E7pEtp6{a$q{_SuKA$OSs%d}FbQ9j+F9_jiY>va0SE2vQSZ^= zx}J8^m?~d0R7q*Vdpiq4El?6w7d|DJDu|$GGYWSL8ZHrM%TUG=jurMM?IG|BL>XW7?wzO9Q9f~+|11|m6FV?2}7iC^viZgzUo%(lEnQr2FI7Yox zDx8;-9Q=xT0w0m>B5hSd)R+{1P_X>=$IVlQ4>ed-mDxG%mOd+*nSg1#CM(^=c6M^B zTA+JKIgzRG1wB@~)Fm(%+=h+_`UbdzpZ48F1!@FleB@VXp4B!BMbZsbyo=SGuNWhL z-$FWFs{_m51ZL2Opsk>-VBj%oAiQB3<=vkGq+hR?uo5lgL}Fy+_jY}r4B*$tMzAFe zdaip$d*Bhs@ILQ(Z#%I!|1AUHk?)ANnU;1UKYn#4N_5-qvFZ7S?6goN4L41M$qU+Q zpB`oKLYQ1o6Dj=tjJAbm9WF&`#ouflp88F?vl3Dq-g$5Ey@Y^z2|(cao)t@kKOeWN zakrFU_sDqDCPA9u(rq=W_bVpl$sDslI*qiArRx6VSC7r=*s>`WZ z-(UMXF2uK8Wyd>lEh#7jl_&xRN%x_)@qZB|GP$t62dgQ6Vc`tUNxJV63D*G2u@_uH zuG_iTPOHawhZ8-^#1(1Dd4H=miA=SMwHIk#Z@A77qy0soEQ*pG5=Cv(K*E0_so z`Q{?99EfUgzTssV8+8_ASUCWg`%?&BivgP$fMb=Qs?^(bT*^Z0g|&C>Pl?QAA*@+z$pvfcHs z>}EUo(#`QA>zuUqOy_Ev;)?X?1a?!biW}BERFpLJ6jmi=8Rh(s`n-p1@>5f^c-d|g zTvyYyQx@`;Hh)_~f?Bb@rbk%!w3zE*!=RC$<$;{0e#xMHM4lcoo~p$#%GNT>D^;PN zJOKX0f51WEj%vKZ5@S$}SX(7BV;JEcn+@8M9I`NAxeeNX=1;06R=3W%MF%{&!pwszYbDHJ`?pJHX40NK3pSQ)h(|rF-s)lU4=C*_JMy}swhZQe@M#GGmKIF9qP3^ zxhI;m>K@c)zHtQ)wjfuA)~^Xcvja7BQm8GoC`iKn*qcFHkNln0NvJC1P9B{Pp}a*b z9FCLRaZ`!yaFmD@RT;~^y=lQFC{NhsWi2RSzd2w+S6%U$_RGZna^8}?9x?=TkM zqV01Fhxt3Yv&hxwLM477N)&iimPI?!e9?84Mh*-Dz3;molDe9F!~IhGBf8s>7)J$g ze&MT(U8H>*wly%_zu2p?XJLg5`F`P*unf549o;$#=YU+yb8pisYx-10=A&H=c~2sK zqNOUGZ9J}~(e2-Y>w)u&rPz4N`8`>kHT4GKsaL%l9y@*h59Zo*SQFK(IUwQLw9&0g zg{O#arzlvYfEm|gB2UIDC#d*w3t&I2@pqXz7ADQEO7ZHo<(JtU@y1&_5r1yct$>(m zA&I@N?dh#6_Wb*5U_h6Q)#}_>{T=ZC=gBO=4SuQ^w7kBT%d7G&XP(WkbqxF zAAvp!o{iG0Bpk--_DsWejRN~4lcxnPeCpN1*wa*W=O#S~76O{BWK(Y3tH%3?w-e9# z-ti0{b72p@GwX=5{@#G#Srv3J@4eo_*3^9S35Y8RratXc z8%)fhzzK|f_IZ-yLB3hUxf3OsqLmOPz^G;%or>j&`w<3A#sU5tB-y z?Y2pb>o$nrOox~Gxw@u80H5ZrvV#1JIcALaID}_Ql5d$D#|K|13W+wF41L;51G6dO zf!f@|8+#eU>A_uulJB4N06S;mn1_GtUGlCIa=}&tpR5UK3sIJUT3k`ZWRM$$%Ypn? zDsV$Y{p$dcINb&F`v^BWJBI5-?6AH!E%Q(O5}$03l7V z<2p9?qD99&X6nB~9+bMB;3IluSu=LduQHYy9vpdQBW_LOP2>2^8d)f@+n)=&Qv`RL zqaRE37)r4{8v z3QoYW{T06FTB)z;`&VO=mU)$>ljV+re!s8r{iy}v#Jy>B*HuHuWjTM0p)S)*I>eRF zxJnFL*(zY&{4*6?uUb=;-)U&}mNtvEl3X42h)>FwechZ2j+i#0?x0g;E!~tMYYay> ziokCzKe9zEbdwKNnpox~10iIyO+qp;DkVeq4)ngqwA27e7lpR#N^Vudf2kyGNT(W|E=#ME7pcCyW{z|eFcGPS^&HWf~Z)?hj1 zTNi&aLUGxd9NbLFF|E{l8cVgD3R>5-bF(9sL>6e|e3uB7oiuC8Be#|>6*5g0`Q1cX z5yxjPy4H41qhNC3A+{y6=so*JA(>jdZlDCK-e>CQo8|+Lep?PTp-+r*!xz5HnD+hd zmxUwTVj#;mpJIl+IK|Oh%_Cz^_Pl7r@jrLAENh-$nf?8fHuGgHqzaGS_63_4@rj+r z&pF`An6dIhqI1nmZJFsv4vUy~?t0c?lwDD>lLp#tba3Yy8FpZdd-)!1wjf{M@*t`oxee;s z>YNm91$h2j8ks7!-<>aDX2magG$@$d4qd83uS)`ye+;vS zi2cK6WB6T~axRFN(q~IqC>$;Dz=7h5Mm@^SrSN<3H=>^OJ^T0KC$1g12d!kXEn8mL zWuwdCepe&BB8*n#0A!s+msX~eBp}x&kY2eAJFZ!a+v%XS*Sf<3)`=va(=VTjUthK= zQ!8-XG+zc|fU9zY=3}T}=dcqUsLNXY9-t63OdqEw5`=(K3E{g02lPVeuF%5gqofR& zsw6nzsLA)3=S?3G%lX${_;ohUR_DtFJ-u{+)nqrkmY%cAa!QD6F)J506`3}CBbY6! zziBo|qhh?LM$Na!8Bz~)`i`R4mD{AbmuBT*Nf}s zChLdKSEVu#mzgaE$a*)CR4vz%B6%E=mf*sL_1O$M4na5`Inr!nN;k68Cg5) zLYh%oFnz}cRD0@W?MoTL0uwfR^v!X)4tk!#5@6SDM#hcSf1QdOMkRgc41F$RkC)5# zWv@BS0Dwio{u<>t@Cb5&2rln|F?>t7veQ%wZ7-Cj8ZsoT?tIo&Xe?bXQV^WrWqD;k z#${ZYEK7nC9RjB_5LO+_phiCkE2%|Co5Jy@=7^|0>Ivu$U0KUVz!`Q{D4Ri+sk{L2 zesFLw-1v-NM;rCsf~%~KSTudHv(GsSONPp;2f-~GtyqvF5n$32A+J*m=d z+@EYR??FYTQEMan?lhD#&Zp;zwmE+L7X9Cc*@KXq-jcn9B;hNz6&a5-iwcq{qA4T8 zvU$RDYKE3=3@yA->S*P2y);oVH>oSo=@yIIiGuVn$Ee(gEbG;8!uzlm}cfww3&LP0>d+T{@W} zyRgxVeLH~}aN)iE!jMj-dgEjmes+om|Ec3Ahhd4B-R8f!Z||$;lzN0=Y1d6GEOly| zeseZ_-H06p<`RwTaJjN=X&%_!Fv1EB6sIzMwP9#uTaol&X&R#E8@@P}oLel~y?c#J z&IeMWp9M9)3uWmQ6ja(bi+Xx{?qda6mVLG>qNZiS-5^10B8KzVtxXdqKy0`JlPtQSE!Y@$UCdBT*t1K zCSs&1S&*;G3de|S!CvPAyu^j2$y)efP-Y2XWVxy`+egJ{I3`ftl7ndj%44Bf3ofAw z#Ps#**`}Ak08+8yHcxfm13xsEyhXOE&pJqFN!;r-P*Y-uK?=jK!pM|}Rl|vkcw~h> zm$tNZU>Tb6a)-3x2u?1`=6)aCNxk=n=M5wOv)p4?Rs&CHBBOb$PoL8JLy>XZIT7RX z%Q>eczxenI#Y|r~H+B(fzmq`e9;rxI6nk5hmiY6g%krb!oR~{mo|30M_?R=9p^R4K zx1qz%W1BzJvZ9fS7FTzEEK$fV_*fVRi{_GiX;Rf?WUXbD=sZU2`755m(~;)|33p%g zBJ!nfyJ9O@2UH8x_i?M{M(j*Uc2%rUn1n%v!hsQ@MOl4OF6ZD;3|&f4F%S0v?$Fj%BapBG)he_ep~CS=F% zvr$C=C)wYY74LE<*O$af0ccb(ar;~0$FAD8dNN^K-wlVc1dfB?D6BuDO6PkeYu}<( z5#u4f#bWnee#4BC9+?F-KiYt4E!1L3yGY?+Y@|#(axf$Oa$cyG>qvRyTBL{Goa@MG zBU3g{*ph*a@YCEx$kP05!?(|#nr>!7_1?y>1guqKWF2RJ8_8DuwISQJ7fPPAd-fA} zs~ITB9zv_#2R!_jZPufeE{>VvC9N}=G!*}d0?8I!@}BT29Hy$VQiY3>+g=SdKh&_J zcd?Ym%g0cHrL789xY$mz6!oC>59qbqSH*m&Tj%%;NqY&RfZdBTj163$?CzP<&G}Gu z@k){hC1XBiN2)nJQtFdLtwZfljvenszYs6G*8}Z_-B{?IQ{^bEZN%(9Qk6>|4)P2; zciMn0N(CbRvSu(pFZ7$`-ipeEgy#teKIqTTGen|buMbPRp`@4or_CfY`8U+neip-9 zr;)D}`x!Ag_*t*rmHH*(%mbVLzLQ<=MbDutkp^xl+@kh;|M%m(5pvC zNus~W@M42b?C;oR)7qI2%3A~RSs8f75{?+_^b*67s(cl-u8o_= z!!~1>mEX_7F@?}cw`mOY_&Znlm-Is<`ssWAs0~`cOI52LLL<|dJySV&>CnX0$37Uw znP$E>h<>N;u_cvZ9#9Q*h#Nj0B1DfF>`NhQOKv)!sp7TmQnM{@L8R}MUHXD=;R_|> zk2{!eo(G|qFLq*4WwI*ZaHRA!}@sZr*xG^-zCn5#P(he5J3h+uMQQAuJh%OElJz_A4 zQ&#~^RARE)6iIn97r#&v|Ri|#^`;pxshl^4aXacJzu7^;v zqDLli~Nn6%&C=KqJ}f7+Zs7eC*2 zqlm7iuP#~l3X0N`ZtNM15B^ACd?YidmBpVgE~HqI(}(<`N77i(wAEFV^u)I2`YTVk z3$H8$l&c*s1#d3KmSf{o@Xv}<~Z$?FcrXU7~>e@s82h}TWsQDK^ zI@yhve0?}PhW3Q*oad-{(q$`^N_xPhRn>_b)#c|A|8uiDLB=-$Zs52PYaV-i#koIq zxWviI?O7J|{*&FKzjA+&0`wu&CA=lHdJJ0@MRtLYC@yiJWl7%8-z)Z9y<%P8lCR)TEVi>U zpF-Bl_JWkjAt211s%nb;H)a?|%B;(2XiESYAYK~m-e@&)f8$5|gc=BKuk-_%{Bm4j zV9WpWjlmr`N;p%=%Z|^Qk1@fOh#_CPW-$*|YsRHS*ILDhJM27xWb6&*zIUrY?->C@ z-bnjDll^=JiC7tnCvE6yK%nRlOuCBi!B-MqQm-1w3-5mZ3raV)tbN56G-UoSzWO{p z*=;87UhZB}gc(hO>Y4xtbn6kB=xNrNFRF`EWa3b2;MQfoAnuYHF`A&lJue%m>7r*+ z)~=q^rAL38#n_JINrZ$dA*+W~#yM{-xB&B#;?*P1JIrBuXk^-o1#`Y^Nw5h6X!)wm6K%I;CIJwjXZEa@=wnt&ap7ZC_->H5Owu%%_gxsN1yI3yI(eqM%?0#3 zE+n$?r*#kgC%@f5I5rd9aaWVTwuOiMl$-EL^4l&Uj|nP6QWLZdd_bnCp)!R~I`CK& z*-p(^Az%N)IKcm(7hw4Xb8aj_g6_(@W?fErSkAGVDnEoyiN;%D;sxf0v@1*Wi3jrn z`b0FgNA~7#JpP((EF54ixTBKFCQ94UbEc5J#bsd=7}bsw#5zRZuW{%KN8pY?U|B3#U^qATA?PePEouO5AVVP3EA7zS znDpwh5tk+C>|i6ZTvESycgox1vpfT1mW*KFTV>SEp{^wNd^o+Q(%?Ewe%Rx7Hx5-n z@hAneGZq>}gc}1TNp3V3`4kn<0-`91p%xeO-742VZLujTmh7NJ7=uffIXW5qSrS^NkUq@rM0MFG+)&+Z<}Ed zsOx(}6JwkdDlhJ%dMp>0spKz=VMSN_fhW)gH)DrcKUfacJAi&ZU)qOIHG@>@sr{V%e$65iRv5OB$hX3bgS>1& z*6hnwcZ3^P-5d}fgdD|iF3MdxEc-IdKHtVE4J>`+zcnoe`ApnAzOE*2Ua~c#EAG){ zX!BjhAQ()0ZKFu=J4xzCjs^`RdA_U1+Ii0;O4D5vGN~dCCJ`p`4n0OL&@Kuig66ff z@-H$`ob;c(QQ56|q={1&Le%7z_)rh6v;pt^kSYGc~m1yRZ{XJ%+bM&pNO(MJTzhVza~qN3c!Kx8yv& zt*)67`&v&loSJ<0T5wj#gmrxvoBENkm zYp1y$pCU88LsJ_wj(uBt#){5-8~iF;QPEUfRk&gTd8Cv!6REfpW-^lixm%_FBlW~T z%yMjTE`VM5u$bLN!do4=256f*&Qy)FX5A|56oH5DVYjZGTzf5Tbd+|9fdnTP!Xbh= zG2h9$t8ubS9|cc*g(-oH;jo26(S-!#GTFsxjncu~LdT$zTshUtzbZl+TAQHi#--?= zi{MM^{yV{r*&xBaXq6B*Dl;h_$~`4dbP;>SN{q#qkjG$c^`B`dQ4G7*Y_>MQOlqgb zh|90;?e)cL-xN57l`M$McwPE{8`!0nMJ3d+%u%BbOP`x(rwMc!+exzQ?qv!GVXLA~ z@|_e)u-R=a0CUk6oK;wAb{0_pu4DBoWG>lzXk&^;oje+zPTFu!fkI-|S()0IEXUNpGnwVB`F+c;Q zQtQU5x@zf4)6?F6QlU2naleypIHez6Urd4(9Di!{EmW=vrKt@Rr@_xEqFHgmjV%!f z2g<~qTnO~NCJ&_=k#Woj&*2au$Q4bm;oU1JQFJK~PW)DNPO zTtD>VOdCJy`}reVd>p$t1R;}gISH<>GSUjAfK2Wcd?Q(ChA-yuZHiWm*#8L@Wo%6< zeJgf1jBxH_K28+8cv<_UQ3sU7;*~f)189SObQn&Cz9*nogRktf0X$@=0;U}E5 ziVI~&{{hLau&0Cfg8zozUc~fT(jzCR1^t#=n3%JQ-d2^6x%Zq<5CLIE2hBiI#I^MD z&U)gySG4*`yr11)2VlBOF##T%#v=JvNth-E<-Z%jwgtr`=|e;N;pqLiPZ4h(Tb1jf$!G zSn#VD!{gPSdV7;mvU>tDBzqPqi54#qwYAvSLLF^bu}k%$~s!TANQg25g|gBOK za^lj=wd{7&=q5L5@SU1tv7q4l*FtwN>a}=gjVR>|T%0PdP3Mq7gr|ojU-IYD&xbhF z1<_ek(h~efQn^HQ#>R~}Y+_*b+?R@^t2-X5xT03sX>%41z1OlHyO9Ge1Pf9ZQ_@-^dxGK46U#=3SJK?%okmA_7j;u2Vkkbxqz=Byd0%sI$=9V$Iv zi8A+=;F}#j`s(Xykb*R_WNsa9qWlBjZn*VBv*G}f877^J)gLCYJz^Z(cGYiV*~p(q zkW|Mx8;HG=6w+j!Uz{WeJiY=j8@mukCcs?z;Xk0Sn=hj5ZZQY&Nus&5P^AjZy3vz8 z4zwIVrCQwNk!ownE3oDWvCBzqMd_C&3xtV2&6Ow?)(z0mCk9+Mgos62rZG2FBJ2oA-RC!6lNG5RR#cQ~~x-UO~NglxZaf3=Q z)k*j*kw!UZvN;NGse{xJ(1myVm%_YGbIxAW5+D<9?u5JA53UWRXQ#58--{rXoUL@!{O;{u_%6xcPjwoQb1i zGnb&@vSqr@S^tu>(qyIP8`QQqB3UU-ADs_{DkE-AC&pacnSXtwR9FV#;P5myfw8%W z8mNNLm+4Be!}lg1%#fIKt`I78$InT6Vrcw21dPK=N@OWL0SF+MvL_a(0MCV|nqUa% zD+gyv*x}ShktIPIXHZM;ly48zOaFMIy?!ADz9bON7sN3c5o-TDaDp9J4n(+DJa$1( zTMkrR>nW7RHaJpkw+yf%0%Q@@6N{BnE^Ei|nln;934K4MuyZl=29tN`3~p_T`s{X`t{Br&!QF@hHK1CHi$;tr-lVTS{o!n>r zi}iGeZJ&hqQ-$5Xpx-%3ZMVuJpJs-WAh)m*K4vyQpHxaoWpmSpm_}!VJKWUKJ729q zxf=HB={~zm@ZVW0u-^a1h)kpT aCPK9^3 +Content-Disposition: inline; filename="nf-core-proteomicslfq_logo.png" + +<% out << new File("$baseDir/assets/nf-core-proteomicslfq_logo.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> + +<% +if (mqcFile){ +def mqcFileObj = new File("$mqcFile") +if (mqcFileObj.length() < mqcMaxSize){ +out << """ +--nfcoremimeboundary +Content-Type: text/html; name=\"multiqc_report\" +Content-Transfer-Encoding: base64 +Content-ID: +Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" + +${mqcFileObj. + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} +""" +}} +%> + +--nfcoremimeboundary-- diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r new file mode 100755 index 0000000..abe1335 --- /dev/null +++ b/bin/markdown_to_html.r @@ -0,0 +1,51 @@ +#!/usr/bin/env Rscript + +# Command line argument processing +args = commandArgs(trailingOnly=TRUE) +if (length(args) < 2) { + stop("Usage: markdown_to_html.r ", call.=FALSE) +} +markdown_fn <- args[1] +output_fn <- args[2] + +# Load / install packages +if (!require("markdown")) { + install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') + library("markdown") +} + +base_css_fn <- getOption("markdown.HTML.stylesheet") +base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) +custom_css <- paste(base_css, " +body { + padding: 3em; + margin-right: 350px; + max-width: 100%; +} +#toc { + position: fixed; + right: 20px; + width: 300px; + padding-top: 20px; + overflow: scroll; + height: calc(100% - 3em - 20px); +} +#toc_header { + font-size: 1.8em; + font-weight: bold; +} +#toc > ul { + padding-left: 0; + list-style-type: none; +} +#toc > ul ul { padding-left: 20px; } +#toc > ul > li > a { display: none; } +img { max-width: 800px; } +") + +markdownToHTML( + file = markdown_fn, + output = output_fn, + stylesheet = custom_css, + options = c('toc', 'base64_images', 'highlight_code') +) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py new file mode 100755 index 0000000..738d48b --- /dev/null +++ b/bin/scrape_software_versions.py @@ -0,0 +1,52 @@ +#!/usr/bin/env python +from __future__ import print_function +from collections import OrderedDict +import re + +# TODO nf-core: Add additional regexes for new tools in process get_software_versions +regexes = { + 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], + 'Nextflow': ['v_nextflow.txt', r"(\S+)"], + 'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"], + 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], +} +results = OrderedDict() +results['nf-core/proteomicslfq'] = 'N/A' +results['Nextflow'] = 'N/A' +results['FastQC'] = 'N/A' +results['MultiQC'] = 'N/A' + +# Search each file using its regex +for k, v in regexes.items(): + try: + with open(v[0]) as x: + versions = x.read() + match = re.search(v[1], versions) + if match: + results[k] = "v{}".format(match.group(1)) + except IOError: + results[k] = False + +# Remove software set to false in results +for k in list(results): + if not results[k]: + del(results[k]) + +# Dump to YAML +print (''' +id: 'software_versions' +section_name: 'nf-core/proteomicslfq Software Versions' +section_href: 'https://github.com/nf-core/proteomicslfq' +plot_type: 'html' +description: 'are collected at run time from the software output.' +data: | +
+''') +for k,v in results.items(): + print("
{}
{}
".format(k,v)) +print ("
") + +# Write out regexes as csv file: +with open('software_versions.csv', 'w') as f: + for k,v in results.items(): + f.write("{}\t{}\n".format(k,v)) diff --git a/conf/base.config b/conf/base.config new file mode 100644 index 0000000..11e9d90 --- /dev/null +++ b/conf/base.config @@ -0,0 +1,51 @@ +/* + * ------------------------------------------------- + * nf-core/proteomicslfq Nextflow base config file + * ------------------------------------------------- + * A 'blank slate' config file, appropriate for general + * use on most high performace compute environments. + * Assumes that all software is installed and available + * on the PATH. Runs in `local` mode - all jobs will be + * run on the logged in environment. + */ + +process { + + // TODO nf-core: Check the defaults for all processes + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 7.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' + + // Process-specific resource requirements + // NOTE - Only one of the labels below are used in the fastqc process in the main script. + // If possible, it would be nice to keep the same label naming convention when + // adding in your processes. + // TODO nf-core: Customise requirements for specific processes. + // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 14.GB * task.attempt, 'memory' ) } + time = { check_max( 6.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 42.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 84.GB * task.attempt, 'memory' ) } + time = { check_max( 10.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withName:get_software_versions { + cache = false + } + +} diff --git a/conf/igenomes.config b/conf/igenomes.config new file mode 100644 index 0000000..2de9242 --- /dev/null +++ b/conf/igenomes.config @@ -0,0 +1,420 @@ +/* + * ------------------------------------------------- + * Nextflow config file for iGenomes paths + * ------------------------------------------------- + * Defines reference genomes, using iGenome paths + * Can be used by any config that customises the base + * path using $params.igenomes_base / --igenomes_base + */ + +params { + // illumina iGenomes reference file paths + genomes { + 'GRCh37' { + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" + } + 'GRCm38' { + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" + } + 'TAIR10' { + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" + } + 'EB2' { + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" + } + 'UMD3.1' { + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" + } + 'WBcel235' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" + } + 'CanFam3.1' { + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" + } + 'GRCz10' { + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'BDGP6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" + } + 'EquCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" + } + 'EB1' { + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" + } + 'Galgal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Gm01' { + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" + } + 'Mmul_1' { + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" + } + 'IRGSP-1.0' { + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'CHIMP2.1.4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" + } + 'Rnor_6.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'R64-1-1' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" + } + 'EF2' { + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" + } + 'Sbi1' { + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" + } + 'Sscrofa10.2' { + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" + } + 'AGPv3' { + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" + } + } +} diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..010beba --- /dev/null +++ b/docs/README.md @@ -0,0 +1,12 @@ +# nf-core/proteomicslfq: Documentation + +The nf-core/proteomicslfq documentation is split into the following files: + +1. [Installation](https://nf-co.re/usage/installation) +2. Pipeline configuration + * [Local installation](https://nf-co.re/usage/local_installation) + * [Adding your own system config](https://nf-co.re/usage/adding_own_config) + * [Reference genomes](https://nf-co.re/usage/reference_genomes) +3. [Running the pipeline](usage.md) +4. [Output and how to interpret the results](output.md) +5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) diff --git a/docs/images/nf-core-proteomicslfq_logo.png b/docs/images/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..465ca25d947e6dce2b026d8c77f805aa39aceba5 GIT binary patch literal 20821 zcmdSB1ydYt*9M3aTnBduKDY);aCdiich}(V4#7jP!QI^h1ed`HfkA_1kmY&5-5;@4 zZB<=;`t-5euB&5Jm1QtcNl;;6U@+ulCDmbI;B?=w7e64qAF~$w6y8rLuCjU_Fff=S z|Lw4uESRJ)FjO#dl3z4^3(gAzLdmo;uR_|ps+JUoIKE5g)z*^?9J#x_YYhHFQ~E*lGY)`v9}OE8M^Earq;V9sxU4v?I4->zSMF&& z^u_H*S51I_B~)8S?Bw|y7q?Mzz>ls!E1f3=0ho+&rT>4}78w<55LNM|)adyc^-hlj~}Y*X81-il6JNmq|ngdbyy=b`68~he-FLglYxpgP zA^ZC!9{>#zSmA^n>UvUZQG)fuO*qD24$%wVMs|N@Q-(ZWub?zA@}@br!n{8;HDY!; zh^#z4%nu}U8AOwL3d7YRyU!nD->bKyx(%F{SZ1lfQ$rJ2sk1kk@=OHA?0*>Ah_n&tvN@9r*%al~Tj6*mDg2H*iYej6eWG z)Uh%om9FHx3~DDeR>A6yLOOv98DC@xAP$9vFXroeJ2Ierm8sXdVTog@)<*mG?rBec z1x(?e3dHqoYuTg;F1vVR^C7O65roc9Pfj7G%VUYc_rEk|@`bHxX8+cX<%DpK2|MiO z6Db??d#qwS2zM>I-1#xap^+TczYF6@NXHV*sk#O`?m@rIEVr}n5Vwmxno32UelJU+ zX*Ua6fxl64BKtSIQm2%qtCse^FUUAPSOzKS=sIRmv`|?j`u!XdhCd672zlEb-qZ(X zm-3J#4ywN!hm8_(M{v%cvIF7K-ebg$>`mB*jmSf|)X;2qzXD6N2l7xyps1D?`af6A zmSOpdaRe1OA%&Un$rOyG6>_aNKWz!swGvr)>vnkS)kLai(3hZUqveD=2@y^C9U?`o z-<4AMM>}r7boN|c%0B#qUaZMEyC+4zL7;U6T51`Ve%uEvI#g53f5ze$eS4l|#d0dX z^hFA_U2e6OEORl*H>)Dn$n{)J`J`*&;p`Lui zmpXM!LLksyKQ`W-qn}uG&4O*$gK`aU6V)E=1ZdYgHfqK#3TqD9V=K5Ajcq{A{+U6l zt{}O(O$6~cL8y4o<_W2J24nb{tWT5{gSM4iQl|tv5_jtn*>FawinkZTn8TPDY^bk) z!?fx$WYqE!nRPLCtwK6NTMbs(hHfMRh1grv!x)wK*wv?gP1%%uoR(~R4_Jd7G%_8l ze2jrsrIZRLM^b;b!l9XFU3BXY=ULj*<&3kEfZDHmo=b?1fycSD1FuL5VlkP_wP=)9 znoLVqF0;y&jN4;^pDU;Mvf(BE6XE+4O%bpuJWQ|z$P$#Q1|$k7=7bH}SergJw*HN- zWOQKCtvz%xcrIAhwmHZfJ^t^f55J}Ag{2S1%lk7KPzOmYX)bcVp87j{1B(U|%2?ci&T ze$c15{RC!Rl*V~16(ozffp+2#8I0&^v-XOuOt{~y;>puW84Kste^gm_pgn3FNXf-m zO`G21fvinn?oB$7#ERwfDZ~k1H&?m%!cBVGhgM{`E9}#>k~x03DGgpQPmAJ}sMiM&;l6 z!qmsFhl=wGPb#_&Hd1w(qr~B*X1TU40d|Mo$%Km3Ei28s2`5g|RkODgA=~`CSXrf> z^kT?Kls3Is1xSWVWN!P7Cm$IW#e!7k&p3nVoe0cyV~!$CPt0?ZS~WQ7nDpI z7dvl=uMdh-fq6L3+6C#Qj9wlD%g{9zJBgz|DWsw!Kd%OacWDCOW*;pTnw~P}2RT!{ zA}Pa`cZi~@l<%V+^3t{&-)>*FS47g!g3QzTj~4EB?%^K>^H9ez@HnHRng7-x-MsQx z%vkyEoSgte9yXzT{`r?8XZa{5uu|2O@9AQ?eS2@t?|>D>@!9BjHYQRi!H|7>9LD!p z}tuUDmtn-h`sLx zKdbNR_IFD`V-V%hi;+MW!z0y^pU^|@F~r}vU(9N>Ywy?G-n$vo_;=hYNP!fqQKV1z zy27`c5Nj3*H|oS)H&E}U!!6##yHs~ zq6x}j=~-0tDw0F2}-7NwxY*Ws^YChDC)+;jw(G7q}tTTpDh}aZejYhMd_#t4vlDnec%MfVX@gA{zW0Fhfou~ zFIjmx+3VZzY-0>U-T)N5$pTY(uY&LjyMmM*we&!FvF}?KzJq@ONiGoFbUEOc+K9MC z_YInfpgpasp&f2N=aZ#gW0ilZXbM*lg+^95A=0Z_lOoI@&!i07$WmxNvuELaVSV4r36s0^o z5$gku181Sxv{Raf+Hl42eG0NpPS;OGi{R5Z6iE2~&IwyPe7?!E65ZvuA-D253XZR!11~e;`BYFa zW6L7x1FVnnBQw}W!sWcY=pWG45gyO%Uqi{SuxAszULY)Qa$|xM>b_ z>=N#rKDpvvX^(K_lrL`}zY_D0S~I&+b4@$Thc_LDI;29YwKVRbVdF$0DI738*o`0B zicE^8g+m^r`A&-Wn+z&q!w(wBe3E@3l5Jre?cdf-&ECe9J4zw$@l1Hhm0cgIjNF8) zMeeO15^~?R>Yb|01W?ZAhZ&4ixoE5z8%bnDu|TO#3K;rUZEgk_c5(6>CTmFqCEzPutwa0pv@Pn!vvw#c-re z?C2Loc*f^viAOGo*R@U4gk$Ce+MYlLgyib!x5ambNB*qnSZ!K2;>*)V#iQN%Bbv1o!BG4lvxX5Z6byU$ z{Z`;MloBQcR=G+fg9R(~h@VHgGmZ=I5n%4+NZ(p%ZJWP)-`5a~uyjBk;ZpUr6Ngpu zpB5vd1m7dsQ*F5k+h4oUm(nv3pMW)pJu;d0#n%$-x+$B#@CuHq2Gd@Kh8OIb&Fp9I z1Ufdem6ayQE#cR~d!HvoJ{)Ww)VXl;`G$w9A@7@IGfd$Z5t0%g;kRc`akrW7rOmVQ z{nJ!8qDR!jZr2@j;t*p@tk zFfYa|-T>%AJIUo}-PA5^)u6ot{5WTNWXfN{e}WYc4D;Ls2p^G3d-x)ZccC%7HF{(b zjFSg$Larh2O=s-7NX_7e4s7J>LXNo@F@!}y1U2Jxc>Z?UK<<*V-eWDt?CN)NGY(= zo8{QP&X>km(>R|?!06C%@z)=m|H{QL$$jyI9CFWzLz+$|Ju;lj~nA;N$%-zuRIFgpk}}LHuC+WWWSKt3=UL$ROi8qqA>Pn>kY~ z+g{l~w?0r3kOaH4YEKQf81s=OmN>r764srfP9B+?+-;J#tJolP<|}!!R*RZFQfP(@ z``XhAK`(NoDBth+S|F4xZp_zWZTr@DbLZvN>(&O!Pu4>D-@Zx#N)Fj7>V_J3gqW7F zUznVWDdiHJRIj?|M?^+URY7`Cgz}J!7r=^r?n$j`P z#&%bluc>ucdU_sZ-fZps#0n%9^SM3;WtVz-cL+lf1XTc?v~{GFYS&wN_q?WJfYn7{ z5}c@VfI&Uigkqof6dFIdoXC~-#>Ah!&m?Njf8-ET5ys)hsjJCIMg=u}CXMB}{DyHy zW?=oN>ofsfGeei7!J9W}Ouet;){xm2&y8XH_QL$a5h?bZxzwQa7Zs7th0VQ9HDr?I zQNX`hP(RPi+}}s24flEKjWF8lv0>48R8()3w0q=S`SUi;Pq7n3ky9Bf>*~^$gsdLb zQazNRs$H6aaq zmq2;g!}yjD@)dB`o-RG`DW((4jl3{|5xHJl7)y@(`jYdgCNeX|yn}cxyg);jf`L^H zABfBY$gSp&XguiU8enh}13Sqk)w-zpeM^_l$YtNlO1W1(4Ct6tE7Je|k$1QiTlmar zD8BE3AZoFi_|-KDHz4q*Ql8oqvJ=rk1~=1X!Bh1l*#9cl=ykEewEbeP8J?D~Jh785 z`#}XoBp7Kpp24OU%|`iFtkpaH1aX{wtP*LHc=&G-#SXs(d(wsl7rknwWBJ=d!g`v# zpUL#mEx5=!Ks(62-`Kgw28yq~6L9^4c6ExSpr)c+)pJ`c#v7!6%T`sYS?-6beW+K5 zFF@S7mg7oN&)Ii9qcyOM1ZB9?Lwe&BLRPti#kyeHEa-^HCFNi27D@l;lr+p=Jh<_+ z(8L=FmTo1rTi(y5@c3P1A_Og|I|=UHXFA_Hev6f(MUEcDlE?%qb*(B?6d~mU@r8iP zIr56@=7alrCC~M@>iAW_>~7gxL97s0^(tmv4eTQ-wg=jkhNFq;ke)N_tTjIoSGY8) zmX&O1$>9DJ{J>;FF)$craZy{#`9u)ZF?TbD`*BVe?a6<2d|ET$5i5{5fS#zp;=ij|ZP`-CPe|)yo=GJvj zaq;WZHYb>NG&^BFHuMDc#rQ{Rwu+j4>ANo^)0u>gmU~U%AD_6j`>r=Fpo`f6|HqO} zun6>%OLD3K$1{2{?Xc>tJE7TNjV=-ON9^}(R_bA(KmVXWzg1om^eCkio2M6GC9!xUr(}5AhKiNFu<{8dgbD+xz&Zs?v^hQ z3IK1Mr%d|%8PGhX(ch)v8IAs~xDfYM4L~uJp%cKm_@h09{JaT>{OrD+&-xmnTS;Qz zxt#f1Ck#c1#Lii``N71e_uI}A%Tq5$F~^4{u97G6DaEU6UH`LjJ^qeUqM1i_$ku(K zV=pAtd~ECf8e-nm<+dwwmkeVj&w$}l6uqz19I#Og5>A!K9G3z^r;hJQqUJ`P&l z*qWO~YJ;>|+-$QX_xo-gC{xT~oEKy~r6s?}dun#-d>Y-XdVpq&-X%nH_IzWQ9rY-h z4^H<-tr&~5BRrJ(bQHAB|H@D`|MK{_I?sB*dG)j~3C`uWONJt2=$L@4Bak{+fK|Ml zx=3mQLtv(t){)-hA768<4*Q<_FseHvOt}4^jkZ_5*Bxy$F$81w3*B!~0e&-eql?GW z8>q=HN(%`9q)rAdqo1#^LQQYWxFJXt)?HLIUIOE729#!pr;tdmopSu@9XXD)@na|I zU9c|&Utu@}rGv*PW8l0&7y`+vxR|=)TIA0-xBP?PgT~lvldYf*Qy06{Y zu=Sr{fXRHiWlW%AWZn6_1qbc)o^L$GdHT2Wv4?IY>WO?z7IBTLsmDI)T-G9 zlNHmt&ab>W>=RlgO$TmSlm;sXE%mR`5|C!Avn{3iRspe3zT#mBJ!s=ah;Vw&^d(R`c`5)c_e&U zo-DEpcPZr;#lb>-HE~|}Np3N$t_Hf#I8oAh{y9VsWQcgDhY}Z!xDM7}YUV4RH)+1? zTLxOkEzY&DVrfstcdK?)VpCHz!i{Q32cCaB*)#+7j-Go}#EEMz$V;EZ(znF5SPd&z zs%Pk_OVZ4;t{E29wIgJ)qx;%&QYMszoENNhI4)qB*C1{f}&% zRs42LMK2$46GJe>>Edodv`nN_vXi?;EBzrR3?0J&E5p2c(@ast)cI1jp3t0?^s5|E zB=ElkZC`Y&Xr)TY#KdQV9})m}C%}z!>}!rBy7nn00Z6d8u3dC!;utb|pD;LGe(haj z*~-ke&)GT}K`5N0+|<>_aaqc%SO4I)6^u7IBW{qWYizjn9ZHo|+lH2t^fNj@w8?!WL11U}FnKU6kt3O#m)Ih5U(Xb)97VCqkR2$=|LcHfK&) z++=QOBo-VZOtI}hOb1ajS$p;$2EU&jj3|yn(m|-BgRKeENNKUv#NvjsY>H*QnJg)R zMv$V`{+<2#*2BlH*nf4Fk70IRq?5PrnY1Z>O zVI_DIIH1!XWio35GFWK^@yhA3UE(;n+mO)Nhi)rqn1wk;Smoo}B$v^i} z!$ijowkb6M6K>pJ#)k4hJ^FOE>pXn}|7G(2u)aU}38h-R!-k6}p$4D57RTK)TLCRy zAP?XOuo@w4(ZVj%i2}}kmTYeC}q5(R#-;lK@mb=R^F?cN?O7AwX>Qz2D_ndv0r`zV{S5v95r(1BMO`UMTyr- zCve`69{9<_*OHqlaW$WMIe(CaHBFWWKu2^pbzdFB+1k~zzok*<&a$i#)4AT#eTMN# z0;Sf5U{6rI;<~;XVSr&^$oJil_o>3Grf_R%Ar{EoLab{ep>ev;}jgqq& zRmK_g5;)v)BJ7Z$Z8mX7KkQzr^@#Xp{WKH-cBJO%Vk>4H-EBtZLFpF8Nc>puX7eiz#{zVuKxq2C@q!2D^{i z8Ci>-h|>v+o-|Q(q=i4xLQQ;90{kQpkEja-b)WUvM@_HNjA==1u@RDGRz41%TUj)8 z*WHYFHtH-ZkV%;~j8uHbSe+A&C6lL$i>VQ+!>g?G2~T)`yN~J%^bPYOov^Fuw`tpF z>S|k1;@_c~zThX1SwdAc5=n;})5u1-uPbB{ovBQ!T|NGQV%bQRTA)L=d4!zcnWF$i z;Cr2w#GC1rgr{)&Hy;or22Vvxg`(WwHIuiR>H_@&RXa^8j#R3kM`hFef-qY^ua7$> zz*RUZbp|3Idgnw?njSR7`$dKFZ=xAd^hvJLbQaJdwl;3@gFgN$4krM4N_J0!v=!mw zl6GgDb}0e?U2(F77|=`i%JLVDB)7DnKz`L5orYa8+jKv2qId1Qi6)<{y*ReV@vQZi z;;@lkyoO=ddWW$}ihhtwMX!b`8S+d`(=5io)cdkIr!Hxi1?>*Je3O}4f>!He-*%(p zI+y08dRHXDsG=l%kCPdnoB$g8{*MNs2@yy3oMCZ+e2}|m5N^g8F+#Oi{F!69ha8QP zL_pJq8+5&;5`P9SX;VeYU1WK>q(dw~h5n~=*?MUAjUOF^eBcsxxyjhxJ__}5wVUAzNY=>q-Ezzv?N9U|`9`%VXV`G3tvGzv`FEbcdC0s#xfgi9< zbfe;uC}hdVKYyW@ad5Q$Jd^Hzrc$K&M;yvoY2$P@qLndX*B1^T9{n00`%1_7%w(Tj zEJ;x-&Q{xW&+(;k;VV}sp}UjRjR}C&df{0&k_*SkVep?vLXa|YB?D0Qy$`+3d-w0t z<_z4p&;_@w4Wq88Yx{vH#TE6250d%&YC2oR(*>l>da2UNm#G2UM)!Ak8AW{fiMXs^ zrH_dh7PFD0!Y5t>4pl%%-VK<^h_WAOu%Z1c4qvx{r1C_P9!q*N(GuvaIt)DF+cFVY z3XQkB5a}d%SgSm6Wk=NPdm|eUqA2k|MFx{t6qi*O!Ic0Mk#`LLKNLb4NTuT4z?6)T z%`jQ1_@=^y)35R{aq8Sa5?<-Rh_GWdMjbr2#*fq{oR~jqx5IDnyeL;<6DbSJ3R zoVX$@HVb58Y}^C^kBpPs*)lY-l;h?^9mCf-L7%r&Q$=^VJBq!CD7XFW8p6p_Q4@riX;H2^K%tJgEeARJgOq zr}3x^GAl9a9MdNRYX?5c(oBf#r?CJtC1N+(a0pA7ywI{T;Nm(^7(S}2LLJr>VP+?hz*`l!MXBuco3Dmel z=&efasE!YliKPrIf}Q)(G>d(}a)p2!wk?bH%zc`{Qv$q2i@;(co6hDMY=N`B0^o;}oH{gxF4%3AZ!s{z(y>%UP)U z%kX{7jJQ-b%!B>~^{1@-_n}%L+(rh(I>e{%G=lrXIWTwx9~5yl{sowcmSA|8O5Fr{ zH6WtVF)ZqYC68`}FkkJMu19r{ty0C(M5hG!eBvOKyT)$cnj_dH*)<%0@y9tE6^bb4 zr(MbanVAiX3HO@#5AIQjGxT5f`(zNuSRX|xE4$Q1iBN=vNYWOvsJl}Y9F%(X(e;l; zCG(^zu#s^B=9`9LyONbUeXA>7_B{SsW1hJl*SFm&#@cpW2dL7NqD-cCg@UcpD>3|+ z3TO1b0| z$+)lbfjF`4u^Lru8PhWOp|FRYc_d^F6fD>BpPJjYIb5mtgX>P7tKbpW-Wf?5kacqiM}LK z;3T%8DU)ze8sTv&v!jZ8(oM5vaHJJx8=PR(WSk~Fbw(qIX>!AaxP<8;2G-44_{OO7 zBIiWvcKq?{3#1GB!tm>%W(0#ysC%x=SA=xirBFwLG@C~^D(NWCVc@~f%8dr^^KY7# z!E;Dcta4cX&w(#}Z^Qq=?NAo~64i<;-*L?KESl4pU3iiBDVpZ7wGfN60W&L%r3wGSUAuAP4MWEw^k4%vjU{5Ol<23v*GpzF1`h1k=-EKWB-Plr^e;qD9DgOSU_ zNujeq@gTRk2c7;c7oZRbzL$pffE2QOB2tBQ&Fo1HWGy^oP3G&pl|I!lD)YQZV@~g_ zUD)Iok zEe>s8(r@2_A0Ry~+LC{uUB49iGC5URP5fsF+(Wu@(|}DZj3J8jMEGuD2MFgL_@fym zJuVjxonewP5^C_}Z6KHld1uk2aK5Lw*I_E0vY#n8T{YvYZC*G)dA`*QRMr|f52`2z zgj0uE-k3@`DwuVrddqLtlmVrSWf&zY=HIJefS_nv*Pc~O>vv)|J-3Wb6B@3ad=RSz zCqf{FFK}P`O!`8m{`A2X?AvquqxNB8_ug$X?=KlEn!LL)&VAUZoOBtDNF8oyl$X z%kfMOwEBP7RX2_iPDFPB0|pV`^x(aJvU9a)U$y|8zA$PZL}d?Hl~#GK@C#-GbGivPqLhs?8|F&djo0An_H3S=v_$>!M^U>tw==b?x9l41LH@WyNOYw=Nt zfxmQ7Vy1*$do3{q2^TwG5&w_Gn|twRBidI9c!iW#a9EGnL`)EMwdP(KD6w&)b=PAm zd3)njIgsxDH(%UGl`I%`@jv&olGa{UDTJn&eyNyLxl1~5Au_CJ4~s-BKQVXRB^UBy zGxq$L(o)}DSt1lRWF6rrBe;b#J?iZ8LojeU7CP7=6W%Rz=Zht+*5<-0>JcJfc zKdySFlyY_AG9`Obnwz0E_jYr@3p7sgk){lOUgh-Z7`I3h87`mcfv*>PX-VDo$k*<_ z5t$535~kJIm2lw?X=`S`LLU?heb_zo?3Lkw)1^WI2{czeY3c2a&r z02jq@I$Q~iugzd2_w70GnylJTKph)6k01J6u-XZPu|f=Xz&Gi2@DZy{!7D;``fs~J zRS-^61-=4JqPiv=@<0-*%$uDK4;}lbTfzc%O15_(EMdcB%FWzpd-2es&Zh)K;mf>j zewUCl6L4LueCGY<{{YhTTzb1^W5pr@A9}F)9~{O(;0EwfMbvEFBbgnn#nttUyYOu; z#o`0xLgWa1*L7rZ@iwzDA5cwZavc8IsSyY`oE63(YEZIV-Zc|6Eh&P$b}d^xcxHt02XT}1~q2Z>F9 zo#XHdD1x0&YWZIdCQHg5wt|-R<~P%WxJ5kN{luIUzgbHCtqt5M-aB}?m`67%DB`(6 zy78_a_21)D%bdY~z4q@^F#0MF{q_!eWW)m`|EVT+mvs$TP1|-#!J)5;j}<$rx^vnX zzzyuC`)Tgp8R!o)59Z{u`+3%S%0#gFusJ*LZ$?w_8~xO_ub<=lc6;!By8+Vic83=g zKkZWUzvY07A_E>je(MU2Q#q#Ou++f&)WYt6Z}(es=Y^ZdHqI22L8E257pyT0aY!T6 zHlc%UpDaJC$^QqG&xoDcHziK)4zFx?jm^ zHGzf#+7G?wHHa0fp0>NtA%U|6Y~TcT_>9QY)En6wJI96=8f9 zfzH#InteMDPW~BP1gQ*N^yo>H04kT6si!3$KA;e&R8u0}o;i*SCl{qa(#_kZ!U{wE zQ+3U{)e?+DseJ-aH%1=*r_z^URE-vyRcU4&%?*Bk$LJuZ^}YpfG^Y0Quz;e?j&a|r zAB;*;z;_;kqNF#>+xyT3wo9yAl;xE#-Yfw2?HKqb8ocpfzS%Y=mSS+$0%5U>oAf=3T;znQJX%2XW zIBz%!dN^~lXz!4r-Dxxy?rC)pkCv{BIO&ouG_T1kG*)*yju<@Mj&{hb9g{VWE-_FQ zuzKg|Ra>y=_x&9g8pk!L_w*3tPIMdbvqzIr-UWRn4sD)DBSE`8O1$;>x4+`<9vX`v za!CG94BbBr7LS9flZ2Cd)0bHYAT(N4~q1;6J(sV(61tfX~LmOv4p~lS*D`A%LIx z7Qd{{uaT7K+U;FJVovTY!9aLpl?D-# z>G8(j=d-KIl8&Zef!MSg^RZ_1{y+VnDjI6*vXCk{d3+oP6C zqsIx9>||nJ(FcRef%8=P{1&~0J;vQgQ}=;k)63zL*VMq^B_8kFRmfH%MpbYlip>wk z$^Tbzw*UAKT{gvv1{MUSGpV}Jd0m2CPQWxrR6wJl;77=IF!;=VUMBPDZ@$3uZji6{ z3!dXf55EdKR3Cm`TrDsD|CF3%?e!vdC^DY6beS+n=X$+u4&W--ocj@QpV}lO-@brm znF{wCeo&A%AA9tetf{B8WHEK)e_cPLxnh8Rbeel({xD_wk92f< z9U43y1gXue>f}PDA!t1hIaWNIQ!GCwu31HsV z?WyTe-xY?;h`};^x@a!N3IC5=1WQQ8CSHN^Jzn*l-lW!5Ts^>4M_^7<^1oD&It@0C zu(V2+$Qc!hE{B;+>hFZJ|DRzpVh3u>&$$!maD#BJ!cEta1u%3Ab1aSG^#7x8;a5Nf zx7-3KAANSKtcqQb`jEE}$TC6=jsDLK+oNZ$Se82rAHHO?nW3hw{fZY$7h`) z`d^t@G!p)451Y|u!F+L)9=!A63q3AL#C!QEW!#?i+yXzakWx=bivKU)q)wgpf8~)W z8pfeWGjx97jv)}m=jS_-kVrnjXA1AG zoJfy0BVB*y`7L>$cNX!+5q5=k)l&h_@{eaF?KS3JZ(fCOvNQY&T#fDgW2D{03-(UF zMoREd)n8%*Qr6)*oTXgWo3vc<(--ohCB8$lbbFQ_@7&#v;V(i4wwJ-x#Fi=0+Qqwd z>60yuLdk*_R5=l>JFdw6X3{^bd(EVM?3*{a_Bx0x@MLg3WWFdM>>Pz_DJ=}jg+5A{ z3z%T|EjbVTEcw6i@nIoftli1Nd%iAdiWY;0^$?#?9s#(nIU_NpSOTcNydZ=e704=Y zG^4%`Y&#NHo&4uYV`gx?8yBWciU%21v1NpR73?@%gH-*g;tw$cLCG(jSs6*5%zcfI7=Dz9uIzO(v zY6HJt%4JApbq(Ys8Lg%9AeDYH!O443ol9@DlGU>fwgw znL@OzGWC7RD`Vg#>?V?`IG*gYno z?fGOt-XtxE-;rn82d)ZL!Jf~!lxM$UlT3~ANF3KMLSifP(`bi=m4&HL&#y1ht-2hB zA&0$v`};9n+IY+a=2e+%_0n#abYg%7B?)fMOIP{hlWVF(YQP>-+7lZET9Oa`UADg* z?ZfQyL~AnblJagy7#0<(7uoTM?Za2*@t%ySpq)7)ClUU!#?onI^P+$ZzX6-jkd*UV zDA7#;o9ySYq)`dyy#ACPb>0;v56)%YTLDGbov;M?^gy@BX-bO;9@0m%h{ISmEZlx^ zs0k^n!PAYcqM$@2?UX%j{b_o~s!1&Xiz)M37ydAbtvlrm?_rd?==m0>;>3rpY;EOV zVL=@JWKc&KDhwO1|4NKga&pbyqB&^Lf z;dj=%>qyi1MmGaKZu~MnY2jWxp!&g^qp*gQTC{%3N4JP0HWNPemq=AK4IGCn5gr@t zl|M&ne@MOB#RsQof=tUyviA1v1#`m$x>HWtw9z&Th@3nl!k5njJ{q=hh1qk? zAyZU!5_SZx$9Bl?FQNtJ9VeDMXn}-BoECp%FNnn47_tbcOR0UTgDe zjQ!?!BJ7+;XFP=d^M?e@Dg)wVM0QkLD(zVy@9x9I;yKs+;QInE1?{kLyXx%AHl zh!sdz`7GD%@yYIq_8Py~npqba5y}tU`1Qk7sp=^LO$)@_oHAzc=Ah>dEAwfBJYV~0 zuq*T)MOXnoo2uE+(Cx01KjI)vW$_=GO>{~DV&yVfB*I$TeqLYlV>+quq1EOen^_6g z-o!ZInRXV8i(M=B0S946>V2Zqdk}`Grz2-P&U~>R?)K#)kwVHv|9R!3!D38gw=pl7 z{lz>wk29hsRZ%M3a#N~WkOW2c!m^UK(6g62dw-wWjeN=cJb=$tv1xyVK5;7_@^jD| z_hjkuL~t&^6!nVRRC}n2Zt}Ec{zWWRj`n>H(6fOoynMxuu}&34fO~1RU~3Ie*s34% z=}BO6e(OiJ%Azq=7K=&wo%|sBVYinV32yYNvZ91~1!Jqnhcxqw?MIr$>~q|J?27(9Edx!{Q#N3F2siXLFnT7X#RVI5xx z@l4##r0`cg{-o4tu}WK;ckR>0GKZjVC0mlr(OBQ;P&pUtA1sM0%UJLH@^>jx2!hY$ z9oUw=>9Yxwi{3FO!y! zNfi|%YKa|8gv-UjfWy>XOgs?z-io3V(1M+YxmMmIH}1?{5A7LW?t&nT7=^ppCMEh* zb@whtPhhT7>C302H$06o`&xv0iJL!;(p(5w>YDf>hlWjItQuh_)bgjohh~)VKG^IA zmwa@Sn?7*4%&AgaLB-PbjVx6(Ws{9bZ}4=azJdyF$;BO{lfhyXqz=L>QXOEKv#@q} zejQyON)TCRlF65Pg}T2q@+%?BASVLf_Tk2|t*}5HA1?2xW4)N6=djOggo^`yJlcEr zsWuQK%dth~Ls4YE%3~TWfbzFshpyx;m;TRvV9jR!dV8Vov2LE zWfJ*-lWf|)Oo~HWgu2wRsbs2wVoxySj`Q9RrqNzl-KraXkw56eDWL)RCCQPGW<9Da z_C@D#z4@~b=ES!^;kmNDg7mYn{FD=4B`OsXza%U;JsO%VfbX#Sw z{jmlz!5>OvMsv-suL zp)idQH!YAi?j_A%8g9dV!@v4Yad_t|ET}Aj^ERj?9+y@o%^KO{`dcT$3108CR!KcxgdAeXC~*QLK~C+lh( z$q&=kbUP2$s2H6OUf6ES9R!DeV@#q-v{*-9x}PccM+YAV0=_$cVzc2aw>O@jX7x*ffHC6~0W8I6_dpp30{KQgg6Ut;V-G^Df;(}^?#?sK_)RCvy5 zr#t;=YoB#~|Gg=@IK++@EF`N`pxUY73j9QeX(bO{{q7%@0nyPG^qdUs3HPN8Z8NBs zIo?ah8X?lZhu{D)Vwe=X7nRF{b{a^-Iz4xL#PjUK2?#yfAq}I5S$zqneM@7w=^^DA z24pR|4*~V|f(T#i!4o_mt#TIYpQ@7=XiAUB-SSIqsKq876zX1_5W;{&sWiHS*tWhq@hwE03roRlrEo+EjI}2!8t7K27=Qxdo+whsL3Gh#Jx>Q=0n9s zhN`n;m_6Fr&$E_~Y7&cQpWT>rm(bmCKi>&`L(6_|crgGg-|u~UJDV)*CS*xICS*CB zLsMcW+LmxtR!jDVT{|a)ov#Si%9gZqN}Cs7>Ueii{Rb*X(!ZfCW~PH37iDK}z_D?z z%5OC)hjRc#TwCQ%toc$gQJD1zHH?-&#)~*woG?ru=TW8B!MA@wYY&kFVwQv+wzHwq z?xMyQ9YJ}o#`MYQN$Uh*TwSz&28AW_=<0Mj3)_{6oAV{}ikjt*_`5I#VPp0VJWx_M z1tEHSDn|MUp&1%@`Aix>%KTg%I~!#2Nd?-8?a7w4|(0){}Q~0ZB$W8 zOR?yhCnhAFlGm-O{u+3AxhW;>rcchRKV)lSWWgbJGtAAm%L2T==)^DSIIXYFOgA01 z@L-SH^VP79FJpATWqr#tB`E*j# ze{T9AO|a8r5$g=@=$Yi>8jm?v`O-pVa7Ct5>D?L8I*z|v0nEC6$yqTK^MI%j@Mh>v z|5=qoB_>@DCToA__{59fIN~oas?*TEaazZir;LBY-M+o;KYXJvV`4k^z*GgRL?2T1 z?*20Foqk{#O4)m7;*uU_auz{aw)xBUxo&b6ZS(w2T#}Z=PLvYfq9&V;SKlw>Ro1L3 z*RlHiL!}0Q)qL4}H+ADX(c6|>2SHVoI4okn@ae#MNCE$>Q}70*kZ;!=+~FJWremV{ z$n#LGSNGqFq}>m;B$y#(DZ2n)85U@BR~yl&ydTD><&ZkNx>Uv3q%zL|1Yj*sr`ZB? zYq2;gv?I*`8C3Gh>RSo9b|Q>W25)dbpY_mh_AUslCuy6>3Re--l}nh3H*^yxM!DW0 zS@w3LMg{dvG!l>p94AB+!?j}Lo@b?CJw-SB;OlW>jGc&Zax|do9`fbwXv;<_b_pah zWDRC?3u!J?%;J)HkFCfHy?YF-quZo|bo;Dl7!4@!zD|LP;OLV-v755l*TT1- z`Y4lkAw(y+xK(pm|Z&jh$XUnlvcHXA0V#TAy- zZGF3WvfqKaY|RTJYYNOgi*vs&U+Y!|9wWF?u+m2x@H0)ul%5*EOX+QdIZ=J_0!;=! zsh7=gK5hlFerBlhex@j0l-wG$_2oB3?TT6%)P&i+il~E4j~!p&lb>-CgWA&` zFl5yZW_~-5y616|ay8)-6^SEH&%D?V%1_)YC_G9HRBcsLRKzF|jTXl_(w*+}OlG$M z?XyT^K9u!IjXpJJQI^e2q$Jl1u+zAF04h3A+||Z+wZELo^FjL(Px<*B+->;;x6C^e zhjAgC!;RKJ(?`$@s8rUMDBcWvM8yNiCC;>gu9?O{+Ff13;}GRMsCS7I_Afn82@=_T z*DcbEMW)Tsjx{nVJz^of#SGC3+!-5!hbXViBsSx9UL-<`B1r;S^qtYhPd*X8LFM+F z{KGwKOa1L__+Lh%xJh3Hs0`PYwzA0TfsT(3=v}v%tFlyJo7}+?@4x#DcyKN%Valvc zew=YFYDD#tU8RUVkRo2RMu!j9`TEje6!#VP(#8hoQw_g%!N zc_Z-Q<=9e~0Q#x4$8ILx+5GfUpN)$PZ;QMCNI#X)1qHr=aZ;fnEB=?G|lnh*$ zf~i#tS1|{+{|Llf?U>uu?#eBOqv@ZuOg>Q&qMu?~(VUCM4B*m@^ziX&z88b=Il+ts zrFyXt2FO5wz%+Sr|HC6^6&crKIkTwC2C!rWm!AVGpDZSa;p$CCEL5LVlsdDZ$6z=v zftE${fY&=I8yzRdPfiBqlb!|BD63fe4az&bEuUlq)j!wRk47Yg!bbK6`KQmD*v@#; zt#5jRST+HJ{yCZ}VfvpQD@> zC(v^leM{SkO+gf`tEqOBVZxyNh_ptN_C_+(CIaoZTSux?dn33zY6g3S8i2oegYi<8 zhKAdiB_O(OV;?Z!Z?g_P-%;VgXQwNDn@C;NC)^0*nc+k5)HPnn$q>8`<-_if5Z2v^ zLba1kxl+ZxYU!a|=}so+D|oKCWKO*!$AKVm@8DlD05>!DP8l2je z-wIUz#I0Lp{VA_c7~?wjHH`i16SdySXv%9@NRpKlBk)5|knD^Z7?dWAIJuJ$m+}({4~rIJhZ0Vw_@39UqJ2@ z7FR~xH2pLy6u)(Qu~X%U7ab0=A6L81n<<2|O01|59!vXBs%GRit@Ui)A!^H%ycBm3 zQzmq}MgRUkKh7~PvW zL=i}{=VvA~yqc$uCVJUij=LeO^bY`o4M!^K;x&0$gF zEW(}9hY!oCE!x93)OT%Guv0^(G3i4Lp%3@W#8U%@f*|-yr$szODrC-UilU+HBzaSK zF%V1}Ki>26jOUog45wtwJ>VRy}g!#VmkfrZN4qoVlF1PcVOkmjJq;<<_fR02ziw*A~F{0J4mMrp>!bM<7!bK zRD~y7iX6-Y6ZkDu;yW`s!E@x#GHvjS5NMqzM>T|HSCvDSPMNph*MRT$Hm7T~kQA96 zb51a2aF1hANo`)GW)7&*^Dl#@;MhvRB%NYk`1{%J!y7|bxZg!lGQU;Db8e=FU!6|c6LRlBJYlrPpI>brD>zF5n|D0H;eEw4kFct|gG$CU(5ADoi78TyF z4X6#9asL87O)X!hX$DK86!~H@JAM{D`l)w%iDIlc30XnCJy3JY1b*?+G0+;uUa$}I zPgUnf74wa{cN5WXfH!h!Gz5wQy%}L)GjUh^Z>BD75#$AlU32N^$1dC;VtP&H`h(h% zrO$30e>ZKhP`w--#PjS&awWU5<5~5yHK>B2{(b7dq9XX5(BU6W+PLW10aE2qZg)QA zs;0hEmvVD_Y$~}Gp8>(ZmUpXAQ)OqGpMH&eg=bgv9$sBwT{xNh(9i4pB3G`>V0SPB z%oD0oL0Wr2(3*))?VJ*r4z%F2su31;%X|sFX$Ek)Tjm?rbT%l{$(SpMn7$S1KI0Nr~MzqFonndX1u?rg-eU@6)%LtVJpX7fqkZbOqN=?Y9%QR9h vF}`pDEW2i!~dfxm6BCf3u{r8hhJ~{;}3mZW1V^}=ji_dE=qD# literal 0 HcmV?d00001 diff --git a/docs/output.md b/docs/output.md new file mode 100644 index 0000000..48f4268 --- /dev/null +++ b/docs/output.md @@ -0,0 +1,43 @@ +# nf-core/proteomicslfq: Output + +This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. + + + +## Pipeline overview + +The pipeline is built using [Nextflow](https://www.nextflow.io/) +and processes data using the following steps: + +* [FastQC](#fastqc) - read quality control +* [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline + +## FastQC + +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences. + +For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). + +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory. + +**Output directory: `results/fastqc`** + +* `sample_fastqc.html` + * FastQC report, containing quality metrics for your untrimmed raw fastq files +* `zips/sample_fastqc.zip` + * zip file containing the FastQC report, tab-delimited data file and plot images + +## MultiQC + +[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. + +The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. + +**Output directory: `results/multiqc`** + +* `Project_multiqc_report.html` + * MultiQC report - a standalone HTML file that can be viewed in your web browser +* `Project_multiqc_data/` + * Directory containing parsed statistics from the different tools used in the pipeline + +For more information about how to use MultiQC reports, see [http://multiqc.info](http://multiqc.info) diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..81d539a --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,328 @@ +# nf-core/proteomicslfq: Usage + +## Table of contents + +* [Table of contents](#table-of-contents) +* [Introduction](#introduction) +* [Running the pipeline](#running-the-pipeline) + * [Updating the pipeline](#updating-the-pipeline) + * [Reproducibility](#reproducibility) +* [Main arguments](#main-arguments) + * [`-profile`](#-profile) + * [`--reads`](#--reads) + * [`--single_end`](#--single_end) +* [Reference genomes](#reference-genomes) + * [`--genome` (using iGenomes)](#--genome-using-igenomes) + * [`--fasta`](#--fasta) + * [`--igenomes_ignore`](#--igenomes_ignore) +* [Job resources](#job-resources) + * [Automatic resubmission](#automatic-resubmission) + * [Custom resource requests](#custom-resource-requests) +* [AWS Batch specific parameters](#aws-batch-specific-parameters) + * [`--awsqueue`](#--awsqueue) + * [`--awsregion`](#--awsregion) + * [`--awscli`](#--awscli) +* [Other command line parameters](#other-command-line-parameters) + * [`--outdir`](#--outdir) + * [`--email`](#--email) + * [`--email_on_fail`](#--email_on_fail) + * [`--max_multiqc_email_size`](#--max_multiqc_email_size) + * [`-name`](#-name) + * [`-resume`](#-resume) + * [`-c`](#-c) + * [`--custom_config_version`](#--custom_config_version) + * [`--custom_config_base`](#--custom_config_base) + * [`--max_memory`](#--max_memory) + * [`--max_time`](#--max_time) + * [`--max_cpus`](#--max_cpus) + * [`--plaintext_email`](#--plaintext_email) + * [`--monochrome_logs`](#--monochrome_logs) + * [`--multiqc_config`](#--multiqc_config) + +## Introduction + +Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler. + +It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`): + +```bash +NXF_OPTS='-Xms1g -Xmx4g' +``` + + + +## Running the pipeline + +The typical command for running the pipeline is as follows: + +```bash +nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker +``` + +This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. + +Note that the pipeline will create the following files in your working directory: + +```bash +work # Directory containing the nextflow working files +results # Finished results (configurable, see below) +.nextflow_log # Log file from Nextflow +# Other nextflow hidden files, eg. history of pipeline runs and old logs. +``` + +### Updating the pipeline + +When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: + +```bash +nextflow pull nf-core/proteomicslfq +``` + +### Reproducibility + +It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. + +First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-core/proteomicslfq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. + +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. + +## Main arguments + +### `-profile` + +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. + +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. + +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. + +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. + +* `conda` + * A generic configuration profile to be used with [conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) +* `docker` + * A generic configuration profile to be used with [Docker](http://docker.com/) + * Pulls software from dockerhub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) +* `singularity` + * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) + * Pulls software from DockerHub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) +* `test` + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters + + + +### `--reads` + +Use this to specify the location of your input FastQ files. For example: + +```bash +--reads 'path/to/data/sample_*_{1,2}.fastq' +``` + +Please note the following requirements: + +1. The path must be enclosed in quotes +2. The path must have at least one `*` wildcard character +3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs. + +If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` + +### `--single_end` + +By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: + +```bash +--single_end --reads '*.fastq' +``` + +It is not possible to run a mixture of single-end and paired-end files in one run. + +## Reference genomes + +The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. + +### `--genome` (using iGenomes) + +There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. + +You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: + +* Human + * `--genome GRCh37` +* Mouse + * `--genome GRCm38` +* _Drosophila_ + * `--genome BDGP6` +* _S. cerevisiae_ + * `--genome 'R64-1-1'` + +> There are numerous others - check the config file for more. + +Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. + +The syntax for this reference configuration is as follows: + + + +```nextflow +params { + genomes { + 'GRCh37' { + fasta = '' // Used if no star index given + } + // Any number of additional genomes, key is used with --genome + } +} +``` + + + +### `--fasta` + +If you prefer, you can specify the full path to your reference genome when you run the pipeline: + +```bash +--fasta '[path to Fasta reference]' +``` + +### `--igenomes_ignore` + +Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. + +## Job resources + +### Automatic resubmission + +Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. + +### Custom resource requests + +Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. + +If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack). + +## AWS Batch specific parameters + +Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use [`-profile awsbatch`](https://github.com/nf-core/configs/blob/master/conf/awsbatch.config) and then specify all of the following parameters. + +### `--awsqueue` + +The JobQueue that you intend to use on AWS Batch. + +### `--awsregion` + +The AWS region in which to run your job. Default is set to `eu-west-1` but can be adjusted to your needs. + +### `--awscli` + +The [AWS CLI](https://www.nextflow.io/docs/latest/awscloud.html#aws-cli-installation) path in your custom AMI. Default: `/home/ec2-user/miniconda/bin/aws`. + +Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. + +## Other command line parameters + + + +### `--outdir` + +The output directory where the results will be saved. + +### `--email` + +Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. + +### `--email_on_fail` + +This works exactly as with `--email`, except emails are only sent if the workflow is not successful. + +### `--max_multiqc_email_size` + +Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB). + +### `-name` + +Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. + +This is used in the MultiQC report (if not default) and in the summary HTML / e-mail (always). + +**NB:** Single hyphen (core Nextflow option) + +### `-resume` + +Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. + +You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. + +**NB:** Single hyphen (core Nextflow option) + +### `-c` + +Specify the path to a specific config file (this is a core NextFlow command). + +**NB:** Single hyphen (core Nextflow option) + +Note - you can use this to override pipeline defaults. + +### `--custom_config_version` + +Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`. + +```bash +## Download and use config file with following git commid id +--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96 +``` + +### `--custom_config_base` + +If you're running offline, nextflow will not be able to fetch the institutional config files +from the internet. If you don't need them, then this is not a problem. If you do need them, +you should download the files from the repo and tell nextflow where to find them with the +`custom_config_base` option. For example: + +```bash +## Download and unzip the config files +cd /path/to/my/configs +wget https://github.com/nf-core/configs/archive/master.zip +unzip master.zip + +## Run the pipeline +cd /path/to/my/data +nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/ +``` + +> Note that the nf-core/tools helper package has a `download` command to download all required pipeline +> files + singularity containers + institutional configs in one go for you, to make this process easier. + +### `--max_memory` + +Use to set a top-limit for the default memory requirement for each process. +Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` + +### `--max_time` + +Use to set a top-limit for the default time requirement for each process. +Should be a string in the format integer-unit. eg. `--max_time '2.h'` + +### `--max_cpus` + +Use to set a top-limit for the default CPU requirement for each process. +Should be a string in the format integer-unit. eg. `--max_cpus 1` + +### `--plaintext_email` + +Set to receive plain-text e-mails instead of HTML formatted. + +### `--monochrome_logs` + +Set to disable colourful command line output and live life in monochrome. + +### `--multiqc_config` + +Specify a path to a custom MultiQC configuration file. diff --git a/environment.yml b/environment.yml new file mode 100644 index 0000000..ee31f95 --- /dev/null +++ b/environment.yml @@ -0,0 +1,14 @@ +# You can use this file to create a conda environment for this pipeline: +# conda env create -f environment.yml +name: nf-core-proteomicslfq-1.0dev +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - conda-forge::python=3.7.3 + # TODO nf-core: Add required software dependencies here + - bioconda::fastqc=0.11.8 + - bioconda::multiqc=1.7 + - conda-forge::r-markdown=1.1 + - conda-forge::r-base=3.6.1 diff --git a/main.nf b/main.nf new file mode 100644 index 0000000..a0c6d51 --- /dev/null +++ b/main.nf @@ -0,0 +1,424 @@ +#!/usr/bin/env nextflow +/* +======================================================================================== + nf-core/proteomicslfq +======================================================================================== + nf-core/proteomicslfq Analysis Pipeline. + #### Homepage / Documentation + https://github.com/nf-core/proteomicslfq +---------------------------------------------------------------------------------------- +*/ + +def helpMessage() { + // TODO nf-core: Add to this help message with new command line parameters + log.info nfcoreHeader() + log.info""" + + Usage: + + The typical command for running the pipeline is as follows: + + nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker + + Mandatory arguments: + --reads [file] Path to input data (must be surrounded with quotes) + -profile [str] Configuration profile to use. Can use multiple (comma separated) + Available: conda, docker, singularity, test, awsbatch and more + + Options: + --genome [str] Name of iGenomes reference + --single_end [bool] Specifies that the input is single-end reads + + References If not specified in the configuration file or you wish to overwrite any of the references + --fasta [file] Path to fasta reference + + Other options: + --outdir [file] The output directory where the results will be saved + --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits + --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful + --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) + -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic + + AWSBatch options: + --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch + --awsregion [str] The AWS Region for your AWS Batch job to run on + --awscli [str] Path to the AWS CLI tool + """.stripIndent() +} + +// Show help message +if (params.help) { + helpMessage() + exit 0 +} + +/* + * SET UP CONFIGURATION VARIABLES + */ + +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" +} + +// TODO nf-core: Add any reference files that are needed +// Configurable reference genomes +// +// NOTE - THIS IS NOT USED IN THIS PIPELINE, EXAMPLE ONLY +// If you want to use the channel below in a process, define the following: +// input: +// file fasta from ch_fasta +// +params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false +if (params.fasta) { ch_fasta = file(params.fasta, checkIfExists: true) } + +// Has the run name been specified by the user? +// this has the bonus effect of catching both -name and --name +custom_runName = params.name +if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + custom_runName = workflow.runName +} + +if (workflow.profile.contains('awsbatch')) { + // AWSBatch sanity checking + if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + // related: https://github.com/nextflow-io/nextflow/issues/813 + if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + // Prevent trace files to be stored on S3 since S3 does not support rolling files. + if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." +} + +// Stage config files +ch_multiqc_config = file(params.multiqc_config, checkIfExists: true) +ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) + +/* + * Create a channel for input read files + */ +if (params.readPaths) { + if (params.single_end) { + Channel + .from(params.readPaths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } + .into { ch_read_files_fastqc; ch_read_files_trimming } + } else { + Channel + .from(params.readPaths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true), file(row[1][1], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } + .into { ch_read_files_fastqc; ch_read_files_trimming } + } +} else { + Channel + .fromFilePairs(params.reads, size: params.single_end ? 1 : 2) + .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --single_end on the command line." } + .into { ch_read_files_fastqc; ch_read_files_trimming } +} + +// Header log info +log.info nfcoreHeader() +def summary = [:] +if (workflow.revision) summary['Pipeline Release'] = workflow.revision +summary['Run Name'] = custom_runName ?: workflow.runName +// TODO nf-core: Report custom parameters here +summary['Reads'] = params.reads +summary['Fasta Ref'] = params.fasta +summary['Data Type'] = params.single_end ? 'Single-End' : 'Paired-End' +summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +summary['Output dir'] = params.outdir +summary['Launch dir'] = workflow.launchDir +summary['Working dir'] = workflow.workDir +summary['Script dir'] = workflow.projectDir +summary['User'] = workflow.userName +if (workflow.profile.contains('awsbatch')) { + summary['AWS Region'] = params.awsregion + summary['AWS Queue'] = params.awsqueue + summary['AWS CLI'] = params.awscli +} +summary['Config Profile'] = workflow.profile +if (params.config_profile_description) summary['Config Description'] = params.config_profile_description +if (params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact +if (params.config_profile_url) summary['Config URL'] = params.config_profile_url +if (params.email || params.email_on_fail) { + summary['E-mail Address'] = params.email + summary['E-mail on failure'] = params.email_on_fail + summary['MultiQC maxsize'] = params.max_multiqc_email_size +} +log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") +log.info "-\033[2m--------------------------------------------------\033[0m-" + +// Check the hostnames against configured profiles +checkHostname() + +def create_workflow_summary(summary) { + def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') + yaml_file.text = """ + id: 'nf-core-proteomicslfq-summary' + description: " - this information is collected when the pipeline is started." + section_name: 'nf-core/proteomicslfq Workflow Summary' + section_href: 'https://github.com/nf-core/proteomicslfq' + plot_type: 'html' + data: | +
+${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} +
+ """.stripIndent() + + return yaml_file +} + +/* + * Parse software version numbers + */ +process get_software_versions { + publishDir "${params.outdir}/pipeline_info", mode: 'copy', + saveAs: { filename -> + if (filename.indexOf(".csv") > 0) filename + else null + } + + output: + file 'software_versions_mqc.yaml' into ch_software_versions_yaml + file "software_versions.csv" + + script: + // TODO nf-core: Get all tools to print their version number here + """ + echo $workflow.manifest.version > v_pipeline.txt + echo $workflow.nextflow.version > v_nextflow.txt + fastqc --version > v_fastqc.txt + multiqc --version > v_multiqc.txt + scrape_software_versions.py &> software_versions_mqc.yaml + """ +} + +/* + * STEP 1 - FastQC + */ +process fastqc { + tag "$name" + label 'process_medium' + publishDir "${params.outdir}/fastqc", mode: 'copy', + saveAs: { filename -> + filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename" + } + + input: + set val(name), file(reads) from ch_read_files_fastqc + + output: + file "*_fastqc.{zip,html}" into ch_fastqc_results + + script: + """ + fastqc --quiet --threads $task.cpus $reads + """ +} + +/* + * STEP 2 - MultiQC + */ +process multiqc { + publishDir "${params.outdir}/MultiQC", mode: 'copy' + + input: + file multiqc_config from ch_multiqc_config + // TODO nf-core: Add in log files from your new processes for MultiQC to find! + file ('fastqc/*') from ch_fastqc_results.collect().ifEmpty([]) + file ('software_versions/*') from ch_software_versions_yaml.collect() + file workflow_summary from create_workflow_summary(summary) + + output: + file "*multiqc_report.html" into ch_multiqc_report + file "*_data" + file "multiqc_plots" + + script: + rtitle = custom_runName ? "--title \"$custom_runName\"" : '' + rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + // TODO nf-core: Specify which MultiQC modules to use with -m for a faster run time + """ + multiqc -f $rtitle $rfilename --config $multiqc_config . + """ +} + +/* + * STEP 3 - Output Description HTML + */ +process output_documentation { + publishDir "${params.outdir}/pipeline_info", mode: 'copy' + + input: + file output_docs from ch_output_docs + + output: + file "results_description.html" + + script: + """ + markdown_to_html.r $output_docs results_description.html + """ +} + +/* + * Completion e-mail notification + */ +workflow.onComplete { + + // Set up the e-mail variables + def subject = "[nf-core/proteomicslfq] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[nf-core/proteomicslfq] FAILED: $workflow.runName" + } + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = custom_runName ?: workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary + email_fields['summary']['Date Started'] = workflow.start + email_fields['summary']['Date Completed'] = workflow.complete + email_fields['summary']['Pipeline script file path'] = workflow.scriptFile + email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision + email_fields['summary']['Nextflow Version'] = workflow.nextflow.version + email_fields['summary']['Nextflow Build'] = workflow.nextflow.build + email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + // TODO nf-core: If not using MultiQC, strip out this code (including params.max_multiqc_email_size) + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = ch_multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList) { + log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'multiqc', will use only one" + mqc_report = mqc_report[0] + } + } + } catch (all) { + log.warn "[nf-core/proteomicslfq] Could not attach MultiQC report to summary email" + } + + // Check if we are only sending emails on failure + email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$baseDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$baseDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] + def sf = new File("$baseDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + if (email_address) { + try { + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (sendmail)" + } catch (all) { + // Catch failures and try with plaintext + [ 'mail', '-s', subject, email_address ].execute() << email_txt + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (mail)" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File(output_d, "pipeline_report.html") + output_hf.withWriter { w -> w << email_html } + def output_tf = new File(output_d, "pipeline_report.txt") + output_tf.withWriter { w -> w << email_txt } + + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_red = params.monochrome_logs ? '' : "\033[0;31m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; + + if (workflow.stats.ignoredCount > 0 && workflow.success) { + log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" + log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" + log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" + } + + if (workflow.success) { + log.info "-${c_purple}[nf-core/proteomicslfq]${c_green} Pipeline completed successfully${c_reset}-" + } else { + checkHostname() + log.info "-${c_purple}[nf-core/proteomicslfq]${c_red} Pipeline completed with errors${c_reset}-" + } + +} + + +def nfcoreHeader() { + // Log colors ANSI codes + c_black = params.monochrome_logs ? '' : "\033[0;30m"; + c_blue = params.monochrome_logs ? '' : "\033[0;34m"; + c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; + c_dim = params.monochrome_logs ? '' : "\033[2m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; + c_white = params.monochrome_logs ? '' : "\033[0;37m"; + c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; + + return """ -${c_dim}--------------------------------------------------${c_reset}- + ${c_green},--.${c_black}/${c_green},-.${c_reset} + ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} + ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} + ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} + ${c_green}`._,._,\'${c_reset} + ${c_purple} nf-core/proteomicslfq v${workflow.manifest.version}${c_reset} + -${c_dim}--------------------------------------------------${c_reset}- + """.stripIndent() +} + +def checkHostname() { + def c_reset = params.monochrome_logs ? '' : "\033[0m" + def c_white = params.monochrome_logs ? '' : "\033[0;37m" + def c_red = params.monochrome_logs ? '' : "\033[1;91m" + def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" + if (params.hostnames) { + def hostname = "hostname".execute().text.trim() + params.hostnames.each { prof, hnames -> + hnames.each { hname -> + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { + log.error "====================================================\n" + + " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + + " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + + " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + + "============================================================" + } + } + } + } +} diff --git a/nextflow.config b/nextflow.config new file mode 100644 index 0000000..83c89c0 --- /dev/null +++ b/nextflow.config @@ -0,0 +1,147 @@ +/* + * ------------------------------------------------- + * nf-core/proteomicslfq Nextflow config file + * ------------------------------------------------- + * Default config options for all environments. + */ + +// Global default params, used in configs +params { + + // Workflow flags + // TODO nf-core: Specify your pipeline's command line flags + genome = false + reads = "data/*{1,2}.fastq.gz" + single_end = false + outdir = './results' + + // Boilerplate options + name = false + multiqc_config = "$baseDir/assets/multiqc_config.yaml" + email = false + email_on_fail = false + max_multiqc_email_size = 25.MB + plaintext_email = false + monochrome_logs = false + help = false + igenomes_base = 's3://ngi-igenomes/igenomes/' + tracedir = "${params.outdir}/pipeline_info" + igenomes_ignore = false + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = false + config_profile_description = false + config_profile_contact = false + config_profile_url = false + + // Defaults only, expecting to be overwritten + max_memory = 128.GB + max_cpus = 16 + max_time = 240.h + +} + +// Container slug. Stable releases should specify release tag! +// Developmental code should specify :dev +process.container = 'nfcore/proteomicslfq:dev' + +// Load base.config by default for all pipelines +includeConfig 'conf/base.config' + +// Load nf-core custom profiles from different Institutions +try { + includeConfig "${params.custom_config_base}/nfcore_custom.config" +} catch (Exception e) { + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") +} + +profiles { + conda { process.conda = "$baseDir/environment.yml" } + debug { process.beforeScript = 'echo $HOSTNAME' } + docker { + docker.enabled = true + // Avoid this error: + // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. + // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 + // once this is established and works well, nextflow might implement this behavior as new default. + docker.runOptions = '-u \$(id -u):\$(id -g)' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + } + test { includeConfig 'conf/test.config' } +} + +// Load igenomes.config if required +if (!params.igenomes_ignore) { + includeConfig 'conf/igenomes.config' +} + +// Export this variable to prevent local Python libraries from conflicting with those in the container +env { + PYTHONNOUSERSITE = 1 +} + +// Capture exit codes from upstream processes when piping +process.shell = ['/bin/bash', '-euo', 'pipefail'] + +timeline { + enabled = true + file = "${params.tracedir}/execution_timeline.html" +} +report { + enabled = true + file = "${params.tracedir}/execution_report.html" +} +trace { + enabled = true + file = "${params.tracedir}/execution_trace.txt" +} +dag { + enabled = true + file = "${params.tracedir}/pipeline_dag.svg" +} + +manifest { + name = 'nf-core/proteomicslfq' + author = 'Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg' + homePage = 'https://github.com/nf-core/proteomicslfq' + description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' + mainScript = 'main.nf' + nextflowVersion = '>=19.10.0' + version = '1.0dev' +} + +// Function to ensure that resource requirements don't go beyond +// a maximum limit +def check_max(obj, type) { + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } + } +} From 8a4a9f1de70ccb959f72bf946a61d1a2c4045938 Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Tue, 11 Feb 2020 15:18:17 +0100 Subject: [PATCH 042/374] add social preview image --- .../nf-core-proteomicslfq_social_preview.png | Bin 0 -> 54748 bytes .../nf-core-proteomicslfq_social_preview.svg | 448 ++++++++++++++++++ 2 files changed, 448 insertions(+) create mode 100644 assets/nf-core-proteomicslfq_social_preview.png create mode 100644 assets/nf-core-proteomicslfq_social_preview.svg diff --git a/assets/nf-core-proteomicslfq_social_preview.png b/assets/nf-core-proteomicslfq_social_preview.png new file mode 100644 index 0000000000000000000000000000000000000000..6ad68529f99909a1171d627dd97f2ede164ab828 GIT binary patch literal 54748 zcmeEug8y1UcBKtf8SQ96f|1`z=P>5c&rk#2^L z?>hQC@B94`FTXoRojG&%K3A-3t+jn4FDprWis}@CAjDD+#S{^Q2woESo;U{ot6S^+ z2>&B=cp#;G0)9MC7`}#IpR|3b?tma9-_U=LL@<+3!*5bLimN#)*%&*z=-V41E-o&t zX4V!C2Ku%}tTy&0F^d9J2yz9H61%VL8at13aqUqVYFP0w4wbs<*Wn@Qe*Vb)FV9{R z8R78a7(y-Ov*eh5ZKwOH+UmO(1N?mleXV{S4Qhs|elH>+l3A)?a3K?u^m4W*3#;KY zo32o3JUcY;gww3^0apymcid4C(>%8-9h0jLeS6E#JIjgO+?d#y?`fPzz7ij~f?nr~BD10*$IAHe zh{!O5$_*9nGm47%*@ig+5gz^1cmBSE{q&#q{(I*Sm`aAdx^{p4nsB;5{2A7kDlP3F z5fhhFROJy75#=UZT(L2qVWJbC{V=WDaTqy^Q9JUt(B^$Jx!l%!I(T1(E_kZ6k!^8U z_4%u@)HYm2leG)YW_1E4r|Yr}wRb}~SBwldF?FuO=Al$bNJzu(F<;xdQzGb1_}U5Q zXk;8;EY)U>oEtEBBPHd&yma2FX)-dwIktbUx0I}|$h$RKsk+7}hKp)<+0|)&sMoa} zIh*^Rx-UV?_wJ*@p-WgnVL^}}CQFuf-(rW_yrXnQz45Y;5`c1&l_f4F#*btn=p~ zr}< z{w%Y@?S8vpZI20?lZVKa@Bh8=}gmvqNDrBtANmSopwo#NheMGba_f5ntVZ4eu=1b-DQY{?dGd#^!O`ON*)Y z>RU6O`#V8ReG9``3f;!{=T9KE7yjK$UY=epL5^zgWXUMu20m}gQe$&gTE z^^YJyk+HGx{bURh=^Dl6FDo*%HqsWl+sm674zYo3KUi8l&kL9~w|OrLNSE^~$L zLTeWDAMP5QaZ%ddD>Co3vr&EXR<-**%kv&$3e)1~Ee%u7sa zHrk)%V7A0lGCf!gLKKmYKQ zU26bGh+ay%GZCgc4LC zwwk>HU4zCT+7>RAqg_gy&69iw;?=X97RNHEkuMo%0Vv&Gk(#xaQQPGj_Bw*R-w!Swee=WS+qg^UkCF?$bE%KAF%rB+99*P|L3!1C0UNt|DB8V>I zjT^h1i|~G>Cr>_&TQj8vtQUpRIq)$&?>LG4bW~d+LGGB>9$bcOK#o8Jl6W0#PYDGV z4qBG)E>^Hir`tC-HS^8f^(IDq{rw=o^&U&ueQr2>xOE&k5fG{4E=3pIIAzzbQ6pk%#~K_KEy0q=i-zW)b<#TZ6b{eIpsf+!PgZ*RM=wQ}!M@jK6}6}Bqdm+|u6 z5X4|GR4-%^Bux}na_5QoAB>lSp4Z<`oU^IkPPKP%@S^K4FnXlz)T?sQQQ0lxeR%A; z+Mn9z_ooRFTRtrU0_4uGu^8b~&ZiMy>92G6*$7pg6^r@Hhg*_^3)#=_G%eUCtgH2X z6g8nynrBt9Ru8dnIjyqD>+R`j7;ui>AmS_~^Q*u^45CK>`}nhRf$4k4Y#skI+%Y6E zbnDk!wj!gJi+?xYVjxc+ZlSCUmm5a+i2q$`J#~$WajZ;+->agM<#!tw>3W`q(oBYv zXO1|meo)Gwj7z_`;?#RPH&^3Zzf#t>o!Ri7$FlrB%>=$|Te&9@9-GVaI!sued1~bB zrKYAP@4$+QApY=o@AOJzmQ{YSD(Sevy}nhRXHt2ZrIunHDLrDubdya{%saoJTTvzP zS6D-8>Ovob(qZ;7d<;{Goy!xN*J7X8!$GKUT?#mkMs>9zTBrJ|X zPb5O0!2B%%Q?o6K%alp1z0p}1N0oZMPvI)>u~o3oS4~O;Dqj*I5wiC71?}SrjKnlN z@0R64hvzyo>xLujZIdvD*1hGiID0o)p}Hk4BX|GsE;YpoF>%szHLFd@EbB6*?Z+k%Y5TnlrTShQL1J-8g95^b~XnNKE*U+=-pwBd%zEhh3C zsr$?_iLK75TrXrk66{|jX=?cPpggS??{CM$5_J@LK=wCM0}fYMT1N|eOk1g`bh6X< zg5|>*ROE#A6)|QtS(TK8D*0qBK5!U?!Bop)K<9i-l=?S z^-iWlFrC$XRdQ8`7{Oxi_tir+*9UTFy6Z7YLwVZA3-2lSO+BDpy{OKoTb%BC&86ZI z&Z4Nmkq$XQMCgkkHExwUPVqj${y3us^XkqyZ+vcVo0=dZ;<~loEyrp5QzW5LB}YSL zcVj^*f=T7}bmo&CMyE15b$-$M#V)-P`%oUu@SWe^(tB80g4**vt4&`i*{qlAQmELN zZEGR&A%o9xFkd?wMNCo?ADTM+pBdSuXXOYZcHmI$OeuuW|RTmLZAsXs6#|N#Yqct!nFyCi)$G z1Q9VEgygz-TkYo>&#StRcMTfUIuzqVlfLenA5sa)Axta;(9y=EMiJOc)abEL+#enpIki^00cgc$ ztQu-#ezs7h=;6@1e%|m2%EJ*3)oaJv*^d1Sn&p_L(;|#b4J$qXfL7eA-0{7t`&1qf znN1h}2&+lsCZqEHT2o%CBJ`f~V>P-N85z=# z9w`|cr;b>cKu@_w%Ns6y_3HgMZ{GL^2dipn#hUkKru=D*VPl{@w3 z>5CRy4PzFDOCq~x&israw3pgrrNUD29aez&)mBlk1kbxwt;H>6K9-lf?kmye^;bmcZ@~1T8kNz z-O(JuTnxx1;{DFS(iM982d&YZO3+Dq%bhHWt#KN0{4SUZ=Yk zKS!%fF^0=ZQ8wt1;tZgqET_4?+2Il_`U&RT(g!53>hv0?jJs21qWN8N6%`fN>qv!T z`x_mV7lw*^)8sB|+trg(3ApBEs$#7ZyZu>sc?Yp}NfLpIj~@pBbb6oN0JH0c>Pl?^rWgmx5I&@%e9c^*^Xz;(~%er*kYFmxXDvDyB>j7qyDren#sfMtE7&&nXiU`kr z2(T=Yr4H!VMeGhSeUtO=TP8@HHA915JdNz1Ngs~hR1K!exg9el93$M-qB*6B89Vjp zxWMIiwngkMO--G1*qYp?up1TtqVLZ!+`1+8{1|c9&)VmJ5LJmRo%$h2@FNcGGdU%r ziq6i?B7X$JzI_uTrsY#671~Oq;Q-quQMpC7mA z%T|Z#XBY0KRleSGJVJeTck(9BSL?!CMqf^6Qqt;VWghwI2vt+rUB|7Mms{l#d!LzI z1E1CRLE3Y#%rQcfMt-b{*MvBEMhvl9qTe)>WbE7PcBKY|TM=G>6%_c}$z8tZWb! zmu331X9Nk2smaOHp^sSe6SMp$peO9K%+AzWGyE3BO>*ISN?p*N;`{TW9ug6%s*}T4 zsU4gf_fbe!ApO4Xq4weEUWj+|DCIc-YAG}$>C}+|85AX+=hy`qid+TgWa0(Xq007j z%L!|be-)OJmXYaA3Kq)r+TBoB^ck&mGw;uhi}%_XJVkjRrD$fBaj=$fm~Pba-hC72 z@*EfRo=rD*yzW&NR6R6%|FkFQUj2Hv8-AJ6m!mnC5~-7U=ki#;u1}6^9N$bJ_vi^{ z<|(V4^%-wJAMJ9-le+*M|14>e&0qZbHT3A*a~uIe__$s`ff~6Tig8qP*hrmDs%k01 z-n=)k5X)WCpY)q0!F2m0OSI=oRqWH_28v2boSs{@Eb0Y}1384^3JM{MRhy&lPAS^k z+dBdkFdHT()R~2?&P-1JlKWalN~#lv;jHa7Q44$?7Pu`sx8G!fyYq$6@55JDS2^r| zKg>}tlnA>n6N;k)P~{;&SGC&A>>XIORJ{x1^UCiCikV%}nCCbNHMKDE%8A>^l@2tj zP7SYiaaiqAM3*eR%F|*gSzYGZKNvPy@4H57`<}RMcz&=D4IhS*_b)$XXWu5K;&^!L z)~#CWdbZ1+%TN2Ty4ZH%LxC$MM_F|$)qHk`$JTK!W9mYxd9(tq@@RIgPPRP9HLF*x zcRk{>{poA`adKW0;HaYDiT<+B_kEKO=aI4}=bz3_tpM4Fi|dnbR~ zCy5JrG5Y8k@-ylAHD0RHEjzF9^4quTYLY%rrCIzSWy|_TTS_fO<_$YZ1_F5@?M^y{N%mAqY}N0)vfWqN_IfY>9Jvn<~N<3 z>Hhp9S=kMX5y*xMw6s>GG5#iQsm0tTs$A3#n}iia$cYnBg!J1N@ZYnV?Z!Ig2Kn(@ za<#%o&wOlLaGuMpSnTrJ#8KGmKfUGV=2m{NGY8%LvtG#)F?aX!+|pmCtj4Nwt_!QH z_D%~!obxssf=mnyqLJFpQCyY-XXi1V8)JtDsy_R3y3cV9yu(FiVHeqSph|rI{tysC z^RpnP_{;1igcW{y1^0v*23Qag5t#W!K5Q?IF0JEgWdQM9{Wga;|MB5&G`o>>p61DD z_^$m$mzwT!r@3c1e}$;1C@8>*B$tIOi*j2gyW#~S40<0+NxeA+_^&labn{2?RJ@1m zJ>wrIoT;aN*Hh?*_s{LVb$))F6a#n$Ci^E%w_CeMzluU0XU{WTK?vn}(R;Y${siZt zUxlMI7F_QTjpnk_fQFZu7Vj>7nu7I=Gxhs{&sH(k)!W8356B;Gj(T7**(dg?RiO-L z_4M@UmDow}^76V*#kuVLo$>2%%8z3ljvkKXDLo+GPnr~u9{gBh>@uHYw%~4VT7|J6 zqCK)ye~Pj`V(t5BB!cwF732lL4N8r?!$!T5r)QU&nX%K`QwgNb)c1?M_uU2YJxaCK zb&789w=2SE4UwZJi6_1?@ano9R%I8Po0C>uw|B6xzvjg6=51 zF64Ew`SORVH$AX=rb}Ci>PDunU=Y9&9db3d`(>D8u=d3zK~ND@UANa=ocl z6WjT8Kh1{ReZOfG5z9U%YJ!{)C#$ta5WT=+b^Ztq*N4PeJ%H)ceF4upwXl?=mzMHC z@#$^6>_c;Ln7>_IU;kr6@MY}gn2+#4&Wjho>@ab2a}QWu+WgKCJPq_!c%4H2XB+R; zrW??_Bikp?t`}on-UOkUy8fha$}N7n`Z_HKW9PN9d+lzIIbB`~nUCEUEwStyeX^D{ zlbb#&a@3+vDQ5aa{jfCo_^tN>DWR>bF)Y%PpG_xSlZwhpm6uHByJiK(gd};5h#EYO z`TGAILvHa!)x^r~Afx@66;uQ&j~{OipOqP-H3~CWcl>V-AK= zYhL!4Wu7px8nh%SYq_U_$x|^B=f4)WeWvqazkaJJ4%45jne@PA+ila~enjSuK^gP8S_&Zn&OG=Zem)&e_zK!hS20H~@p_lgMwP@* z*;tk6n(+XqPg+s#HC`^nJ7!S=t4s6+7?sd<*Fr9Hs!i z(SM?DGIp3|m159{oNY~@Fdy1r3)l7jbL9^9wp{03&b?Vms`M5bTNKNh zCSE*=?`5yfJx8B)^FVcK{6e{OxT;RFm7CenyFy83_7ZcqKNnk`*CrIWG%;f>3${6O zMHSBzn{{nwx9?tc$Z{tCuv@oYFhr|-<9ZV||1OzA!_Em8wDBp+pZ&lhho9|LLUDDD zCU>#Tm`<@pKSoh86n=sn!_NYQD_{9Vvxy}$gIk}J=M_IHfoznef0$) zubt_p8wovtK-~MLFjyUJ^l>E!d2x8}x#pBBk?KBYsv2T*^K+okNF1$y{{XWfAHWLh$S|$jCH!qLWiob6=hw9jfuEL5I$s9%ZYMQibEBbOLEQ?{BMrf`OuMtI-wY z2^1a*SuDFKL*mi7{yc(G96S3Q}Dp$7f9-10p_;o>b&Yd@_NU`>5 z@FYII`c-uNnW~N*`hByvHj>O@gS74+d&OtzqL{IpzHO5&wZ_4i$BJ^|H?(fX?Emid zxEpjcuUYAqxG^trbTfMu-)Ti!)2j*XY9mr`qll&C580{FT8l%kGbjR-Y zreXLE@U{g-MI4-FUEe};{3PqYeKY;`nvCl1lNYSjTTg1{E?l@^3j7^9T?uXf4!alM z2vapzvE|@uti{xyKbh9J@`iR1M;;n?;Lx2=M#FfG#Z|bo$gGrhw3tPw?K!=qy_*o1&A6=4e`S z*aX(%B{6ks)IAfbxf$T}3pogdRR;$L(Yaq&b6hoB?IYYn-2NN=C#tnaw2N)O+6509 zc3&$VGJhCWJ&V~VPG<0v0K0LRPRUQfoHN}swXG8D-yc(Tklgo%xV`LE!PkXm z;XA5)bxSN<9?6TNaXGzZQyT>Ohg8IPwK@%d+sRmm%OUx#~_gABjg-F~EwR7EdO zXhPCWC2#KE`AMl_Ky=0TP9w_~zk-#G0Q| z9oBo*_=ttiP~GZ9yjgdu`o)961!q7=()=!qJ-?g6+L?h`gWk6Phr6b8*rq1u=y4L$ zgS~B3__DR+9N6Ak@;S7S+S-Hxb^2g;Nq7ms{9@&rtk=#=8l((Y2W|zOgljNbeg)9z z5;*r-V5E)Ijjp=C50XkBI$P;PTJzA5$i#5TK=r)a*AF;3BEjb1>oHxKDE4~n@)8U_ z8Oq}c$zh-KNh~|G6Yw4nP{#1+tdroy11U-G#?q>Km7PyxV@{fD2L%bsg6JxvdpEH{ zemb$La|~ zYnkJ-MN>Dt6h8yahQHG-5=8AObP3N33$MfLj2DMZ8k1AqOEz_t*;1qPqic+NpN^jp zWQ}|T^Lb&iGjKk5e7~+mbNH?ImQ8PAB6RB4PJ`)3+sw=?n$IbdwQ~7E&0N%otQ$IU z<%}I{lt@@?%=Nd2s%Q+C*kIQqzr;3IELB@@T)}FVluFc^N{E(u&Jhu4ndQEgfU zH8s;gt~G~A=L$}v<_k3qLH`ItUGMFXHN}|v{!YzEfkYo&j@fKBF$wuXx#!hBDich_d$SDX<}SSS?UfONeO?cJEIPM~marFRdKziVXI7^PnmkC3 z?@NVR`ez9~yelQ>HND|V5~EF8Tp<<4RxNol@xlPGK@l4mZ@Ow|(TGtF}a z0mi{cg?4-ao~}HAv6XUQc)Edy>MdCsY!BqP%E~J1GU}EFB{}D^_tt#|$)NR_i}JyZ z6a9WiO}dgKNzoBo!h(*|@cY?78s08foZc&8@=kPDibl6qCR6}u7~tJOiU$Zfq9(}v z%vRTPYy8zvl}8CcsotzK=m4FbqU2g{(jIS?YzO|)Jt%eWUAN+#CU@d98>;u|;l#6b zRa6~*qEa}(CVw8JzYylQvBrSSRm`1j3sb%0@*0~fH}qLa25ZA6-+-ynQn0h~qH^Hd zpLOh$j1y@u>{GDy;iF1%51ildL*UaR}sG?X2;(?WHK z>AgVnqb~^C|9y^EAOEcNdd0m58xLR2hmeJ?eLozIBN>b%D-h>1bdaF&_;Hc)>M+|9 zf5aHqY|5LsSG&CvSP(gvDlKBCfs*V%=jdq89A1-E%W{{c+(lf)E{8E}FCUC#okG`T zV@(Ys`ey`So{hd|5(M*XK(AJ*?MI`JMmL)w^B@NUdqG0`h$g84KFIy&)8p$9uhbd% zL6d~0IXu;1!Qy|2c8{eh|9}9F5t;gatgHz>%V9^w$)p>4KR$bK=pkEMDOZ6JYk0qs z{muLljz{3E!f^UwxVaEXxwhnHf3e1_ozB}R_%t+hfc*ekcpE&>XEMyj20yBq_;ffdM-A51%WXcNKadnec-US>iohoL=v+%7$mqcx5>}= zR`qALb_E8iiiCv_Op3D|AD?$dKg|Q?HwkZDHhD1#FFOUS8ArKdR(kqF%@S))3|kM2 zS{_IaK$cPGxP*7l^r)z68)c6|Y6kYM)&oAZ0fEJRE+P{%&L;c6Q-b4*LiEDyKt-* zCY;@&9o*Q>I@_VYA3Op4NteNImeZoGQl>=} zEkjS9JPGJKLkP1Z^dQQt$nfwB9V+isb-ju^-3fgqFI>Dh3u7dRY$kGtHn#*R;`rj~ znzJ0hM8RZG+|~=zuZq_Eu5?}tYsHW0#-KF{%HjEWxz+MX!KoPUU<&dv9WKe~{p=(E z1~8@%j3!!nnqO|k-r5V3X?!L;jD1$4U^TH72ZaF;VxDo5auarKe zTC7!ITJR1u!t2&g1Xx*{?N5I@%>ssuh?9}NhzL>GbO5pQX`WijI^mhppG&{rEq%Lk z=hwj6QKuO_b@eDNn^DVbokHW*o?LA}W&~z{#3km6&3l_os|wAAKj(D1gQ6la=f~8| zeD&VFdnow>X%!wGZupyPuB%HiB_$;TMwVmcd=+I5Q%szky;F9OJ$=O7BVd`~xlDF9 z(6CUvd5(o!ho3z)ghtj`9PzzH=!?kL@~FjXe+)OFVDea*sT0=xrv7=0Pp*(^p-_97nA;D~9Xm}|HH6|qC+fr4sF#fM!ziV-t#Y*auA$0j3o7PGAc+XX- zBz*Q~hp9GSBHw?5ConS?$Z8W<$pc$2euO|`Vp|dH5saB8#2g;+mD-CqiF`2&H#sa= z#jbOVkDZM4j&;Y6-TF*){AX9N?me$vo0|!>PCvgKNx~D*A`;@_8Ey^^nu{cK0;Z00 z7U`Mu_os8UD+t)zgM|;$A)3Iq(9nQV_7@!q&(kOr2lf{Y(vwpTe|6PH(tn^b9*8#Ct5>h3nh%v>x|uW1_f(J&8on!gPq#*t3q0HJR-1pWEYY_guqTKIPwB>PNd! z-#uCpB9uMubXUn#zN@jM_2FO-QteU>h5zG2>XV#L)rj~|G*jJ&R%oZ`j@oBIdj1UT zO}64Ou_^-xr;$|Dzr6hR(9b8W=1$#3ePR7#-EX`jd(>MFhu*D@U(N&5Sfq0$>OkuS zIO+&QgpOm5`vNdiN_`Agl!1H$Vwa=cmd1qTOwud9@YUt0NulGX5=JNCu(#^V`yTcso~QW%ZsX)?GwQ3e*(&*4b~NnBV#L> z$_vpT>-^~PgAh?RguwTe&auE>)NpRC397Vnu=U9IpF!?$bN5346b!fKemm1ZT+Z4* zQ<4T#^AzKSD3X_=GN2Kp;KRbg_CZh?;o+sJ<>e=@&X@ewq-}i@3COfaCP99>?ll}o zL_QQ!3VC$Q=6#V?P*9MVBfof&D#wKt%n?vEDCsXTkL7_;(Kf7xJI$fvx$D847 zpYPT#IBQ}&x{{^51F2zvJ9qv(Y<`w&)t~B@G$DH~#}Q`KV|Dh=xpU_N`U-a{?=a^R zEE{Gc9G631QFRYD2lF{jzZ2T}c{HJM(2@%WPy}*k?$Rj)nmujA?ijzwpt#q`iyj^n+XH z$&u_Y*C1?P@OB?*ZwRD@t0b2@wbHTHSoE_(NofEI7)o!$ENHySJIjyQ4{Zz~4^UOJ zGa6&h*S?gxqI8Rmtsi;-SgPO&)ZxC^za$We!Ob&SAzu#vEix06V_q~$D!g~4WSBOz z#Gx-swcWG|G?z;`E?gjrOR(u@=+;!{YkbU>2&9sj6PJ{<=t)^n0HOY4*! zs?hizIeL5gL?3%0(SR}cWr?Fmke(MOg6QXhTqZF$`TMsi06__hWBxPbzE4Szeypxd z5#5oWWawC>u>^u0mj>H~Yp5g58q%R)c>Dad9@mj-8@v#C49mKX8iB8Tt(+=?eANPW zH=uuSxsEij|I7T4vCWQZc+#)%OpIX%c5nGPC7d+BHac0jd`VU`WA3<6aA$yiN-xLdl1skfKBUp zeCwPDl6U3^JjVzTxJF>#aG8V2_gn&4V^XI(EQMWT$+4iIAUrhmNs%#7b&#dtP$_0F zXIBnQm5npsYxpB?yZrNhYrG&>3c;Xj&=8QH$e`A0V2)2<#HRgFV?n;GpmCp0h*A*~ zoC&VKPL^r0PB`WCReO0P;X&*&DjJ=s1U{aL`QBz@KF*q(?&ZtO^z@r(B{OZUtgMtS zJGBh1I0dterL0=u{`jWJ`ZjV)Tfeou5>mtE1+Mi#=GiZyWNKh zYf%jJ`K6o^*cITssB+xsONMhIb-f2oTR~F0B_uRz+A0tFk?;iP4cXXq@Ox5YH5Gt{ zngeSj*gnBsdB)|r$V#byKrcOf!yFox@+Tcx>6eEc0K`8d_{U$ zXle>JQnRzOv*C|RnU*`K+yTA(5wN=Ut-!jY$Ijf*VCzv|*}e&O0g!xE6cu?N1VRwr z`WvbcZk@1DC>$pUUEEUhHFV@ChkcKHp@$j=7DUtljdKiJDTgMu(`w^A(I1#Fg+kQB zo&K7hZ1sEaXl7#@f&M`i&$~k{sCI`Uvm7^NkQeB@xAZ5Rz{g}h!6`au6sesGfbtTy z!hTX3=egF(#Ky+0GfV-n7_N3;G(+>Z;{%ZzYsnQeZhiLVBD=$pL`JufZHYOs-h%7` z2E$8yj_KE&3JVGxCnHp6VH_$+K6nF69{%fTwshb{&?(6g`h9}M2b~2K3?>!TJJY|9 zV0fffT?Ig-{m|9j9oUC5yD*fR6&a_$ef>I*kufqEG&~D z>k&1OfQu|q*3zQhpIj(7OQSdwMhautwF?tR65j%R%@#p3=2|BV#4XldEVpm>08yB0 zky|DKgG{ku5a8q)ltc#Nay_zUPaL?#^@=`l)ZVVZ&13HalvVY9uh74wnOeDro9@#U;b zhP!z8wf9{bX-c-1)Y z(U^54-j5dYD*xCPgnqf2B6p9|OjmtZLYU>U4?XH|LVUka zvKzjRhXh|3DeEra6`%`7-5FlslLBVs$qqMhrOsi?27M^^|~ zwP{Ub-1Rzmjg&#rW67W;x&HV=;?twYHTW7q{^fLC)RM*Y+x<_DiGXbo=IST(i~cgs)Rv+iFs}R*Gp>a()-~5DNJM_^w0XXw`uC)_I@QA zPRWTN2orf)cgCCZH)g@1bn?3=04GqVrKP0cb@e3>_R$*w5__W9!NGwOEDrkkeo#1p^4oIk}5zML@ zLaG2?z$=#uDJB*A>({TIPvWP+IU#!TYVvdQN{tt1IH@@6H$T@cv|oxhy(ze0n`Ms zGxb#i0A4^!ga0qRqx$f`qvo^pjceD4<}VSbrB%8H|Hi8+4r_Z4gq9=nQ7EXEm%`Zr z>9_EC_!&;31LeM zhVH`5Hfp49v`FPPy+469>ubZMt8bPt}Ir}qO4Fmo)M+d&RXhDz9K%;)u zD>gKRL!iU|La(e{;f&7qb+b>N-H3O6)^fY8yYnm_ zq+8U?DFKrVMDCw{7%Lj77LV1+V+?T{x^IAU*CRr+DKS$e%e?j(5fz82JlF>^VU(YR z%B(CVCYGnk%LYv_Sa9P5o!5FNeAh*KdU_5RpaU=6rH8#eanGc!2Xfw%u0Q1w1%9Tx z@mbJWQ%wBf4ce-cJpEm~NMga?^Piz8H{Tk^uL?@mJDiT2kN71a&(cw^O>N$|HocNx zk*{B$sM;8^aNk>Lggj8Q18oHKCVYn|sb4RXf=}h%Y6}2#XpZBuXK6+mxDm}C+6^+W z2F1z!1u&q;ezPvI`1S3L22hC5K`zWYKQmtbSqoJ`0kZu$&R-6GH@W^Tg#}Evwbs{e zgM1*k_T!#Anjyfx!Tag2T@%m34ZTrpyas7*6<3Nq^` zgk<%YY9lAeA&ZcojDT;-po%98A8xk;4SB|y-vmtgiBN2*u)`n;=IB&uK#eP08N#6L z8#+qk#TEoDRPZtloGC;V5x)OiDAS!u&W?`u{yX6srh4`}*Emy9HELRIO$nDRTwaPRB&Ba4rBVP3O0EaFS!x=g^Bqt5hD! zZ@yvBnp3AZg;`VRHPm*&w2)b(XWEk zvkP@-K^;9-oDeVQF^ih)fEc0kn7<4DLvys$6=2Vb2EdOF|>gv0LUU zS?I5MfyD=9(p?1Xk#ER-ZT2cf>J)Y4jm<{$8@UWMeY~$C%3^6#tzNsapcn6#s4 zo@m~*umr1sj|QiVTu#EZl75GgUFQqQF{Y%*gP_(q{INg$JP2}vAKIia96Lk}jNm{M zL-EIW-BgJ=10VvZAS%Ihf<36W)p^heTH2kSe0BH)>JUgv@Xkc-pL6|KRkMt;qkoc; zl6vzEB(>dVQp&;bgn}Ag8WID%eq3niIrQ3)aoy;r1U-IW`TXC!Q9xb1RUR%*OQYH- zV1O(OXlIZ@ErjS^B{YIJsu%l6)Ykf-YMTMhC@8bJ)M#pFs(YA)^v_}1;#Gg!QK<2? zW9xgV($lv{gaQZS6LN)fVl!FXc4fB@PE?4_9k2d$f`9DM5oK2NnSM$C@#8V-1k?O% zsQ@VF)TvY9VPQ!ic9UPY04C^W>k{Vc*Cl|6QvneXZiod4EM*EUTU1GBtPtejz6D-& z3-D3ytTbLm;{xUxCh!mgTxVis#ei!DXsz&s1nkz9D=hv$)@n%RziK?Ubv6v?S+INN zW7S@MH7|&!{w$Bzp)O^hE3=`HnL|7`9%%obnqp>SlY>T&nz})=|CE$OPC=0d)e>fF z*e4BCvW0^nSQy+g;C2D%+I>05(Q@n**QjzYZl6HCb(%I+z0ZQKNRj4xYhO2$84O~WJ}#Ew=oTVW z?|JKFPjR&>WbFRX?5&yJOpb0mzR7bMIl)07g1}E29)SIhQ=1{gcah7I4&1nU2RWme zTJvd*__E1?;mF>ZbRS2$T#ZiS;2XA)Cd}MU<&~pKgce*C0tzxk$sIWqF1aeQZcpRQ zJj&epx}dlwY>u47zhq)h?W9a7Oj}N*+g@3i5TY(x%mwBya;) zuKXhpap1ap;d8NtnT>C80gTa(3!iFc@U*%a1KP%hB-s9xXGbr`UpV0XG_DBVsM zq>`ZWmczUIiBY&=2AsLr?eoc4F-x(ROyDi$r|^3v#BcRZ?T?)q*hx5SS0<)tz(Hy( zH9D9;i0Hp0_eIw(YX(dw3Z=n{DK`--n}$MXKBrl=9!ka;eL)mUT4PZGtI z8Z{O@nkyj(x&%tLw(-(y+z+`b=G)4L=H7HSz9OBeD2!WIuH6uk!38{ez|q~V<%Tr#>i)>!?(j%@g{JnDi!gH5pZ*yfK7k>? ze+E6k#tZ5<__d>!*^pDsa+kl*GBx!I_qBJOj`=g&HAL$u`kbyj8KYS*9_w*QO{-M< zc(b;~oONdVz0H|dbuB8Zhw#@!C@c5x2rZjTt0dp7f2Ma|neeo^n8JBW(**P9=8G4F5f_S83hm(n@9 zR?epPsM#pH{4+(9AEioL88%&O#8^g5Z0FfM1Z?B4Ur+1$`}?DC-h5%6mXkNM3$F^u>#fvMI0 z_J{Fm{uN@Xr)h6!h4H3pwKl4bb=gc)I zkn?$rr&XJvp{MuyO6d#uNK9;*C7R5CFn5Q0-0Th+ z<>iA(1^`<0pM2EI9nFgAxB;EkMs!T5P1`XrXZ&RCBOj^fBWiHY&rbQ0E+*Fei}lOD zyCH=+^PAskip)J8Qc_ZMws`43tw6&vAHjN`jWaJN$?>F^m`3BVD`4{{BipY7Jy0{J zv1TUepvRkFL;n@p1jz@$?5zp3pnP95~siuA%Yd>x+}@%3{G2 zW~!<)=hlP5QUs}e#SFs-5XQgvd)LdjwbQ=d zC&gRZ$H+qZG8mFPsC(p?eS1kQ>$rcpr2{`$WL)+<=eTGmJVj!l7trr z__9ZwusQoUE|_oO=S5-3RKvNQPk|2SKV&|ZXEh6F6=&;VFR3lGK9G*QFP3~lhv4zQ z>%XG;Vb#%)k>6!B9T-VPRM!wtsQl(TPyUgI|NcDfnYQLF?PogaM%K@zWOi?&H~8c~ zE28gGy1Hl;4mh&lR%MB&my>*A?Lk-5sAv`i{v=rPPRt z=^cIdcbyNh8Hryd4;b__f6d^$a%ijas^^Z|7CI}JU!bRt`yzJ&-FJF=!T!w8H?Hrl ztHNcLKbcTuFRdQY${E;Wpf8uZCKa0ONcQiBmG4nWeYNNu`_&P5S~~L632qC?y~ZW- z^$pWB$6I1ze6NT_{$22>LLzI!z=ZC&)*s@WK=qc9W39AQiI0;SiYqVtjvzO)vPAN@ zis7_8oh!K1^EeB;W(Uwj{h!Z=YGPhL(SRr{DmQrOoah8md;6JhV`b>a^Qji?0^*1ax$^Brc%NuT&X^ibX=Ln<-D>_yx>B*?@vGCElR!r0`U%_B z1_$dSCvqrWyK}Xvv|XWRcmAL2U?*ED*qUm%h;TnDdTuhaZDR3UJX>EHnO^7688 zTG|aTv-$h_B1IPcsG~|{i}LKPSCPp1zYmHmM3&Ecj|ZgV*UQVcS^#t;)Z^ph2P$0H zV63{H&PTsr`=1IdCYF59|LxlqAOiJQe$~GY4yJe?^Wnqm`FTryH?f@;{|!cZ!Hqsm z8+~Al9~9{L>^oHg)%TzvQ#zCp%n27kK7IhsMS9M&luZAPyi7gtTDk(8<~IpMU;t>~m|n=rn=`ck-6<7$4Gn)dJfZIfbj{a- z*Xs&mVuU(CG5EI%ynZO~-!FOnkPK*m254vUii)2g|BeOz&mDJ_g9rZ!93Z;M!lDAX z{@mn~eLXf9n{h^X`V#$H+wkNJo#Hq%;$j@{08LNi%FpPPIRZqQM{LgYnD#{gRr%~ITpeC{CoQHIlX@+fW9jl zPHfx{WrUi?&h+1aAq&C4qtypE1ZCFH@xO(Ro<3)5CM`i;M<>^jQcU81>Ii+8Ke(!y zv9gfFsnbOd8QD;RPDfmHnkN5l#Oq>|EZ20z#5YCtVeg6p2hySFIA)L?j2Q#U@8xY9k@5K$z{O@YjVdyx8Lm2Pe zkypDIa`(Ue;Pq?6h|o}a4s3zNAzA0x|Mz}S)=sp5HT6YtadDmiy}_6N-2zZ}9;&?u zuS4E#dV^YR-#4!flQCH#k&&;!IuFi~Pl<^LoX6-eg}(wtQ&Uq-P3;S$D}Jv3RbXJ? zSMs^kf&%srF)_^A62%ek-=mMF)MJwsxx&mmcli+4*-3`Tr^(WmR#eR49a0)Smc48b zZR)&&f`an$@_gUEJw1ZMk!LHa1AqKbMn>G&_&TH11K5pvAEgZCbnOZ*VUAW7C=m^G}-|4;{-kI8O&1SFjF0tYJ=L;9LdMS$M;F6%H4Fdl9zyh;OyD6 zzF)qG^yFx&rp`u0hJ-u?KsF9MbgAbyJFwt>$;ph#X=x1n{M4`oc;T>**zalIa0E=}(x%OKryqV2Q7wh$W|{(hCY!bx|@!oPzs9s}~k)Q@@)C zLo<2cgD$I3#&Iz6y$%X$fV4s@--{Q7$;rve9ES;En+1S8ee&RDV*OPG+NMKiYcg$Xn zO>ZW7c6PR5MhG1XGxO;*xdaK2+3QUl6&24xxebYlc?(<%kj8ZoCGZ>}bF;U`Mn^}N z@p!-BU{dsSvFf2SC}$5JKmG>k{|+JzOd8-csB3P{{MDu8;o$)#=Qy0b^@H$Wlk)Nh z_8{d^2T_FKO-?cJ|p@ zMCB|fdY}%bM%;LD<2k3?>zjj{#=aHYhx7IagpungVNaiO2ncM6i;JV#xKSF=l9{NQ z(KoBzqP|+`j3U=BUTi8emREZS&Yhz^22Yw*?9MhoHsxT(@baU~OyJ}s*ZZ`~I{^;r`48$qkY5}^LYMw(fF(Jjq_hQY$oqIs z<@NRHS~F}XZ*4%15q6$G0=SBM=1kR+aZeyZM=TaE4J~c?RHK9+Li?M~pI_D-ZSF;Q zy3!ae)|O|!131SG7~=MveDyReEEZwh6bo+knlPK9HZ}rz~C$}x*$MymK z-5WraTo-9HnE_Ltf`s||$jBpH{aR}3XNUuWs%a;Wu$)hHndUoqaLX|~@|O8%%3|m1 zmM?>*4h|J)rJv}G&HCxPv{V@pe$U&G&qZq<_{o$6}2CN1t~uU|8f%j+s}q-SG$jGdM4qm3{K zAMnaspfc;WZp~Kc>MFFO!*i@al<*G=J9wlhXK`_nyNi03|54C8z>ETLd9+VTBM8wq z>!hSbMzW@9WUq$O^AMK4aB*$sWDxsWqJtW~c&sfHq*Mmr2JPmgu(Jo9kFX}qjF!7=Ah{>v4xk32*W{DIrg@qr_Y4ul9D zld$z}n6wVoGEsgvEAW4Z0BuXa(VGM3oc^?%1edyET#MeXs-6G!_aXdp43F8wyklIK znV}q9s zc)RnP`CH0fw3u=6@}47HxS@f;v!5d`?I*74H==mZZbo6W4-&x-Aa}YgEG-jZ$gBDz zPKugsu8&m<=KlVzniLV`#rzADp&^D=R+()#5!*7zN64kCnfISimxeXw2nf{b_wS`9 zX5TMqL+O%uL1IaT``|&P`Tn_3T)!M@$tQ>#e1J|bOtcPPe8qT3OBPV&oJ=R8U$YFm zGbbV5C+5$cJC^_$T&~b%sqo8kCo3zfd&>&pywk_1`v(T(iSt7Rk?+IG##OQ|E-uZW z@b|&iR{baPD2p#@YrbTxYj0K=NbVDP8;M-Ad>C8XIJiPlUS4^zR-C_N4vVSQ+HzXXwA3C z$jAin`*JM4XNN)+f%f`8zYvpgDYLWu0iF3`F~W5s1Q&8B{JkZB%|}K>C6WmRe;3>~ zzHwauTVXq3$m|Bo3?~xm`9FiY%ozRDj9CJlpl?514z27&bWZ>$aByDa+7T8mEK@?O zZQHUX_OH2ea0d|fL;$05upp%=9z5Dj>r*-X;c?_M`;^H<}7idNU0H^Ljg|Oc7x&=jTd4J(54ILd`XAlno z5fKqT8h^67L|&Q1II9&pOg-nw#7Lxt1k@f} z%PgY9YM-cLOG^$&N=hPcbDmzwv7m@!Q(d?N9S7EyUm@H%a;}RL9u|!yf{%Bde7;ra zEw|&&Bc@NtB$1ZZpV&Rk9_%g$fvj*XFK%4=l4q_SS060X)JoNJ*>x##M~pc}gOo?# z#4ezl60 z=lK5Zh?dQ%v+qI1G+|x`;>kf>@6%q79-RTuoqePCY@&ig0^oN-dEbB*G1?XA_Dsf)Ls*yz zZKiTzki7rPEZ!m{Tog>x^F@kw5F+hr6rI}O^7lJ+dU1m)scx}_uYebn50z9^L8Xl# z8jBt13ADX1l3Pwz_6;T@sAo5^GH)g=8HE>3QE#Q64Ax6?;%8Y!#Z8iD&SYQxK&x3< zfTE=E)-UfJdiJXzILdjP7zC*8D)et!+BCWes43hZJ{-2lmG6!eb7X|FV7WEpQ{mZX zO+6sJQku(u9o;9#050DP^o~NCG0oSYr%%11mm(=Kr_aRDFcy6~glcIy$WI z(fNM+rjw{}QAb~Y5A53R5MrF-muO8H0h#;rb8=wAaZ~KCV7|L4JxH2h=%_>ZbZLFh zKC?a1qO7L&5SdCJ&loqa^&YJNbBVOWx`e0sYYC;wvav3m32m#S9(%MsmuQ z=jl)hRaI3f6>byUjd_Kp-KAoW*(EY;xsL9aV}PLJ%9h3l+@zxvG`l_a?JZcq>B6Qd zVblSE{puyi3+=Hcn@LB83sEWDu;&QNMhGI@k@C59z4b7(M%h*tF$h3A!pU9V|Q^mms|BI|-?4%j_+ypm2t0*T+hnrv}dB9dlsc z4k=f;=@i}CxM$BEQc^Z2(p*wjz6XlkhB3vTG-^Iy@!H7cihs~P2ptv1`=|`^yEY%iwnj>f)vI50R6>ypmC*z27R|{Ek-~RT{L}`d2bGsg zimoh9VNI$PZVPw|kI^GiG1uDjo^bK3tFAr`!>IMluQ1Dh@5eD%+*iZ#| z%>={|d%$oHDZ4-N^Q%O~iXJs#9io?i6H#dY{a^)IumBWC2*udD4WCSpgbC=c2G|Z* z^)=R^53>-JS)SIA@^l4WqpeGQSyF++)PACU{a%1JgbBXu)1@&Gm>q`b=6>Y%>!Sci ze9Uw|<3>r&7rs3f*_>$FiS6n$=@96IcZB;)JRU)-S_$*;hRKrxOIwj-3}2R)m;3y< z&nGUf7qZ_i4j_2)>CC$}s9WzB=v=zK^O(gOWb0tQGoEnB>-}ArM;7_AFz_?b-NVD6 zza|){gy0i%OC5Tt<57e45gxE@(bRvYh7wpK>&o3`qbGMYUt8~i3jT#HbB_Fl3zcXP zfO{BLkUet9pzz2o2;GX;POryOKJ2>egjd*kZylYzL+>Rl{HvIJBOkiQhf^*L#-*%c z(lSE%#}JZ~itM4EReOdhm#k>4nM;Km+Q|T?@#BNZ8Q@SKnuq%u|K$r= zVc_W7C-Fo)ku>6wZ3u*N@glvTkdXhACvK4LRSELUeT+LJY~4eF8oq$V#ChVw4)jvn zl!CA0jhCYx@x^Qx*&$`wEUc4Wj3l{K=;KO%bWIkjTQY|2bh?qYCX#ZFh0CAam@sc>fuKhimLj!nE8Uz8}^#&VNDyn#V9IBKxkcn@=ceQ=}y87>e zJC@fv$nd1$ZYTK%ZF#%d9Lh2QgNhC4`f83nEJdIyN1eqf+FDaXYgIf!MfN|Qqr@K< z{5H9L@r6+#)|0o+nQlZzd2wUSZdSiZ?+uIY(p{+0a#}Dlsk3`yG#P5OtgLXj~`bfMI9IjS%XrjB9(ty$Ivk0qAp`ZWTX$u ze_>(agSxcU1iKW}vRV2ETz&kg^wP3}5%`l7nzf5)GLX<0M@rEl0`ruywLOAOW8GIp z-~RL-i!T)_%s0@K8C7R)Qdd`p+K7rYa1eHZIa|}#(E)&P6(EPKoSb`kc{62fp)oOj z*rt7beY^SjHy0EXRMpi<+1LmYuwmrOAwaO(Q5IcD*GKaS;3wnv;B)0<0ZgE&dwP_v zw6xT(BqlI$D{8@yDXKgh*hHD2W|lNG8;bHXfbMX;()y|BVD*bsKE|20yrNq9`rOlt$Z+>VpkUk|`!oq<^<9|$deGnlZVItHJ^+_9O+7l6caU3nL&LLq|Nda@6WG?YZ{NP9 z*|e#m*kQ{0$NTk8Fq*evsNCtsnwRRP9~2avK%eGGDn2~37njS#QNhP4C-0VHfKvSl7A*Qvj9ueEUH*mH3tQv& z>}=?vm~H#%>6w!eSQRMlor_CLiAcp?UOb0{OO|NB(Ac zRJC34J2fes?Ck9NzDyt>v$6!LQN-9s9k>vkcI$ln zGN#eMD)8~~t+2e}-uMRk<+#QIY)WMZhk}oTJaVqbgHig;X3iel>84s%-ATdb%;2pI z#0O532^@bvWPO~86tH+Tcp(uVd+T3ZVMiqLvi%gk;9)UaG= zrUmQmf!8SxQ#c(=EkqdV1Py+@%z_;hv-4)FDV3{hCKNHa1o+2o~xQF`sL6Ib*;EW3;Aj z{Fl9CLuUh_{EShOvR(H@#l*~uU4jt^`;nmJRr=+@K=vfqJ)scTbq&BIfq=X(= z3Z_Xj;NeK-*4Nc-VPLoipX(;bV>Mn~-&JTkq@2#%RCIC{ZA5eElG+1sbQrpG036K~ zz(wT}9dfMx-E?DUFe}jI9d`UP%Hk@Mm7iY$apQgTRWvDMRspday=0FQB9ikII5l?@ zJOn1-miNNL+xoR2YWVwoc|}@Eil#mSObv$9@EqAnftS_;_Es6w0etR*fuUSP6y!%` zjV|OZrA$3BeFxyCj3ID>sijwX>1mH(P}*r=Uhc-L-=GvF{YhN-!#inz7Zwcgc*}22 zO-{NS5j|-b<|t(qFZAh|WQmx~7ZgLScRF}62GQ(0z~CIG46(ZAW(LUb-oU#+59}Vb z_KHF8%Ov`e?1*m>yAg`70?HD;@x%SeuCH=wk9P2LV5o}eb6?+$x1eVGwSTskS0n3G z2>Z<{fq%E{##+X(=8aIJ=>4HjNp$r9<+tZka!a?iQ*3Q*w=G-E#FAp`=jbDJq1A;Q zS>o$AZ{86CWYir(aoXB7%h6uV7=pV;({*{yB$)k@p59|{R&>nl6d+cf5T&2j=z1r< zvUsztt&OI30bZ#m{{DM-FNBf`8xf!JDOV0A{o|tVM@2<_tiy(jbCBmbc<>Fp)DMjF zQ3-ti{@r`@UMCVYuWsQ>`JxsR6BD}h?;)OZSlp-aF~|)|p5; zz>oaap{AxzKQdG;kcBs#(2E{%?_$o*rUSI|2T`DbpV$_190N>HU(nS$YCt3Q8=UL< z`o4u8D~{XRY*4@Qmc8GMadNVX=gw`wnADW%F!D$-Ua;yf&U4iKQ4OB_LF>`~q z7CngM&e+0%`&g3-Fcba zFBsS=UA<7BK_Yrv)xyt{*C0G}H|lJKuo4#=VN@6^BIFojh*^b_tvM}Jmo8n3bAUTp z10BxCbIqGse4l?E=@<8wPb^*Ng9x_&wmmh*gjgV3Z3AZ0JNG8d=tEA{jn#a1eYiWwMjbpv57M4{RBqEJ# zg%6QM7T3!(nGIF^-szkpC%%MX>!kjs^2A!WP(eE zheVu}G&DRxW6;z_VB#e3J@@`=3cxiUm;l4SI{mZhXU8jksPPP@V?rw=r@I}S6qvTO ztR?UE&eb<=tIBggb3F+P;^e)+-ue2*8I6$4%*?E`q-&jpd0Aw1Sk$+Tmm;JZMj1|= zoVqBr)S*7pceYO}eEv=vCnslYLV}b=bOwD%LmDfRG^A8&EtY(`udP?18`jtrpDFw? z1R<<8qy#ciLZ+Y`)_|d;$<OToG~-oO``zH`RRKJEeTU6kk8T}DIdu@F$aJ9< zN5p$voos+Rx=)pB>dMaXnj^$qtF8BS1wgNA9WU8tyxq2IurgtMSb#0FTAv9S2m5Vq6!mlZl#L-8OUl7 zvnnhyl9$&US+Nb%OR_`@OpmbO@w}<5988`CN%#&Gp1W^)=EhSLs!qzbf`TVH8=Ux*#jgLzia&_f^az(~v&Jp9K_X5Tqyisw>% z<^^uY^-f?rQQAE@eE$6T@BRJzI8z06balV~{CSYKvCVIL^LY-}_u!m@vv~J`-8m?D zT~cy2hI2iJ9^h?zJ2Nii9caD>%N!%otxd9cyXWCqQ27S0bQy1fALmED&rF*$Nans z-JUvi-5}MP&6B;ZnwT8mtqr>%k4n`1Ws!(le)zsgjqJRhx^g^n!}(8u@eqsx@2%60^f>3T_tUGW}(Dc&&89^$IV-lRk+W= zgEJ(^#s&Tu-G(&kf!`>sm}Un3__i*c@k34WQ8fN3(WS`xc`kD3*B!`>Y?*fTaZn39 z%)Unc2xVqT)7s+2VN;*@#D&Y!d0eq-Fb5V3aA7bsbvwXI$w5wp`TL6n79@C7Ebk<$5~A~ zm!v*@F)FwI-$NhY5pdbggyQXVnMHoh77A?eUchpPoqEL~=!F0E4Xhx+d=Ct-$X0G+ zruw&nX)YLl*wX$P5Rm)XXY?WJ%uei{`gNfL2b(A=R|jvf@fiM;$~bInTJL?Mx8gen z!-24DW%T%`HWa`+4Q~s`%V4tyTyg}RbARYZ|NeyjI6m3G|{2M4Nx9-EHtxKI@ z8&5SKoiA(IO7Bl!-O#`o(%;jw8%rMJJ$8~Fjd2G1WJh7g&ri@Gt*>pK38%L~!neJB z+XFuz&lMbjtyg)LjGxFl4A{=m*DOqc7>H)` z=1Sm+6Jtf=9O$8j0@$tr6!?kL7n+Tjjy9ney8c=^;SeEuhP(jt#~sL>Z+`LQzL(c} zgcN^7Mb2>yruqHn4`aa05jX@STWK*GJesTyB-fYp=mdQLPtaC3ZzqD7oE(OHfja>B z!FN9}6srdd_~82V)&JQ*LKNv^l*Xu&)N%o~E>k|kux0%t-9g~31Cd(^${~nvh#k-w zNITa!G@bIHS6|{Au0Wjxp}-yoz?_mco>JfUy+Qi>0gZsH{r@FwYr;ZK#x}acvZEvmo?{KdP z-*wc}CF)EZ zn);gHY07h=&-PW{4Gj%dYvuba&CyLw`HiZXw*J#PQ8hO}KFp&={s}u%Iw%&<_J0(? zT;APBE?g$W6MFRuEDQwmyKrB^@_N7?>~vls=5jsS!BZl)wfaoo=04!LkTh)%kx{wy zKT8CXMSV@Y?}zE>>EudKXzOdD^dThHgMKkZ;QL7ADA;!IK81n}i&qE07_jCC#@y%o zvTXq3L7lR`)-e($mN+4P`?Mz+@R5{!EJho|vXr(DWuxuSkE`%+6?iIuyu3g{Uj=HK zb9@a}JRt!zD`O9AohvyNz-iH*Bdzhjo0YmDT8!uA+s~lrP|TwHzhK&h?dLuqR)b4| zs)vWy@BGGz%+qRWjA|J=Tt_ZlLkdA8U4$^Ee3H!sju=Ukrr#QS3RG`X#bU$#PGrR( zZGN=3n%de15GxW9A$HnBtZF)|VsY8NhgK?f9In%=-N4dllCbeFnNgF8t4A{7PL`aS zCLX;94fnpGe9ua&oWn+RAAny{&NNz_7%qO-?;?OW5+hB$it?#b6iAAkje-_5{5jW? z;jJIfvGF9-0^L3l=YX;`GH;p*Ty8l?GSc*tCKoHTymHIj8FtmEpX{C(*zOC%VO+RW z%Lky-{*M*#f#{7nj$E3qz9%kt{J4y@byij}V41yuwa{>3da@5T%KF+|$ZU}AAM|^v z;ar8hYzQj2j_@PaV*I~gm>mGn$dCD!+WHQt%cL-uB&*ni*pPlw?FM2FWDIJ{>4Qhx zPZtB`Ud=PtVV+s?JUH=CH2jgTn`c!4G}hJzW2SK z^!*PrGx>@A-_X#Irgk%MfKs&|SX}#U)kh%6rnTZ05fQ0r*4V>f3I9e;AJHTMA`TO= z<3`=cBPxOZoj~>wfdNP0ClK?|8d%dR0Z%^+Ii!6RZ0@}jCqX(ox;O{)%aT(shSBsK zSoEkz(kaH+`SRA8{VBAh!dMG>iw%;h{hT~|_pSkDUjY#E(4ycL6qy+-vs*3-C5(}e z28cd^uyW>Elgddr-k8Wb{awqv>IyiD@(Th;I}Zjho7MobZq++=KxQV$0T$-P$Ghy$ zI=Z3)_yw{4T}Q`G;7yX4 zuI#(~72zE@#1%I3?LFZ-|@rG+Fg<;zp&S61B>&|iJ; z?Y$2^phjt*0A}|tW|2BZ9@S6?hxBM^j|l~s+6l@NUqvEMU|Zc{u=zU0Dw!g^jwGDij4$B{f%44*crmioC!Acd_S(O zC}o%caRd^-2gIpwzkFc?)Nls+uqGsOdq+n?ok3-bpn2x(St{sMJfPtJfd@&7BfwMp zM#>vWpL9GxU@aPXVsz)-=nkZ}r(iJp47wlzunI&944l9e;7wiK@-ht{J`Eixuznwa z;skTd;VbMy6$EtSjQw>GckTr2_<>U?-t4&SbeCu_TU&RaoFvQ?t^)<(mhIb#&UsHS zjP^Sap@Ns(r*RiVD6c-TLgc+D+ejIWm2?*194H(V9YiDpoh@RRxt&3f%r)pRiO3T^ zU7AL5u%=+kC=wD98(4+6z(kJfYBTJpmz!ewxwuZ_it)YNkGInc9b_{BobY>UD)3Ou z&(j>@D=1ivoIo|B!}oFXjvfU_-j-uTmv7O|v7o^Lrhc!eC@Z$bTZrxs+x}FtcW`Jq zD0K9ws(s)KOou>#nMYW%Y^pXVloIE23C)p@-I7`qhSd=mgH{0@|u>p{_BImz6* zbqti0HDsfa60U3@8Qwz%W|Zu;ft?^hf|vnx3Y|6%kclr3~6;j=g)as16jfZQZ(c8s}}=^Qm?N$a!E-cKRm zG1B%i;MUJzwUM;mx`l&d%?@}6kX(3lA2tvY_J5QyIdg!0KjM6%_DKLpO|Psjw@>DA z0ii-oNDOu@FRxu%oGn{Xg~G!L{{QmZ)%)+T_?uHAo<6+_AT_u&A=vgEnq25R*z~cX zNXl_CaTLSi$#AcG>pa#O1(4h-T49Z&)d;7xyG&F@mu$YV3yM@IZ@!B7bp9n2$)kazBxL_1W(W-{#Q z{dd22$q77%VUvQF^CWgFDsyUp{Md&TglR|jV)m=A@$=_x&c8bNb_x061QnHc6%$)Knwk z1Hy-s!4Imf1Qb9&fcnFraj7uKYy?%upYiU^Sd_Siz56RSyjhqBNr$X{Q%GRfOH4m2 z#uy8zkb-fvf&^*I@c&g{m&BqE1BUej!W}pHMCHP5C0|e(LzA)rMWKACvJpXSKxb^% zq;%EQZNNEEa3VeUQW>Iv1VF+&f+&x!x1t&361eR6&` z4}5m~fb^d~e>8H9H@bkrx^JeoRm{u7gBty%2Y>0`d*`jzoj7^Y0F;tF%3{dVQr_a~ zh^7k>M*nL0E!v?Sr3PM}8{D9Khbq*Yh_``pSQ8imy`=$iBV-7r&HbWkFeDIRKgh?Q z??)R+Nzaah&zg;Bh*b+?47XJJA^hcA2J^`b0vx}BMF91{QF3&d?Fm0JxB^nm5c7lp zMgJ*=9_sM0Fw!%;!>BV-YL2`g{O}G!1p&jVC*1XKuqI{j8Vu9qeb9B2gijQE&524I zU#M`Cq2A!dg*=ZwtG55wmkpBa1#*iJ5V-u0RGZm+pj&dnw#<6nN(c^o^r!hYG|L~5y9ptnp{aSzd9iY0YU&;8 zvMn$lK(2C7cawiSp$W;pt^3yUpzIS05$N(nk)^XBa^XIzhNnn`VaTw=_U7Gi&ov0r z3)mNF@OoQ@*taH|5#1@;rzK@nf4)W}mUzvun5r}~R zr|qO$O1em=C56TV&02X?Lol1`dD+AFB(dYE^T#+`z_zOl-1AHrX*a`uc(x`hY z9CbJ;<1MnOHR!pFURemD2(^+5-rk3*XXiT81|W*(5o6L>>7$;=S6nq^OT_kmu4jQR?e?_c<6@X~g7Xn-_B za(PF5H8}Yy7P|W!-+k2RDhNCXGXNJ{Q+%~d2(g6cA;65n@+TrkhPmJ^*{iU{Rf|VM z6NQ(1AL8SGH7UZYomoZaeN??*(e9(b+Re+m0X2V!B`6?7ef}#3QwkN2pTRN0kCV2=4Oo6Jb_|kA14o_+CU4AHq7Dhg-1}mt=hf{ zGJRewC-5=*F@cUO1;Ayf276J7!F(uLX?^P!`Z6&hPk`2|oxiglB1Nq6ESMRHr3#RF zj<1n{0l9j@VA-R_B(zS4#^wb;3Lx?Ka;C!c@bgoW-;*@zU!6re;8Bsx>6_QSbP3qw zUo_j;3;QEQ?BwllmH|Fs!n`Q>dT8K#pu9m-nvKcid*)y=1Fe|(gtu7?f!n_+;MuCb zvC!n=ZYdbjR|#YW(lZY*QQZAn+S<=xdHVS!eJ5G+xcQ;?Mk3Rup?tdKFa=r<)G?En z09siJxI$hIyruN0Isj+JVI%$o?|M2-h*14_7Q~K>?}l_R{l@dMG6_A@(Z!H5c#k?Q z9+Z=lBij!}u{Eexol+-e_y$^PY^<$Gg8PoL3^#EEnhRXb1N?Gbef>IhJ_)%1BEpuE zkx^&mfhFfGs>ZxYNR2*Xqac`<{?KUZKT6{@Er&(|0LepONNUr(@2GpDm965g617Q!dBR9gXiC=pL zYe4YV?1+u0Qh1RJs!X|>D4g+0ft>5H4v9>%w)Q@xG4NbX4;TiJ`2gDF)hQcGt3c=( zUx|`q`^f$YWkq5ykAK?x-{2C&IVmb1(!l)pvxZ2q|NHS@Wq>#T-%pj>|4uo;uNb}| zwcUR|f>%g=_1_P(07+H<{cK<&74UyQ|NnoD(4|5O2S*}_KSK=vU83Rt&Sasd_h*Ax zK@#wfyWt68VVdeaoe2S)a>|AOP5Hpx22i*D-@WbWmA0bs-`w7RCRS$U?z=NczF4)I zbIb1Ou07kX#i>hAuUfa3UTHczHvYvyg!Jvd@AuQw(Y)|%KDl;i*Ltxa@aa#LBozl1 zWL(0#qa&X_*&IW`vb$y?VtPch<~z3xPUOeR+r!-8m|_CR4K8w0JVc-PQxgZkg*S6 zH(;LNR0k`Zhla`QySH3=1a6;Av~L~-+DWF$3I zDEwRiKt+y_!3LOH#y;%ti+~zu*s=s>IF}uR(Pp2X`{FvFL9%n%(VqeGr5di)K1f4D zQEcfDDXnJFt#o+W#I)zgq2!4E9}ESn0h)%%Bre^qQY^8RAewHzaZC>(w=xFh=A8vd z(IdM*RQnSdfTjI#a*_o(2kf!p6iUplUWV0U)jVZ%l-I-f%gX6mM~k{6&h6)d)K~?_ z&Y>vWdqH%b*Q=p&@vuP0K5$^ z&nl1umO}L)(CuP(3kz%Dmuy~UpjoNLklCPgBX7n-+z@2jg`MTv2l0Cl@zSC#`tz83XpKMr259_RS6>J?0ax~*lH#98okmW+vlkKlDt{*R z0y8mF&y+@-ynP*Opwtcpg9GYUX{iL%fyiF|W$SA0LWHP*qR9|sY7Wjrk9aHL?_;Tp zkqs!JP6KG*0cv>j=`5Lo4SX4;)Lx8nY5WzxhVuKALzcG3Uv+Yl0oK-%BhIvfgeRHUrgSgV_?0?!(MQT2;L8Qwiuv zluMl)=;_zf&}4Vr-`1iM5Y3k6+d z70Q01mCUglQKP4yLv`N|UMk0;U62dH)y_rWBV1r}Lf6qTRsm`=!XbRh@lswE7W>$w z66J%NEm6?z9JqPSCI%|9u2PKdaN;}Pg~MJRb~WIN2X9`Z%>EuX-SzemCcY|&il&{+ zjuyTl2}X$%`4=OSz_r$5NcVp5041%`@5Mf%Pp7`MytOYAApLL*aF;N@eXEvT(U{vk zt*{?GDs*F)HdNv(@Hw-5qNzj@81W!Og*e+8)}=2*<)@5SBGqK{tz? z;C7!LZc(6Kfzg)Sa5KinKC(sY)LiUvlNqCn37I1ZTfW@+^T9yo|1Cm{6v%I9tD8oL zTLX7mA4H&n@J3o})~~pqq?{;!+s>kETt1ZV*O!jaWZc2^!&^H zi*2W;2PL4Hef5Fr@5U+wB}_s!1iN%+Z53`R@=wLG@}~;A3NCe15rn0%5NTL|Q+di48nQLuiWteZ1pd!JXE? zk=zF@lFh^qY3%1%l^05<(Ho3sCFw&vg_c9!!XoWMx1b{r3vhCIICe(|>>#8-$Kp#o z#efL*16g5UW8;J2A_Ngxr0wJvc)AQ>^|KjmJ1lqZTo4u}9KoLGcjdU%Ge&Lap!$#l z!A zd8|d&j0KpQY|w?#SBJHI3$`q?9STU(ex(R{WjAn&AHRMbLUwtAdG&ID|LuOaMMF+m zgN4$Em5qSKA?>}*pJAW8EG!qq@6N!&GW>Ew!4ObR5+qJQ34^BqqpROGemxyB#T0ST zu2AgH!9+vCCL!y| z4sF7vbE#)Uw%ma;))1&g4!jL`%GfkYaVHHj`H6!A z5q&FQ?4l8c3l~DMSv+Mcwn&mtH?IBcMlSSoiX3H*B?j6pVNUUeJJLDZ8Ko_VMtgC`tr zor5^+AQ&L90_HAnJE(LD@|JqmYyqURo`xHC+`OfS^H6F*7aL*h7TE=mdHLJ) z*T^AK5fK~XC!aQkojX%KSaG`NMQQ`sgZ7p!{fQuL288wr=;yKbc03Z$?EQ4vy7eL!794;xf?LN#4Kjn^*- znZ+E=;^>1}ItQ>BD1_QhtM@B|08bdUZ{J6ZQpg!($B@}o_()zX#$d?@v?uD{shyOU zG6dq3gQ?G6(5j=~PsWK7)1&R5jF-$Bv0n@Uc@?_wV3!9SyeNqQP25oQ6^!h7#tMri z*;gDPGtY%U<2Jo*0}jzYIjUhPOTEuw>&i9p z!>ki!Naa7gdv^v5VglqljIC=OjYz5EX;^HSgqiGW*N}^6YF%pJQA%>dRGgPU&82`L z90Kgbz{<)ic4yl9Z@jc4+4JC&1L0MdMd&AYiC^CbbR+;sgULw=nGU54ofCkyeN0KZKT- z^@|Wws^IhLjl;=76GgQ~ON&qt9Ge2^s0>L`R*?=~@@L?Ynb>?owlVuTIbUE1y`+P^ zeOrAdD0u|POPi)p&iN|3V!L9zc{=!GE1@z(x@3gnVI!-q;$)=D3}8lS&37CfQ_1Nq z-wPq^a*kTYC*KYOh^>axHI*G49XmR^kn2)_+9e}crS{L_z+$YB;c3?$3=HuY88DW` z`}lojV-upr}K(C4Yz-kT1#=ct}faSiSMmEHLWC zAZ09jRiLtYTG#O=7)KN^e;z?j87h&$_yemG*ieqy zeJ_8&I{qvTD?EL9p?l@4sEo7}4&^zGGucw`Jws*Z!vx+M>yd*5t^qM0wl2Xi776Eua{VzBm>dbJ*aa8SIgry_ckdkWW}3V8Gmqr}d)9{ymd_2rTR2X?j0oZM+%%mz`&V-%W$g z{d^6!>BwFilfoOl4_R=;A{>J^?0u-Hs8TkI|9vwwBWQk-8x?Ba_0F$jC#G=dwN6;fjkv&;%j;gIH5G;cAc-$$OSJ@cM2CP7GtFP zrdMw@4v_O?jmf!MBS%1IW{2+|5z|#x4j2ZmU2lW^^#Xdb0+nvSe5t5%+AT(mTxvr3 zGr{}Hg#KZ+F56ArQ&gD_vvO`IlYf#n7AHm(PVZG zUnPG6kDamIje|)~oC3fI@$b#bAe|qA=6q z?HwP_L|9Zo-LAR@xNPUGDZyhpt&F!9*7mmAbyiq^b%z0 zRiB?@PK{{9(`p_o)ZIuSj#^Ys4Jq{Kd5my~%+b`HZ__GULAnK@j+cIl+7=nZEE(HL zYu6g&zZ`8w6P@*xu2ScYtIo(zc+VOSJUmkI-C9R~qhlKRZTyz$Zy4t8cd4N?AgA)G z<{F>x9~m+CAIxnOAF)1(rL2PNpZ3QA=PF=WPS}U~h-yP+GCHTTcr1z1gTC98z6T=j zFAE(^RSL4QS+q!$@ zi!*k&Bwv!j#KcZ+Ka5XDXYnaTj~g$S{8r&v$Y(Uu-&HYDA~DTM`pt9e@8v0I5Y;*m zD5yi(qJ?uY*D7idnvlrnW!%x-4uUdU*lelYk{0Or-;mUiF(gpRE|N9mrPB zzA6p#QU)rY^m{$%zHWxaWbev@*Z@ZU&fQ;pUf+4L;U(il}UIRt_H3b)FO zD|USCDV08~>nOKJ+v3U~bjep(u)@B*6Amu-8xp2E*I$afg6FO7qbZt0u~CS1r0_sfv~NXcF?2 z?y)neqf`{sET;5IhfuH*>AtEXG$ZrF*eIJ`8CCc&tF^wR-Ol}jtpB`Othf{eX-lD) zVj2pc)=OE`s`Wp#{;gxj115vLsH$546I``>jH2=dQhW01b+j5zvoJ%RsN4KHW((#0 zn!QD<-eAs`N|I$${DTj-D;`PX)s?zkZ_7F@v7KNz(pJ$362nF)hU3Yh4R#Q$EiGXa zs33^iv7nHjL6p}>I}V!^gjv-;s%N3qd=5#5;Y{_str_$l^Y=!EhLRDrV+k)`c08iq z42^CIyjF_*)~I4r5VRO~jFCYj$B$rcOAF%Gp~)01#y70gz~6RL#{DqNv_j>Z(B-mVsftP`wKo;D^PM?ZR8vtvYp0XZslzB(l)y#W zG6~pKv{0`2b4Y)Nul5<_9wv6Vj*xlHAb4r4J+^C|vhQ)_@RAu)qCAQ=1X`p@?3~i? ztZ=T)ttpg^S&HfaFlOFume2^A!d(u-QTp+C72p#ybSB59YSJk^P@znp_=1^JQG~^adx>kY zMn+#5UUs)bsL5zu6?FNREvDF`ev+rOGaY_$fL?w`PSJl8T@#oK(tb{4h_>tMFmy*rrba%eR(I>KCSahX-85`)Yz0CCv`e{gH+B6siIJH}Bu zv8Ptqq4|0=?t3>B0jPcSJMp|8#h`-i%*@GTY0i}7QhJcJ?M}6!q;iJDRLxQ+w>^!F z8zK*l9E9T0(dRSaR9~z{JmgWZx|2yt&*!1uvOTbt{vtAVZun)S?(mT`1f#D2c@+ag zLY_17>!qQ3v23cyGr`JcwCjb!+qmh=1r)1L;3uW%6y$t3S`&h>Wc=kFB>vgLig@w~ z7(2Z^N>S=FHoE2QlfIS>vxOs@$t4=q`f}v4Vth8i6V-eSbXf0(XL~Cb*S`gDYfX4>SH9j zuYXNQ;C!@)E1yO2veU$aWpVU^)Q58l?I+@;^@yFh-nQalswYv%e=#mZa38LCjy`7Iba2I7mfJeR89C_aSu; z%1KB0)F5TD$c!?T6TLfkavJoUd4K0`k^c>Q=K`C43#xFJr5W|8{YQ>`{{G}MWZfgX zHNe7JJg-it#9qWfEq2j=9{%DnVy4ukt%3VxNMIZQ;K(#ecDuHntzr3SXO4YhiuqQC zTzzXw{``4Wgs0EfcFpM7><0qg?TP#{gNnNC+--oUkLo|*twBOF-4cATt+qgVS4^r-So{drzy75EGV~)9Tm*c889&C4HlQ3#D@soav=1A4K zj1BCMe0^KQp3`kR7>kC~;<*sjZ8BCTX^y*JlwyxFgLGSk3-yH}>KVl@V@H??MNMc$ zJM>zN&;JI(eZq35p$s0HtqW8Ox1WmhuXckeUI_zJ?EZ$s*rCdXpp(~T1<`u_MNOz} z4E+6admGzEERZx#Xqi06YYu&QGZVurs89aV?GDPhE`uymYg~v1Z0V!4)Tm;h4F;om zyz3ES&TK*Mdlk{3Su%|k>d|)Rf*7F4^FU7*DUqa;l=!zYtsHr{$BQe6} zqNk^~X-!4Novu%xK5={u=fZt)#1pYZQU4vV9=0Odn`48NNq84IkJKuHcD!>DTDNqW zo1i!LHI8p~`*w=5W%CC0HEsa)vYEaP_xBskB(&ORAG{dv-q~do?o+bOZN)-oa9eA*eKXi|u%*)U8vaZd#i5v(JHuQaq$lGA<^$R%F$nNdb8!6)Q?!)x zgMg_-&UYhEEUCRh-qLWo4VY$WAPWFmHSmcsflTm@$x3nefINvI)I@FibkcEa+=S}{ z&dWRDWS)6AeZul-`vgSQ!>{W4rw^InrOaC{!N(sk!gQ0X-%A0VktLs*J2Q z41i!2r^vQjgdk9AI$4RX;9%ad>H6mYU6j${)6=(S8+d$sedC?}9Y>!Pb(0QO78Y&- zDmC313m%Q8Cg%&jd-aAO3NaV(KyE*%l|e1lBta&Hw9Yyt%LVSAAFMkr)=gJ2lAUYr z5H4s$0J!6>WK$W*NtQdjCm+e_3pfK{Wy0^t8oH7S3&Uq=pVFUH9Q74INr%(B^ z6%j2r=NaqM7COQ*4^_MhFfR3<(G&P0mccsS@YSe5ch)e3AE$gl-5c{v6a!N2ih5A> z2*UOfG4yAwlLt?52faVaLj?p{-co)Pg)ZikcmVw<&rWdr301uEAg9y@DLdq_X`|3Q zfNs(zY{2V4^hb%4sg61q(d-gp3^XtLl$s)=rMI7$;@%t>0t4Z*?H6WmO zJvYa(oA1T}^VpurHf@S0V11I2IS@F)`EYP3t}$)^Xi$q{MVj+joRx*_*GuU9lTr2q zF4R66?bA;V`Mz%2-ir|+sER|dMK1Dp!~Ie;gfjh?Kp9374ji-`s8_s24lIpfzwZ0jyZyG?()^^YUc~`3@oV1Q29nV^v=s+*mb);Q^5ofN>nyvakMvHkz6r zM5T~4muK2ke(foC&u~mGTW^K)*yv)7;f6m0s`n+1!+4{Bs$kc1fVz*CPaMyeI53C3 zbe_Bw0?5x#UwgFkthy0ZHTeAxD;?3;yVZCBxh2ux0y+jy8*gcehAWCd!?3#4qAS0t z_H*IX_0RsA80B#qrVMTr=7C_CHP#vX%s3G4I_*<9p`I|*MOx}TCTd`){g`byqGA>L z(cK&knf(oqu^dq6W5~Uar{U-tgDo5<0Rr2-iN}NAWMx04j0|XqM7rryTcP;;+}uzM z7~s9exhG^gtdTSybw{mZ@Llx=I$>i+=RX9;CbnAyoFD?A104P^8@_UrzV-2KlS*Y2 zG4D}Zp;OaM0sP&*7X+w3yl>S2htt?-jiPsCgfvD27OIA)1YFh4fjWO8#OL;%uxG}M z?3Oc(fz_NS#r**%RwI5UU@Z|^aj3KC6&V`b`gn$vA3$O`u?t*~-A*8Fl7XdYb$G#r z6y$7j)G3H&`2^r}B%`$*=R77YWS}vGvx5_IK?Zhqe#E1||I^-=|5Kg6?;kbQG}9zC zGmcPw={32{Ex zZNA_C;q&-BW`1FM;JlXS^M2maV`!*ot%9T2ss`IQMhNQSD zD1kqhO;1fxckd30*aoPWX#}$VWeyTr)vq0Ibr43g%Ei}26RJ{=4v9gdx*Hs8{MW{! zcLco^+k86|M))*O1?{fS==eMi*Y?VLc~J0Mq0Cwtuz|1p3U2AbN zOMC^MN8=7MR#WpCO2xV8u-+f5WeL!Bn;3#Dli!yp;npyUzH8&2t$YEoX6Igc z(CYH4z6rUhf#0h30(WhNzR}T9L&3EU8mqW^*71uF)eRaUvBNrXyTVDC)^u(PV~U(_ zLfp5;0WJfIM6$i*GtQg15$RDKuMGbTafN8$P+gl#G%B7%t#1u8!6NQ< zQ-4;%9A@5{c)5 zCZ`lwIMkv=<#Pl7bP>=2DKSu!{sIGwvb0kM2!Y)Yzg>fT(n?FwuV(aZ<}NKo@tZIc zqpltCC)m#3MOn@cXe>x08$p2iQCCmzI<6`>Ki@kc(^-K{#H^tT*td7Du z%!jS|d|`AbkK?vZo%hJFdL840lM5!#9+7V!6KPKy{$^yMz|L)tu&3RV`y#R{vNKbD z&+c%G4n3b9_t~5hj8+Gy0rI83+ zA$$I&))-J7CR6?g_e~vl-q~&w#`5xJy;^nE#00kD{cA)&x7E6;uQhSbb7xJD@@9X` zTvGkme6+wkvHNcEHo=kIGafrC)77|$iw$sLQgwZQF4%7TD-N7U_}b_Kp8MvTZ~FaL z2@A_$94db#<2^h4hOOGe41e#MFRn^k-+en_-fYv0^W94&>q!D?PHp}Whr<~ci2KxA ztaWiY+dAIMZ!(FaZiQ9b{b|H?t4i z`mN#K!L^IO`|dj3s4WXfgWsL1_N9M{VHSL;czv|eOD2Oz@SL%GkjQymC-2uC0Eu6Z1w;$_uu8;qE z!bM3oOKp~Y%8!Z7RXq}xeA4zRJVN?v25x&?C~#Kvh*FrrZT z$k|Ki7rr?!X!6)CPPEakBwAT{GtqcLeC;j&d%{}B0{IFFlb1HO?K`@*D72LiJV6h7 z@Vv-+KldRR?YbxmZ!kUDG|i5V8$Vmz>Zh>vvD@01sJ^FVyAe6ylaR4=c8g1gC4ox1 zNL-xmwxjme5M=aD5^fPtfLA21naLyRA?4lFjm8044aSc>7N6QO50ImgLSFiczBWDcPu#+33kMU9 zgl%C_dLZsC0{ALtZ&!&9Tp(V>t8ZuADRY4!#IN_oz-7~|t9C1X zMhuw!74PsCs2kNl*Smr9-#{y+i14r75x}1NCx^l8IkxHY zIgJY2XX0Hy_#Ia;XT|XzS98YRciu)lJarcu#YTg^$2l_Y4S~Sr&ETc_Vs(aD&rAL@ zpJ7yZH$d@7**Z4C$*s3j)CC`qcc270ipBwfonfKEQs|BDXI|fA{QP@`;%n(W5EaOQ z;@41C#JwqM>hQHECH^{iefG0OW?>fUG2uAYS?j{90$LUnIqY!7@JuM_P6 z*b}d2C(RQHi3-Fp>BgRO4*?4hM~bu~Op(43WR3QqE23zYf$v~A(wyFmzN+uByO6Hc zAP=T@fLLn}vLY7~GcFqH0ozy$HBwTeaL(gr5z2>qJx!$6z8`fO6mqaE*dAiqA-uBH zt75DN7$>4*(b-tmW(ARaB^VFaw?5n%@Lul8xfN(kHShPpaUl1Le%{M3Qn{dB9PSiN z{N29vRjU`z*Lo z&Xl!-J7qPYk|TqVB(W6cjI3E z79eBEjI{MHFJ~^glpQPC^>MoecWPVZ4;44hrMZ7-KUHv0+Zxpmf5)k&)SiuBWR9`j zYs9-sYc8u1BLn>zr2bo9Aymi9cMX{*B!-qG2uf^^yDD8JVk`Ivy>XTJiZy9yLzyoc zM>+@v+!}+ZO(QZ6B^6?un)kuf-1XhqN9z`i3T^@SSYn$zDTh5=XUGU$h!YjqZG=3% zhwYFUfi!b)O#vkGk;tvvc3+f3yP0STJ%IPG(Ohl*3_hkh;;R+AlM2+5_wKf{A6~sr zTZXtSRr5DBLks$e9c?;|~ipyFiBJl`)F!sO&z4w}0jAgFRkQ4)tPYdLu23mc0 zKd4-I&%5|&2(r>PS2*)B_O5KDa(WEl%i!jcwKzVeZU;Rc`Qta60!*yv$066}Z?h?GO+3(ihr@Tqz4!U>XRsXm zJAAI#Ej)^(=C#TPr@{-x+XdY&WSTV3sHytEDP_DK53A<&Y903E5l~iY20lq$-#T6A zvsb++5|yJBW$80T>8E@H6!Y}OpT~Z}A#2{|BpUQa4f(%Fw91*=voqLWo&(a0qR$Is ziAHriciiuFWdfD+V3dB_dbVsQVq<*s3pVv)2H4pt!NEj0#(qhv_eG=0cPR7ah@TjG zc$q#${?;Y?wz;coS+ZEga2YrKZ%*-*cc!XmP=HZsR=ydR+o<^x+9tCf*tNB+uxVq! zZvk)J5p_X^bm10g?HpOl`1bKpboQqDfUv5ct(}G(f4q1-9)C95woc_`)S2xoSm?VY zvFhlQ#MF!d!vc8jZRnVat@_eEgPF&fta5L#1byMi#2W?HUGT71(``54gFApWnb%)0 zgWc$fJedL1pQ-v|vB}7f_QWF*%}#EgrYccVfyK->LW6ws=tefoI5 z*F>`{TKa+zA~^gSC{T&b%I+`$Y0=N|60pK5_`?p}CG&nOHTkh)oq*b?zC(K4x=$VW z@eKDSSbsFnHY0pZKJ@l`_;C^?7VhQ9nA5Muf^nh`-f(>BBU!K`keJ_Wl_OYzI;#VP zsCLKaKSAHRqpA2&MWF-ENw3x)5m-IcTCA)$M!JAh=k@SD3U(rbb?b_R_Pql!_HK#? zX<*BJonV4*<>zY+&?Q3~1$d+NOEdeWiQW||KkUTJw7h1UdH@OCm_3;@J%d~1ee}UX z1e#=;@n8a|L_X^QdW6-i(UO!v3GhvfV<<9A6vD`%G~x5p=r?cPSbm?-@qNW-4m8X| zT%L@q?8T^a@6Q9lb=5?A?YTb-X%^U!NlgceQRq~HipD#nPK}DO3^KD66p6;)B1=LX z3NkQ49Yb`TT@k@;5&fPQ8jB&c$GNL6P4y5L4v0JwD)o+7MMN24WjUdEF44feh2(g zE&#+D(3+hxNbuMIU)Y)E9GnwvKM23&@=t06Si6HNxytD_1%`SZn zvF{BeNrwRHYlN)%peGxRF%mREgg=Z4Gg+z=Z&vKzb}L96q(fp>+973CzMutggfb;x z@RgELs9%SRNIJLUD)of!qr;AcYJ3`O9}K>!Jbn}^gxy4T62V)2;rePXA%mcQ%q;wX zK@fA7f;NeAi$Cy45>VVhy2*v7W4Dj!I^)zkE=#dRr4$t#9RIQgH-M_eQvNI}s#j7J ze=nit1n^)k=shjR+_)&qDzP%# z@Pe9OwtzpI4ERae#Sis+4;p^?-~9(;QwA2vf!YoOg%eF_{lFcF%|40D*K+Czk=G>F zBT0$C-l@h@-|*S0cePoDBT)2hUq6;JuOA$R5sNXvAIIFwG(f7f7cl=MYW2~g#Y4!; z6~NpO3JX1_=`~mj@=+y7Dj&Enn=d?XI{^MjG-wk6e?I~{ojA2pA@$&(_;m+u8CUs+Ol5r`sZ(ODw(A=fe0(76mRf5;Z9urNn`+Z%yulwb{b zSEG(euZsY|T^}%bHx{(7VJTXwdVv3TKj&(OXaBbh>o^?5v*?XBzYo35EblGPKx5t) zT6E8yTbayf#{y275T6K4tQ8~zsqIo9OLj&mXbHs49l^G|inQu18lR-+NDHc=5(>|0 zFz*>MsC4Tkk&T|tJPXO248~7F#07wxIPyf>Ez6I?ZK46-u-F~UB1-Z=*MHqrJp3BO z9W5zalI-)tu!ZY-=ZhtWlR%g(OK|K?cuZX{YJ!KvOd0I^Q>n-&GwWC}0V=R?QBgn; z^s!DPD)Y0tj_3#o27`zt&787Tj&+Y~LBgCdMx1W_8qnSDBjA~auK~vv=j=XMzUlq6 z!Sn#X-A?g7%4wBXF7Pe?+4mpH;^CTN@CvKT%gc{x-~k@O13rvBoQf2E9g0(CB3ibs zmVD%WL}J=aEW3lG{Jy+uOO`1;C8(MU+E33B8Chti2erCTQ1doHhuZ|D z3iXDYFrXu3#u#01Pj0tvP;0La8s3e=35Z_1qrVfD$Lv4HWSrM2pw&L8Ih5;Y9OAzj z1L9*Rc|+_&DU%2=ywY!%EHR?HLoc34Z^z;Rk9his{~|4YAMh&pT2Lr~Idd0IPq)PV z`qvl(r|%gaJ{{JJMR?d(1KIr4^bY=4B?on7nu|HOE&p`u@vZS}r{)19k5+YjaM})g z?g3(E1gVMDua(XLKs66z2V_+~MCT$Emyn9xkqVSA6>Q|6 zJ9pqy*zaV_hz|!0$h_MuY0nQ$McytTEy?&Bz_Pm*%xEyIRx~{atpTg zPqKzu7lXN;H(PGO~Ja!wTiD`meL z!{nsKlMsCOKMJjIf7$FQ-H6a4mto7i4cO2anr$PbT}C(pc?ED}J_wTONP8NJYhD&0 z8RE>OqPig>XX?)L^QKl4^T@Lh#GFDSFN;J}(RECJ@GNHKrKMjSPr#ou#Kg)+l`L9~JB`}f+Uom8Um=Z79kYNdczF(TEZ4^hV*YSF zxjD8eK^wVwG3WFFz5Bp$%~q>3Z=>F%42wXVbzlrDM(;h$&qf2Mwa1f!TOSJ*P?7u? zR26d7#M5hCfb03Y2j_8yU)SZW9%`)N@S&WfaKoupr$=MLayv*325MS7F%7{;$(UuK zlAGiB2J#`$aq8{UD+!PI6eGz@MV4m}#m3x)7mBzXaP@Kxp@uk!YcxXMe;7U5scEcl zPc*61<1GeRRe5!$q0#iC=E$LZ^|6IF|;;w0j@}ndW-0ZaS1f=I(Wzq^sKv5d<1} zHh_qXz<4*t?%vSMhN))}@2^h@H19*X5w*EViZ8aB=S)9{UnmEY<~D}1(}^?A3l=R( zgBm@km5yAC4d`Xu!YbkA-UcJQy0f$Mm<8++Tq3S!8ms~|Uh64Ruf%E@9_na>mV975 z?_I2(tS3^|kSTL5BjVcOL^Z-fz)p#uTN|EoV=E*iMm=VB2COBfWxygvgyA|22hhOD za`=aXT`3s-;Ybz27*NK8KsX*1>KqRkYQL9mn)&SZr{kk^NkHOKKms|PzXYSbuI^51 zY)%0HVIayD{Eczoe&~Pf!G(_>DMAC~9@_0CZD8ogty*$rJ4y-@u-&~)mlE@FUGf`X zzRpcwHU`ii#8d4voWJhWds{@hRABDPV_Ty4Is!%V5;fQZUx&j)k81B~`|=44LlCa= zeVum%6XS(=zKAPv#z$W?ry!~uTu=pTAm1@uP0d?u!C6aB)J!4%5xhfS(dEMK9330x z%vO+PKybma=8!Lr13kwa0UG!GIxt(9VYJSrem-4eOzm+V$`G>!4oW*F^y%hedmjiK zvFHob()y#e2U0`lPe)}rDF|8P<%5Pg=e zLYP|LQvDT)l-V~)Spi6~jKQqZ9}aI@%D4+YEj%04+7f|U8{x7J!@BKFU`;LKDA>2V zk4}J735sO{%{{Bd6FN-#y8=Jv3z%&6)Nxqd*eP5 z^)wg@7IDPTO(jlUJ4Pok;)qfP8d=jWv#u1ZOnEVuo-7Y=q`AiX9Nwq#aU(jY0J!E= zcLglPi^iapm>4zB)bdm2y(u7U4uVT%LReg3fMi*8xZ4@#8p%X3KCNJ8o)I|OSr&LD zs2fA8tm-*1TDAyjDJ}YdR3Zx&S_J=++=g&Ntknio%LC2nu?dY8KS8iSk{4B68MlTkOz*J}zx0}?pHiy|x=<|rEoWkr5S za4+HX2A!_?!0I&qxQ(Fvvw~{XATph);Se0TVIWJg_p#6!)7X}Vx84ALQQBsjwg~=* zP?e9qEhw*DmwyTa@n%jd23q-b_@RykF4t zWkEy+o^*&B3kb%Zg%|+^qzc_;A))Ek_yK^M)a+&Fo{GW|wHtCgO4=K6CY=OL6Z9UE zC?%Aoqq%5^!wNCYiG928In^CN&MN`o*XY64ylA8`Zi{ug^MdIVZvY0 zAxXP;WJ8K-8Sz_+{9+XCt91~u?7`GBeC%qoDYiBX6k;JE`XrB$Hny`h=uj%O_zEyJ z(FaHn>It#C#(GrU0tDTI)SmZuA5s(Zy~oR-Yty)s3U-;_)Vw6b@|h1o$oN#YNnGx` zlEk@`8He=~@zeUll_=n(5P3;AMglST#!yGo2wImBE{fh8*jT!t4&iC*-}>#(JIF>z z3+0Q1j9vgJvv-32Fur%-=|D{~?l=w_P?GXtGLBzDGJKUx!|ibx`OQ8;CZwj32nM@6 z&etNGsM;j^TJYVE*WjUa5=SO{D%Il0KEv;1sx}L$qnlryeRDFCi7KZm;ufjV=8!4u zIB9u#0|=&1-^Db>{FMu1L6ZVvW7w4ha>w^# z{tdM)3gY_EaU4h53h&J#@p(>AF3L{gh_*ujb;IGcc*&A*P$Kjpvm9@W$;dgB)p%vO z5WQRw074+=dQl^ibqVDMtb=r^laBVBVV0dP$)G@L;N;hcLvmH7`WTti&kCoiyK(S0 zp)(>4qb9tm7a%x76E!TJIM+jmd#M6-u1UX)?M96Py~qu+V4GdTiKLIN&anKbo)|KD z(y(R0gNM4ymgv4i199i*)8A%${);4m-;;KW0EtpNUvw^^=Z z1WAjLXHwG&g-Cq!#R}GKyi$u-i(>{5JR?cdp{0daCk0wzBu-AM8Ay!zSpTVx;291 zP4V+qf2r+FhoevNdOD<0kD&grb=gjl0e%i8C{$!ifm^{ zo-4{n4~sfJM8`lv-IE+;T|LU|xd`4+4HS9MOOM^gEl-f(E~N%8dvkJiIOa5<@s7=2 zINIy(VM26Y(k3yle=@ue#6m4mKRnQB0b?vC9Te`5%a@U#guom05~*7r9DMfU(?#+n znmkQX$5aA&(~?T+dyO?~4*<^~Y8v@$4?g1LF_JU_INTroc0dhnIm8Umfj0e@$P507FrT@eMvg z{3xiH4Nv90d+~}$q#%15*`*znYQC0CP^xqY@5`H-hW~&RK%DxZc=jFA}V zp#n}hdv*M=l;QJ8gu_SWKnySV*S$&>&iyKifl-4q=;XWaZ#Dy0xWu62!1t0XTm%-* z-5jHBrGkLK|NmBC=Re;)x%|&{;s4rJ{`-#%#=j3_Zejep80azlFIWs8`R_94i{M($ zY0frJeEDuIzCF5k&amjoUvnYl+$PLi!+*B(C^#}IzLBIap^JCYmz68d(U*nbXl49%^!NV(CTO9a literal 0 HcmV?d00001 diff --git a/assets/nf-core-proteomicslfq_social_preview.svg b/assets/nf-core-proteomicslfq_social_preview.svg new file mode 100644 index 0000000..78e2e03 --- /dev/null +++ b/assets/nf-core-proteomicslfq_social_preview.svg @@ -0,0 +1,448 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + Proteomics label-free quantification (LFQ) analysis pipeline + proteomicslfq + + + + + + + + + + + + + + + + + + + + + + + + + From ef738fb33434317a4eac633da2b413c5d2f6fadf Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Wed, 12 Feb 2020 15:45:42 +0100 Subject: [PATCH 043/374] update CHANGELOG --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 14b0aab..1a0ee8b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,5 @@ # nf-core/proteomicslfq: Changelog ## v1.0dev - [date] + Initial release of nf-core/proteomicslfq, created with the [nf-core](http://nf-co.re/) template. From 09aa1dd43ef7fee1352eab614a6f5a4048e02a6e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 13 Feb 2020 10:41:46 +0100 Subject: [PATCH 044/374] Switch to the docker builds of the release (canditate) branch --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index e76cf17..c64c9b9 100644 --- a/nextflow.config +++ b/nextflow.config @@ -91,7 +91,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'openms/executables' +process.container = 'openms/executables:release-2.5.0' // Load base.config by default for all pipelines includeConfig 'conf/base.config' From af4bc25dd5fd21bbc4bb12b24d9d5ddbe0f925d9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 17 Feb 2020 20:55:48 +0100 Subject: [PATCH 045/374] Added msstats step. Started to fix Rawfileconversion step --- docs/usage.md | 12 +++++++++--- environment.yml | 10 +++++++--- main.nf | 31 +++++++++++++++++++++++++++---- 3 files changed, 43 insertions(+), 10 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 88fbfb0..ee785fd 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -109,20 +109,26 @@ This version number will be logged in reports when you run the pipeline, so that ### `--spectra` -Use this to specify the location of your input mzML files. For example: +Use this to specify the location of your input mzML or Thermo RAW files: ```bash --spectra 'path/to/data/*.mzML' ``` +or + +```bash +--spectra 'path/to/data/*.raw' +``` + Please note the following requirements: 1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character +2. The path must have at least one `*` wildcard character TODO I dont think this is true, can also be a list! check ### `--database` -If you prefer, you can specify the full path to your fasta input protein database when you run the pipeline: +Needs to be given to specify the input protein database when you run the pipeline: ```bash --database '[path to Fasta protein database]' diff --git a/environment.yml b/environment.yml index 6b6aac1..b3f90e2 100644 --- a/environment.yml +++ b/environment.yml @@ -6,6 +6,10 @@ channels: - bioconda - defaults dependencies: - # TODO nf-core: Add required software dependencies here - - fastqc=0.11.8 - - multiqc=1.6 + - openms=2.5.0 + - bioconductor-msstats + - thermorawfileparser + - percolator + - msgf_plus + - comet-ms + #TODO check if thirdparties are in path and are found by the openms adapters \ No newline at end of file diff --git a/main.nf b/main.nf index 47d0f3b..bbc1181 100644 --- a/main.nf +++ b/main.nf @@ -212,16 +212,21 @@ branched_input.mzML */ process raw_file_conversion { + container 'docker://quay.io/biocontainers/thermorawfileparser:1.2.1--0' + input: file rawfile from branched_input.raw output: file "*.mzML" into mzmls_converted - // TODO use actual ThermoRawfileConverter!! + + // TODO check if this sh script is available with bioconda + // else check if the exe is accessible/in PATH on bioconda and use sth like this + // mono ThermoRawfileParser.exe -i=${rawfile} -f=2 -o=./ script: """ - mv ${rawfile} ${rawfile.baseName}.mzML + ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ """ } @@ -620,11 +625,10 @@ process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: - file mzmls from mzmls_plfq.toSortedList({ a, b -> b.baseName <=> a.baseName }).view() + file mzmls from mzmls_plfq.toSortedList({ a, b -> b.baseName <=> a.baseName }) file id_files from id_files_idx_feat_perc_fdr_filter_switched .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) .toSortedList({ a, b -> b.baseName <=> a.baseName }) - .view() file expdes from expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) @@ -659,6 +663,25 @@ process proteomicslfq { } +// TODO the script supports a control condition as third argument +// TODO the second argument can be "pairwise" or TODO later a user defined contrast string + +process msstats { + container 'docker://quay.io/biocontainers/bioconductor-msstats:3.18.0--r36_0' + publishDir "${params.outdir}/msstats", mode: 'copy' + + input: + file csv from out_msstats + + output: + file "*.pdf" + file "*.csv" + + script: + """ + msstats_plfq.R ${csv} || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." + """ +} From e280317321a73884efbc97c19344dd913217aca5 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 17 Feb 2020 20:56:38 +0100 Subject: [PATCH 046/374] forgot script --- bin/msstats_plfq.R | 101 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100755 bin/msstats_plfq.R diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R new file mode 100755 index 0000000..2121188 --- /dev/null +++ b/bin/msstats_plfq.R @@ -0,0 +1,101 @@ +#!/usr/bin/env Rscript +args = commandArgs(trailingOnly=TRUE) + +if (length(args)==0) { + stop("At least one argument must be supplied (input file).n", call.=FALSE) +} +if (length(args)<=1) { + # contrasts + args[2] = "pairwise" +} +if (length(args)<=2) { + # default control condition + args[3] = "" +} +if (length(args)<=3) { + # default output prefix + args[4] = "out" +} + +# load the MSstats library +require(MSstats) + +# read dataframe into MSstats +data <- read.csv(args[1]) +quant <- OpenMStoMSstatsFormat(data, + removeProtein_with1Feature = FALSE) + +# process data +processed.quant <- dataProcess(quant, censoredInt = 'NA') + +lvls <- levels(as.factor(data$Condition)) + +if (args[2] == "pairwise") +{ + if (args[3] == "") + { + l <- length(lvls) + contrast_mat <- matrix(nrow = l * (l-1) / 2, ncol = l) + rownames(contrast_mat) <- rep(NA, l * (l-1) / 2) + colnames(contrast_mat) <- lvls + c <- 1 + for (i in 1:(l-1)) + { + for (j in (i+1):l) + { + comparison <- rep(0,l) + comparison[i] <- -1 + comparison[j] <- 1 + contrast_mat[c,] <- comparison + rownames(contrast_mat)[c] <- paste0(lvls[i],"-",lvls[j]) + c <- c+1 + } + } + } else { + control <- which(as.character(lvls) == args[3]) + if (length(control) == 0) + { + stop("Control condition not part of found levels.n", call.=FALSE) + } + + l <- length(lvls) + contrast_mat <- matrix(nrow = l-1, ncol = l) + rownames(contrast_mat) <- rep(NA, l-1) + colnames(contrast_mat) <- lvls + c <- 1 + for (j in setdiff(1:l,control)) + { + comparison <- rep(0,l) + comparison[i] <- -1 + comparison[j] <- 1 + contrast_mat[c,] <- comparison + rownames(contrast_mat)[c] <- paste0(lvls[i],"-",lvls[j]) + c <- c+1 + } + } +} + +print ("Contrasts to be tested:") +print (contrast_mat) +#TODO allow for user specified contrasts + +test.MSstats <- groupComparison(contrast.matrix=contrast_mat, data=processed.quant) + +write.csv(test.MSstats$ComparisonResult, "msstats_results.csv") + +groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", + width=12, height=12,dot.size = 2,ylimUp = 7) +groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", + width=12, height=12,dot.size = 2,ylimUp = 7) +groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", + width=12, height=12,dot.size = 2,ylimUp = 7) + + +#for (comp in rownames(contrast_mat)) +#{ +# groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", +# width=12, height=12,dot.size = 2,ylimUp = 7, sig=1)#, +# which.Comparison = comp, +# address=F) +# # try to plot all comparisons +#} \ No newline at end of file From bdde1c4aa9d0811c3601f7516587b2a99500e3d8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 17 Feb 2020 20:58:25 +0100 Subject: [PATCH 047/374] remove travis --- .travis.yml | 43 ------------------------------------------- 1 file changed, 43 deletions(-) delete mode 100644 .travis.yml diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index f4ffb2d..0000000 --- a/.travis.yml +++ /dev/null @@ -1,43 +0,0 @@ -sudo: required -language: python -jdk: openjdk8 -services: docker -python: '3.6' -cache: pip -matrix: - fast_finish: true - -before_install: - # PRs to master are only ok if coming from dev branch - - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])' - # Pull the docker image first so the test doesn't wait for this - #- docker pull nfcore/proteomicslfq:dev - - docker pull openms/executables - # Fake the tag locally so that the pipeline runs properly - # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - #- docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - -install: - # Install Nextflow - - mkdir /tmp/nextflow && cd /tmp/nextflow - - wget -qO- get.nextflow.io | bash - - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow - # Install nf-core/tools - - pip install --upgrade pip - - pip install nf-core - # Reset - - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests - # Install markdownlint-cli - - sudo apt-get install npm && npm install -g markdownlint-cli - -env: - - NXF_VER='19.10.0' NXF_ANSI_LOG=0 # Specify a minimum NF version that should be tested and work - - NXF_VER='' NXF_ANSI_LOG=0 # Plus: get the latest NF version and check that it works - -script: - # Lint the pipeline code - - nf-core lint ${TRAVIS_BUILD_DIR} - # Lint the documentation - - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml - # Run the pipeline with the test profile - - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker From 788f8d0372abcf0758178e3ec56800c4ba283290 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 17 Feb 2020 21:36:01 +0100 Subject: [PATCH 048/374] removed docker prefix --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index bbc1181..4316071 100644 --- a/main.nf +++ b/main.nf @@ -212,7 +212,7 @@ branched_input.mzML */ process raw_file_conversion { - container 'docker://quay.io/biocontainers/thermorawfileparser:1.2.1--0' + container 'quay.io/biocontainers/thermorawfileparser:1.2.1--0' input: file rawfile from branched_input.raw @@ -667,7 +667,7 @@ process proteomicslfq { // TODO the second argument can be "pairwise" or TODO later a user defined contrast string process msstats { - container 'docker://quay.io/biocontainers/bioconductor-msstats:3.18.0--r36_0' + container 'quay.io/biocontainers/bioconductor-msstats:3.18.0--r36_0' publishDir "${params.outdir}/msstats", mode: 'copy' input: From b04f5a5897812bc78ea9a8ee3e62420028972486 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 18 Feb 2020 00:17:28 +0100 Subject: [PATCH 049/374] heatmaps only for more than one comparison --- bin/msstats_plfq.R | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 2121188..d1e46a4 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -78,18 +78,21 @@ if (args[2] == "pairwise") print ("Contrasts to be tested:") print (contrast_mat) #TODO allow for user specified contrasts - test.MSstats <- groupComparison(contrast.matrix=contrast_mat, data=processed.quant) +#TODO allow manual input (e.g. proteins of interest) write.csv(test.MSstats$ComparisonResult, "msstats_results.csv") groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", width=12, height=12,dot.size = 2,ylimUp = 7) -groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", - width=12, height=12,dot.size = 2,ylimUp = 7) groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", width=12, height=12,dot.size = 2,ylimUp = 7) +if (nrow(constrast_mat) > 1) +{ + groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", + width=12, height=12,dot.size = 2,ylimUp = 7) +} #for (comp in rownames(contrast_mat)) #{ @@ -98,4 +101,4 @@ groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", # which.Comparison = comp, # address=F) # # try to plot all comparisons -#} \ No newline at end of file +#} From 9fef25b5c0298f2126ccda25dfb6b89dc3ce92de Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Feb 2020 19:09:08 +0100 Subject: [PATCH 050/374] Added preliminary requirements to environment.yml to build dockerhub container --- environment.yml | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/environment.yml b/environment.yml index b3f90e2..d8d186f 100644 --- a/environment.yml +++ b/environment.yml @@ -6,10 +6,15 @@ channels: - bioconda - defaults dependencies: + # bioconda - openms=2.5.0 - - bioconductor-msstats + - bioconductor-msstats # will include R - thermorawfileparser - percolator - - msgf_plus + - msgf_plus # will include JRE - comet-ms + # conda-forge or defaults + - python # for plotting Percolator results (TODO) + - gnuplot # for plotting IDPEP results (TODO) + #TODO check if thirdparties are in path and are found by the openms adapters \ No newline at end of file From 0995e8903a60c6e76dfc0a3acc986d7baf5e14d3 Mon Sep 17 00:00:00 2001 From: Zethson Date: Wed, 19 Feb 2020 22:26:02 +0100 Subject: [PATCH 051/374] [FIX] markdown lint --- docs/configuration/adding_your_own.md | 5 +++++ docs/configuration/local.md | 3 +++ docs/configuration/reference_genomes.md | 2 ++ 3 files changed, 10 insertions(+) diff --git a/docs/configuration/adding_your_own.md b/docs/configuration/adding_your_own.md index e7f0f92..4aaf959 100644 --- a/docs/configuration/adding_your_own.md +++ b/docs/configuration/adding_your_own.md @@ -9,6 +9,7 @@ If you are the only person to be running this pipeline, you can create your conf A basic configuration comes with the pipeline, which loads the [`conf/base.config`](../../conf/base.config) by default. This means that you only need to configure the specifics for your system and overwrite any defaults that you want to change. ## Cluster Environment + By default, pipeline uses the `local` Nextflow executor - in other words, all jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node. To specify your cluster environment, add the following line to your config file: @@ -30,11 +31,13 @@ process { ## Software Requirements + To run the pipeline, several software packages are required. How you satisfy these requirements is essentially up to you and depends on your system. If possible, we _highly_ recommend using either Docker or Singularity. Please see the [`installation documentation`](../installation.md) for how to run using the below as a one-off. These instructions are about configuring a config file for repeated use. ### Docker + Docker is a great way to run nf-core/proteomicslfq, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems. Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required - nextflow will automatically fetch the [nfcore/proteomicslfq](https://hub.docker.com/r/nfcore/proteomicslfq/) image that we have created and is hosted at dockerhub at run time. @@ -50,6 +53,7 @@ Note that the dockerhub organisation name annoyingly can't have a hyphen, so is ### Singularity image + Many HPC environments are not able to run Docker due to security issues. [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. @@ -78,6 +82,7 @@ process.container = "/path/to/nf-core-proteomicslfq.simg" ### Conda + If you're not able to use Docker or Singularity, you can instead use conda to manage the software requirements. To use conda in your own config file, add the following: diff --git a/docs/configuration/local.md b/docs/configuration/local.md index 350d3bb..1faa73d 100644 --- a/docs/configuration/local.md +++ b/docs/configuration/local.md @@ -3,6 +3,7 @@ If running the pipeline in a local environment, we highly recommend using either Docker or Singularity. ## Docker + Docker is a great way to run `nf-core/proteomicslfq`, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems. Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required. The `nf-core/proteomicslfq` profile comes with a configuration profile for docker, making it very easy to use. This also comes with the required presets to use the AWS iGenomes resource, meaning that if using common reference genomes you just specify the reference ID and it will be automatically downloaded from AWS S3. @@ -20,10 +21,12 @@ Nextflow will recognise `nf-core/proteomicslfq` and download the pipeline from G For more information about how to work with reference genomes, see [`docs/configuration/reference_genomes.md`](reference_genomes.md). ### Pipeline versions + The public docker images are tagged with the same version numbers as the code, which you can use to ensure reproducibility. When running the pipeline, specify the pipeline version with `-r`, for example `-r 1.0`. This uses pipeline code and docker image from this tagged version. ## Singularity image + Many HPC environments are not able to run Docker due to security issues. [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. Even better, it can use create images directly from dockerhub. To use the singularity image for a single run, use `-with-singularity`. This will download the docker container from dockerhub and create a singularity image for you dynamically. diff --git a/docs/configuration/reference_genomes.md b/docs/configuration/reference_genomes.md index 3a2c9df..06e3c0d 100644 --- a/docs/configuration/reference_genomes.md +++ b/docs/configuration/reference_genomes.md @@ -8,6 +8,7 @@ See below for instructions on how to do this. Read [Adding your own system](adding_your_own.md) to find out how to set up custom config files. ## Adding paths to a config file + Specifying long paths every time you run the pipeline is a pain. To make this easier, the pipeline comes configured to understand reference genome keywords which correspond to preconfigured paths, meaning that you can just specify `--genome ID` when running the pipeline. @@ -33,6 +34,7 @@ params { You can add as many genomes as you like as long as they have unique IDs. ## illumina iGenomes + To make the use of reference genomes easier, illumina has developed a centralised resource called [iGenomes](https://support.illumina.com/sequencing/sequencing_software/igenome.html). Multiple reference index types are held together with consistent structure for multiple genomes. From 55d671afd185b2eee2a7ffb2a903b7b10097019b Mon Sep 17 00:00:00 2001 From: Zethson Date: Wed, 19 Feb 2020 22:36:49 +0100 Subject: [PATCH 052/374] [FIX] hasExtension --- main.nf | 42 ++++++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 22 deletions(-) diff --git a/main.nf b/main.nf index cac6179..515b742 100644 --- a/main.nf +++ b/main.nf @@ -10,7 +10,6 @@ */ def helpMessage() { - // TODO nf-core: Add to this help message with new command line parameters log.info nfcoreHeader() log.info""" @@ -140,7 +139,7 @@ if (params.help){ // Has the run name been specified by the user? // this has the bonus effect of catching both -name and --name custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { +if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ custom_runName = workflow.runName } @@ -670,31 +669,28 @@ process proteomicslfq { // Header log info log.info nfcoreHeader() def summary = [:] -if (workflow.revision) summary['Pipeline Release'] = workflow.revision summary['Run Name'] = custom_runName ?: workflow.runName // TODO nf-core: Report custom parameters here summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output dir'] = params.outdir summary['Launch dir'] = workflow.launchDir summary['Working dir'] = workflow.workDir summary['Script dir'] = workflow.projectDir summary['User'] = workflow.userName -if (workflow.profile.contains('awsbatch')) { - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue - summary['AWS CLI'] = params.awscli +if(workflow.profile == 'awsbatch'){ + summary['AWS Region'] = params.awsregion + summary['AWS Queue'] = params.awsqueue } summary['Config Profile'] = workflow.profile -if (params.config_profile_description) summary['Config Description'] = params.config_profile_description -if (params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact -if (params.config_profile_url) summary['Config URL'] = params.config_profile_url -if (params.email || params.email_on_fail) { - summary['E-mail Address'] = params.email - summary['E-mail on failure'] = params.email_on_fail +if(params.config_profile_description) summary['Config Description'] = params.config_profile_description +if(params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact +if(params.config_profile_url) summary['Config URL'] = params.config_profile_url +if(params.email) { + summary['E-mail Address'] = params.email } log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "-\033[2m--------------------------------------------------\033[0m-" +log.info "\033[2m----------------------------------------------------\033[0m" // Check the hostnames against configured profiles checkHostname() @@ -716,19 +712,14 @@ ${summary.collect { k,v -> "
$k
${v ?: ' - if (filename.indexOf(".csv") > 0) filename - else null - } output: - file 'software_versions_mqc.yaml' into ch_software_versions_yaml - file "software_versions.csv" + file 'software_versions_mqc.yaml' into software_versions_yaml script: // TODO nf-core: Get all tools to print their version number here @@ -758,6 +749,8 @@ process output_documentation { markdown_to_html.r $output_docs results_description.html """ } +*/ + /* * Completion e-mail notification @@ -897,3 +890,8 @@ def checkHostname() { } } } + +// Check file extension +def hasExtension(it, extension) { + it.toString().toLowerCase().endsWith(extension.toLowerCase()) +} From 29a221f6310145cc2bba197a156a63f7d5b85785 Mon Sep 17 00:00:00 2001 From: runner Date: Thu, 20 Feb 2020 15:33:06 +0000 Subject: [PATCH 053/374] Template update for nf-core/tools version 1.9 --- .github/workflows/ci.yml | 3 +- .github/workflows/linting.yml | 39 ++++++++----- .gitignore | 3 +- .travis.yml | 47 ---------------- Dockerfile | 2 +- README.md | 14 +++-- assets/multiqc_config.yaml | 4 +- bin/markdown_to_html.py | 100 ++++++++++++++++++++++++++++++++++ bin/markdown_to_html.r | 51 ----------------- conf/test.config | 26 +++++++++ docs/usage.md | 9 ++- environment.yml | 5 +- main.nf | 30 +++++----- nextflow.config | 2 +- 14 files changed, 193 insertions(+), 142 deletions(-) delete mode 100644 .travis.yml create mode 100755 bin/markdown_to_html.py delete mode 100755 bin/markdown_to_html.r create mode 100644 conf/test.config diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 704d792..03c888b 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -21,7 +21,8 @@ jobs: sudo mv nextflow /usr/local/bin/ - name: Pull docker image run: | - docker pull nfcore/proteomicslfq:dev && docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + docker pull nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 7354dc7..1e0827a 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,26 +1,39 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. # It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines -on: [push, pull_request] +on: + push: + pull_request: + release: + types: [published] jobs: Markdown: - runs-on: ubuntu-18.04 + runs-on: ubuntu-latest steps: - - uses: actions/checkout@v1 + - uses: actions/checkout@v2 - uses: actions/setup-node@v1 with: node-version: '10' - name: Install markdownlint - run: | - npm install -g markdownlint-cli + run: npm install -g markdownlint-cli - name: Run Markdownlint - run: | - markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml - nf-core: + run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + YAML: runs-on: ubuntu-latest steps: - uses: actions/checkout@v1 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install yaml-lint + run: npm install -g yaml-lint + - name: Run yaml-lint + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + nf-core: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash @@ -29,13 +42,9 @@ jobs: with: python-version: '3.6' architecture: 'x64' - - name: Install pip - run: | - sudo apt install python3-pip - pip install --upgrade pip - - name: Install nf-core tools + - name: Install dependencies run: | + python -m pip install --upgrade pip pip install nf-core - name: Run nf-core lint - run: | - nf-core lint ${GITHUB_WORKSPACE} + run: nf-core lint ${GITHUB_WORKSPACE} diff --git a/.gitignore b/.gitignore index 0189a44..6354f37 100644 --- a/.gitignore +++ b/.gitignore @@ -3,5 +3,6 @@ work/ data/ results/ .DS_Store -test* +tests/ +testing/ *.pyc diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 9a49c43..0000000 --- a/.travis.yml +++ /dev/null @@ -1,47 +0,0 @@ -sudo: required -language: python -jdk: openjdk8 -services: docker -python: '3.6' -cache: pip -matrix: - fast_finish: true - -before_install: - # PRs to master are only ok if coming from dev branch - - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ]) || [ $TRAVIS_PULL_REQUEST_BRANCH = "patch" ]' - # Pull the docker image first so the test doesn't wait for this - - docker pull nfcore/proteomicslfq:dev - # Fake the tag locally so that the pipeline runs properly - # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - -install: - # Install Nextflow - - mkdir /tmp/nextflow && cd /tmp/nextflow - - wget -qO- get.nextflow.io | bash - - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow - # Install nf-core/tools - - pip install --upgrade pip - - pip install nf-core - # Reset - - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests - # Install markdownlint-cli - - sudo apt-get install npm && npm install -g markdownlint-cli - -env: - # Tower token is to inspect runs on https://tower.nf - # Use public mailbox nf-core@mailinator.com to log in: https://www.mailinator.com/v3/index.jsp?zone=public&query=nf-core - # Specify a minimum NF version that should be tested and work - - NXF_VER='19.10.0' TOWER_ACCESS_TOKEN="1c1f493bc2703472d6f1b9f6fb9e9d117abab7b1" - # Plus: get the latest NF version and check that it works - - NXF_VER='' TOWER_ACCESS_TOKEN="1c1f493bc2703472d6f1b9f6fb9e9d117abab7b1" - - -script: - # Lint the pipeline code - - nf-core lint ${TRAVIS_BUILD_DIR} - # Lint the documentation - - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml - # Run the pipeline with the test profile - - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker -ansi-log false -name proteomicslfq-${TRAVIS_EVENT_TYPE}-${TRAVIS_PULL_REQUEST}-${TRAVIS_COMMIT:0:6}-test-description diff --git a/Dockerfile b/Dockerfile index aeee070..2cb1395 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,4 +1,4 @@ -FROM nfcore/base:1.8 +FROM nfcore/base:1.9 LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" \ description="Docker image containing all software requirements for the nf-core/proteomicslfq pipeline" diff --git a/README.md b/README.md index 93974e4..4832f9b 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,6 @@ **Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.**. -[![Build Status](https://travis-ci.com/nf-core/proteomicslfq.svg?branch=master)](https://travis-ci.com/nf-core/proteomicslfq) [![GitHub Actions CI Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) [![GitHub Actions Linting Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) @@ -18,7 +17,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool i. Install [`nextflow`](https://nf-co.re/usage/installation) -ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) +ii. Install either [`Docker`](https://docs.docker.com/engine/installation/) or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) for full pipeline reproducibility (please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles)) iii. Download the pipeline and test it on a minimal dataset with a single command @@ -26,7 +25,7 @@ iii. Download the pipeline and test it on a minimal dataset with a single comman nextflow run nf-core/proteomicslfq -profile test, ``` -> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile institute` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. +> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. iv. Start running your own analysis! @@ -68,6 +67,11 @@ For further information or help, don't hesitate to get in touch on [Slack](https -You can cite the `nf-core` pre-print as follows: +You can cite the `nf-core` publication as follows: -> Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1). +> **The nf-core framework for community-curated bioinformatics pipelines.** +> +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml index 5cc6771..f47b0ad 100644 --- a/assets/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -3,7 +3,9 @@ report_comment: > analysis pipeline. For information about how to interpret these results, please see the documentation. report_section_order: - nf-core/proteomicslfq-software-versions: + software_versions: order: -1000 + nf-core-proteomicslfq-summary: + order: -1001 export_plots: true diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py new file mode 100755 index 0000000..57cc426 --- /dev/null +++ b/bin/markdown_to_html.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python +from __future__ import print_function +import argparse +import markdown +import os +import sys + +def convert_markdown(in_fn): + input_md = open(in_fn, mode="r", encoding="utf-8").read() + html = markdown.markdown( + "[TOC]\n" + input_md, + extensions = [ + 'pymdownx.extra', + 'pymdownx.b64', + 'pymdownx.highlight', + 'pymdownx.emoji', + 'pymdownx.tilde', + 'toc' + ], + extension_configs = { + 'pymdownx.b64': { + 'base_path': os.path.dirname(in_fn) + }, + 'pymdownx.highlight': { + 'noclasses': True + }, + 'toc': { + 'title': 'Table of Contents' + } + } + ) + return html + +def wrap_html(contents): + header = """ + + + + + +
+ """ + footer = """ +
+ + + """ + return header + contents + footer + + +def parse_args(args=None): + parser = argparse.ArgumentParser() + parser.add_argument('mdfile', type=argparse.FileType('r'), nargs='?', + help='File to convert. Defaults to stdin.') + parser.add_argument('-o', '--out', type=argparse.FileType('w'), + default=sys.stdout, + help='Output file name. Defaults to stdout.') + return parser.parse_args(args) + +def main(args=None): + args = parse_args(args) + converted_md = convert_markdown(args.mdfile.name) + html = wrap_html(converted_md) + args.out.write(html) + +if __name__ == '__main__': + sys.exit(main()) diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r deleted file mode 100755 index abe1335..0000000 --- a/bin/markdown_to_html.r +++ /dev/null @@ -1,51 +0,0 @@ -#!/usr/bin/env Rscript - -# Command line argument processing -args = commandArgs(trailingOnly=TRUE) -if (length(args) < 2) { - stop("Usage: markdown_to_html.r ", call.=FALSE) -} -markdown_fn <- args[1] -output_fn <- args[2] - -# Load / install packages -if (!require("markdown")) { - install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') - library("markdown") -} - -base_css_fn <- getOption("markdown.HTML.stylesheet") -base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) -custom_css <- paste(base_css, " -body { - padding: 3em; - margin-right: 350px; - max-width: 100%; -} -#toc { - position: fixed; - right: 20px; - width: 300px; - padding-top: 20px; - overflow: scroll; - height: calc(100% - 3em - 20px); -} -#toc_header { - font-size: 1.8em; - font-weight: bold; -} -#toc > ul { - padding-left: 0; - list-style-type: none; -} -#toc > ul ul { padding-left: 20px; } -#toc > ul > li > a { display: none; } -img { max-width: 800px; } -") - -markdownToHTML( - file = markdown_fn, - output = output_fn, - stylesheet = custom_css, - options = c('toc', 'base64_images', 'highlight_code') -) diff --git a/conf/test.config b/conf/test.config new file mode 100644 index 0000000..7aa0dde --- /dev/null +++ b/conf/test.config @@ -0,0 +1,26 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a fast and simple test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test, + */ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 48.h + + // Input data + // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets + // TODO nf-core: Give any required params for the test so that command line flags are not needed + single_end = false + readPaths = [ + ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], + ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] + ] +} diff --git a/docs/usage.md b/docs/usage.md index 81d539a..5eaf1d9 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -94,6 +94,8 @@ Use this parameter to choose a configuration profile. Profiles can give configur Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! @@ -101,15 +103,16 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles. If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -* `conda` - * A generic configuration profile to be used with [conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) * `docker` * A generic configuration profile to be used with [Docker](http://docker.com/) * Pulls software from dockerhub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) * `singularity` * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) * Pulls software from DockerHub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) +* `conda` + * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. + * A generic configuration profile to be used with [Conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) * `test` * A profile with a complete configuration for automated testing * Includes links to test data so needs no other parameters diff --git a/environment.yml b/environment.yml index ee31f95..a35b0d6 100644 --- a/environment.yml +++ b/environment.yml @@ -7,8 +7,9 @@ channels: - defaults dependencies: - conda-forge::python=3.7.3 + - conda-forge::markdown=3.1.1 + - conda-forge::pymdown-extensions=6.0 + - conda-forge::pygments=2.5.2 # TODO nf-core: Add required software dependencies here - bioconda::fastqc=0.11.8 - bioconda::multiqc=1.7 - - conda-forge::r-markdown=1.1 - - conda-forge::r-base=3.6.1 diff --git a/main.nf b/main.nf index a0c6d51..eed02bd 100644 --- a/main.nf +++ b/main.nf @@ -23,7 +23,7 @@ def helpMessage() { Mandatory arguments: --reads [file] Path to input data (must be surrounded with quotes) -profile [str] Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, test, awsbatch and more + Available: conda, docker, singularity, test, awsbatch, and more Options: --genome [str] Name of iGenomes reference @@ -90,7 +90,8 @@ if (workflow.profile.contains('awsbatch')) { } // Stage config files -ch_multiqc_config = file(params.multiqc_config, checkIfExists: true) +ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) /* @@ -153,9 +154,10 @@ log.info "-\033[2m--------------------------------------------------\033[0m-" // Check the hostnames against configured profiles checkHostname() -def create_workflow_summary(summary) { - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ +Channel.from(summary.collect{ [it.key, it.value] }) + .map { k,v -> "
$k
${v ?: 'N/A'}
" } + .reduce { a, b -> return [a, b].join("\n ") } + .map { x -> """ id: 'nf-core-proteomicslfq-summary' description: " - this information is collected when the pipeline is started." section_name: 'nf-core/proteomicslfq Workflow Summary' @@ -163,12 +165,10 @@ def create_workflow_summary(summary) { plot_type: 'html' data: |
-${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} + $x
- """.stripIndent() - - return yaml_file -} + """.stripIndent() } + .set { ch_workflow_summary } /* * Parse software version numbers @@ -225,11 +225,12 @@ process multiqc { publishDir "${params.outdir}/MultiQC", mode: 'copy' input: - file multiqc_config from ch_multiqc_config + file (multiqc_config) from ch_multiqc_config + file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) // TODO nf-core: Add in log files from your new processes for MultiQC to find! file ('fastqc/*') from ch_fastqc_results.collect().ifEmpty([]) file ('software_versions/*') from ch_software_versions_yaml.collect() - file workflow_summary from create_workflow_summary(summary) + file workflow_summary from ch_workflow_summary.collectFile(name: "workflow_summary_mqc.yaml") output: file "*multiqc_report.html" into ch_multiqc_report @@ -239,9 +240,10 @@ process multiqc { script: rtitle = custom_runName ? "--title \"$custom_runName\"" : '' rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' // TODO nf-core: Specify which MultiQC modules to use with -m for a faster run time """ - multiqc -f $rtitle $rfilename --config $multiqc_config . + multiqc -f $rtitle $rfilename $custom_config_file . """ } @@ -259,7 +261,7 @@ process output_documentation { script: """ - markdown_to_html.r $output_docs results_description.html + markdown_to_html.py $output_docs -o results_description.html """ } diff --git a/nextflow.config b/nextflow.config index 83c89c0..7d918af 100644 --- a/nextflow.config +++ b/nextflow.config @@ -17,7 +17,7 @@ params { // Boilerplate options name = false - multiqc_config = "$baseDir/assets/multiqc_config.yaml" + multiqc_config = false email = false email_on_fail = false max_multiqc_email_size = 25.MB From 999f1613b5fd99ed2b19cc1aa7c8ab3c87d7f208 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:17:14 +0100 Subject: [PATCH 054/374] Update ci.yml --- .github/workflows/ci.yml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 03c888b..0bea96e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,9 +20,10 @@ jobs: wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Pull docker image + # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull nfcore/proteomicslfq:dev - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + docker pull openms/executables:release-2.5.0:dev + docker tag openms/executables:release-2.5.0:dev openms/executables:release-2.5.0:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required From 1c25ad893fcbd801b83ac1736c211d23911aac67 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:19:21 +0100 Subject: [PATCH 055/374] [FIX] their markdown errors --- docs/configuration/adding_your_own.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/configuration/adding_your_own.md b/docs/configuration/adding_your_own.md index 4aaf959..25224c6 100644 --- a/docs/configuration/adding_your_own.md +++ b/docs/configuration/adding_your_own.md @@ -29,7 +29,6 @@ process { } ``` - ## Software Requirements To run the pipeline, several software packages are required. How you satisfy these requirements is essentially up to you and depends on your system. If possible, we _highly_ recommend using either Docker or Singularity. @@ -51,7 +50,6 @@ process.container = "nfcore/proteomicslfq" Note that the dockerhub organisation name annoyingly can't have a hyphen, so is `nfcore` and not `nf-core`. - ### Singularity image Many HPC environments are not able to run Docker due to security issues. @@ -80,7 +78,6 @@ singularity.enabled = true process.container = "/path/to/nf-core-proteomicslfq.simg" ``` - ### Conda If you're not able to use Docker or Singularity, you can instead use conda to manage the software requirements. From e8868aff82a3c83a49366cb50e73d989e8d0cd9c Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:23:14 +0100 Subject: [PATCH 056/374] Markdown lint --- docs/installation.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/installation.md b/docs/installation.md index 8c2e829..cb02f19 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -63,6 +63,7 @@ If you would like to make changes to the pipeline, it's best to make a fork on G ## Pipeline configuration + By default, the pipeline loads a basic server configuration [`conf/base.config`](../conf/base.config) This uses a number of sensible defaults for process requirements and is suitable for running on a simple (if powerful!) local server. @@ -76,11 +77,13 @@ Be warned of two important points about this default configuration: * It's expected to use an additional config profile for docker, singularity or conda support. See below. ### Docker + First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/) Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/proteomicslfq). ### Singularity + If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative. The process is very similar: running the pipeline with the option `-profile singularity` tells Nextflow to enable singularity for this run. An image containing all of the software requirements will be automatically fetched and used from singularity hub. @@ -99,6 +102,7 @@ nextflow run /path/to/nf-core-proteomicslfq -with-singularity nf-core-proteomics Remember to pull updated versions of the singularity image if you update the pipeline. ### Conda + If you're not able to use Docker _or_ Singularity, you can instead use conda to manage the software requirements. This is slower and less reproducible than the above, but is still better than having to install all requirements yourself! The pipeline ships with a conda environment file and nextflow has built-in support for this. From e9215296f13f7c76935beb5e39c2464f0ba1145e Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:27:23 +0100 Subject: [PATCH 057/374] Markdown lint --- docs/installation.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/installation.md b/docs/installation.md index cb02f19..0a48eca 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -18,6 +18,7 @@ To start using the nf-core/proteomicslfq pipeline, follow the steps below: ## Install NextFlow + Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed by running the following commands: ```bash @@ -38,9 +39,11 @@ See [nextflow.io](https://www.nextflow.io/) for further instructions on how to i ## Install the pipeline ### Automatic + This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub if `nf-core/proteomicslfq` is specified as the pipeline name. ### Offline + The above method requires an internet connection so that Nextflow can download the pipeline files. If you're running on a system that has no internet connection, you'll need to download and transfer the pipeline files manually: ```bash @@ -61,7 +64,6 @@ export NXF_OFFLINE='TRUE' If you would like to make changes to the pipeline, it's best to make a fork on GitHub and then clone the files. Once cloned you can run the pipeline directly as above. - ## Pipeline configuration By default, the pipeline loads a basic server configuration [`conf/base.config`](../conf/base.config) From c05b4978dc387127aa1130761c9927055f877fb5 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:28:59 +0100 Subject: [PATCH 058/374] Markdown lint --- docs/troubleshooting.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index b534072..9fd4da0 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -19,7 +19,6 @@ ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will. - ## Data organization The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point. From e3ebc25db262909678725f3803de137f663e7067 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:33:10 +0100 Subject: [PATCH 059/374] Delete reference_genomes.md --- docs/configuration/reference_genomes.md | 52 ------------------------- 1 file changed, 52 deletions(-) delete mode 100644 docs/configuration/reference_genomes.md diff --git a/docs/configuration/reference_genomes.md b/docs/configuration/reference_genomes.md deleted file mode 100644 index 06e3c0d..0000000 --- a/docs/configuration/reference_genomes.md +++ /dev/null @@ -1,52 +0,0 @@ -# nf-core/proteomicslfq: Reference Genomes Configuration - -The nf-core/proteomicslfq pipeline needs a reference genome for alignment and annotation. - -These paths can be supplied on the command line at run time (see the [usage docs](../usage.md)), -but for convenience it's often better to save these paths in a nextflow config file. -See below for instructions on how to do this. -Read [Adding your own system](adding_your_own.md) to find out how to set up custom config files. - -## Adding paths to a config file - -Specifying long paths every time you run the pipeline is a pain. -To make this easier, the pipeline comes configured to understand reference genome keywords which correspond to preconfigured paths, meaning that you can just specify `--genome ID` when running the pipeline. - -Note that this genome key can also be specified in a config file if you always use the same genome. - -To use this system, add paths to your config file using the following template: - -```nextflow -params { - genomes { - 'YOUR-ID' { - fasta = '/genome.fa' - } - 'OTHER-GENOME' { - // [..] - } - } - // Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified on command line - genome = 'YOUR-ID' -} -``` - -You can add as many genomes as you like as long as they have unique IDs. - -## illumina iGenomes - -To make the use of reference genomes easier, illumina has developed a centralised resource called [iGenomes](https://support.illumina.com/sequencing/sequencing_software/igenome.html). -Multiple reference index types are held together with consistent structure for multiple genomes. - -We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is configured to use this by default. -The hosting fees for AWS iGenomes are currently kindly funded by a grant from Amazon. -The pipeline will automatically download the required reference files when you run the pipeline. -For more information about the AWS iGenomes, see https://ewels.github.io/AWS-iGenomes/ - -Downloading the files takes time and bandwidth, so we recommend making a local copy of the iGenomes resource. -Once downloaded, you can customise the variable `params.igenomes_base` in your custom configuration file to point to the reference location. -For example: - -```nextflow -params.igenomes_base = '/path/to/data/igenomes/' -``` From d6f372fdf2e65296cffed3173da75fe857107ec2 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:35:24 +0100 Subject: [PATCH 060/374] Markdown lint --- docs/installation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/installation.md b/docs/installation.md index 0a48eca..91c049d 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -82,7 +82,7 @@ Be warned of two important points about this default configuration: First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/) -Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/proteomicslfq). +Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from [Docker Hub](https://hub.docker.com/r/nfcore/proteomicslfq). ### Singularity From 710f9e31fc713d3767f75f573eba35a70323c372 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:36:13 +0100 Subject: [PATCH 061/374] docker file --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 0bea96e..3055e33 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,8 +22,8 @@ jobs: - name: Pull docker image # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull openms/executables:release-2.5.0:dev - docker tag openms/executables:release-2.5.0:dev openms/executables:release-2.5.0:dev + docker pull openms/executables:release-2.5.0 + docker tag openms/executables:release-2.5.0 openms/executables:release-2.5.0 - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required From 2b3587841101eae72537b366ed8d7bbfa7e373a7 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:40:31 +0100 Subject: [PATCH 062/374] Markdown lint --- docs/usage.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/usage.md b/docs/usage.md index f2974e4..a418cf8 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -71,6 +71,7 @@ NXF_OPTS='-Xms1g -Xmx4g' ## Running the pipeline + The typical command for running the pipeline is as follows: ```bash @@ -89,6 +90,7 @@ results # Finished results (configurable, see below) ``` ### Updating the pipeline + When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash @@ -96,6 +98,7 @@ nextflow pull nf-core/proteomicslfq ``` ### Reproducibility + It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-core/proteomicslfq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. @@ -132,6 +135,7 @@ If you prefer, you can specify the full path to your fasta input protein databas ``` ### `-profile` + Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. @@ -259,10 +263,13 @@ MSGFPlus: Maximum number of modifications per peptide. If this value is large, t Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. ## Job resources + ### Automatic resubmission + Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. ### Custom resource requests + Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. @@ -300,6 +307,7 @@ The output directory where the results will be saved. Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. ### `--email_on_fail` + This works exactly as with `--email`, except emails are only sent if the workflow is not successful. ### `-name` @@ -311,6 +319,7 @@ This is used in the MultiQC report (if not default) and in the summary HTML / e- **NB:** Single hyphen (core Nextflow option) ### `-resume` + Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. @@ -318,6 +327,7 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U **NB:** Single hyphen (core Nextflow option) ### `-c` + Specify the path to a specific config file (this is a core NextFlow command). **NB:** Single hyphen (core Nextflow option) From 0d850aff5d7b8fe17095eaba7dfafa67c0ce6ffa Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:42:28 +0100 Subject: [PATCH 063/374] Markdown lint --- docs/troubleshooting.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 9fd4da0..9975dd5 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -20,9 +20,11 @@ ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will. ## Data organization + The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point. ## Extra resources and getting help + If you still have an issue with running the pipeline then feel free to contact us. Have a look at the [pipeline website](https://github.com/nf-core/proteomicslfq) to find out how. From 7c6c7116bc0fdd3649b10c00e37afaf1a9bb7164 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:48:21 +0100 Subject: [PATCH 064/374] Markdown lint --- docs/usage.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index a418cf8..6ae5c46 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -105,7 +105,6 @@ First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-cor This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. - ## Main arguments ### `--spectra` @@ -122,7 +121,6 @@ The pipeline also dynamically loads configurations from [https://github.com/nf-c Please note the following requirements: - 1. The path must be enclosed in quotes 2. The path must have at least one `*` wildcard character @@ -214,7 +212,7 @@ Percolator provides the possibility to use so called description of correct feat 4 retention time -8 delta_retention_time*delta_mass_calibration +8 delta_retention_time\*delta_mass_calibration ### `--isotope_error_range` @@ -344,6 +342,7 @@ Provide git commit id for custom Institutional configs hosted at `nf-core/config ``` ### `--custom_config_base` + If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the @@ -364,14 +363,17 @@ nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs > files + singularity containers + institutional configs in one go for you, to make this process easier. ### `--max_memory` + Use to set a top-limit for the default memory requirement for each process. Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` ### `--max_time` + Use to set a top-limit for the default time requirement for each process. Should be a string in the format integer-unit. eg. `--max_time '2.h'` ### `--max_cpus` + Use to set a top-limit for the default CPU requirement for each process. Should be a string in the format integer-unit. eg. `--max_cpus 1` From c53945b1713de8ad0743049abb4158b7ae623441 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:50:17 +0100 Subject: [PATCH 065/374] trailing spaces --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 6ae5c46..2d265b9 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -192,7 +192,7 @@ Percolator: False discovery rate threshold to define positive examples in traini ### `--test_FDR` -Percolator: False discovery rate threshold for evaluating best cross validation result and reported end result. +Percolator: False discovery rate threshold for evaluating best cross validation result and reported end result. ### `--FDR_level` From bd007990ea853a98f6641afb9ed3815f8210acc4 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 21 Feb 2020 11:52:18 +0100 Subject: [PATCH 066/374] Markdown lint --- docs/configuration/local.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/configuration/local.md b/docs/configuration/local.md index 1faa73d..a9bcea1 100644 --- a/docs/configuration/local.md +++ b/docs/configuration/local.md @@ -24,7 +24,6 @@ For more information about how to work with reference genomes, see [`docs/config The public docker images are tagged with the same version numbers as the code, which you can use to ensure reproducibility. When running the pipeline, specify the pipeline version with `-r`, for example `-r 1.0`. This uses pipeline code and docker image from this tagged version. - ## Singularity image Many HPC environments are not able to run Docker due to security issues. [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. Even better, it can use create images directly from dockerhub. From 9cc0914347692074e075c6f734ca155cd5c9dee6 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 24 Feb 2020 12:29:11 +0100 Subject: [PATCH 067/374] Update usage.md --- docs/usage.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 1bf6bc0..462d4ba 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -125,7 +125,6 @@ or The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). - Please note the following requirements: 1. The path must be enclosed in quotes From a09373872b3e1579f1434ff6f978f8971606780c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 11:34:47 +0100 Subject: [PATCH 068/374] Update environment --- bin/plotPercolatorWeights.py | 52 ++++++++++++++++++++++++++++++++++++ environment.yml | 4 +-- 2 files changed, 53 insertions(+), 3 deletions(-) create mode 100644 bin/plotPercolatorWeights.py diff --git a/bin/plotPercolatorWeights.py b/bin/plotPercolatorWeights.py new file mode 100644 index 0000000..44ece9f --- /dev/null +++ b/bin/plotPercolatorWeights.py @@ -0,0 +1,52 @@ +#!/usr/bin/python + +import csv +import os +import numpy as np +import matplotlib.pyplot as plt + +mean_weight = None + +files = [f for f in os.listdir('.') if os.path.isfile(f)] + +count = 0 + +filename = args[0] +f = open(filename, 'r') +reader = csv.reader(f, delimiter='\t') +headers = next(reader, None) +converters = [str.strip] + [float] * (len(headers) - 1) +w1 = next(reader, None) +next(reader, None) +next(reader, None) # 2nd header +w2 = next(reader, None) +next(reader, None) +next(reader, None) # 3rd header +w3 = next(reader, None) + +f1 = np.asarray([float(x) for x in w1]) +f2 = np.asarray([float(x) for x in w2]) +f3 = np.asarray([float(x) for x in w3]) + +mean_weight = (f1+f2+f3)/3.0 + +# set width of bar +barWidth = 0.25 + +# Set position of bar on X axis +r1 = np.asarray(np.arange(len(f1))) +r2 = np.asarray([x + barWidth for x in r1]) +r3 = np.asarray([x + barWidth for x in r2]) + +# Make the plot +plt.bar(r1, mean_weight, color='#7f6d5f', width=barWidth, edgecolor='white', label='var1') +#plt.bar(r2, f2, color='#557f2d', width=barWidth, edgecolor='white', label='var2') +#plt.bar(r3, f3, color='#2d7f5e', width=barWidth, edgecolor='white', label='var3') + +# Add xticks on the middle of the group bars +plt.xlabel('group', fontweight='bold') +plt.xticks([r + barWidth for r in range(len(f1))], headers, rotation=90) + +# Create legend & Show graphic +plt.legend() +plt.show() \ No newline at end of file diff --git a/environment.yml b/environment.yml index 807f9de..03bfa87 100644 --- a/environment.yml +++ b/environment.yml @@ -9,12 +9,10 @@ dependencies: # bioconda - openms-thirdparty=2.5.0 - bioconductor-msstats # will include R - # conda-forge or defaults - - python # for plotting Percolator results (TODO) - - gnuplot # for plotting IDPEP results (TODO) #TODO check if thirdparties are in path and are found by the openms adapters #TODO check if we need the rest here - conda-forge::python=3.7.3 - conda-forge::markdown=3.1.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 + From c923763614abac14f7f766cb925cee4afaa5059e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 12:51:51 +0100 Subject: [PATCH 069/374] switch to plfq bioconda container --- .github/workflows/ci.yml | 4 ++-- main.nf | 3 --- nextflow.config | 2 +- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3055e33..00a05bf 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,8 +22,8 @@ jobs: - name: Pull docker image # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull openms/executables:release-2.5.0 - docker tag openms/executables:release-2.5.0 openms/executables:release-2.5.0 + docker pull nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required diff --git a/main.nf b/main.nf index e74d940..7a86de2 100644 --- a/main.nf +++ b/main.nf @@ -211,8 +211,6 @@ branched_input.mzML */ process raw_file_conversion { - container 'quay.io/biocontainers/thermorawfileparser:1.2.1--0' - input: file rawfile from branched_input.raw @@ -666,7 +664,6 @@ process proteomicslfq { // TODO the second argument can be "pairwise" or TODO later a user defined contrast string process msstats { - container 'quay.io/biocontainers/bioconductor-msstats:3.18.0--r36_0' publishDir "${params.outdir}/msstats", mode: 'copy' input: diff --git a/nextflow.config b/nextflow.config index 528c6bb..59dcfa6 100644 --- a/nextflow.config +++ b/nextflow.config @@ -94,7 +94,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'openms/executables:release-2.5.0' +process.container = 'nfcore/proteomicslfq:dev' // Load base.config by default for all pipelines includeConfig 'conf/base.config' From a25cea9ff3aad180b3b6f0c565e107271f890c8f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 12:53:11 +0100 Subject: [PATCH 070/374] switch to plfq bioconda container --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 00a05bf..79caef1 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,8 +22,8 @@ jobs: - name: Pull docker image # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull nfcore/proteomicslfq:dev - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + docker pull nfcore/proteomicslfq + docker tag nfcore/proteomicslfq nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required From 353d6dc2ad96a7b4b68584cd5959270a4227d0db Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 25 Feb 2020 13:24:14 +0100 Subject: [PATCH 071/374] Update dependencies --- environment.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/environment.yml b/environment.yml index 37a87a9..475a6d8 100644 --- a/environment.yml +++ b/environment.yml @@ -6,6 +6,10 @@ channels: - bioconda - defaults dependencies: + # bioconda + - bioconda::openms-thirdparty=2.5.0 + - bioconda::bioconductor-msstats # will include R + #TODO check if thirdparties are in path and are found by the openms adapters - conda-forge::python=3.7.3 - conda-forge::markdown=3.1.1 - conda-forge::pymdown-extensions=6.0 From 05c0aeb4ac4cba098af5b765dac333b5d2efe8bc Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 13:28:14 +0100 Subject: [PATCH 072/374] docker image name --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 79caef1..00a05bf 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,8 +22,8 @@ jobs: - name: Pull docker image # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull nfcore/proteomicslfq - docker tag nfcore/proteomicslfq nfcore/proteomicslfq:dev + docker pull nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required From baf8365e4ed5a9f1e856544a5ae6cdb3e6af47a2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 13:31:51 +0100 Subject: [PATCH 073/374] pin msstats version --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index cebc5cc..20dc575 100644 --- a/environment.yml +++ b/environment.yml @@ -8,7 +8,7 @@ channels: dependencies: # bioconda - bioconda::openms-thirdparty=2.5.0 - - bioconda::bioconductor-msstats # will include R + - bioconda::bioconductor-msstats=3.18.0--r36_0 # will include R #TODO check if thirdparties are in path and are found by the openms adapters - conda-forge::python=3.7.3 - conda-forge::markdown=3.1.1 From 398313c7ce7ab27f108de5d5a9a8134dfcb67d6d Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 25 Feb 2020 13:41:09 +0100 Subject: [PATCH 074/374] Update ci.yml --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3055e33..00a05bf 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,8 +22,8 @@ jobs: - name: Pull docker image # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | - docker pull openms/executables:release-2.5.0 - docker tag openms/executables:release-2.5.0 openms/executables:release-2.5.0 + docker pull nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | # TODO nf-core: You can customise CI pipeline run tests as required From 95ed26f279392f2a6496c32f4f1e30fd039ea1ac Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 25 Feb 2020 13:51:08 +0100 Subject: [PATCH 075/374] Update nextflow.config --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 528c6bb..59dcfa6 100644 --- a/nextflow.config +++ b/nextflow.config @@ -94,7 +94,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'openms/executables:release-2.5.0' +process.container = 'nfcore/proteomicslfq:dev' // Load base.config by default for all pipelines includeConfig 'conf/base.config' From b599a13fd8e9bb0e04315104d0b07d2acbe61831 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 16:36:29 +0100 Subject: [PATCH 076/374] dockerfile --- Dockerfile | 3 +++ environment.yml | 7 ++++--- main.nf | 8 ++++---- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/Dockerfile b/Dockerfile index 2cb1395..436dedf 100644 --- a/Dockerfile +++ b/Dockerfile @@ -9,5 +9,8 @@ RUN conda env create -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH +# Add jar............. +RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) + # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml diff --git a/environment.yml b/environment.yml index 20dc575..20e666e 100644 --- a/environment.yml +++ b/environment.yml @@ -8,10 +8,11 @@ channels: dependencies: # bioconda - bioconda::openms-thirdparty=2.5.0 - - bioconda::bioconductor-msstats=3.18.0--r36_0 # will include R + - bioconda::bioconductor-msstats=3.18.0 # will include R #TODO check if thirdparties are in path and are found by the openms adapters - - conda-forge::python=3.7.3 - - conda-forge::markdown=3.1.1 + - conda-forge::openjdk=8 + - conda-forge::python=3.8.1 + - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 diff --git a/main.nf b/main.nf index 7a86de2..2544e2b 100644 --- a/main.nf +++ b/main.nf @@ -314,11 +314,11 @@ if (params.search_engine == "msgf") process search_engine_msgf { echo true input: - tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) + //tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) // This was another way of handling the combination - //file database from searchengine_in_db.mix(searchengine_in_db_decoy) - //each file(mzml_file) from mzmls + file database from searchengine_in_db.mix(searchengine_in_db_decoy) + each file(mzml_file) from mzmls output: @@ -326,7 +326,7 @@ if (params.search_engine == "msgf") script: """ - MSGFPlusAdapter -in ${mzml_file} \\ + MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} From 699737b1d12789230ab3cc4e06d9208aa2588198 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Feb 2020 20:56:57 +0100 Subject: [PATCH 077/374] terminate --- main.nf | 102 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 60 insertions(+), 42 deletions(-) diff --git a/main.nf b/main.nf index 2544e2b..ddf749c 100644 --- a/main.nf +++ b/main.nf @@ -247,7 +247,7 @@ process mzml_indexing { //Mix the converted raw data with the already supplied mzMLs and push these to the same channels as before -branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls; mzmls_plfq} +branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls_comet; mzmls_msgf; mzmls_plfq} if (params.expdesign) @@ -260,9 +260,9 @@ if (params.expdesign) //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. -(searchengine_in_db, pepidx_in_db, plfq_in_db) = ( params.add_decoys - ? [ Channel.empty(), Channel.empty(), Channel.empty() ] - : [ Channel.fromPath(params.database),Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) +(searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys + ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] + : [ Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { @@ -271,7 +271,7 @@ process generate_decoy_database { file(mydatabase) from db_for_decoy_creation output: - file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy, pepidx_in_db_decoy, plfq_in_db_decoy + file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy_msgf, searchengine_in_db_decoy_comet, pepidx_in_db_decoy, plfq_in_db_decoy //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... when: @@ -310,57 +310,75 @@ process generate_decoy_database { if (params.search_engine == "msgf") { search_engine_score = "SpecEValue" +} else { + search_engine_score = "expect" +} - process search_engine_msgf { - echo true - input: - //tuple file(database), file(mzml_file) from searchengine_in_db.mix(searchengine_in_db_decoy).combine(mzmls) - - // This was another way of handling the combination - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls +process search_engine_msgf { + // --------------------------------------------------------------------------------------------------------------------- + // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- + // --------------------------------------------------------------------------------------------------------------------- + // I actually dont know, where else this would be needed. + errorStrategy 'terminate' - output: - file "${mzml_file.baseName}.idXML" into id_files + input: + tuple file(database), file(mzml_file) from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf) + + // This was another way of handling the combination + //file database from searchengine_in_db.mix(searchengine_in_db_decoy) + //each file(mzml_file) from mzmls + when: + params.search_engine == "msgf" - script: - """ - MSGFPlusAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ - } + output: + file "${mzml_file.baseName}.idXML" into id_files_msgf -} else { + script: + """ + MSGFPlusAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ +} - search_engine_score = "expect" +process search_engine_comet { + + // --------------------------------------------------------------------------------------------------------------------- + // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- + // --------------------------------------------------------------------------------------------------------------------- + // I actually dont know, where else this would be needed. + errorStrategy 'terminate' + input: + tuple file(database), file(mzml_file) from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet) - process search_engine_comet { - echo true - input: - file database from searchengine_in_db.mix(searchengine_in_db_decoy) - each file(mzml_file) from mzmls + //or + //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) + //each file(mzml_file) from mzmls - output: - file "${mzml_file.baseName}.idXML" into id_files + when: + params.search_engine == "comet" + + output: + file "${mzml_file.baseName}.idXML" into id_files_comet - script: - """ - CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ - -threads ${task.cpus} \\ - -database ${database} - """ - } + script: + """ + CometAdapter -in ${mzml_file} \\ + -out ${mzml_file.baseName}.idXML \\ + -threads ${task.cpus} \\ + -database ${database} + """ } + process index_peptides { - echo true input: - each file(id_file) from id_files + //tuple file(database), file(id_file) from id_files_msgf.mix(id_files_comet, Channel.empty()).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + + each file(id_file) from id_files_msgf.mix(id_files_comet) file database from pepidx_in_db.mix(pepidx_in_db_decoy) output: From b753a69046def1549e4a385c769d8316c95ef81f Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Feb 2020 00:46:08 +0100 Subject: [PATCH 078/374] Dockerfile --- Dockerfile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Dockerfile b/Dockerfile index 2cb1395..b8ab44e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -6,6 +6,8 @@ LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a +RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) + # Add conda installation dir to PATH (instead of doing 'conda activate') ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH From 666be6909fa3554b9ca8e8fb2c8ccc42eda6db2a Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Feb 2020 11:47:44 +0100 Subject: [PATCH 079/374] Fix jdk for msgf --- environment.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/environment.yml b/environment.yml index 475a6d8..4ecb675 100644 --- a/environment.yml +++ b/environment.yml @@ -10,6 +10,7 @@ dependencies: - bioconda::openms-thirdparty=2.5.0 - bioconda::bioconductor-msstats # will include R #TODO check if thirdparties are in path and are found by the openms adapters + - conda-forge::openjdk=8.0.192 - conda-forge::python=3.7.3 - conda-forge::markdown=3.1.1 - conda-forge::pymdown-extensions=6.0 From 6125f94a7903d4e716b1e7995229b5396bb0dfd3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Feb 2020 14:38:07 +0100 Subject: [PATCH 080/374] Use only one git actions --- .github/workflows/ci.yml | 44 ++++++++++++++++++++++++++++++++------ .github/workflows/main.yml | 36 ------------------------------- 2 files changed, 38 insertions(+), 42 deletions(-) delete mode 100644 .github/workflows/main.yml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 00a05bf..b90d7d8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,7 +1,12 @@ name: nf-core CI # This workflow is triggered on pushes and PRs to the repository. # It runs the pipeline with the minimal test dataset to check that it completes without any syntax errors -on: [push, pull_request] +on: + push: + branches: + - master + - dev + pull_request: jobs: test: @@ -15,17 +20,44 @@ jobs: nxf_ver: ['19.10.0', ''] steps: - uses: actions/checkout@v2 + - name: Determine tower usage + shell: bash + run: echo "::set-env name=TOWER::`[ -z "$TOWER_ACCESS_TOKEN" ] && echo '' || echo '-with-tower'`" + id: tower_usage + - name: Extract branch name + if: github.event_name == 'push' + shell: bash + run: | + ref=`jq --raw-output .ref "$GITHUB_EVENT_PATH"` + ref=${ref#"/refs/heads/"} + ref=${ref#"/refs/"} + echo "::set-env name=RUN_NAME::$ref" + id: extract_branch + - name: Extract PR number + if: github.event_name == 'pull_request' + shell: bash + run: echo "::set-env name=RUN_NAME::PR`jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH"`" + id: extract_pr - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Pull docker image - # Needs to be changed to nfcore/proteomicslfq:dev or a bioconda/biocontainer run: | docker pull nfcore/proteomicslfq:dev docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - - name: Run pipeline with test data + - name: Run pipeline with test data run: | - # TODO nf-core: You can customise CI pipeline run tests as required - # (eg. adding multiple test runs with different parameters) - nextflow run ${GITHUB_WORKSPACE} -profile test,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "wHATEVER" -profile test,docker + - uses: actions/upload-artifact@v1 + if: always() + name: Upload results + with: + name: results + path: results + - uses: actions/upload-artifact@v1 + if: always() + name: Upload log + with: + name: nextflow.log + path: .nextflow.log diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml deleted file mode 100644 index 2fbc2cb..0000000 --- a/.github/workflows/main.yml +++ /dev/null @@ -1,36 +0,0 @@ -name: nf-core proteomicslfq CI -#This workflow is triggered on pushes and PRs to the repository. -on: [push, pull_request] - -jobs: - github_actions_ci: - runs-on: ubuntu-latest - env: - NXF_ANSI_LOG: 0 - TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_TOKEN }} - steps: - - uses: actions/checkout@v1 - name: Checkout sources - - name: Docker pull OpenMS image - run: docker pull openms/executables - - name: Extract branch name - shell: bash - run: echo "::set-env name=RUN_NAME::`echo ${GITHUB_REPOSITORY//\//_}`-`echo ${GITHUB_HEAD_REF//\//@} | rev | cut -f1 -d@ | rev`-${{ github.event_name }}-`echo ${GITHUB_SHA} | cut -c1-6`" - id: extract_branch - - name: Determine tower usage - shell: bash - run: echo "::set-env name=TOWER::`[ -z "$TOWER_ACCESS_TOKEN" ] && echo '' || echo '-with-tower'`" - id: tower_usage - - name: Install Nextflow - run: | - wget -qO- get.nextflow.io | bash - sudo mv nextflow /usr/local/bin/ - - name: BASIC Run the basic pipeline with the test - run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "wHATEVER" -profile test,docker - - uses: actions/upload-artifact@v1 - if: always() - name: Upload results - with: - name: results - path: results From 2b4b559541b0e572196fcf73c0560b3dd5f5f5ee Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Feb 2020 15:10:02 +0100 Subject: [PATCH 081/374] Use RUN_NAME --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b90d7d8..c40d653 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -48,7 +48,7 @@ jobs: docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "wHATEVER" -profile test,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test,docker - uses: actions/upload-artifact@v1 if: always() name: Upload results From 74cb880316e526240e8077b7d5ab9865a3aaba6e Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Feb 2020 17:53:32 +0100 Subject: [PATCH 082/374] fix branch name parsing --- .github/workflows/ci.yml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index c40d653..bb90a0f 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -29,8 +29,9 @@ jobs: shell: bash run: | ref=`jq --raw-output .ref "$GITHUB_EVENT_PATH"` - ref=${ref#"/refs/heads/"} - ref=${ref#"/refs/"} + ref=${ref#"refs/heads/"} + ref=${ref#"refs/"} + ref=${ref//\//-}" echo "::set-env name=RUN_NAME::$ref" id: extract_branch - name: Extract PR number From 9e1c08f56e58e5ba5f31ace804b24eab00e033f7 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Feb 2020 18:34:03 +0100 Subject: [PATCH 083/374] too many quotes --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index bb90a0f..119875f 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -31,7 +31,7 @@ jobs: ref=`jq --raw-output .ref "$GITHUB_EVENT_PATH"` ref=${ref#"refs/heads/"} ref=${ref#"refs/"} - ref=${ref//\//-}" + ref=${ref//\//-} echo "::set-env name=RUN_NAME::$ref" id: extract_branch - name: Extract PR number From 09f16c72d5b9964b6ffdc7d06716888cd0cfce29 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Feb 2020 18:41:20 +0100 Subject: [PATCH 084/374] readd nf tower token secret --- .github/workflows/ci.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 119875f..45d3671 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -13,6 +13,7 @@ jobs: env: NXF_VER: ${{ matrix.nxf_ver }} NXF_ANSI_LOG: false + TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_TOKEN }} runs-on: ubuntu-latest strategy: matrix: From a769a46d7ec87f2bb9694da4ef6a5eb7b9f79754 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 2 Mar 2020 17:01:58 +0100 Subject: [PATCH 085/374] Add more logging to the results. --- .gitignore | 1 + main.nf | 95 +++++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 73 insertions(+), 23 deletions(-) diff --git a/.gitignore b/.gitignore index 6354f37..334a666 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ results/ tests/ testing/ *.pyc +.dockerignore \ No newline at end of file diff --git a/main.nf b/main.nf index ddf749c..992970f 100644 --- a/main.nf +++ b/main.nf @@ -211,6 +211,8 @@ branched_input.mzML */ process raw_file_conversion { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + input: file rawfile from branched_input.raw @@ -232,6 +234,8 @@ process raw_file_conversion { */ process mzml_indexing { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + input: file mzmlfile from branched_input_mzMLs.nonIndexedMzML @@ -241,7 +245,7 @@ process mzml_indexing { script: """ mkdir out - FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML + FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML > ${mzmlfile.baseName}_fileconverter.log """ } @@ -267,6 +271,8 @@ if (params.expdesign) //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + input: file(mydatabase) from db_for_decoy_creation @@ -282,7 +288,8 @@ process generate_decoy_database { DecoyDatabase -in ${mydatabase} \\ -out ${mydatabase.baseName}_decoy.fasta \\ -decoy_string DECOY_ \\ - -decoy_string_position prefix + -decoy_string_position prefix \\ + > ${mydatabase.baseName}_decoy_database.log """ } @@ -316,6 +323,8 @@ if (params.search_engine == "msgf") process search_engine_msgf { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + // --------------------------------------------------------------------------------------------------------------------- // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- // --------------------------------------------------------------------------------------------------------------------- @@ -339,12 +348,15 @@ process search_engine_msgf { MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ - -database ${database} + -database ${database} \\ + > ${mzml_file.baseName}_msgf.log """ } process search_engine_comet { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + // --------------------------------------------------------------------------------------------------------------------- // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- // --------------------------------------------------------------------------------------------------------------------- @@ -368,13 +380,16 @@ process search_engine_comet { CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ - -database ${database} + -database ${database} \\ + > ${mzml_file.baseName}_comet.log """ } process index_peptides { + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + input: //tuple file(database), file(id_file) from id_files_msgf.mix(id_files_comet, Channel.empty()).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) @@ -389,7 +404,8 @@ process index_peptides { PeptideIndexer -in ${id_file} \\ -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ - -fasta ${database} + -fasta ${database} \\ + > ${id_file.baseName}_index_peptides.log """ } @@ -399,6 +415,8 @@ process index_peptides { process extract_perc_features { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForPerc @@ -413,7 +431,8 @@ process extract_perc_features { """ PSMFeatureExtractor -in ${id_file} \\ -out ${id_file.baseName}_feat.idXML \\ - -threads ${task.cpus} + -threads ${task.cpus} \\ + > ${id_file.baseName}_extract_perc_features.log """ } @@ -421,6 +440,8 @@ process extract_perc_features { //TODO parameterize and find a way to run across all runs merged process percolator { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_feat @@ -447,13 +468,16 @@ process percolator { -threads ${task.cpus} \\ ${pptdc} \\ -subset-max-train ${params.subset_max_train} \\ - -decoy-pattern ${params.decoy_affix} + -decoy-pattern ${params.decoy_affix} \\ + > ${id_file.baseName}_percolator.log + """ } process idfilter { - - publishDir "${params.outdir}/ids", mode: 'copy' + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: file id_file from id_files_idx_feat_perc @@ -469,11 +493,14 @@ process idfilter { IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ - -score:pep ${params.psm_pep_fdr_cutoff} + -score:pep ${params.psm_pep_fdr_cutoff} \\ + > ${id_file.baseName}_idfilter.log """ } process idscoreswitcher { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_feat_perc_filter @@ -492,7 +519,8 @@ process idscoreswitcher { -old_score q-value \\ -new_score MS:1001493 \\ -new_score_orientation lower_better \\ - -new_score_type "Posterior Error Probability" + -new_score_type "Posterior Error Probability" \\ + > ${id_file.baseName}_scoreswitcher.log """ } @@ -502,6 +530,8 @@ process idscoreswitcher { // Branch b) Q-values and PEP from OpenMS process fdr { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForIDPEP @@ -519,11 +549,14 @@ process fdr { -threads ${task.cpus} \\ -protein false \\ -algorithm:add_decoy_peptides \\ - -algorithm:add_decoy_proteins + -algorithm:add_decoy_proteins \\ + > ${id_file.baseName}_fdr.log """ } process idscoreswitcher1 { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForIDPEP_fdr @@ -542,11 +575,14 @@ process idscoreswitcher1 { -old_score q-value \\ -new_score ${search_engine_score}_score \\ -new_score_orientation lower_better \\ - -new_score_type ${search_engine_score} + -new_score_type ${search_engine_score} \\ + > ${id_file.baseName}_scoreswitcher1.log """ } process idpep { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForIDPEP_fdr_switch @@ -561,11 +597,14 @@ process idpep { """ IDPosteriorErrorProbability -in ${id_file} \\ -out ${id_file.baseName}_idpep.idXML \\ - -threads ${task.cpus} + -threads ${task.cpus} \\ + > ${id_file.baseName}_idpep.log """ } process idscoreswitcher2 { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep @@ -583,13 +622,15 @@ process idscoreswitcher2 { -threads ${task.cpus} \\ -old_score "Posterior Error Probability" \\ -new_score q-value \\ - -new_score_orientation lower_better + -new_score_orientation lower_better \\ + > ${id_file.baseName}_scoreswitcher2.log """ } -process idfilter2 { - - publishDir "${params.outdir}/ids", mode: 'copy' +process idfilter1 { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch @@ -605,11 +646,14 @@ process idfilter2 { IDFilter -in ${id_file} \\ -out ${id_file.baseName}_filter.idXML \\ -threads ${task.cpus} \\ - -score:pep ${params.psm_pep_fdr_cutoff} + -score:pep ${params.psm_pep_fdr_cutoff} \\ + > ${id_file.baseName}_idfilter1.log """ } process idscoreswitcher3 { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter @@ -627,7 +671,8 @@ process idscoreswitcher3 { -threads ${task.cpus} \\ -old_score q-value \\ -new_score "Posterior Error Probability" \\ - -new_score_orientation lower_better + -new_score_orientation lower_better \\ + > ${id_file.baseName}_scoreswitcher3.log """ } @@ -636,7 +681,8 @@ process idscoreswitcher3 { // Main Branch process proteomicslfq { - + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: @@ -673,7 +719,8 @@ process proteomicslfq { -out_msstats out.csv \\ -out_cxml out.consensusXML \\ -proteinFDR ${params.protein_level_fdr_cutoff} \\ - -debug ${params.inf_quant_debug} + -debug ${params.inf_quant_debug} \\ + > proteomicslfq.log """ } @@ -682,6 +729,8 @@ process proteomicslfq { // TODO the second argument can be "pairwise" or TODO later a user defined contrast string process msstats { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/msstats", mode: 'copy' input: @@ -693,7 +742,7 @@ process msstats { script: """ - msstats_plfq.R ${csv} || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." + msstats_plfq.R ${csv} > msstats.log || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." """ } From 203c778c38067c34e9a5129b3d7adc6c081c59a2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 2 Mar 2020 17:27:16 +0100 Subject: [PATCH 086/374] Forgot output declarations probably --- main.nf | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index 992970f..4f64cde 100644 --- a/main.nf +++ b/main.nf @@ -225,7 +225,7 @@ process raw_file_conversion { // mono ThermoRawfileParser.exe -i=${rawfile} -f=2 -o=./ script: """ - ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ + ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ > ${rawfile}_conversion.log """ } @@ -241,11 +241,12 @@ process mzml_indexing { output: file "out/*.mzML" into mzmls_indexed + file "*.log" script: """ mkdir out - FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML > ${mzmlfile.baseName}_fileconverter.log + FileConverter -in ${mzmlfile} -out out/${mzmlfile.baseName}.mzML > ${mzmlfile.baseName}_mzmlindexing.log """ } @@ -278,7 +279,7 @@ process generate_decoy_database { output: file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy_msgf, searchengine_in_db_decoy_comet, pepidx_in_db_decoy, plfq_in_db_decoy - //TODO need to add these channel with .mix(searchengine_in_db_decoy) for example to all subsequent processes that need this... + file "*.log" when: params.add_decoys @@ -342,6 +343,7 @@ process search_engine_msgf { output: file "${mzml_file.baseName}.idXML" into id_files_msgf + file "*.log" script: """ @@ -374,6 +376,7 @@ process search_engine_comet { output: file "${mzml_file.baseName}.idXML" into id_files_comet + file "*.log" script: """ @@ -398,6 +401,7 @@ process index_peptides { output: file "${id_file.baseName}_idx.idXML" into id_files_idx_ForPerc, id_files_idx_ForIDPEP + file "*.log" script: """ @@ -423,6 +427,7 @@ process extract_perc_features { output: file "${id_file.baseName}_feat.idXML" into id_files_idx_feat + file "*.log" when: params.posterior_probabilities == "percolator" @@ -448,6 +453,7 @@ process percolator { output: file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc + file "*.log" when: params.posterior_probabilities == "percolator" @@ -484,6 +490,7 @@ process idfilter { output: file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_filter + file "*.log" when: params.posterior_probabilities == "percolator" @@ -507,6 +514,7 @@ process idscoreswitcher { output: file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + file "*.log" when: params.posterior_probabilities == "percolator" @@ -538,6 +546,7 @@ process fdr { output: file "${id_file.baseName}_fdr.idXML" into id_files_idx_ForIDPEP_fdr + file "*.log" when: params.posterior_probabilities != "percolator" @@ -563,6 +572,7 @@ process idscoreswitcher1 { output: file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch + file "*.log" when: params.posterior_probabilities != "percolator" @@ -589,6 +599,7 @@ process idpep { output: file "${id_file.baseName}_idpep.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep + file "*.log" when: params.posterior_probabilities != "percolator" @@ -611,6 +622,7 @@ process idscoreswitcher2 { output: file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch + file "*.log" when: params.posterior_probabilities != "percolator" @@ -637,6 +649,7 @@ process idfilter1 { output: file "${id_file.baseName}_filter.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter + file "*.log" when: params.posterior_probabilities != "percolator" @@ -660,6 +673,7 @@ process idscoreswitcher3 { output: file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch + file "*.log" when: params.posterior_probabilities != "percolator" @@ -703,6 +717,7 @@ process proteomicslfq { file "debug_mergedIDsGreedyResolvedFDR.idXML" optional true file "debug_mergedIDsGreedyResolvedFDRFiltered.idXML" optional true file "debug_mergedIDsFDRFilteredStrictlyUniqueResolved.idXML" optional true + file "*.log" script: """ @@ -739,6 +754,7 @@ process msstats { output: file "*.pdf" file "*.csv" + file "*.log" script: """ From 717633b746db6b3b293e2e489f85c8043bba4da5 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 3 Mar 2020 12:49:35 +0100 Subject: [PATCH 087/374] Add option for number hits in the search engines --- main.nf | 2 ++ nextflow.config | 1 + 2 files changed, 3 insertions(+) diff --git a/main.nf b/main.nf index 4f64cde..3f0c14e 100644 --- a/main.nf +++ b/main.nf @@ -351,6 +351,7 @@ process search_engine_msgf { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ + -matches_per_spec ${num_hits} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -384,6 +385,7 @@ process search_engine_comet { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ + -num_hits ${num_hits} \\ > ${mzml_file.baseName}_comet.log """ } diff --git a/nextflow.config b/nextflow.config index 59dcfa6..92b8660 100644 --- a/nextflow.config +++ b/nextflow.config @@ -20,6 +20,7 @@ params { protein_inference = "aggregation" psm_pep_fdr_cutoff = 0.10 protein_level_fdr_cutoff = 0.05 + num_hits = 1 // decoys decoy_affix = "DECOY_" From 6da38ff98d314df24e254d4ebb484bf65b9407ae Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 3 Mar 2020 12:51:14 +0100 Subject: [PATCH 088/374] forgot params --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 3f0c14e..e1b7640 100644 --- a/main.nf +++ b/main.nf @@ -351,7 +351,7 @@ process search_engine_msgf { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ - -matches_per_spec ${num_hits} \\ + -matches_per_spec ${params.num_hits} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -385,7 +385,7 @@ process search_engine_comet { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ - -num_hits ${num_hits} \\ + -num_hits ${params.num_hits} \\ > ${mzml_file.baseName}_comet.log """ } From b74da63602846d197f7118f168448a8bac90d7bd Mon Sep 17 00:00:00 2001 From: Zethson Date: Wed, 4 Mar 2020 17:20:56 +0100 Subject: [PATCH 089/374] [FEATURE] Software versions --- main.nf | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index e1b7640..2e5a8ad 100644 --- a/main.nf +++ b/main.nf @@ -826,11 +826,27 @@ process get_software_versions { file 'software_versions_mqc.yaml' into software_versions_yaml script: - // TODO nf-core: Get all tools to print their version number here """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt - echo "foo" > software_versions_mqc.yaml + ThermoRawFileParser.sh --version 2>&1 v_thermorawfileparser.txt + FileConverter 2>&1 | grep Version: > v_fileconverter.txt + DecoyDatabase 2>&1 | grep Version: > v_decoydatabase.txt + MSGFPlusAdapter 2>&1 | grep Version: > v_msgfplusadapter.txt + msgf_plus 2>&1 | grep Release > v_msgfplus.txt + CometAdapter 2>&1 | grep Version: > v_cometadapter.txt + comet 2>&1 | grep version > v_comet.txt + PeptideIndexer 2>&1 | grep Version: > v_peptideindexer.txt + PSMFeatureExtractor 2>&1 | grep Version: > v_psmfeatureextractor.txt + PercolatorAdapter 2>&1 | grep Version: > v_percolatoradapter.txt + percolator -h 2>&1 | grep version > v_percolator.txt + IDFilter 2>&1 | grep Version: > v_idfilter.txt + IDScoreSwitcher 2>&1 | grep Version: > v_idscoreswitcher.txt + FalseDiscoveryRate 2>&1 | grep Version: > v_falsediscoveryrate.txt + IDPosteriorErrorProbability 2>&1 | grep Version: > v_idposteriorerrorprobability.txt + IDFilter 2>&1 | grep Version: > v_idfilter.txt + ProteomicsLFQ 2>&1 | grep Version: > v_proteomicslfq.txt + scrape_software_versions.py &> software_versions_mqc.yaml """ } From de69ecd470388c3bd9acfde9985317a2b7055294 Mon Sep 17 00:00:00 2001 From: Zethson Date: Wed, 4 Mar 2020 17:29:31 +0100 Subject: [PATCH 090/374] [FEATURE] msstats software version --- bin/scrape_software_versions.py | 4 ---- main.nf | 1 + 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index ca2b897..2d0acb3 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -7,14 +7,10 @@ regexes = { 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"], - 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], } results = OrderedDict() results['nf-core/proteomicslfq'] = 'N/A' results['Nextflow'] = 'N/A' -results['FastQC'] = 'N/A' -results['MultiQC'] = 'N/A' # Search each file using its regex for k, v in regexes.items(): diff --git a/main.nf b/main.nf index 2e5a8ad..f8628dd 100644 --- a/main.nf +++ b/main.nf @@ -846,6 +846,7 @@ process get_software_versions { IDPosteriorErrorProbability 2>&1 | grep Version: > v_idposteriorerrorprobability.txt IDFilter 2>&1 | grep Version: > v_idfilter.txt ProteomicsLFQ 2>&1 | grep Version: > v_proteomicslfq.txt + ${workflow.manifest.version} &> v_msstats_plfq.txt scrape_software_versions.py &> software_versions_mqc.yaml """ } From 8f799094b728343e37feba1c63780f7640a42bd8 Mon Sep 17 00:00:00 2001 From: Zethson Date: Wed, 4 Mar 2020 18:24:52 +0100 Subject: [PATCH 091/374] [FEATURE] software versions regex party --- bin/scrape_software_versions.py | 39 ++++++++++++++++++++++++++++++++- main.nf | 2 +- 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 2d0acb3..1b92da7 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -3,14 +3,51 @@ from collections import OrderedDict import re -# TODO nf-core: Add additional regexes for new tools in process get_software_versions +openms_version_regex = r"[0-9][.][0-9][.][0-9]" + regexes = { 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], 'Nextflow': ['v_nextflow.txt', r"(\S+)"], + 'ThermorawfileParser': ['v_thermorawfileparser.txt', r"(\S+)"], + 'FileConverter': ['v_fileconverter.txt', openms_version_regex], + 'DecoyDatabase': ['v_decoydatabase.txt', openms_version_regex], + 'MSGFPlusAdapter': ['v_msgfplusadapter.txt', openms_version_regex], + 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], + 'CometAdapter': ['v_cometadapter.txt', openms_version_regex], + 'Comet': ['v_comet.txt', r"\"(.*)\""], + 'PeptideIndexer': ['v_peptideindexer.txt', openms_version_regex], + 'PSMFeatureExtractor': ['v_psmfeatureextractor.txt', openms_version_regex], + 'PercolatorAdapter': ['v_percolatoradapter.txt', openms_version_regex], + 'Percolator': ['v_percolator.txt', r"[0-9].[0-9]{2}.[0-9]"], + 'IDFilter': ['v_idfilter.txt', openms_version_regex], + 'IDScoreSwitcher': ['v_idscoreswitcher.txt', openms_version_regex], + 'FalseDiscoveryRate': ['v_falsediscoveryrate.txt', openms_version_regex], + 'IDPosteriorErrorProbability': ['v_idposteriorerrorprobability.txt', openms_version_regex], + 'IDFilter': ['v_idfilter.txt', openms_version_regex], + 'ProteomicsLFQ': ['v_proteomicslfq.txt', openms_version_regex], + 'MSstats': ['v_msstats_plfq.txt', r"(\S+)"] } results = OrderedDict() results['nf-core/proteomicslfq'] = 'N/A' results['Nextflow'] = 'N/A' +results['ThermorawfileParser'] = 'N/A' +results['FileConverter'] = 'N/A' +results['DecoyDatabase'] = 'N/A' +results['MSGFPlusAdapter'] = 'N/A' +results['MSGFPlus'] = 'N/A' +results['CometAdapter'] = 'N/A' +results['Comet'] = 'N/A' +results['PeptideIndexer'] = 'N/A' +results['PSMFeatureExtractor'] = 'N/A' +results['PercolatorAdapter'] = 'N/A' +results['Percolator'] = 'N/A' +results['IDFilter'] = 'N/A' +results['IDScoreSwitcher'] = 'N/A' +results['FalseDiscoveryRate'] = 'N/A' +results['IDPosteriorErrorProbability'] = 'N/A' +results['IDFilter'] = 'N/A' +results['ProteomicsLFQ'] = 'N/A' +results['MSstats'] = 'N/A' # Search each file using its regex for k, v in regexes.items(): diff --git a/main.nf b/main.nf index f8628dd..9740499 100644 --- a/main.nf +++ b/main.nf @@ -829,7 +829,7 @@ process get_software_versions { """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt - ThermoRawFileParser.sh --version 2>&1 v_thermorawfileparser.txt + ThermoRawFileParser.sh --version > v_thermorawfileparser.txt FileConverter 2>&1 | grep Version: > v_fileconverter.txt DecoyDatabase 2>&1 | grep Version: > v_decoydatabase.txt MSGFPlusAdapter 2>&1 | grep Version: > v_msgfplusadapter.txt From 18d569776d268656c8f267bbfd5ac84ee6f8e703 Mon Sep 17 00:00:00 2001 From: Zethson Date: Thu, 5 Mar 2020 11:37:04 +0100 Subject: [PATCH 092/374] [FIX] actually getting the software versions --- bin/scrape_software_versions.py | 18 -------------- main.nf | 43 ++++++++++++++++++--------------- 2 files changed, 24 insertions(+), 37 deletions(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 1b92da7..a474604 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -9,23 +9,6 @@ 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], 'Nextflow': ['v_nextflow.txt', r"(\S+)"], 'ThermorawfileParser': ['v_thermorawfileparser.txt', r"(\S+)"], - 'FileConverter': ['v_fileconverter.txt', openms_version_regex], - 'DecoyDatabase': ['v_decoydatabase.txt', openms_version_regex], - 'MSGFPlusAdapter': ['v_msgfplusadapter.txt', openms_version_regex], - 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], - 'CometAdapter': ['v_cometadapter.txt', openms_version_regex], - 'Comet': ['v_comet.txt', r"\"(.*)\""], - 'PeptideIndexer': ['v_peptideindexer.txt', openms_version_regex], - 'PSMFeatureExtractor': ['v_psmfeatureextractor.txt', openms_version_regex], - 'PercolatorAdapter': ['v_percolatoradapter.txt', openms_version_regex], - 'Percolator': ['v_percolator.txt', r"[0-9].[0-9]{2}.[0-9]"], - 'IDFilter': ['v_idfilter.txt', openms_version_regex], - 'IDScoreSwitcher': ['v_idscoreswitcher.txt', openms_version_regex], - 'FalseDiscoveryRate': ['v_falsediscoveryrate.txt', openms_version_regex], - 'IDPosteriorErrorProbability': ['v_idposteriorerrorprobability.txt', openms_version_regex], - 'IDFilter': ['v_idfilter.txt', openms_version_regex], - 'ProteomicsLFQ': ['v_proteomicslfq.txt', openms_version_regex], - 'MSstats': ['v_msstats_plfq.txt', r"(\S+)"] } results = OrderedDict() results['nf-core/proteomicslfq'] = 'N/A' @@ -47,7 +30,6 @@ results['IDPosteriorErrorProbability'] = 'N/A' results['IDFilter'] = 'N/A' results['ProteomicsLFQ'] = 'N/A' -results['MSstats'] = 'N/A' # Search each file using its regex for k, v in regexes.items(): diff --git a/main.nf b/main.nf index 9740499..fc606a5 100644 --- a/main.nf +++ b/main.nf @@ -821,32 +821,37 @@ Channel.from(summary.collect{ [it.key, it.value] }) * Parse software version numbers */ process get_software_versions { + publishDir "${params.outdir}/pipeline_info", mode: 'copy', + saveAs: { filename -> + if (filename.indexOf(".csv") > 0) filename + else null + } output: - file 'software_versions_mqc.yaml' into software_versions_yaml + file 'software_versions_mqc.yaml' into ch_software_versions_yaml + file "software_versions.csv" script: """ echo $workflow.manifest.version > v_pipeline.txt echo $workflow.nextflow.version > v_nextflow.txt - ThermoRawFileParser.sh --version > v_thermorawfileparser.txt - FileConverter 2>&1 | grep Version: > v_fileconverter.txt - DecoyDatabase 2>&1 | grep Version: > v_decoydatabase.txt - MSGFPlusAdapter 2>&1 | grep Version: > v_msgfplusadapter.txt - msgf_plus 2>&1 | grep Release > v_msgfplus.txt - CometAdapter 2>&1 | grep Version: > v_cometadapter.txt - comet 2>&1 | grep version > v_comet.txt - PeptideIndexer 2>&1 | grep Version: > v_peptideindexer.txt - PSMFeatureExtractor 2>&1 | grep Version: > v_psmfeatureextractor.txt - PercolatorAdapter 2>&1 | grep Version: > v_percolatoradapter.txt - percolator -h 2>&1 | grep version > v_percolator.txt - IDFilter 2>&1 | grep Version: > v_idfilter.txt - IDScoreSwitcher 2>&1 | grep Version: > v_idscoreswitcher.txt - FalseDiscoveryRate 2>&1 | grep Version: > v_falsediscoveryrate.txt - IDPosteriorErrorProbability 2>&1 | grep Version: > v_idposteriorerrorprobability.txt - IDFilter 2>&1 | grep Version: > v_idfilter.txt - ProteomicsLFQ 2>&1 | grep Version: > v_proteomicslfq.txt - ${workflow.manifest.version} &> v_msstats_plfq.txt + ThermoRawFileParser.sh --version &> v_thermorawfileparser.txt + echo \$(FileConverter 2>&1) > v_fileconverter.txt || true + echo \$(DecoyDatabase 2>&1) > v_decoydatabase.txt || true + echo \$(MSGFPlusAdapter 2>&1) > v_msgfplusadapter.txt || true + echo \$(msgf_plus 2>&1) > v_msgfplus.txt || true + echo \$(CometAdapter 2>&1) > v_cometadapter.txt || true + echo \$(comet 2>&1) > v_comet.txt || true + echo \$(PeptideIndexer 2>&1) > v_peptideindexer.txt || true + echo \$(PSMFeatureExtractor 2>&1) > v_psmfeatureextractor.txt || true + echo \$(PercolatorAdapter 2>&1) > v_percolatoradapter.txt || true + percolator -h &> v_percolator.txt + echo \$(IDFilter 2>&1) > v_idfilter.txt || true + echo \$(IDScoreSwitcher 2>&1) > v_idscoreswitcher.txt || true + echo \$(FalseDiscoveryRate 2>&1) > v_falsediscoveryrate.txt || true + echo \$(IDPosteriorErrorProbability 2>&1) > v_idposteriorerrorprobability.txt || true + echo \$(ProteomicsLFQ 2>&1) > v_proteomicslfq.txt || true + echo $workflow.manifest.version &> v_msstats_plfq.txt scrape_software_versions.py &> software_versions_mqc.yaml """ } From 5fd0f08bb26e4e2f90121c9c34a58ec9dc833dda Mon Sep 17 00:00:00 2001 From: Zethson Date: Thu, 5 Mar 2020 11:46:29 +0100 Subject: [PATCH 093/374] [FIX] openms capture regex --- bin/scrape_software_versions.py | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index a474604..ee835b5 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -3,12 +3,28 @@ from collections import OrderedDict import re -openms_version_regex = r"[0-9][.][0-9][.][0-9]" +openms_version_regex = r"([0-9][.][0-9][.][0-9])" regexes = { 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], 'Nextflow': ['v_nextflow.txt', r"(\S+)"], 'ThermorawfileParser': ['v_thermorawfileparser.txt', r"(\S+)"], + 'FileConverter': ['v_fileconverter.txt', openms_version_regex], + 'DecoyDatabase': ['v_decoydatabase.txt', openms_version_regex], + 'MSGFPlusAdapter': ['v_msgfplusadapter.txt', openms_version_regex], + 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], + 'CometAdapter': ['v_cometadapter.txt', openms_version_regex], + 'Comet': ['v_comet.txt', r"\"(.*)\""], + 'PeptideIndexer': ['v_peptideindexer.txt', openms_version_regex], + 'PSMFeatureExtractor': ['v_psmfeatureextractor.txt', openms_version_regex], + 'PercolatorAdapter': ['v_percolatoradapter.txt', openms_version_regex], + 'Percolator': ['v_percolator.txt', r"([0-9].[0-9]{2}.[0-9])"], + 'IDFilter': ['v_idfilter.txt', openms_version_regex], + 'IDScoreSwitcher': ['v_idscoreswitcher.txt', openms_version_regex], + 'FalseDiscoveryRate': ['v_falsediscoveryrate.txt', openms_version_regex], + 'IDPosteriorErrorProbability': ['v_idposteriorerrorprobability.txt', openms_version_regex], + 'ProteomicsLFQ': ['v_proteomicslfq.txt', openms_version_regex], + 'MSstats': ['v_msstats_plfq.txt', r"(\S+)"] } results = OrderedDict() results['nf-core/proteomicslfq'] = 'N/A' From 2965eea78ea8ea0b5f310decf97cbab251b7aa41 Mon Sep 17 00:00:00 2001 From: Zethson Date: Thu, 5 Mar 2020 11:56:22 +0100 Subject: [PATCH 094/374] [FIX] duplicated IDFilter version results --- bin/scrape_software_versions.py | 1 - 1 file changed, 1 deletion(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index ee835b5..dd145d5 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -44,7 +44,6 @@ results['IDScoreSwitcher'] = 'N/A' results['FalseDiscoveryRate'] = 'N/A' results['IDPosteriorErrorProbability'] = 'N/A' -results['IDFilter'] = 'N/A' results['ProteomicsLFQ'] = 'N/A' # Search each file using its regex From 31686c41eb002caecd55ea925e302e94f7852c48 Mon Sep 17 00:00:00 2001 From: Zethson Date: Thu, 5 Mar 2020 14:27:52 +0100 Subject: [PATCH 095/374] [FEATURE] Some random cleanups & a few comet parameters --- README.md | 2 +- bin/scrape_software_versions.py | 2 +- conf/test.config | 1 - docs/troubleshooting.md | 10 +++---- docs/usage.md | 49 +++++++++++++++++++++++---------- nextflow.config | 4 ++- 6 files changed, 44 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 4832f9b..5c3191e 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ iv. Start running your own analysis! ```bash -nextflow run nf-core/proteomicslfq -profile --reads '*_R{1,2}.fastq.gz' --genome GRCh37 +nextflow run nf-core/proteomicslfq -profile --spectra '*.mzml' --database '*.fasta' --expdesign '*.tsv' ``` See [usage docs](docs/usage.md) for all of the available options when running the pipeline. diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index dd145d5..2e7d048 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -12,7 +12,7 @@ 'FileConverter': ['v_fileconverter.txt', openms_version_regex], 'DecoyDatabase': ['v_decoydatabase.txt', openms_version_regex], 'MSGFPlusAdapter': ['v_msgfplusadapter.txt', openms_version_regex], - 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], + 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], # TODO this results in MSGFPlus vv2017.07.21 -> let's find a smarter regex to scrap the 'v' 'CometAdapter': ['v_cometadapter.txt', openms_version_regex], 'Comet': ['v_comet.txt', r"\"(.*)\""], 'PeptideIndexer': ['v_peptideindexer.txt', openms_version_regex], diff --git a/conf/test.config b/conf/test.config index 0e83d4a..3c9074b 100644 --- a/conf/test.config +++ b/conf/test.config @@ -17,7 +17,6 @@ params { max_time = 48.h // Input data - // TODO nf-core: Give any required params for the test so that command line flags are not needed spectra = [ 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F1.mzML', 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F2.mzML', diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 9975dd5..16c43b2 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -7,21 +7,19 @@ If only no file, only one input file , or only read one and not read two is picked up then something is wrong with your input file declaration 1. The path must be enclosed in quotes (`'` or `"`) -2. The path must have at least one `*` wildcard character. This is even if you are only running one paired end sample. -3. When using the pipeline with paired end data, the path must use `{1,2}` or `{R1,R2}` notation to specify read pairs. -4. If you are running Single end data make sure to specify `--singleEnd` +2. The path must have at least one `*` wildcard character. If the pipeline can't find your files then you will get the following error ```bash -ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz +ERROR ~ Cannot find any spectra matching: *.mzml ``` -Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will. +Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_X1_001.mzml` can be difficult enough for a human to read. Specifying `*{1,2}*.mzml` wont work give you what you want Whilst `*{X1,X2}*.mzml` will. ## Data organization -The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point. +The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. ## Extra resources and getting help diff --git a/docs/usage.md b/docs/usage.md index 462d4ba..e71bd8a 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -18,6 +18,9 @@ * [`--variable_mods`](#--variable_mods) * [`--allowed_missed cleavages`](#--allowed_missed_cleavages) * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) + * [`--`decoy_search](#--decoy_search) + * [`--mass_type_parent`](#--mass_type_parent) + * [`--mass_type_fragment`](#--mass_type_fragment) * [Protein inference](#Protein-Inference) * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) * [`--train_FDR`](#--train_FDR) @@ -68,8 +71,6 @@ It is recommended to limit the Nextflow Java virtual machines memory. We recomme NXF_OPTS='-Xms1g -Xmx4g' ``` - - ## Running the pipeline The typical command for running the pipeline is as follows: @@ -128,7 +129,7 @@ The pipeline also dynamically loads configurations from [https://github.com/nf-c Please note the following requirements: 1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character TODO I dont think this is true, can also be a list! check +2. The path must have at least one `*` wildcard character ### `--database` @@ -162,7 +163,7 @@ If `-profile` is not specified at all the pipeline will be run locally and expec ### `--precursor_mass_tolerance` -Specify the precursor mass tolerance used for the comet database search. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (eg. 5) +Comet: Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (eg. 5) (Comet parameter '-peptide_mass_tolerance') ### `--enzyme` @@ -212,13 +213,13 @@ Percolator: Retention time features are calculated as in Klammer et al. instead Percolator provides the possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is the used as predictive features. -1 iso-electric point +1 -> iso-electric point -2 mass calibration +2 -> mass calibration -4 retention time +4 -> retention time -8 delta_retention_time\*delta_mass_calibration +8 -> delta_retention_time\*delta_mass_calibration ### `--isotope_error_range` @@ -264,6 +265,32 @@ MSGFPLus: Number of matches per spectrum to be reported (MS-GF+ parameter '-n') MSGFPlus: Maximum number of modifications per peptide. If this value is large, the search may take very long. +### `--decoy_search` + +Comet: Decoy search mode. (Comet parameter '-decoy_search') + +0 -> No decoy search. + +1 -> Concatenated decoy search. Target and decoy entries will be scored against each other and a single result is returned for each spectrum query. + +2 -> Separate decoy search. Target and decoy entries will be scored separately and separate target and decoy search results will be reported. + +### `--mass_type_parent` + +Comet: Controls the mass type, average or monoisotopic, applied to peptide mass calculations. (Comet parameter: '-mass_type_parent') + +0 -> average masses + +1 -> monoisotopic masses + +### `--mass_type_fragment` + +Comet: Controls the mass type, average or monoisotopic, applied to fragment ion calculations. (Comet parameter: '--mass_type_fragment') + +0 -> average masses + +1 -> monoisotopic masses + Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. ## Job resources @@ -300,8 +327,6 @@ Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a ## Other command line parameters - - ### `--outdir` The output directory where the results will be saved. @@ -390,7 +415,3 @@ Set to receive plain-text e-mails instead of HTML formatted. ### `--monochrome_logs` Set to disable colourful command line output and live life in monochrome. - -### `--multiqc_config` - -Specify a path to a custom MultiQC configuration file. diff --git a/nextflow.config b/nextflow.config index 92b8660..604d9f7 100644 --- a/nextflow.config +++ b/nextflow.config @@ -57,7 +57,9 @@ params { // Comet flags allowed_missed_cleavages = 1 - // TODO + decoy_search = 0 + mass_type_parent = 1 + mass_type_fragment = 1 // ProteomicsLFQ flags inf_quant_debug = 0 From fdf69ced4d83ede13c5da73e4dfe24d590b1eaa4 Mon Sep 17 00:00:00 2001 From: Zethson Date: Thu, 5 Mar 2020 14:32:31 +0100 Subject: [PATCH 096/374] [FIX] markdown lint --- docs/usage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index e71bd8a..2ea6569 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -273,7 +273,7 @@ Comet: Decoy search mode. (Comet parameter '-decoy_search') 1 -> Concatenated decoy search. Target and decoy entries will be scored against each other and a single result is returned for each spectrum query. -2 -> Separate decoy search. Target and decoy entries will be scored separately and separate target and decoy search results will be reported. +2 -> Separate decoy search. Target and decoy entries will be scored separately and separate target and decoy search results will be reported. ### `--mass_type_parent` @@ -281,7 +281,7 @@ Comet: Controls the mass type, average or monoisotopic, applied to peptide mass 0 -> average masses -1 -> monoisotopic masses +1 -> monoisotopic masses ### `--mass_type_fragment` @@ -289,7 +289,7 @@ Comet: Controls the mass type, average or monoisotopic, applied to fragment ion 0 -> average masses -1 -> monoisotopic masses +1 -> monoisotopic masses Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. From cf7116da196bde745b8624a93e4db70b3b075eeb Mon Sep 17 00:00:00 2001 From: Zethson Date: Fri, 6 Mar 2020 11:47:22 +0100 Subject: [PATCH 097/374] [FIX] Removed comet parameters again --- bin/scrape_software_versions.py | 2 +- docs/usage.md | 29 ----------------------------- nextflow.config | 3 --- 3 files changed, 1 insertion(+), 33 deletions(-) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 2e7d048..72555ef 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -12,7 +12,7 @@ 'FileConverter': ['v_fileconverter.txt', openms_version_regex], 'DecoyDatabase': ['v_decoydatabase.txt', openms_version_regex], 'MSGFPlusAdapter': ['v_msgfplusadapter.txt', openms_version_regex], - 'MSGFPlus': ['v_msgfplus.txt', r"\(([^)]+)\)"], # TODO this results in MSGFPlus vv2017.07.21 -> let's find a smarter regex to scrap the 'v' + 'MSGFPlus': ['v_msgfplus.txt', r"\(([^v)]+)\)"], 'CometAdapter': ['v_cometadapter.txt', openms_version_regex], 'Comet': ['v_comet.txt', r"\"(.*)\""], 'PeptideIndexer': ['v_peptideindexer.txt', openms_version_regex], diff --git a/docs/usage.md b/docs/usage.md index 2ea6569..2b2502e 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -18,9 +18,6 @@ * [`--variable_mods`](#--variable_mods) * [`--allowed_missed cleavages`](#--allowed_missed_cleavages) * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) - * [`--`decoy_search](#--decoy_search) - * [`--mass_type_parent`](#--mass_type_parent) - * [`--mass_type_fragment`](#--mass_type_fragment) * [Protein inference](#Protein-Inference) * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) * [`--train_FDR`](#--train_FDR) @@ -265,32 +262,6 @@ MSGFPLus: Number of matches per spectrum to be reported (MS-GF+ parameter '-n') MSGFPlus: Maximum number of modifications per peptide. If this value is large, the search may take very long. -### `--decoy_search` - -Comet: Decoy search mode. (Comet parameter '-decoy_search') - -0 -> No decoy search. - -1 -> Concatenated decoy search. Target and decoy entries will be scored against each other and a single result is returned for each spectrum query. - -2 -> Separate decoy search. Target and decoy entries will be scored separately and separate target and decoy search results will be reported. - -### `--mass_type_parent` - -Comet: Controls the mass type, average or monoisotopic, applied to peptide mass calculations. (Comet parameter: '-mass_type_parent') - -0 -> average masses - -1 -> monoisotopic masses - -### `--mass_type_fragment` - -Comet: Controls the mass type, average or monoisotopic, applied to fragment ion calculations. (Comet parameter: '--mass_type_fragment') - -0 -> average masses - -1 -> monoisotopic masses - Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. ## Job resources diff --git a/nextflow.config b/nextflow.config index 604d9f7..c640a85 100644 --- a/nextflow.config +++ b/nextflow.config @@ -57,9 +57,6 @@ params { // Comet flags allowed_missed_cleavages = 1 - decoy_search = 0 - mass_type_parent = 1 - mass_type_fragment = 1 // ProteomicsLFQ flags inf_quant_debug = 0 From 6e3c323994e5a7196208f61122e88e2c44a5f798 Mon Sep 17 00:00:00 2001 From: Zethson Date: Fri, 6 Mar 2020 11:48:48 +0100 Subject: [PATCH 098/374] [FEATURE] Added to do for comet parameters again --- nextflow.config | 1 + 1 file changed, 1 insertion(+) diff --git a/nextflow.config b/nextflow.config index c640a85..5dd7466 100644 --- a/nextflow.config +++ b/nextflow.config @@ -57,6 +57,7 @@ params { // Comet flags allowed_missed_cleavages = 1 + # TODO // ProteomicsLFQ flags inf_quant_debug = 0 From 70ab85ca4f47b5d78183a2698b9dbc8beee15dd1 Mon Sep 17 00:00:00 2001 From: Zethson Date: Fri, 6 Mar 2020 11:50:23 +0100 Subject: [PATCH 099/374] [FIX] wrong comment character :facepalm: --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 5dd7466..92b8660 100644 --- a/nextflow.config +++ b/nextflow.config @@ -57,7 +57,7 @@ params { // Comet flags allowed_missed_cleavages = 1 - # TODO + // TODO // ProteomicsLFQ flags inf_quant_debug = 0 From 2910b0e6937c94d01fa7bc8b7f4763ecacaabada Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 24 Mar 2020 21:29:02 +0100 Subject: [PATCH 100/374] Add ptxqc to environment such that container gets built --- environment.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/environment.yml b/environment.yml index f9a8a30..41310be 100644 --- a/environment.yml +++ b/environment.yml @@ -9,8 +9,8 @@ dependencies: # bioconda - bioconda::openms-thirdparty=2.5.0 - bioconda::bioconductor-msstats=3.18.0 # will include R - #TODO check if thirdparties are in path and are found by the openms adapters - - conda-forge::openjdk=8.0.192 + - conda-forge::r-ptxqc # for QC reports + - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 From 4e837b0292580e0f560705feefd61119e0f3301c Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 25 Mar 2020 13:01:41 +0100 Subject: [PATCH 101/374] Test URLS?? --- conf/test.config | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/conf/test.config b/conf/test.config index 3c9074b..d0f9b3c 100644 --- a/conf/test.config +++ b/conf/test.config @@ -18,15 +18,15 @@ params { // Input data spectra = [ - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F1.mzML', - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA1_F2.mzML', - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA2_F1.mzML', - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA2_F2.mzML', - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA3_F1.mzML', - 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA3_F2.mzML' + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F2.mzML' ] - database = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' - expdesign = 'https://github.com/nf-core/test-datasets/raw/proteomicslfq/testdata/BSA_design.tsv' + database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' + expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' posterior_probabilities = "fit_distributions" search_engine = "msgf" protein_level_fdr_cutoff = 1.0 From 8ba348c97023e1170749ddb93b76b13bde109ad1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 25 Mar 2020 19:45:42 +0100 Subject: [PATCH 102/374] Add ptxqc support --- main.nf | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/main.nf b/main.nf index fc606a5..c937db2 100644 --- a/main.nf +++ b/main.nf @@ -764,6 +764,26 @@ process msstats { """ } +//TODO allow user config yml (as second arg to the script +) +process ptxlfq { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/ptxlfq", mode: 'copy' + + input: + file mzTab from out_mzTab + + output: + file "*.html" + file "*.yml" + + script: + """ + ptxqc.R ${mzTab} > ptxqc.log" + """ +} + //--------------------------------------------------------------- // From b14be179c804b412d564f6e56a01b72fafae1cb3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 25 Mar 2020 19:48:25 +0100 Subject: [PATCH 103/374] Fixes --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index c937db2..0adc44d 100644 --- a/main.nf +++ b/main.nf @@ -765,7 +765,7 @@ process msstats { } //TODO allow user config yml (as second arg to the script -) + process ptxlfq { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -780,7 +780,7 @@ process ptxlfq { script: """ - ptxqc.R ${mzTab} > ptxqc.log" + ptxqc.R ${mzTab} > ptxqc.log """ } From f9e5a9c473060b3b7e18532ea7a562cba49b0c71 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 25 Mar 2020 19:48:47 +0100 Subject: [PATCH 104/374] Forgot script --- bin/ptxqc.R | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100755 bin/ptxqc.R diff --git a/bin/ptxqc.R b/bin/ptxqc.R new file mode 100755 index 0000000..32282db --- /dev/null +++ b/bin/ptxqc.R @@ -0,0 +1,14 @@ +#!/usr/bin/env Rscript + +args = commandArgs(trailingOnly = TRUE) +require(PTXQC) +mztab_file = args[1] +fn = PTXQC:::getReportFilenames(dirname(mztab_file), TRUE, mztab_file) + +yaml_obj = NULL +if (length(args) >= 2) +{ + if (file.exists(args[2]) yaml_obj = yaml::yaml.load_file(args[2]) +} +createReport(txt_folder = NULL, mztab_file = mztab_file, yaml_obj) + From 7fd50236fd8c96c5c8a0b7020e354e8d0cb14767 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 25 Mar 2020 22:07:17 +0100 Subject: [PATCH 105/374] more fixes and update environment to have pinned versions and additional fonts --- Dockerfile | 7 +------ bin/ptxqc.R | 11 +++++++---- environment.yml | 3 ++- main.nf | 9 ++++++--- 4 files changed, 16 insertions(+), 14 deletions(-) diff --git a/Dockerfile b/Dockerfile index ac4f158..bf1d12e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -6,12 +6,7 @@ LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a -RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH - -# Add jar............. +# OpenMS Adapters need the raw jars of Java based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) # Dump the details of the installed packages to a file for posterity diff --git a/bin/ptxqc.R b/bin/ptxqc.R index 32282db..26690eb 100755 --- a/bin/ptxqc.R +++ b/bin/ptxqc.R @@ -1,14 +1,17 @@ #!/usr/bin/env Rscript +options(encoding = 'UTF-8') + args = commandArgs(trailingOnly = TRUE) require(PTXQC) mztab_file = args[1] fn = PTXQC:::getReportFilenames(dirname(mztab_file), TRUE, mztab_file) -yaml_obj = NULL -if (length(args) >= 2) -{ - if (file.exists(args[2]) yaml_obj = yaml::yaml.load_file(args[2]) +yaml_obj = list() +if (length(args) >= 2) { + if (file.exists(args[2])) { + yaml_obj = yaml::yaml.load_file(args[2]) + } } createReport(txt_folder = NULL, mztab_file = mztab_file, yaml_obj) diff --git a/environment.yml b/environment.yml index 41310be..45215eb 100644 --- a/environment.yml +++ b/environment.yml @@ -9,7 +9,8 @@ dependencies: # bioconda - bioconda::openms-thirdparty=2.5.0 - bioconda::bioconductor-msstats=3.18.0 # will include R - - conda-forge::r-ptxqc # for QC reports + - conda-forge::r-ptxqc=1.0.2 # for QC reports + - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 diff --git a/main.nf b/main.nf index 0adc44d..fb6fd6c 100644 --- a/main.nf +++ b/main.nf @@ -766,17 +766,20 @@ process msstats { //TODO allow user config yml (as second arg to the script -process ptxlfq { +process ptxqc { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - publishDir "${params.outdir}/ptxlfq", mode: 'copy' + publishDir "${params.outdir}/ptxqc", mode: 'copy' input: file mzTab from out_mzTab output: file "*.html" - file "*.yml" + file "*.yaml" + file "*.Rmd" + file "*.pdf" + file "*.txt" script: """ From a36b7d692f3e66ad8863c468b018b84fe5c5e525 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 25 Mar 2020 22:11:44 +0100 Subject: [PATCH 106/374] add back the conda PATH addition. --- Dockerfile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Dockerfile b/Dockerfile index bf1d12e..228e563 100644 --- a/Dockerfile +++ b/Dockerfile @@ -6,6 +6,9 @@ LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" COPY environment.yml / RUN conda env create -f /environment.yml && conda clean -a +# Add conda installation dir to PATH (instead of doing 'conda activate') +ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH + # OpenMS Adapters need the raw jars of Java based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) From e0e0c3b30e8b6d45f6cf234cd8721a680d471f63 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 28 Mar 2020 15:36:54 +0100 Subject: [PATCH 107/374] [FEATURE] Add aws test on PRIDE data. --- .github/workflows/awstest.yml | 31 +++++++++++++++++++++++++++++++ conf/test_full.config | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+) create mode 100644 .github/workflows/awstest.yml create mode 100644 conf/test_full.config diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 0000000..2d59688 --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,31 @@ +name: nf-core AWS test +# This workflow is triggered on PRs to the master branch. +# It runs the -profile 'test_full' on AWS batch + +on: + pull_request: + branches: + - master + - dev #TODO remove after testing + release: + types: [published] + +jobs: + run-awstest: + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}} + TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}} + run: | + aws batch submit-job --region eu-west-1 --job-name nf-core-atacseq --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/atacseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' \ No newline at end of file diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100644 index 0000000..a5464d7 --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,32 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a fast and simple test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test, + */ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Real-world sized test dataset to check pipeline function and sanity of results' + + // Input data + spectra = [ + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R1.raw', + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R2.raw', + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R3.raw', + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_50000amol_R1.raw', + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_50000amol_R2.raw', + 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_50000amol_R3.raw' + ] + database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata-aws/uniprot_yeast_reviewed_isoforms_ups1_crap.fasta_td.fasta' + expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata-aws/experimental_design_short.tsv' + posterior_probabilities = "percolator" + search_engine = "comet" + psm_pep_fdr_cutoff = 0.01 + decoy_affix = "rev" + protein_inference = "bayesian" + targeted_only = "false" + post_processing_tdc = false +} From 574f6758c1d1cfcb5c57b8dc95cc9a103af77c47 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 29 Mar 2020 17:50:33 +0200 Subject: [PATCH 108/374] Apply suggestions from code review Co-Authored-By: Gisela Gabernet --- .github/workflows/awstest.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 2d59688..f6cce6f 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -3,7 +3,7 @@ name: nf-core AWS test # It runs the -profile 'test_full' on AWS batch on: - pull_request: + push: branches: - master - dev #TODO remove after testing @@ -28,4 +28,4 @@ jobs: AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}} TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}} run: | - aws batch submit-job --region eu-west-1 --job-name nf-core-atacseq --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/atacseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' \ No newline at end of file + aws batch submit-job --region eu-west-1 --job-name nf-core-proteomicslfq --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' From abf23b554a5c23d14784f0dcb8d0c4e998bccc5d Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 29 Mar 2020 17:59:21 +0200 Subject: [PATCH 109/374] register the test_full config --- nextflow.config | 1 + 1 file changed, 1 insertion(+) diff --git a/nextflow.config b/nextflow.config index 92b8660..c143e64 100644 --- a/nextflow.config +++ b/nextflow.config @@ -123,6 +123,7 @@ profiles { singularity.autoMounts = true } test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } // Avoid this error: From 781ac17af95884b5b7c83ae66abdf4ae0a398637 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 29 Mar 2020 23:56:40 +0200 Subject: [PATCH 110/374] run the test --- .github/workflows/awstest.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index f6cce6f..4bdee0f 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -6,7 +6,7 @@ on: push: branches: - master - - dev #TODO remove after testing + - feature/awstests #TODO remove after testing release: types: [published] From efa4c920d05d482ad71fcf24a67669ef9e3a8725 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 30 Mar 2020 12:22:57 +0200 Subject: [PATCH 111/374] [FIX] No error, just skip when having one condition. --- bin/msstats_plfq.R | 123 +++++++++++++++++++++++---------------------- 1 file changed, 64 insertions(+), 59 deletions(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index d1e46a4..4587e36 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -1,5 +1,6 @@ #!/usr/bin/env Rscript args = commandArgs(trailingOnly=TRUE) +args[1] = "/Users/pfeuffer/Downloads/out.csv" if (length(args)==0) { stop("At least one argument must be supplied (input file).n", call.=FALSE) @@ -29,19 +30,44 @@ quant <- OpenMStoMSstatsFormat(data, processed.quant <- dataProcess(quant, censoredInt = 'NA') lvls <- levels(as.factor(data$Condition)) - -if (args[2] == "pairwise") +if (length(lvls) == 1) { - if (args[3] == "") + print("Only one condition found. No contrasts to be tested. If this is not the case, please check your experimental design.") +} else { + if (args[2] == "pairwise") { - l <- length(lvls) - contrast_mat <- matrix(nrow = l * (l-1) / 2, ncol = l) - rownames(contrast_mat) <- rep(NA, l * (l-1) / 2) - colnames(contrast_mat) <- lvls - c <- 1 - for (i in 1:(l-1)) + if (args[3] == "") { - for (j in (i+1):l) + l <- length(lvls) + contrast_mat <- matrix(nrow = l * (l-1) / 2, ncol = l) + rownames(contrast_mat) <- rep(NA, l * (l-1) / 2) + colnames(contrast_mat) <- lvls + c <- 1 + for (i in 1:(l-1)) + { + for (j in (i+1):l) + { + comparison <- rep(0,l) + comparison[i] <- -1 + comparison[j] <- 1 + contrast_mat[c,] <- comparison + rownames(contrast_mat)[c] <- paste0(lvls[i],"-",lvls[j]) + c <- c+1 + } + } + } else { + control <- which(as.character(lvls) == args[3]) + if (length(control) == 0) + { + stop("Control condition not part of found levels.n", call.=FALSE) + } + + l <- length(lvls) + contrast_mat <- matrix(nrow = l-1, ncol = l) + rownames(contrast_mat) <- rep(NA, l-1) + colnames(contrast_mat) <- lvls + c <- 1 + for (j in setdiff(1:l,control)) { comparison <- rep(0,l) comparison[i] <- -1 @@ -51,54 +77,33 @@ if (args[2] == "pairwise") c <- c+1 } } - } else { - control <- which(as.character(lvls) == args[3]) - if (length(control) == 0) - { - stop("Control condition not part of found levels.n", call.=FALSE) - } - - l <- length(lvls) - contrast_mat <- matrix(nrow = l-1, ncol = l) - rownames(contrast_mat) <- rep(NA, l-1) - colnames(contrast_mat) <- lvls - c <- 1 - for (j in setdiff(1:l,control)) - { - comparison <- rep(0,l) - comparison[i] <- -1 - comparison[j] <- 1 - contrast_mat[c,] <- comparison - rownames(contrast_mat)[c] <- paste0(lvls[i],"-",lvls[j]) - c <- c+1 - } } + + print ("Contrasts to be tested:") + print (contrast_mat) + #TODO allow for user specified contrasts + test.MSstats <- groupComparison(contrast.matrix=contrast_mat, data=processed.quant) + + #TODO allow manual input (e.g. proteins of interest) + write.csv(test.MSstats$ComparisonResult, "msstats_results.csv") + + groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", + width=12, height=12,dot.size = 2,ylimUp = 7) + groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", + width=12, height=12,dot.size = 2,ylimUp = 7) + + if (nrow(constrast_mat) > 1) + { + groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", + width=12, height=12,dot.size = 2,ylimUp = 7) + } + + #for (comp in rownames(contrast_mat)) + #{ + # groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", + # width=12, height=12,dot.size = 2,ylimUp = 7, sig=1)#, + # which.Comparison = comp, + # address=F) + # # try to plot all comparisons + #} } - -print ("Contrasts to be tested:") -print (contrast_mat) -#TODO allow for user specified contrasts -test.MSstats <- groupComparison(contrast.matrix=contrast_mat, data=processed.quant) - -#TODO allow manual input (e.g. proteins of interest) -write.csv(test.MSstats$ComparisonResult, "msstats_results.csv") - -groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", - width=12, height=12,dot.size = 2,ylimUp = 7) -groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", - width=12, height=12,dot.size = 2,ylimUp = 7) - -if (nrow(constrast_mat) > 1) -{ - groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", - width=12, height=12,dot.size = 2,ylimUp = 7) -} - -#for (comp in rownames(contrast_mat)) -#{ -# groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", -# width=12, height=12,dot.size = 2,ylimUp = 7, sig=1)#, -# which.Comparison = comp, -# address=F) -# # try to plot all comparisons -#} From 91822f6bd1bb4382f83d4a8e32c883c09da1e84a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 30 Mar 2020 12:25:57 +0200 Subject: [PATCH 112/374] [FIX] No error, just skip when having one condition. --- bin/msstats_plfq.R | 1 - 1 file changed, 1 deletion(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 4587e36..2139567 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -1,6 +1,5 @@ #!/usr/bin/env Rscript args = commandArgs(trailingOnly=TRUE) -args[1] = "/Users/pfeuffer/Downloads/out.csv" if (length(args)==0) { stop("At least one argument must be supplied (input file).n", call.=FALSE) From f7b370df459d8b446de5e2e0f7b5603cefb528d1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 30 Mar 2020 16:14:15 +0200 Subject: [PATCH 113/374] Add some more parameter stubs, plus an implemented one for skipping msstats. --- main.nf | 15 ++++++++++++++- nextflow.config | 8 ++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index fb6fd6c..c785e85 100644 --- a/main.nf +++ b/main.nf @@ -30,7 +30,7 @@ def helpMessage() { Database Search: --search_engine Which search engine: "comet" (default) or "msgf" - --enzyme Enzymatic cleavage ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) + --enzyme Enzymatic cleavage (e.g. 'unspecific cleavage' or 'Trypsin' [default], see OpenMS enzymes) --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: 'fully' valid: 'semi', 'fully', 'C-term unspecific', 'N-term unspecific') --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') @@ -110,6 +110,16 @@ def helpMessage() { "strictly_unique_peptides" = use peptides mapping to a unique single protein only "shared_peptides" = use shared peptides only for its best group (by inference score) + Statistical post-processing: + --skip_post_msstats Skip MSstats for statistical post-processing? + --ref_condition Instead of all pairwise contrasts, uses the given condition number (corresponding to your experimental design) as a reference and + creates pairwise contrasts against it (TODO fully implement) + --contrasts Specify a set of contrasts in a semicolon seperated list of R-compatible contrasts with the + condition numbers as variables (e.g. "1-2;1-3;2-3"). Overwrites "--reference" (TODO fully implement) + + Quality control: + --ptxqc_report_layout Specify a yaml file for the report layout (see PTXQC documentation) (TODO fully implement) + General Options: --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) @@ -750,6 +760,9 @@ process msstats { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/msstats", mode: 'copy' + when: + !params.skip_post_msstats + input: file csv from out_msstats diff --git a/nextflow.config b/nextflow.config index c143e64..8350a10 100644 --- a/nextflow.config +++ b/nextflow.config @@ -67,6 +67,14 @@ params { mass_recalibration = "false" transfer_ids = "false" + // MSstats + skip_post_msstats = false + ref_condition = "" + contrasts = "" + + // PTXQC + ptxqc_report_layout = "" + outdir = './results' // Boilerplate options From 1894eb7b51caf192630a778fae5b2f9adce9d6e2 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 30 Mar 2020 16:32:49 +0200 Subject: [PATCH 114/374] Use a different tower token for the non-AWS tests --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 45d3671..21bfd91 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -13,7 +13,7 @@ jobs: env: NXF_VER: ${{ matrix.nxf_ver }} NXF_ANSI_LOG: false - TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_TOKEN }} + TOWER_ACCESS_TOKEN: ${{ secrets.NONAWS_TOWER_ACCESS_TOKEN }} runs-on: ubuntu-latest strategy: matrix: From bd6d0dcec1acb5856c6ccd5d25193697db9d3352 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 30 Mar 2020 18:18:35 +0200 Subject: [PATCH 115/374] Try to update the parameters in the docs. --- docs/usage.md | 82 ++++++++++++++++++++++++++++++++++++++++----------- main.nf | 7 ++--- 2 files changed, 67 insertions(+), 22 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 2b2502e..fb06ddf 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -10,32 +10,70 @@ * [Main arguments](#main-arguments) * [`--spectra`](#--spectra) * [`--database`](#--database) + * [`--exp_design`](#--exp_design) * [`-profile`](#-profile) -* [Mass Spectrometry Search](#Mass-Spectrometry-Search) - * [`--precursor_mass_tolerance`](#--precursor_mass_tolerance) +* [Decoy database generation](#decoy-database-generation) + * [`--add_decoys`](#--add_decoys) + * [`--decoy_affix`](#-decoy_affix) + * [`--affix_type`](#-profile) +* [Database search](#database-search) + * [`--search_engine`](#--search_engine) * [`--enzyme`](#--enzyme) + * [`--num_enzyme_termini`](#--num_enzyme_termini) * [`--fixed_mods`](#--fixed_mods) * [`--variable_mods`](#--variable_mods) - * [`--allowed_missed cleavages`](#--allowed_missed_cleavages) + * [`--precursor_mass_tolerance`](#--precursor_mass_tolerance) + * [`--allowed_missed_cleavages`](#--allowed_missed_cleavages) + * [`--num_hits`](#--num_hits) * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) -* [Protein inference](#Protein-Inference) - * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) - * [`--train_FDR`](#--train_FDR) - * [`--test_FDR`](#--test_FDR) - * [`--FDR_level`](#--FDR_level) - * [`--klammer`](#--klammer) - * [`--description_correct_features`](#--description_correct_features) - * [`--isotope_error_range`](#--isotope_error_range) - * [`--fragment_method`](#--fragment_method) - * [`--instrument`](#--instrument) - * [`--protocol`](#--protocol) - * [`--tryptic`](#--tryptic) * [`--min_precursor_charge`](#--min_precursor_charge) * [`--max_precursor_charge`](#--max_precursor_charge) * [`--min_peptide_length`](#--min_peptide_length) * [`--max_peptide_length`](#--max_peptide_length) - * [`--matches_per_spec`](#--matches_per_spec) + * [`--instrument`](#--instrument) + * [`--protocol`](#--protocol) + * [`--fragment_method`](#--fragment_method) + * [`--isotope_error_range`](#--isotope_error_range) * [`--max_mods`](#--max_mods) + * [`--db_debug`](#--db_debug) +* [PSM rescoring](#psm-rescoring) + * [`--posterior_probabilities`](#--posterior_probabilities) + * [`--rescoring_debug`](#--rescoring_debug) + * [`--psm_pep_fdr_cutoff`](#--psm_pep_fdr_cutoff) + * [Percolator specific](#percolator-specific) + * [`--train_FDR`](#--train_FDR) + * [`--test_FDR`](#--test_FDR) + * [`--percolator_fdr_level`](#--percolator_fdr_level) + * [`--post-processing-tdc`](#--post-processing-tdc) + * [`--description_correct_features`](#--description_correct_features) + * [`--generic-feature-set`](#--feature) + * [`--subset-max-train`](#--subset-max-train) + * [`--klammer`](#--klammer) + * [Distribution specific](#distribution-specific) + * [`--outlier_handling`](#--outlier_handling) + * [`--top_hits_only`](#--top_hits_only) + +* [Inference and Quantification](#inference-and-quantification) + * [`--inf_quant_debug`](#--inf_quant_debug) + * [Inference](#inference) + * [`--protein_inference`](#--protein_inference) + * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) + + * [Quantification](#quantification) + * [`--transfer_ids`](#--transfer_ids) + * [`--targeted_only`](#--targeted_only) + * [`--mass_recalibration`](#--mass_recalibration) + * [`--psm_pep_fdr_for_quant`](#--psm_pep_fdr_for_quant) + * [`--protein_quantification`](#--protein_quantification) + +* [Statistical post-processing](#statistical-post-processing) + * [`--skip_post_msstats`](#--skip_post_msstats) + * [`--ref_condition`](#--ref_condition) + * [`--contrasts`](#--contrasts) + +* [Quality control](#quality-control) + * [`--ptxqc_report_layout`](#--ptxqc_report_layout) + * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) @@ -136,6 +174,14 @@ Needs to be given to specify the input protein database when you run the pipelin --database '[path to Fasta protein database]' ``` +### `--exp_design` + +Path or URL to an experimental design file (if not given, it assumes unfractionated, unrelated samples). See an example here (TODO). + +```bash +--exp_design '[path to experimental design file in OpenMS-style tab separated format]' +``` + ### `-profile` Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! @@ -156,11 +202,11 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * A profile with a complete configuration for automated testing * Includes links to test data and therefore doesn't need additional parameters -## Mass Spectrometry Search +## Database search ### `--precursor_mass_tolerance` -Comet: Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (eg. 5) (Comet parameter '-peptide_mass_tolerance') +Precursor mass tolerance used for database search in ppm. TODO parameterize the unit. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (i.e. 5) ### `--enzyme` diff --git a/main.nf b/main.nf index c785e85..5721d10 100644 --- a/main.nf +++ b/main.nf @@ -23,6 +23,9 @@ def helpMessage() { --spectra Path to input spectra as mzML or Thermo Raw --database Path to input protein database as fasta + General Options: + --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) + Decoy database: --add_decoys Add decoys to the given fasta --decoy_affix The decoy prefix or suffix used or to be used (default: DECOY_) @@ -120,10 +123,6 @@ def helpMessage() { Quality control: --ptxqc_report_layout Specify a yaml file for the report layout (see PTXQC documentation) (TODO fully implement) - General Options: - --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) - - Other nextflow options: --outdir The output directory where the results will be saved --email Set this parameter to your e-mail address to get a summary e-mail with details of the From 7818faf9d9cfb3ba591cdc0575f5647cb1dfe756 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 31 Mar 2020 14:10:50 +0200 Subject: [PATCH 116/374] Lint fixes and small channel fixes --- docs/usage.md | 15 +++++---------- main.nf | 23 ++++++++++------------- 2 files changed, 15 insertions(+), 23 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index fb06ddf..63a4cf7 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -15,7 +15,7 @@ * [Decoy database generation](#decoy-database-generation) * [`--add_decoys`](#--add_decoys) * [`--decoy_affix`](#-decoy_affix) - * [`--affix_type`](#-profile) + * [`--affix_type`](#-profile) * [Database search](#database-search) * [`--search_engine`](#--search_engine) * [`--enzyme`](#--enzyme) @@ -52,28 +52,23 @@ * [Distribution specific](#distribution-specific) * [`--outlier_handling`](#--outlier_handling) * [`--top_hits_only`](#--top_hits_only) - * [Inference and Quantification](#inference-and-quantification) * [`--inf_quant_debug`](#--inf_quant_debug) * [Inference](#inference) * [`--protein_inference`](#--protein_inference) * [`--protein_level_fdr_cutoff`](#--protein_level_fdr_cutoff) - * [Quantification](#quantification) * [`--transfer_ids`](#--transfer_ids) * [`--targeted_only`](#--targeted_only) - * [`--mass_recalibration`](#--mass_recalibration) + * [`--mass_recalibration`](#--mass_recalibration) * [`--psm_pep_fdr_for_quant`](#--psm_pep_fdr_for_quant) * [`--protein_quantification`](#--protein_quantification) - * [Statistical post-processing](#statistical-post-processing) - * [`--skip_post_msstats`](#--skip_post_msstats) - * [`--ref_condition`](#--ref_condition) - * [`--contrasts`](#--contrasts) - + * [`--skip_post_msstats`](#--skip_post_msstats) + * [`--ref_condition`](#--ref_condition) + * [`--contrasts`](#--contrasts) * [Quality control](#quality-control) * [`--ptxqc_report_layout`](#--ptxqc_report_layout) - * [Job resources](#job-resources) * [Automatic resubmission](#automatic-resubmission) * [Custom resource requests](#custom-resource-requests) diff --git a/main.nf b/main.nf index 5721d10..527239d 100644 --- a/main.nf +++ b/main.nf @@ -166,10 +166,16 @@ params.outdir = params.outdir ?: { log.warn "No output directory provided. Will */ ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) -ch_database = Channel.fromPath(params.database).set{ db_for_decoy_creation } +ch_db_for_decoy_creation = Channel.fromPath(params.database) // ch_expdesign = Channel.fromPath(params.design, checkIfExists: true) -//use a branch operator for this sort of thing and access the files accordingly! +if (params.expdesign) +{ + Channel + .fromPath(params.expdesign) + .ifEmpty { exit 1, "params.expdesign was empty - no input files supplied" } + .set { ch_expdesign } +} ch_spectra .branch { @@ -264,15 +270,6 @@ process mzml_indexing { branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls_comet; mzmls_msgf; mzmls_plfq} -if (params.expdesign) -{ - Channel - .fromPath(params.expdesign) - .ifEmpty { exit 1, "params.expdesign was empty - no input files supplied" } - .set { expdesign } -} - - //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] @@ -284,10 +281,10 @@ process generate_decoy_database { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file(mydatabase) from db_for_decoy_creation + file(mydatabase) from ch_db_for_decoy_creation output: - file "${database.baseName}_decoy.fasta" into searchengine_in_db_decoy_msgf, searchengine_in_db_decoy_comet, pepidx_in_db_decoy, plfq_in_db_decoy + file "${mydatabase.baseName}_decoy.fasta" into searchengine_in_db_decoy_msgf, searchengine_in_db_decoy_comet, pepidx_in_db_decoy, plfq_in_db_decoy file "*.log" when: From e306a5234509e466d559f9875679e46374f45439 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 31 Mar 2020 14:16:28 +0200 Subject: [PATCH 117/374] forgot rename --- main.nf | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/main.nf b/main.nf index 527239d..cfae5c5 100644 --- a/main.nf +++ b/main.nf @@ -158,7 +158,6 @@ ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") // Validate inputs params.spectra = params.spectra ?: { log.error "No spectra data provided. Make sure you have used the '--spectra' option."; exit 1 }() params.database = params.database ?: { log.error "No protein database provided. Make sure you have used the '--database' option."; exit 1 }() -// params.expdesign = params.expdesign ?: { log.error "No read data privided. Make sure you have used the '--design' option."; exit 1 }() params.outdir = params.outdir ?: { log.warn "No output directory provided. Will put the results into './results'"; return "./results" }() /* @@ -167,7 +166,6 @@ params.outdir = params.outdir ?: { log.warn "No output directory provided. Will ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) ch_db_for_decoy_creation = Channel.fromPath(params.database) -// ch_expdesign = Channel.fromPath(params.design, checkIfExists: true) if (params.expdesign) { @@ -712,7 +710,7 @@ process proteomicslfq { file id_files from id_files_idx_feat_perc_fdr_filter_switched .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) .toSortedList({ a, b -> b.baseName <=> a.baseName }) - file expdes from expdesign + file expdes from ch_expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) output: From 6dab09e2c5b3ab15b1a2453f4b2185f8e264f7b2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:11:16 +0200 Subject: [PATCH 118/374] started on allowing PRIDE SDRF input. --- main.nf | 132 ++++++++++++++++++++++++++++++++++++++---------- nextflow.config | 3 ++ 2 files changed, 109 insertions(+), 26 deletions(-) diff --git a/main.nf b/main.nf index cfae5c5..720c85e 100644 --- a/main.nf +++ b/main.nf @@ -19,13 +19,16 @@ def helpMessage() { nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker - Mandatory arguments: + Main arguments: + Either: + --sdrf Path to PRIDE Sample to data relation format file + Or: --spectra Path to input spectra as mzML or Thermo Raw - --database Path to input protein database as fasta - - General Options: - --expdesign Path to experimental design file (if not given, it assumes unfractionated, unrelated samples) + --expdesign Path to optional experimental design file (if not given, it assumes unfractionated, unrelated samples) + And: + --database Path to input protein database as fasta + Decoy database: --add_decoys Add decoys to the given fasta --decoy_affix The decoy prefix or suffix used or to be used (default: DECOY_) @@ -155,30 +158,100 @@ if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ // Stage config files ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") + // Validate inputs -params.spectra = params.spectra ?: { log.error "No spectra data provided. Make sure you have used the '--spectra' option."; exit 1 }() +if (!(params.spectra || params.sdrf) || (params.spectra && params.sdrf)) +{ + log.error "EITHER spectra data OR SDRF needs to be provided. Make sure you have used either of those options."; exit 1 +} + params.database = params.database ?: { log.error "No protein database provided. Make sure you have used the '--database' option."; exit 1 }() params.outdir = params.outdir ?: { log.warn "No output directory provided. Will put the results into './results'"; return "./results" }() /* - * Create a channel for input read files + * Create a channel for input files */ -ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) + //Filename FixedModifications VariableModifications Label PrecursorMassTolerance PrecursorMassToleranceUnit FragmentMassTolerance DissociationMethod Enzyme + + +if (!params.sdrf) +{ + ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) + ch_spectra + .multiMap{ it -> id = UUID.randomUUID().toString() + comet_settings: msgf_settings: tuple(id, + params.fixed_mods, + params.variable_mods, + params.precursor_mass_tolerance, + params.precursor_error_units, + params.fragment_mass_tolerance, + params.dissociation_method, + params.enzyme) + idx_settings: tuple(id, + params.enzyme) + mzmls: tuple(id,it)} + .set{ch_sdrf_config} +} +else +{ + ch_sdrf = Channel.fromPath(params.sdrf, checkIfExists: true) + /* + * STEP 0 - SDRF parsing + */ + process sdrf_parsing { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + + input: + file sdrf from ch_sdrf + + output: + file "experimental_design.tsv" into ch_expdesign + file "openms.tsv" into ch_sdrf_config_file + + when: + params.sdrf + + script: + """ + python parse_sdrf.py ${sdrf} > sdrf_parsing.log + """ + } + + //TODO use header and ref by col name + ch_sdrf_config_file + .splitCsv(skip: 1) + .multiMap{ row -> id = UUID.randomUUID().toString() + comet_settings: msgf_settings: tuple(id, + row[1], + row[2], + row[3], + row[4], + row[5], + row[6], + row[7]) + idx_settings: tuple(id, + row[7]) + mzmls: tuple(id,row[0])} + .set{ch_sdrf_config} +} + ch_db_for_decoy_creation = Channel.fromPath(params.database) +// overwrite experimental design if given additionally to SDRF +//TODO think about that if (params.expdesign) { Channel .fromPath(params.expdesign) - .ifEmpty { exit 1, "params.expdesign was empty - no input files supplied" } .set { ch_expdesign } } -ch_spectra +ch_sdrf_config.mzmls .branch { - raw: hasExtension(it, 'raw') - mzML: hasExtension(it, 'mzML') + raw: hasExtension(it[1], 'raw') + mzML: hasExtension(it[1], 'mzML') } .set {branched_input} @@ -186,14 +259,14 @@ ch_spectra //TODO we could also check for outdated mzML versions and try to update them branched_input.mzML .branch { - nonIndexedMzML: file(it).withReader { + nonIndexedMzML: file(it[1]).withReader { f = it; 1.upto(5) { if (f.readLine().contains("indexedmzML")) return false; } return true; } - inputIndexedMzML: file(it).withReader { + inputIndexedMzML: file(it[1]).withReader { f = it; 1.upto(5) { if (f.readLine().contains("indexedmzML")) return true; @@ -227,10 +300,10 @@ process raw_file_conversion { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file rawfile from branched_input.raw + set mzml_id, file(rawfile) from branched_input.raw output: - file "*.mzML" into mzmls_converted + set mzml_id, file("*.mzML") into mzmls_converted // TODO check if this sh script is available with bioconda @@ -250,10 +323,10 @@ process mzml_indexing { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file mzmlfile from branched_input_mzMLs.nonIndexedMzML + set mzml_id, file(mzmlfile) from branched_input_mzMLs.nonIndexedMzML output: - file "out/*.mzML" into mzmls_indexed + set mzml_id, file("out/*.mzML") into mzmls_indexed file "*.log" script: @@ -326,6 +399,9 @@ if (params.search_engine == "msgf") search_engine_score = "expect" } + + //Filename FixedModifications VariableModifications Label PrecursorMassTolerance PrecursorMassToleranceUnit FragmentMassTolerance DissociationMethod Enzyme + process search_engine_msgf { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -337,7 +413,7 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), file(mzml_file) from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf) + tuple file(database), mzml_id, file(mzml_file), fixed, variable, lab, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -346,7 +422,7 @@ process search_engine_msgf { params.search_engine == "msgf" output: - file "${mzml_file.baseName}.idXML" into id_files_msgf + set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_msgf file "*.log" script: @@ -356,6 +432,8 @@ process search_engine_msgf { -threads ${task.cpus} \\ -database ${database} \\ -matches_per_spec ${params.num_hits} \\ + -fixed_modifications ${fixed} \\ + -variable_modifications ${variable} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -370,7 +448,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), file(mzml_file) from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet) + tuple file(database), mzml_id, file(mzml_file), from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) @@ -380,7 +458,7 @@ process search_engine_comet { params.search_engine == "comet" output: - file "${mzml_file.baseName}.idXML" into id_files_comet + set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_comet file "*.log" script: @@ -390,6 +468,8 @@ process search_engine_comet { -threads ${task.cpus} \\ -database ${database} \\ -num_hits ${params.num_hits} \\ + -fixed_modifications ${fixed} \\ + -variable_modifications ${variable} \\ > ${mzml_file.baseName}_comet.log """ } @@ -400,13 +480,13 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - //tuple file(database), file(id_file) from id_files_msgf.mix(id_files_comet, Channel.empty()).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + tuple file(database), mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) - each file(id_file) from id_files_msgf.mix(id_files_comet) - file database from pepidx_in_db.mix(pepidx_in_db_decoy) + //each mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet) + //file database from pepidx_in_db.mix(pepidx_in_db_decoy) output: - file "${id_file.baseName}_idx.idXML" into id_files_idx_ForPerc, id_files_idx_ForIDPEP + set mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP file "*.log" script: diff --git a/nextflow.config b/nextflow.config index 8350a10..d1ba47b 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,6 +9,7 @@ params { // Workflow flags + sdrf = "" spectra = "data/*.mzML" database = "data/*.fasta" expdesign = "data/*.tsv" @@ -32,6 +33,8 @@ params { precursor_error_units = "ppm" fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' + fragment_mass_tolerance = 5 + dissociation_method = "HCD" // Percolator flags train_FDR = 0.05 From 813b9f37cfb0419990f19b7e7b94d900133a668f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:24:35 +0200 Subject: [PATCH 119/374] min 20.01 --- .github/workflows/ci.yml | 2 +- nextflow.config | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 21bfd91..85546d6 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -18,7 +18,7 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['19.10.0', ''] + nxf_ver: ['20.01', ''] steps: - uses: actions/checkout@v2 - name: Determine tower usage diff --git a/nextflow.config b/nextflow.config index d1ba47b..c4414bf 100644 --- a/nextflow.config +++ b/nextflow.config @@ -178,7 +178,7 @@ manifest { homePage = 'https://github.com/nf-core/proteomicslfq' description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' mainScript = 'main.nf' - nextflowVersion = '>=19.10.0' + nextflowVersion = '>=20.01' version = '1.0dev' } From 59401c8bdcce475d083f8e86dc682a75c9d57732 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:27:12 +0200 Subject: [PATCH 120/374] min 20.01 --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 85546d6..80b505e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -18,7 +18,7 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['20.01', ''] + nxf_ver: [''] steps: - uses: actions/checkout@v2 - name: Determine tower usage From e0eeb9d688dd40edeffd9cd8afd4504b49b86c2b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:30:07 +0200 Subject: [PATCH 121/374] fix comet call --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 720c85e..8f0f72d 100644 --- a/main.nf +++ b/main.nf @@ -448,7 +448,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, file(mzml_file), from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() + tuple file(database), mzml_id, file(mzml_file), fixed, variable, lab, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) From b82436e40d5ead0333eddec3ded4612446c6b0c4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:33:25 +0200 Subject: [PATCH 122/374] ok linter, we will test the same version twice --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 80b505e..85546d6 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -18,7 +18,7 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: [''] + nxf_ver: ['20.01', ''] steps: - uses: actions/checkout@v2 - name: Determine tower usage From ddde5f78a58209a2f9a1544a945d415200635ab0 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:36:16 +0200 Subject: [PATCH 123/374] badge --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5c3191e..a161d5a 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ [![GitHub Actions CI Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) [![GitHub Actions Linting Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.01-brightgreen.svg)](https://www.nextflow.io/) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/proteomicslfq.svg)](https://hub.docker.com/r/nfcore/proteomicslfq) From 436bcdad1e52b4c0a69818bd6cedff027aaef322 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:41:25 +0200 Subject: [PATCH 124/374] add 0? --- .github/workflows/ci.yml | 2 +- README.md | 2 +- nextflow.config | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 85546d6..cd9f919 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -18,7 +18,7 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['20.01', ''] + nxf_ver: ['20.01.0', ''] steps: - uses: actions/checkout@v2 - name: Determine tower usage diff --git a/README.md b/README.md index a161d5a..132042e 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ [![GitHub Actions CI Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) [![GitHub Actions Linting Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.01-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.01.0-brightgreen.svg)](https://www.nextflow.io/) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/proteomicslfq.svg)](https://hub.docker.com/r/nfcore/proteomicslfq) diff --git a/nextflow.config b/nextflow.config index c4414bf..f8aca80 100644 --- a/nextflow.config +++ b/nextflow.config @@ -178,7 +178,7 @@ manifest { homePage = 'https://github.com/nf-core/proteomicslfq' description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' mainScript = 'main.nf' - nextflowVersion = '>=20.01' + nextflowVersion = '>=20.01.0' version = '1.0dev' } From d71f5e0c6e1341bdc76bc952d93e623ec9ee1489 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:52:39 +0200 Subject: [PATCH 125/374] added the currently unused labelling modification --- main.nf | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/main.nf b/main.nf index 8f0f72d..6ee67b7 100644 --- a/main.nf +++ b/main.nf @@ -183,6 +183,7 @@ if (!params.sdrf) comet_settings: msgf_settings: tuple(id, params.fixed_mods, params.variable_mods, + "", //labelling modifications currently not supported params.precursor_mass_tolerance, params.precursor_error_units, params.fragment_mass_tolerance, @@ -230,9 +231,10 @@ else row[4], row[5], row[6], - row[7]) + row[7], + row[8]) idx_settings: tuple(id, - row[7]) + row[8]) mzmls: tuple(id,row[0])} .set{ch_sdrf_config} } @@ -413,7 +415,7 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), mzml_id, file(mzml_file), fixed, variable, lab, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() + tuple file(database), mzml_id, file(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -432,8 +434,8 @@ process search_engine_msgf { -threads ${task.cpus} \\ -database ${database} \\ -matches_per_spec ${params.num_hits} \\ - -fixed_modifications ${fixed} \\ - -variable_modifications ${variable} \\ + -fixed_modifications "${fixed}" \\ + -variable_modifications "${variable}" \\ > ${mzml_file.baseName}_msgf.log """ } @@ -448,7 +450,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, file(mzml_file), fixed, variable, lab, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() + tuple file(database), mzml_id, file(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) @@ -468,8 +470,8 @@ process search_engine_comet { -threads ${task.cpus} \\ -database ${database} \\ -num_hits ${params.num_hits} \\ - -fixed_modifications ${fixed} \\ - -variable_modifications ${variable} \\ + -fixed_modifications "${fixed}" \\ + -variable_modifications "${variable}" \\ > ${mzml_file.baseName}_comet.log """ } @@ -480,7 +482,7 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple file(database), mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.index_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) //each mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet) //file database from pepidx_in_db.mix(pepidx_in_db_decoy) @@ -495,6 +497,7 @@ process index_peptides { -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ -fasta ${database} \\ + -enzyme "${enzyme}" \\ > ${id_file.baseName}_index_peptides.log """ } From 7ec5f1fed1033b0dae9845b2ec634f59563b8df4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 18:58:31 +0200 Subject: [PATCH 126/374] fix var name --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 6ee67b7..12338d0 100644 --- a/main.nf +++ b/main.nf @@ -482,7 +482,7 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.index_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) //each mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet) //file database from pepidx_in_db.mix(pepidx_in_db_decoy) From 86b14b21ae525d1a305739c7fec9cf3d7c6321f1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 19:05:02 +0200 Subject: [PATCH 127/374] Pep indexer param --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 12338d0..120d7df 100644 --- a/main.nf +++ b/main.nf @@ -497,7 +497,7 @@ process index_peptides { -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ -fasta ${database} \\ - -enzyme "${enzyme}" \\ + -enzyme:name "${enzyme}" \\ > ${id_file.baseName}_index_peptides.log """ } From 7922cba736e8a29bec4633b4a55ef36aafa04d8b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 19:38:07 +0200 Subject: [PATCH 128/374] stop adding the mzml_id at specific points --- main.nf | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 120d7df..7ad247a 100644 --- a/main.nf +++ b/main.nf @@ -512,10 +512,10 @@ process extract_perc_features { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForPerc + set mzml_id, file(id_file) from id_files_idx_ForPerc output: - file "${id_file.baseName}_feat.idXML" into id_files_idx_feat + set mzml_id, file("${id_file.baseName}_feat.idXML") into id_files_idx_feat file "*.log" when: @@ -531,14 +531,14 @@ process extract_perc_features { } - +//Note: from here, we do not need any settings anymore. so we can skip adding the mzml_id to the channels //TODO parameterize and find a way to run across all runs merged process percolator { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_feat + set mzml_id, file(id_file) from id_files_idx_feat output: file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc @@ -626,12 +626,13 @@ process idscoreswitcher { // --------------------------------------------------------------------- // Branch b) Q-values and PEP from OpenMS +// Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_idto the channels process fdr { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForIDPEP + set mzml_id, file(id_file) from id_files_idx_ForIDPEP output: file "${id_file.baseName}_fdr.idXML" into id_files_idx_ForIDPEP_fdr From 6117a71d3d97819e44d4fa20c2d5e22c02a5beba Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 19:47:45 +0200 Subject: [PATCH 129/374] only use filepath in plfq --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 7ad247a..6b771d4 100644 --- a/main.nf +++ b/main.nf @@ -790,7 +790,7 @@ process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: - file mzmls from mzmls_plfq.toSortedList({ a, b -> b.baseName <=> a.baseName }) + file mzmls from mzmls_plfq.map{it->it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) file id_files from id_files_idx_feat_perc_fdr_filter_switched .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) .toSortedList({ a, b -> b.baseName <=> a.baseName }) From 1742c0d3ab07d2f2b9b42139041014f4fc336bdc Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 1 Apr 2020 20:44:06 +0200 Subject: [PATCH 130/374] only use filepath in plfq --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 6b771d4..02fb093 100644 --- a/main.nf +++ b/main.nf @@ -790,7 +790,7 @@ process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: - file mzmls from mzmls_plfq.map{it->it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) + file mzmls from mzmls_plfq.map{it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) file id_files from id_files_idx_feat_perc_fdr_filter_switched .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) .toSortedList({ a, b -> b.baseName <=> a.baseName }) From fa761a1eb71f9e67cac7cf7001bceb71f672040b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 12:34:22 +0200 Subject: [PATCH 131/374] forgot the script --- bin/parse_sdrf.py | 197 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 bin/parse_sdrf.py diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py new file mode 100644 index 0000000..568de80 --- /dev/null +++ b/bin/parse_sdrf.py @@ -0,0 +1,197 @@ +#!/usr/bin/env python3 +import glob +import csv +import re +import pandas as pd +import sys + +#sdrf = pd.read_table("https://raw.githubusercontent.com/bigbio/proteomics-metadata-standard/master/annotated-projects/PXD018117/sdrf.tsv") +if len(sys.argv) != 2: + print('Usage e.g.: python parse_sdrf.py "https://raw.githubusercontent.com/bigbio/proteomics-metadata-standard/master/annotated-projects/PXD018117/sdrf.tsv"') + exit(-1) +sdrf = pd.read_table(sys.argv[1]) +sdrf.columns = map(str.lower, sdrf.columns) # convert column names to lower-case + +def OpenMSifyMods(sdrf_mods): + oms_mods = list() + + for m in sdrf_mods: + if "AC=UNIMOD" not in m: + raise("only UNIMOD modifications supported.") + + name = re.search("NT=(.+?)(;|$)", m).group(1) + name = name.capitalize() + + # workaround for missing PP in some sdrf TODO: fix in sdrf spec? + if re.search("PP=(.+?)[;$]", m) == None: + pp = "Anywhere" + else: + pp = re.search("PP=(.+?)(;|$)", m).group(1) # one of [Anywhere, Protein N-term, Protein C-term, Any N-term, Any C-term + + if re.search("TA=(.+?)(;|$)", m) == None: # TODO: missing in sdrf. + print("Warning no TA= specified. Setting to N-term or C-term if possible: " + m) + if "C-term" in pp: + ta = "C-term" + elif "N-term" in pp: + ta = "N-term" + else: + print("Reassignment not possible. skipping") + pass + else: + ta = re.search("TA=(.+?)(;|$)", m).group(1) # target amino-acid + aa = ta.split(",") # multiply target site e.g., S,T,Y including potentially termini "C-term" + + if pp == "Protein N-term" or pp == "Protein C-term": + for a in aa: + if a == "C-term" or a == "N-term": # no site specificity + oms_mods.append(name + " (" + pp + ")") # any Protein N/C-term + else: + oms_mods.append(name + " (" + pp + " " + a + ")") # specific Protein N/C-term + elif pp == "Any N-term" or pp == "Any C-term": + pp = pp.replace("Any ", "") # in OpenMS we just use N-term and C-term + for a in aa: + if a == "C-term" or aa == "N-term": # no site specificity + oms_mods.append(name + " (" + pp + ")") # any N/C-term + else: + oms_mods.append(name + " (" + pp + " " + a + ")") # specific N/C-term + else: # Anywhere in the peptide + for a in aa: + oms_mods.append(name + " (" + a + ")") # specific site in peptide + + return ",".join(oms_mods) + + +# map filename to tuple of [fixed, variable] mods +mod_cols = [c for ind, c in enumerate(sdrf) if c.startswith('comment[modification parameters')] # columns with modification parameters + + # get factor columns (except constant ones) +factor_cols = [c for ind, c in enumerate(sdrf) if c.startswith('factor value[') and len(sdrf[c].unique()) > 1] + +file2mods = dict() +file2pctol = dict() +file2pctolunit = dict() +file2fragtol = dict() +file2fragtolunit = dict() +file2diss = dict() +file2enzyme = dict() +file2fraction = dict() +file2label = dict() +file2source = dict() +source_name_list = list() +file2combined_factors = dict() +file2technical_rep = dict() +for index, row in sdrf.iterrows(): + ## extract mods + all_mods = list(row[mod_cols]) + var_mods = [m for m in all_mods if 'MT=variable' in m or 'MT=Variable' in m] # workaround for capitalization + var_mods.sort() + fixed_mods = [m for m in all_mods if 'MT=fixed' in m or 'MT=Fixed' in m] # workaround for capitalization + fixed_mods.sort() + raw = row['comment[data file]'] + fixed_mods_string = "" + if fixed_mods != None: + fixed_mods_string = OpenMSifyMods(fixed_mods) + + variable_mods_string = "" + if var_mods != None: + variable_mods_string = OpenMSifyMods(var_mods) + + file2mods[raw] = (fixed_mods_string, variable_mods_string) + + source_name = row['source name'] + file2source[raw] = source_name + if not source_name in source_name_list: + source_name_list.append(source_name) + + if 'comment[precursor mass tolerance]' in row: + pc_tol_str = row['comment[precursor mass tolerance]'] + if "ppm" in pc_tol_str or "Da" in pc_tol_str: + pc_tmp = pc_tol_str.split(" ") + file2pctol[raw] = pc_tmp[0] + file2pctolunit[raw] = pc_tmp[1] + else: + print("Invalid precursor mass tolerance set. Assuming 10 ppm.") + file2pctol[raw] = "10" + file2pctolunit[raw] = "ppm" + else: + print("No precursor mass tolerance set. Assuming 10 ppm.") + file2pctol[raw] = "10" + file2pctolunit[raw] = "ppm" + + if 'comment[fragment mass tolerance]' in row: + f_tol_str = row['comment[fragment mass tolerance]'] + f_tol_str.replace("PPM", "ppm") # workaround + if "ppm" in f_tol_str or "Da" in f_tol_str: + f_tmp = f_tol_str.split(" ") + file2fragtol[raw] = f_tmp[0] + file2fragtolunit[raw] = f_tmp[1] + else: + print("Invalid fragment mass tolerance set. Assuming 10 ppm.") + file2fragtol[raw] = "10" + file2fragtolunit[raw] = "ppm" + else: + print("No fragment mass tolerance set. Assuming 10 ppm.") + file2fragtol[raw] = "20" + file2fragtolunit[raw] = "ppm" + + if 'comment[dissociation method]' in row: + diss_method = row['comment[dissociation method]'] + file2diss[raw] = diss_method.toUpper() + else: + print("No dissociation method provided. Assuming HCD.") + file2diss[raw] = 'HCD' + + if 'comment[technical replicate]' in row: + file2technical_rep[raw] = str(row['comment[technical replicate]']) + else: + file2technical_rep[raw] = "1" + + enzyme = re.search("NT=(.+?)(;|$)", row['comment[cleavage agent details]']).group(1) + enzyme = enzyme.capitalize() + if "Trypsin/p" in enzyme: # workaround + enzyme = "Trypsin/P" + file2enzyme[raw] = enzyme + file2fraction[raw] = str(row['comment[fraction identifier]']) + label = re.search("NT=(.+?)(;|$)", row['comment[label]']).group(1) + file2label[raw] = label + + ## extract factors + all_factors = list(row[factor_cols]) + combined_factors = "|".join(all_factors) + if combined_factors == "": + print("No factors specified. Adding dummy factor used as condition.") + combined_factors = "none" + + file2combined_factors[raw] = combined_factors + #print(combined_factors) + +##################### only label-free supported right now + +# output of search settings +f=open("openms.tsv","w+") +OpenMSSearchSettingsHeader = ["Filename", "FixedModifications", "VariableModifications", "Label", "PrecursorMassTolerance", "PrecursorMassToleranceUnit", "FragmentMassTolerance", "DissociationMethod", "Enzyme"] +f.write("\t".join(OpenMSSearchSettingsHeader) + "\n") +for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + f.write(raw+"\t"+file2mods[raw][0]+"\t"+file2mods[raw][1] +"\t"+file2label[raw]+"\t"+file2pctol[raw]+"\t"+file2pctolunit[raw]+"\t"+file2fragtol[raw]+"\t"+file2fragtolunit[raw]+"\t"+file2diss[raw]+"\t"+file2enzyme[raw]+"\n") +f.close() + +# output of experimental design +f=open("experimental_design.tsv","w+") +OpenMSExperimentalDesignHeader = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", "MSstats_Condition", "MSstats_BioReplicate"] +f.write("\t".join(OpenMSExperimentalDesignHeader) + "\n") +for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + fraction_group = str(1 + source_name_list.index(row["source name"])) # extract fraction group + sample = fraction_group + if 'none' in file2combined_factors[raw]: + # no factor defined use sample as condition + condition = sample + else: + condition = file2combined_factors[raw] + replicate = file2technical_rep[raw] + label = file2label[raw] + if "label free sample" in label: + label = "1" + f.write(fraction_group+"\t"+file2fraction[raw]+"\t"+raw+"\t"+label+"\t"+sample+"\t"+condition+"\t"+replicate+"\n") +f.close() \ No newline at end of file From 3aef88461505771dd0aae65e06b0515d63361533 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 12:56:59 +0200 Subject: [PATCH 132/374] remove defaults for mandatory inputs --- nextflow.config | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nextflow.config b/nextflow.config index f8aca80..a9498fe 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,9 +10,9 @@ params { // Workflow flags sdrf = "" - spectra = "data/*.mzML" - database = "data/*.fasta" - expdesign = "data/*.tsv" + spectra = "" + database = "" + expdesign = "" // Tools flags posterior_probabilities = "percolator" From c3ba3ca1426b8878fe4242b1e6540562de349c30 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 13:44:52 +0200 Subject: [PATCH 133/374] require min ver. fix permissions and call of the py script. add pandas to conda env --- bin/parse_sdrf.py | 0 environment.yml | 1 + main.nf | 2 +- nextflow.config | 2 +- 4 files changed, 3 insertions(+), 2 deletions(-) mode change 100644 => 100755 bin/parse_sdrf.py diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py old mode 100644 new mode 100755 diff --git a/environment.yml b/environment.yml index 45215eb..47302a0 100644 --- a/environment.yml +++ b/environment.yml @@ -12,6 +12,7 @@ dependencies: - conda-forge::r-ptxqc=1.0.2 # for QC reports - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) + - conda-forge::pandas=1.0.3 - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 diff --git a/main.nf b/main.nf index 02fb093..65b9c46 100644 --- a/main.nf +++ b/main.nf @@ -216,7 +216,7 @@ else script: """ - python parse_sdrf.py ${sdrf} > sdrf_parsing.log + parse_sdrf.py ${sdrf} > sdrf_parsing.log """ } diff --git a/nextflow.config b/nextflow.config index a9498fe..3e8c25b 100644 --- a/nextflow.config +++ b/nextflow.config @@ -178,7 +178,7 @@ manifest { homePage = 'https://github.com/nf-core/proteomicslfq' description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' mainScript = 'main.nf' - nextflowVersion = '>=20.01.0' + nextflowVersion = '!>=20.01.0' version = '1.0dev' } From bf9e69874b21445deabd8e4eca1db2d8265a461e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 14:08:38 +0200 Subject: [PATCH 134/374] added click as a dependency --- environment.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 47302a0..408e896 100644 --- a/environment.yml +++ b/environment.yml @@ -12,7 +12,8 @@ dependencies: - conda-forge::r-ptxqc=1.0.2 # for QC reports - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) - - conda-forge::pandas=1.0.3 + - conda-forge::click=7.1.1 # for parse_sdrf.py + - conda-forge::pandas=1.0.3 # for parse_sdrf.py - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 From af0d27ff315893a5b6652752750415d97275111d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 16:02:14 +0200 Subject: [PATCH 135/374] update to newest version of the sdrf parse script --- bin/parse_sdrf.py | 346 +++++++++++++++++++++++++--------------------- 1 file changed, 188 insertions(+), 158 deletions(-) diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py index 568de80..619bcc2 100755 --- a/bin/parse_sdrf.py +++ b/bin/parse_sdrf.py @@ -1,34 +1,38 @@ #!/usr/bin/env python3 -import glob -import csv import re -import pandas as pd import sys +import logging +import os + +import click +import pandas as pd + +CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help']) + + +@click.group(context_settings=CONTEXT_SETTINGS) +def cli(): + """This is the main tool that give access to all commands and options provided by the mslookup and dleamse algorithms""" -#sdrf = pd.read_table("https://raw.githubusercontent.com/bigbio/proteomics-metadata-standard/master/annotated-projects/PXD018117/sdrf.tsv") -if len(sys.argv) != 2: - print('Usage e.g.: python parse_sdrf.py "https://raw.githubusercontent.com/bigbio/proteomics-metadata-standard/master/annotated-projects/PXD018117/sdrf.tsv"') - exit(-1) -sdrf = pd.read_table(sys.argv[1]) -sdrf.columns = map(str.lower, sdrf.columns) # convert column names to lower-case -def OpenMSifyMods(sdrf_mods): +def openms_ify_mods(sdrf_mods): oms_mods = list() for m in sdrf_mods: if "AC=UNIMOD" not in m: - raise("only UNIMOD modifications supported.") + raise Exception("only UNIMOD modifications supported.") name = re.search("NT=(.+?)(;|$)", m).group(1) name = name.capitalize() - # workaround for missing PP in some sdrf TODO: fix in sdrf spec? - if re.search("PP=(.+?)[;$]", m) == None: + # workaround for missing PP in some sdrf TODO: fix in sdrf spec? + if re.search("PP=(.+?)[;$]", m) is None: pp = "Anywhere" else: - pp = re.search("PP=(.+?)(;|$)", m).group(1) # one of [Anywhere, Protein N-term, Protein C-term, Any N-term, Any C-term + pp = re.search("PP=(.+?)(;|$)", m).group( + 1) # one of [Anywhere, Protein N-term, Protein C-term, Any N-term, Any C-term - if re.search("TA=(.+?)(;|$)", m) == None: # TODO: missing in sdrf. + if re.search("TA=(.+?)(;|$)", m) is None: # TODO: missing in sdrf. print("Warning no TA= specified. Setting to N-term or C-term if possible: " + m) if "C-term" in pp: ta = "C-term" @@ -38,160 +42,186 @@ def OpenMSifyMods(sdrf_mods): print("Reassignment not possible. skipping") pass else: - ta = re.search("TA=(.+?)(;|$)", m).group(1) # target amino-acid - aa = ta.split(",") # multiply target site e.g., S,T,Y including potentially termini "C-term" + ta = re.search("TA=(.+?)(;|$)", m).group(1) # target amino-acid + aa = ta.split(",") # multiply target site e.g., S,T,Y including potentially termini "C-term" - if pp == "Protein N-term" or pp == "Protein C-term": + if pp == "Protein N-term" or pp == "Protein C-term": for a in aa: - if a == "C-term" or a == "N-term": # no site specificity - oms_mods.append(name + " (" + pp + ")") # any Protein N/C-term + if a == "C-term" or a == "N-term": # no site specificity + oms_mods.append(name + " (" + pp + ")") # any Protein N/C-term else: - oms_mods.append(name + " (" + pp + " " + a + ")") # specific Protein N/C-term + oms_mods.append(name + " (" + pp + " " + a + ")") # specific Protein N/C-term elif pp == "Any N-term" or pp == "Any C-term": - pp = pp.replace("Any ", "") # in OpenMS we just use N-term and C-term - for a in aa: - if a == "C-term" or aa == "N-term": # no site specificity - oms_mods.append(name + " (" + pp + ")") # any N/C-term + pp = pp.replace("Any ", "") # in OpenMS we just use N-term and C-term + for a in aa: + if a == "C-term" or aa == "N-term": # no site specificity + oms_mods.append(name + " (" + pp + ")") # any N/C-term else: - oms_mods.append(name + " (" + pp + " " + a + ")") # specific N/C-term - else: # Anywhere in the peptide + oms_mods.append(name + " (" + pp + " " + a + ")") # specific N/C-term + else: # Anywhere in the peptide for a in aa: - oms_mods.append(name + " (" + a + ")") # specific site in peptide - + oms_mods.append(name + " (" + a + ")") # specific site in peptide + return ",".join(oms_mods) -# map filename to tuple of [fixed, variable] mods -mod_cols = [c for ind, c in enumerate(sdrf) if c.startswith('comment[modification parameters')] # columns with modification parameters - - # get factor columns (except constant ones) -factor_cols = [c for ind, c in enumerate(sdrf) if c.startswith('factor value[') and len(sdrf[c].unique()) > 1] - -file2mods = dict() -file2pctol = dict() -file2pctolunit = dict() -file2fragtol = dict() -file2fragtolunit = dict() -file2diss = dict() -file2enzyme = dict() -file2fraction = dict() -file2label = dict() -file2source = dict() -source_name_list = list() -file2combined_factors = dict() -file2technical_rep = dict() -for index, row in sdrf.iterrows(): - ## extract mods - all_mods = list(row[mod_cols]) - var_mods = [m for m in all_mods if 'MT=variable' in m or 'MT=Variable' in m] # workaround for capitalization - var_mods.sort() - fixed_mods = [m for m in all_mods if 'MT=fixed' in m or 'MT=Fixed' in m] # workaround for capitalization - fixed_mods.sort() - raw = row['comment[data file]'] - fixed_mods_string = "" - if fixed_mods != None: - fixed_mods_string = OpenMSifyMods(fixed_mods) - - variable_mods_string = "" - if var_mods != None: - variable_mods_string = OpenMSifyMods(var_mods) - - file2mods[raw] = (fixed_mods_string, variable_mods_string) - - source_name = row['source name'] - file2source[raw] = source_name - if not source_name in source_name_list: - source_name_list.append(source_name) - - if 'comment[precursor mass tolerance]' in row: - pc_tol_str = row['comment[precursor mass tolerance]'] - if "ppm" in pc_tol_str or "Da" in pc_tol_str: - pc_tmp = pc_tol_str.split(" ") - file2pctol[raw] = pc_tmp[0] - file2pctolunit[raw] = pc_tmp[1] +def openms_convert(sdrf_file: str = None): + sdrf = pd.read_table(sdrf_file) + sdrf.columns = map(str.lower, sdrf.columns) # convert column names to lower-case + + # map filename to tuple of [fixed, variable] mods + mod_cols = [c for ind, c in enumerate(sdrf) if + c.startswith('comment[modification parameters')] # columns with modification parameters + + # get factor columns (except constant ones) + factor_cols = [c for ind, c in enumerate(sdrf) if c.startswith('factor value[') and len(sdrf[c].unique()) > 1] + + file2mods = dict() + file2pctol = dict() + file2pctolunit = dict() + file2fragtol = dict() + file2fragtolunit = dict() + file2diss = dict() + file2enzyme = dict() + file2fraction = dict() + file2label = dict() + file2source = dict() + source_name_list = list() + file2combined_factors = dict() + file2technical_rep = dict() + for index, row in sdrf.iterrows(): + ## extract mods + all_mods = list(row[mod_cols]) + var_mods = [m for m in all_mods if 'MT=variable' in m or 'MT=Variable' in m] # workaround for capitalization + var_mods.sort() + fixed_mods = [m for m in all_mods if 'MT=fixed' in m or 'MT=Fixed' in m] # workaround for capitalization + fixed_mods.sort() + raw = row['comment[data file]'] + fixed_mods_string = "" + if fixed_mods is not None: + fixed_mods_string = openms_ify_mods(fixed_mods) + + variable_mods_string = "" + if var_mods is not None: + variable_mods_string = openms_ify_mods(var_mods) + + file2mods[raw] = (fixed_mods_string, variable_mods_string) + + source_name = row['source name'] + file2source[raw] = source_name + if not source_name in source_name_list: + source_name_list.append(source_name) + + if 'comment[precursor mass tolerance]' in row: + pc_tol_str = row['comment[precursor mass tolerance]'] + if "ppm" in pc_tol_str or "Da" in pc_tol_str: + pc_tmp = pc_tol_str.split(" ") + file2pctol[raw] = pc_tmp[0] + file2pctolunit[raw] = pc_tmp[1] + else: + print("Invalid precursor mass tolerance set. Assuming 10 ppm.") + file2pctol[raw] = "10" + file2pctolunit[raw] = "ppm" else: - print("Invalid precursor mass tolerance set. Assuming 10 ppm.") + print("No precursor mass tolerance set. Assuming 10 ppm.") file2pctol[raw] = "10" file2pctolunit[raw] = "ppm" - else: - print("No precursor mass tolerance set. Assuming 10 ppm.") - file2pctol[raw] = "10" - file2pctolunit[raw] = "ppm" - - if 'comment[fragment mass tolerance]' in row: - f_tol_str = row['comment[fragment mass tolerance]'] - f_tol_str.replace("PPM", "ppm") # workaround - if "ppm" in f_tol_str or "Da" in f_tol_str: - f_tmp = f_tol_str.split(" ") - file2fragtol[raw] = f_tmp[0] - file2fragtolunit[raw] = f_tmp[1] + + if 'comment[fragment mass tolerance]' in row: + f_tol_str = row['comment[fragment mass tolerance]'] + f_tol_str.replace("PPM", "ppm") # workaround + if "ppm" in f_tol_str or "Da" in f_tol_str: + f_tmp = f_tol_str.split(" ") + file2fragtol[raw] = f_tmp[0] + file2fragtolunit[raw] = f_tmp[1] + else: + print("Invalid fragment mass tolerance set. Assuming 10 ppm.") + file2fragtol[raw] = "10" + file2fragtolunit[raw] = "ppm" else: - print("Invalid fragment mass tolerance set. Assuming 10 ppm.") - file2fragtol[raw] = "10" + print("No fragment mass tolerance set. Assuming 10 ppm.") + file2fragtol[raw] = "20" file2fragtolunit[raw] = "ppm" - else: - print("No fragment mass tolerance set. Assuming 10 ppm.") - file2fragtol[raw] = "20" - file2fragtolunit[raw] = "ppm" - - if 'comment[dissociation method]' in row: - diss_method = row['comment[dissociation method]'] - file2diss[raw] = diss_method.toUpper() - else: - print("No dissociation method provided. Assuming HCD.") - file2diss[raw] = 'HCD' - - if 'comment[technical replicate]' in row: - file2technical_rep[raw] = str(row['comment[technical replicate]']) - else: - file2technical_rep[raw] = "1" - - enzyme = re.search("NT=(.+?)(;|$)", row['comment[cleavage agent details]']).group(1) - enzyme = enzyme.capitalize() - if "Trypsin/p" in enzyme: # workaround - enzyme = "Trypsin/P" - file2enzyme[raw] = enzyme - file2fraction[raw] = str(row['comment[fraction identifier]']) - label = re.search("NT=(.+?)(;|$)", row['comment[label]']).group(1) - file2label[raw] = label - - ## extract factors - all_factors = list(row[factor_cols]) - combined_factors = "|".join(all_factors) - if combined_factors == "": - print("No factors specified. Adding dummy factor used as condition.") - combined_factors = "none" - - file2combined_factors[raw] = combined_factors - #print(combined_factors) - -##################### only label-free supported right now - -# output of search settings -f=open("openms.tsv","w+") -OpenMSSearchSettingsHeader = ["Filename", "FixedModifications", "VariableModifications", "Label", "PrecursorMassTolerance", "PrecursorMassToleranceUnit", "FragmentMassTolerance", "DissociationMethod", "Enzyme"] -f.write("\t".join(OpenMSSearchSettingsHeader) + "\n") -for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - f.write(raw+"\t"+file2mods[raw][0]+"\t"+file2mods[raw][1] +"\t"+file2label[raw]+"\t"+file2pctol[raw]+"\t"+file2pctolunit[raw]+"\t"+file2fragtol[raw]+"\t"+file2fragtolunit[raw]+"\t"+file2diss[raw]+"\t"+file2enzyme[raw]+"\n") -f.close() - -# output of experimental design -f=open("experimental_design.tsv","w+") -OpenMSExperimentalDesignHeader = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", "MSstats_Condition", "MSstats_BioReplicate"] -f.write("\t".join(OpenMSExperimentalDesignHeader) + "\n") -for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - fraction_group = str(1 + source_name_list.index(row["source name"])) # extract fraction group - sample = fraction_group - if 'none' in file2combined_factors[raw]: - # no factor defined use sample as condition - condition = sample - else: - condition = file2combined_factors[raw] - replicate = file2technical_rep[raw] - label = file2label[raw] - if "label free sample" in label: - label = "1" - f.write(fraction_group+"\t"+file2fraction[raw]+"\t"+raw+"\t"+label+"\t"+sample+"\t"+condition+"\t"+replicate+"\n") -f.close() \ No newline at end of file + + if 'comment[dissociation method]' in row: + diss_method = row['comment[dissociation method]'] + file2diss[raw] = diss_method.toUpper() + else: + print("No dissociation method provided. Assuming HCD.") + file2diss[raw] = 'HCD' + + if 'comment[technical replicate]' in row: + file2technical_rep[raw] = str(row['comment[technical replicate]']) + else: + file2technical_rep[raw] = "1" + + enzyme = re.search("NT=(.+?)(;|$)", row['comment[cleavage agent details]']).group(1) + enzyme = enzyme.capitalize() + if "Trypsin/p" in enzyme: # workaround + enzyme = "Trypsin/P" + file2enzyme[raw] = enzyme + file2fraction[raw] = str(row['comment[fraction identifier]']) + label = re.search("NT=(.+?)(;|$)", row['comment[label]']).group(1) + file2label[raw] = label + + ## extract factors + all_factors = list(row[factor_cols]) + combined_factors = "|".join(all_factors) + if combined_factors == "": + print("No factors specified. Adding dummy factor used as condition.") + combined_factors = "none" + + file2combined_factors[raw] = combined_factors + # print(combined_factors) + + ##################### only label-free supported right now + + # output of search settings + f = open("openms.tsv", "w+") + open_ms_search_settings_header = ["Filename", "FixedModifications", "VariableModifications", "Label", + "PrecursorMassTolerance", "PrecursorMassToleranceUnit", "FragmentMassTolerance", + "FragmentMassToleranceUnit", "DissociationMethod", "Enzyme"] + f.write("\t".join(open_ms_search_settings_header) + "\n") + for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + f.write(raw + "\t" + file2mods[raw][0] + "\t" + file2mods[raw][1] + "\t" + file2label[raw] + "\t" + file2pctol[ + raw] + "\t" + file2pctolunit[raw] + "\t" + file2fragtol[raw] + "\t" + file2fragtolunit[raw] + "\t" + file2diss[ + raw] + "\t" + file2enzyme[raw] + "\n") + f.close() + + # output of experimental design + f = open("experimental_design.tsv", "w+") + open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", + "MSstats_Condition", "MSstats_BioReplicate"] + f.write("\t".join(open_ms_experimental_design_header) + "\n") + for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + fraction_group = str(1 + source_name_list.index(row["source name"])) # extract fraction group + sample = fraction_group + if 'none' in file2combined_factors[raw]: + # no factor defined use sample as condition + condition = sample + else: + condition = file2combined_factors[raw] + replicate = file2technical_rep[raw] + label = file2label[raw] + if "label free sample" in label: + label = "1" + f.write(fraction_group + "\t" + file2fraction[ + raw] + "\t" + raw + "\t" + label + "\t" + sample + "\t" + condition + "\t" + replicate + "\n") + f.close() + + +@click.command('convert-openms', short_help='convert sdrf to openms file output') +@click.option('--sdrf', '-s', help='SDRF file') +@click.pass_context +def openms_from_sdrf(ctx, sdrf: str): + if sdrf is None: + help() + openms_convert(sdrf) + + +cli.add_command(openms_from_sdrf) + +if __name__ == "__main__": + cli() From 3b24b7d5092897c66e2667793c44545acafeca0c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 2 Apr 2020 18:38:16 +0200 Subject: [PATCH 136/374] update parse script. swith to 'path' for the first inputs to allow for URIs. add some more params --- bin/parse_sdrf.py | 8 ++++---- main.nf | 39 ++++++++++++++++++++++----------------- 2 files changed, 26 insertions(+), 21 deletions(-) diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py index 619bcc2..18fced1 100755 --- a/bin/parse_sdrf.py +++ b/bin/parse_sdrf.py @@ -12,8 +12,7 @@ @click.group(context_settings=CONTEXT_SETTINGS) def cli(): - """This is the main tool that give access to all commands and options provided by the mslookup and dleamse algorithms""" - + """Tool to convert sdrf files into OpenMS config files""" def openms_ify_mods(sdrf_mods): oms_mods = list() @@ -178,13 +177,14 @@ def openms_convert(sdrf_file: str = None): # output of search settings f = open("openms.tsv", "w+") - open_ms_search_settings_header = ["Filename", "FixedModifications", "VariableModifications", "Label", + open_ms_search_settings_header = ["URI", "Filename", "FixedModifications", "VariableModifications", "Label", "PrecursorMassTolerance", "PrecursorMassToleranceUnit", "FragmentMassTolerance", "FragmentMassToleranceUnit", "DissociationMethod", "Enzyme"] f.write("\t".join(open_ms_search_settings_header) + "\n") for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + URI = row["comment[file uri]"] raw = row["comment[data file]"] - f.write(raw + "\t" + file2mods[raw][0] + "\t" + file2mods[raw][1] + "\t" + file2label[raw] + "\t" + file2pctol[ + f.write(URI + "\t" + raw + "\t" + file2mods[raw][0] + "\t" + file2mods[raw][1] + "\t" + file2label[raw] + "\t" + file2pctol[ raw] + "\t" + file2pctolunit[raw] + "\t" + file2fragtol[raw] + "\t" + file2fragtolunit[raw] + "\t" + file2diss[ raw] + "\t" + file2enzyme[raw] + "\n") f.close() diff --git a/main.nf b/main.nf index 65b9c46..e10c401 100644 --- a/main.nf +++ b/main.nf @@ -42,7 +42,10 @@ def helpMessage() { --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) - --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) + --precursor_mass_tolerance Mass tolerance of precursor mass + --precursor_mass_tolerance_unit Da or ppm + --fragment_mass_tolerance Mass tolerance for fragment masses + --fragment_mass_tolerance_unit Da or ppm --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff --min_precursor_charge Minimum precursor ion charge @@ -185,9 +188,10 @@ if (!params.sdrf) params.variable_mods, "", //labelling modifications currently not supported params.precursor_mass_tolerance, - params.precursor_error_units, + params.precursor_mass_tolerance_unit, params.fragment_mass_tolerance, - params.dissociation_method, + params.fragment_mass_tolerance_unit, + params.fragment_method, params.enzyme) idx_settings: tuple(id, params.enzyme) @@ -216,25 +220,26 @@ else script: """ - parse_sdrf.py ${sdrf} > sdrf_parsing.log + parse_sdrf.py convert-openms -s ${sdrf} > sdrf_parsing.log """ } //TODO use header and ref by col name ch_sdrf_config_file - .splitCsv(skip: 1) + .splitCsv(skip: 1, sep: '\t').view() .multiMap{ row -> id = UUID.randomUUID().toString() comet_settings: msgf_settings: tuple(id, - row[1], row[2], row[3], row[4], row[5], row[6], row[7], - row[8]) + row[8], + row[9], + row[10]) idx_settings: tuple(id, - row[8]) + row[10]) mzmls: tuple(id,row[0])} .set{ch_sdrf_config} } @@ -250,7 +255,7 @@ if (params.expdesign) .set { ch_expdesign } } -ch_sdrf_config.mzmls +ch_sdrf_config.mzmls.view() .branch { raw: hasExtension(it[1], 'raw') mzML: hasExtension(it[1], 'mzML') @@ -302,7 +307,7 @@ process raw_file_conversion { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, file(rawfile) from branched_input.raw + tuple mzml_id, path(rawfile) from branched_input.raw.view() output: set mzml_id, file("*.mzML") into mzmls_converted @@ -325,7 +330,7 @@ process mzml_indexing { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, file(mzmlfile) from branched_input_mzMLs.nonIndexedMzML + set mzml_id, path(mzmlfile) from branched_input_mzMLs.nonIndexedMzML output: set mzml_id, file("out/*.mzML") into mzmls_indexed @@ -415,7 +420,7 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), mzml_id, file(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -434,8 +439,8 @@ process search_engine_msgf { -threads ${task.cpus} \\ -database ${database} \\ -matches_per_spec ${params.num_hits} \\ - -fixed_modifications "${fixed}" \\ - -variable_modifications "${variable}" \\ + -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ > ${mzml_file.baseName}_msgf.log """ } @@ -450,7 +455,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, file(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) @@ -470,8 +475,8 @@ process search_engine_comet { -threads ${task.cpus} \\ -database ${database} \\ -num_hits ${params.num_hits} \\ - -fixed_modifications "${fixed}" \\ - -variable_modifications "${variable}" \\ + -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ > ${mzml_file.baseName}_comet.log """ } From 7363a9f0132fad2ccac8d969a28e312dd0607308 Mon Sep 17 00:00:00 2001 From: Leon Bichmann Date: Fri, 3 Apr 2020 09:40:50 +0200 Subject: [PATCH 137/374] initial branch for optional phosphoproteomics adaption --- main.nf | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/main.nf b/main.nf index 65b9c46..a215380 100644 --- a/main.nf +++ b/main.nf @@ -42,6 +42,7 @@ def helpMessage() { --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) + --phospho_rescoring Phosphosite rescoring using the luciphor algorithm --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff @@ -434,8 +435,9 @@ process search_engine_msgf { -threads ${task.cpus} \\ -database ${database} \\ -matches_per_spec ${params.num_hits} \\ - -fixed_modifications "${fixed}" \\ - -variable_modifications "${variable}" \\ + -variable_modifications ${params.variable_mods.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -fixed_modifications ${params.fixed_mods.tokenize(',').collect { "'${it}'"}.join(" ")} \\ + -enzyme ${params.enzyme} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -470,8 +472,9 @@ process search_engine_comet { -threads ${task.cpus} \\ -database ${database} \\ -num_hits ${params.num_hits} \\ - -fixed_modifications "${fixed}" \\ - -variable_modifications "${variable}" \\ + -variable_modifications ${params.variable_mods.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -fixed_modifications ${params.fixed_mods.tokenize(',').collect { "'${it}'"}.join(" ")} \\ + -enzyme ${params.enzyme} \\ > ${mzml_file.baseName}_comet.log """ } @@ -497,7 +500,7 @@ process index_peptides { -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ -fasta ${database} \\ - -enzyme:name "${enzyme}" \\ + -enzyme:name ${enzyme} \\ > ${id_file.baseName}_index_peptides.log """ } @@ -621,8 +624,6 @@ process idscoreswitcher { """ } - - // --------------------------------------------------------------------- // Branch b) Q-values and PEP from OpenMS @@ -780,6 +781,30 @@ process idscoreswitcher3 { """ } +process luciphor { + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + + input: + file id_file from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) + file mzml_file_l from mzml_files_luciphor + + output: + file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_fdr_filter_switched_luciphor + file "*.log" + + when: + params.phospho_rescoring + + script: + """ + LuciphorAdapter -id ${id_file} \\ + -in ${mzml_file_l} \\ + -out ${id_file.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + > ${id_file.baseName}_scoreswitcher.log + """ +} // --------------------------------------------------------------------- // Main Branch @@ -791,8 +816,7 @@ process proteomicslfq { input: file mzmls from mzmls_plfq.map{it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) - file id_files from id_files_idx_feat_perc_fdr_filter_switched - .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) + file id_files_idx_feat_fdr_filter_switched_luciphor .toSortedList({ a, b -> b.baseName <=> a.baseName }) file expdes from ch_expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) From c91848aeb512c90deaad69542881e707c13647e1 Mon Sep 17 00:00:00 2001 From: Leon Bichmann Date: Fri, 3 Apr 2020 10:10:04 +0200 Subject: [PATCH 138/374] add luciphor jar to PATH --- Dockerfile | 1 + 1 file changed, 1 insertion(+) diff --git a/Dockerfile b/Dockerfile index 228e563..5727a98 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,6 +11,7 @@ ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH # OpenMS Adapters need the raw jars of Java based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) +RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/LuciPHOr2/Luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From e45521ee464a2b10d1be9b2cda87026e4f96f3b1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 5 Apr 2020 18:02:54 +0200 Subject: [PATCH 139/374] update sdrf parser with fixes --- bin/parse_sdrf.py | 227 ++++++++++++++++++++++++++++++++++++++-------- main.nf | 56 +++++++++--- nextflow.config | 5 +- 3 files changed, 232 insertions(+), 56 deletions(-) diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py index 18fced1..65f264f 100755 --- a/bin/parse_sdrf.py +++ b/bin/parse_sdrf.py @@ -9,6 +9,7 @@ CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help']) +warnings = dict() @click.group(context_settings=CONTEXT_SETTINGS) def cli(): @@ -18,8 +19,8 @@ def openms_ify_mods(sdrf_mods): oms_mods = list() for m in sdrf_mods: - if "AC=UNIMOD" not in m: - raise Exception("only UNIMOD modifications supported.") + if "AC=UNIMOD" not in m and "AC=Unimod" not in m: + raise Exception("only UNIMOD modifications supported. " + m) name = re.search("NT=(.+?)(;|$)", m).group(1) name = name.capitalize() @@ -32,13 +33,16 @@ def openms_ify_mods(sdrf_mods): 1) # one of [Anywhere, Protein N-term, Protein C-term, Any N-term, Any C-term if re.search("TA=(.+?)(;|$)", m) is None: # TODO: missing in sdrf. - print("Warning no TA= specified. Setting to N-term or C-term if possible: " + m) + warning_message = "Warning no TA= specified. Setting to N-term or C-term if possible." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 if "C-term" in pp: ta = "C-term" elif "N-term" in pp: ta = "N-term" else: - print("Reassignment not possible. skipping") + warning_message = "Reassignment not possible. Skipping." + #print(warning_message + " "+ m) + warnings[warning_message] = warnings.get(warning_message, 0) + 1 pass else: ta = re.search("TA=(.+?)(;|$)", m).group(1) # target amino-acid @@ -64,17 +68,32 @@ def openms_ify_mods(sdrf_mods): return ",".join(oms_mods) -def openms_convert(sdrf_file: str = None): +def openms_convert(sdrf_file: str = None, keep_raw: bool = False, onetable : bool = False, legacy : bool = False, verbose: bool = False): + print('PROCESSING: ' + sdrf_file + '"') sdrf = pd.read_table(sdrf_file) + sdrf = sdrf.astype(str) sdrf.columns = map(str.lower, sdrf.columns) # convert column names to lower-case - + # map filename to tuple of [fixed, variable] mods mod_cols = [c for ind, c in enumerate(sdrf) if - c.startswith('comment[modification parameters')] # columns with modification parameters + c.startswith('comment[modification parameters')] # columns with modification parameters # get factor columns (except constant ones) factor_cols = [c for ind, c in enumerate(sdrf) if c.startswith('factor value[') and len(sdrf[c].unique()) > 1] + # get characteristics columns (except constant ones) + characteristics_cols = [c for ind, c in enumerate(sdrf) if c.startswith('characteristics[') and len(sdrf[c].unique()) > 1] + + # remove characteristics columns already present as factor + redundant_characteristics_cols = set() + for c in characteristics_cols: + c_col = sdrf[c] # select characteristics column + for f in factor_cols: # Iterate over all factor columns + f_col = sdrf[f] # select factor column + if c_col.equals(f_col): + redundant_characteristics_cols.add(c) + characteristics_cols = [x for x in characteristics_cols if x not in redundant_characteristics_cols] + file2mods = dict() file2pctol = dict() file2pctolunit = dict() @@ -86,15 +105,19 @@ def openms_convert(sdrf_file: str = None): file2label = dict() file2source = dict() source_name_list = list() + source_name2n_reps =dict() file2combined_factors = dict() file2technical_rep = dict() for index, row in sdrf.iterrows(): ## extract mods all_mods = list(row[mod_cols]) + #print(all_mods) var_mods = [m for m in all_mods if 'MT=variable' in m or 'MT=Variable' in m] # workaround for capitalization var_mods.sort() fixed_mods = [m for m in all_mods if 'MT=fixed' in m or 'MT=Fixed' in m] # workaround for capitalization fixed_mods.sort() + if verbose: + print(row) raw = row['comment[data file]'] fixed_mods_string = "" if fixed_mods is not None: @@ -117,12 +140,14 @@ def openms_convert(sdrf_file: str = None): pc_tmp = pc_tol_str.split(" ") file2pctol[raw] = pc_tmp[0] file2pctolunit[raw] = pc_tmp[1] - else: - print("Invalid precursor mass tolerance set. Assuming 10 ppm.") + else: + warning_message = "Invalid precursor mass tolerance set. Assuming 10 ppm." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 file2pctol[raw] = "10" file2pctolunit[raw] = "ppm" else: - print("No precursor mass tolerance set. Assuming 10 ppm.") + warning_message = "No precursor mass tolerance set. Assuming 10 ppm." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 file2pctol[raw] = "10" file2pctolunit[raw] = "ppm" @@ -134,32 +159,54 @@ def openms_convert(sdrf_file: str = None): file2fragtol[raw] = f_tmp[0] file2fragtolunit[raw] = f_tmp[1] else: - print("Invalid fragment mass tolerance set. Assuming 10 ppm.") - file2fragtol[raw] = "10" + warning_message = "Invalid fragment mass tolerance set. Assuming 20 ppm." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 + file2fragtol[raw] = "20" file2fragtolunit[raw] = "ppm" else: - print("No fragment mass tolerance set. Assuming 10 ppm.") + warning_message = "No fragment mass tolerance set. Assuming 20 ppm." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 file2fragtol[raw] = "20" file2fragtolunit[raw] = "ppm" if 'comment[dissociation method]' in row: - diss_method = row['comment[dissociation method]'] - file2diss[raw] = diss_method.toUpper() + diss_method = re.search("NT=(.+?)(;|$)", row['comment[dissociation method]']).group(1) + file2diss[raw] = diss_method.upper() else: - print("No dissociation method provided. Assuming HCD.") + warning_message = "No dissociation method provided. Assuming HCD." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 file2diss[raw] = 'HCD' if 'comment[technical replicate]' in row: - file2technical_rep[raw] = str(row['comment[technical replicate]']) + technical_replicate = str(row['comment[technical replicate]']) + if "not available" in technical_replicate: + file2technical_rep[raw] = "1" + else: + file2technical_rep[raw] = technical_replicate else: file2technical_rep[raw] = "1" + # store highest replicate number for this source name + if source_name in source_name2n_reps: + source_name2n_reps[source_name] = max(int(source_name2n_reps[source_name]), int(file2technical_rep[raw])) + else: + source_name2n_reps[source_name] = int(file2technical_rep[raw]) + enzyme = re.search("NT=(.+?)(;|$)", row['comment[cleavage agent details]']).group(1) enzyme = enzyme.capitalize() if "Trypsin/p" in enzyme: # workaround enzyme = "Trypsin/P" file2enzyme[raw] = enzyme - file2fraction[raw] = str(row['comment[fraction identifier]']) + + if 'comment[fraction identifier]' in row: + fraction = str(row['comment[fraction identifier]']) + if "not available" in fraction: + file2fraction[raw] = "1" + else: + file2fraction[raw] = fraction + else: + file2fraction[raw] = "1" + label = re.search("NT=(.+?)(;|$)", row['comment[label]']).group(1) file2label[raw] = label @@ -167,8 +214,16 @@ def openms_convert(sdrf_file: str = None): all_factors = list(row[factor_cols]) combined_factors = "|".join(all_factors) if combined_factors == "": - print("No factors specified. Adding dummy factor used as condition.") - combined_factors = "none" + # fallback to characteristics (use them as factors) + all_factors = list(row[characteristics_cols]) + combined_factors = "|".join(all_factors) + if combined_factors == "": + warning_message = "No factors specified. Adding dummy factor used as condition." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 + combined_factors = "none" + else: + warning_message = "No factors specified. Adding non-redundant characteristics as factor. Will be used as condition." + warnings[warning_message] = warnings.get(warning_message, 0) + 1 file2combined_factors[raw] = combined_factors # print(combined_factors) @@ -191,34 +246,126 @@ def openms_convert(sdrf_file: str = None): # output of experimental design f = open("experimental_design.tsv", "w+") - open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", - "MSstats_Condition", "MSstats_BioReplicate"] - f.write("\t".join(open_ms_experimental_design_header) + "\n") - for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - fraction_group = str(1 + source_name_list.index(row["source name"])) # extract fraction group - sample = fraction_group - if 'none' in file2combined_factors[raw]: - # no factor defined use sample as condition - condition = sample - else: - condition = file2combined_factors[raw] - replicate = file2technical_rep[raw] - label = file2label[raw] - if "label free sample" in label: - label = "1" - f.write(fraction_group + "\t" + file2fraction[ - raw] + "\t" + raw + "\t" + label + "\t" + sample + "\t" + condition + "\t" + replicate + "\n") + raw_ext_regex = re.compile(r"\.raw$", re.IGNORECASE) + + if onetable: + if legacy: + open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", + "MSstats_Condition", "MSstats_BioReplicate"] + else: + open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", + "MSstats_Condition", "MSstats_BioReplicate"] + f.write("\t".join(open_ms_experimental_design_header) + "\n") + + for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + source_name = row["source name"] + replicate = file2technical_rep[raw] + + # calculate fraction group by counting all technical replicates of the preceeding source names + source_name_index = source_name_list.index(source_name) + offset = 0 + for i in range(source_name_index): + offset = offset + int(source_name2n_reps[source_name_list[i]]) + + fraction_group = str(offset + int(replicate)) + sample = fraction_group + + if 'none' in file2combined_factors[raw]: + # no factor defined use sample as condition + condition = sample + else: + condition = file2combined_factors[raw] + label = file2label[raw] + if "label free sample" in label: + label = "1" + + if not keep_raw: + out = raw_ext_regex.sub(".mzML", raw) + else: + out = raw + + if legacy: + f.write(fraction_group + "\t" + file2fraction[ + raw] + "\t" + out + "\t" + label + "\t" + sample + "\t" + condition + "\t" + replicate + "\n") + else: + f.write(fraction_group + "\t" + file2fraction[ + raw] + "\t" + out + "\t" + label + "\t" + condition + "\t" + replicate + "\n") + f.close() + else: # two table format + openms_file_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample"] + f.write("\t".join(openms_file_header) + "\n") + + for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + source_name = row["source name"] + replicate = file2technical_rep[raw] + + # calculate fraction group by counting all technical replicates of the preceeding source names + source_name_index = source_name_list.index(source_name) + offset = 0 + for i in range(source_name_index): + offset = offset + int(source_name2n_reps[source_name_list[i]]) + + fraction_group = str(offset + int(replicate)) + sample = fraction_group # TODO: change this for multiplexed + + label = file2label[raw] + if "label free sample" in label: + label = "1" + + if not keep_raw: + out = raw_ext_regex.sub(".mzML", raw) + else: + out = raw + + f.write(fraction_group + "\t" + file2fraction[raw] + "\t" + out + "\t" + label + "\t" + sample + "\n") + + # sample table + f.write("\n") + openms_sample_header = ["Sample", "MSstats_Condition", "MSstats_BioReplicate"] + f.write("\t".join(openms_sample_header) + "\n") + for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO + raw = row["comment[data file]"] + source_name = row["source name"] + replicate = file2technical_rep[raw] + + # calculate fraction group by counting all technical replicates of the preceeding source names + source_name_index = source_name_list.index(source_name) + offset = 0 + for i in range(source_name_index): + offset = offset + int(source_name2n_reps[source_name_list[i]]) + + fraction_group = str(offset + int(replicate)) + sample = fraction_group # TODO: change this for multiplexed + + if 'none' in file2combined_factors[raw]: + # no factor defined use sample as condition + condition = sample + else: + condition = file2combined_factors[raw] + + f.write(sample + "\t" + condition + "\t" + replicate + "\n") + f.close() + if len(warnings) != 0: + for k,v in warnings.items(): + print('WARNING: "' + k + '" occured ' + str(v) + ' times.') + + print("SUCCESS (WARNINGS=" + str(len(warnings)) + "): " + sdrf_file) @click.command('convert-openms', short_help='convert sdrf to openms file output') @click.option('--sdrf', '-s', help='SDRF file') +@click.option('--raw', '-r', help='Keep filenames in experimental design output as raw.') +@click.option('--legacy/--modern', "-l/-m", default=False, help='legacy=Create artifical sample column not needed in OpenMS 2.6.') +@click.option('--onetable/--twotables', "-t1/-t2", default=False, help='Create one-table or two-tables format.') +@click.option('--verbose/--quiet', "-v/-q", default=False, help='Output debug information.') @click.pass_context -def openms_from_sdrf(ctx, sdrf: str): +def openms_from_sdrf(ctx, sdrf: str, raw: bool, onetable : bool, legacy: bool, verbose: bool): if sdrf is None: help() - openms_convert(sdrf) + openms_convert(sdrf, raw, onetable, legacy, verbose) cli.add_command(openms_from_sdrf) diff --git a/main.nf b/main.nf index e10c401..5f2f562 100644 --- a/main.nf +++ b/main.nf @@ -38,23 +38,23 @@ def helpMessage() { --search_engine Which search engine: "comet" (default) or "msgf" --enzyme Enzymatic cleavage (e.g. 'unspecific cleavage' or 'Trypsin' [default], see OpenMS enzymes) --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: - 'fully' valid: 'semi', 'fully', 'C-term unspecific', 'N-term unspecific') + 'fully' valid: 'semi', 'fully') --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) --precursor_mass_tolerance Mass tolerance of precursor mass --precursor_mass_tolerance_unit Da or ppm - --fragment_mass_tolerance Mass tolerance for fragment masses - --fragment_mass_tolerance_unit Da or ppm - --allowed_missed_cleavages Allowed missed cleavages + --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) + --fragment_mass_tolerance_unit Da or ppm (currently always ppm) + --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff --min_precursor_charge Minimum precursor ion charge --max_precursor_charge Maximum precursor ion charge --min_peptide_length Minimum peptide length to consider --max_peptide_length Maximum peptide length to consider - --instrument Type of instrument that generated the data + --instrument Type of instrument that generated the data (currently only 'high_res' [default] and 'low_res' supported) --protocol Used labeling or enrichment protocol (if any) - --fragment_method Used fragmentation method + --fragment_method Used fragmentation method (currently unused since we let the search engines consider all MS2 spectra and let them determine from the spectrum metadata) --max_mods Maximum number of modifications per peptide. If this value is large, the search may take very long --db_debug Debug level during database search @@ -220,13 +220,15 @@ else script: """ - parse_sdrf.py convert-openms -s ${sdrf} > sdrf_parsing.log + ## -t2 since the one-table format parser is broken in OpenMS2.5 + ## -l for legacy behavior to always add sample columns + parse_sdrf.py convert-openms -t2 -l -s ${sdrf} > sdrf_parsing.log """ } //TODO use header and ref by col name ch_sdrf_config_file - .splitCsv(skip: 1, sep: '\t').view() + .splitCsv(skip: 1, sep: '\t') .multiMap{ row -> id = UUID.randomUUID().toString() comet_settings: msgf_settings: tuple(id, row[2], @@ -255,7 +257,7 @@ if (params.expdesign) .set { ch_expdesign } } -ch_sdrf_config.mzmls.view() +ch_sdrf_config.mzmls .branch { raw: hasExtension(it[1], 'raw') mzML: hasExtension(it[1], 'mzML') @@ -312,10 +314,6 @@ process raw_file_conversion { output: set mzml_id, file("*.mzML") into mzmls_converted - - // TODO check if this sh script is available with bioconda - // else check if the exe is accessible/in PATH on bioconda and use sth like this - // mono ThermoRawfileParser.exe -i=${rawfile} -f=2 -o=./ script: """ ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ > ${rawfile}_conversion.log @@ -397,12 +395,23 @@ process generate_decoy_database { // """ //} +if (params.enzyme == "unspecific cleavage") +{ + params.num_enzyme_termini == "none" +} + +pepidx_num_enzyme_termini = params.num_enzyme_termini +if (params.num_enzyme_termini == "fully") +{ + pepidx_num_enzyme_termini = "full" +} + /// Search engine // TODO parameterize more if (params.search_engine == "msgf") { search_engine_score = "SpecEValue" -} else { +} else { //comet search_engine_score = "expect" } @@ -438,9 +447,18 @@ process search_engine_msgf { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ + -instrument ${params.instrument} \\ -matches_per_spec ${params.num_hits} \\ + -min_precursor_charge ${params.min_precursor_charge} \\ + -max_precursor_charge ${params.max_precursor_charge} \\ + -min_peptide_length ${params.min_peptide_length} \\ + -max_peptide_length ${params.max_peptide_length} \\ + -tryptic ${params.num_enzyme_termini} \\ + -precursor_mass_tolerance ${prec_tol} \\ + -precursor_error_units ${prec_tol_unit} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -max_mods ${params.max_mods} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -468,15 +486,24 @@ process search_engine_comet { set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_comet file "*.log" + //TODO we currently ignore the activation_method param to leave the default "ALL" for max. compatibility script: """ CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database ${database} \\ + -instrument ${params.instrument} \\ + -allowed_missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ + -num_enzyme_termini ${params.num_enzyme_termini} \\ + -precursor_charge ${params.min_precursor_charge}:${params.max_precursor_charge} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -max_variable_mods_in_peptide ${params.max_mods} \\ + -precursor_mass_tolerance ${prec_tol} \\ + -precursor_error_units ${prec_tol_unit} \\ + -fragment_bin_tolerance ${frag_tol} \\ > ${mzml_file.baseName}_comet.log """ } @@ -503,6 +530,7 @@ process index_peptides { -threads ${task.cpus} \\ -fasta ${database} \\ -enzyme:name "${enzyme}" \\ + -enzyme:specificity ${pepidx_num_enzyme_termini} > ${id_file.baseName}_index_peptides.log """ } diff --git a/nextflow.config b/nextflow.config index 3e8c25b..cf92db5 100644 --- a/nextflow.config +++ b/nextflow.config @@ -29,12 +29,13 @@ params { // shared search engine parameters enzyme = 'Trypsin' + num_enzyme_termini = 'fully' precursor_mass_tolerance = 5 precursor_error_units = "ppm" fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 - dissociation_method = "HCD" + dissociation_method = "HCD" //currently unused. hard to find a good logic to beat the defaults // Percolator flags train_FDR = 0.05 @@ -56,7 +57,7 @@ params { min_peptide_length = 6 max_peptide_length = 40 matches_per_spec = 1 - max_mods = 2 + max_mods = 3 // Comet flags allowed_missed_cleavages = 1 From cbca563bbdf0a96382672a36c11e41917f64b64f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 5 Apr 2020 18:11:05 +0200 Subject: [PATCH 140/374] forgot new renamed unit params --- nextflow.config | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index cf92db5..d128afb 100644 --- a/nextflow.config +++ b/nextflow.config @@ -31,10 +31,11 @@ params { enzyme = 'Trypsin' num_enzyme_termini = 'fully' precursor_mass_tolerance = 5 - precursor_error_units = "ppm" + precursor_mass_tolerance_unit = "ppm" fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 + fragment_mass_tolerance_unit = "ppm" dissociation_method = "HCD" //currently unused. hard to find a good logic to beat the defaults // Percolator flags From a6040aa27892f625eebb1c6dd43b2c9f8d439dd7 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 7 Apr 2020 13:29:08 +0200 Subject: [PATCH 141/374] [FIX] forgot enzyme --- main.nf | 2 ++ 1 file changed, 2 insertions(+) diff --git a/main.nf b/main.nf index 5f2f562..8b50422 100644 --- a/main.nf +++ b/main.nf @@ -453,6 +453,7 @@ process search_engine_msgf { -max_precursor_charge ${params.max_precursor_charge} \\ -min_peptide_length ${params.min_peptide_length} \\ -max_peptide_length ${params.max_peptide_length} \\ + -enzyme ${enzyme} \\ -tryptic ${params.num_enzyme_termini} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ @@ -497,6 +498,7 @@ process search_engine_comet { -allowed_missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ -num_enzyme_termini ${params.num_enzyme_termini} \\ + -enzyme ${enzyme} \\ -precursor_charge ${params.min_precursor_charge}:${params.max_precursor_charge} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ From 67a47d4e820b9ab27d5e510f3903ee80e23e65a3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 7 Apr 2020 16:57:06 +0200 Subject: [PATCH 142/374] Translate enzymes to MSGF --- main.nf | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 8b50422..67b1999 100644 --- a/main.nf +++ b/main.nf @@ -407,7 +407,6 @@ if (params.num_enzyme_termini == "fully") } /// Search engine -// TODO parameterize more if (params.search_engine == "msgf") { search_engine_score = "SpecEValue" @@ -415,9 +414,6 @@ if (params.search_engine == "msgf") search_engine_score = "expect" } - - //Filename FixedModifications VariableModifications Label PrecursorMassTolerance PrecursorMassToleranceUnit FragmentMassTolerance DissociationMethod Enzyme - process search_engine_msgf { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -442,6 +438,12 @@ process search_engine_msgf { file "*.log" script: + if (enzyme == 'Trypsin') enzyme = 'Trypsin/P' + else if (enzyme == 'Arg-C') enzyme = 'Arg-C/P' + else if (enzyme == 'Asp-N') enzyme = 'Asp-N/B' + else if (enzyme == 'Chymotrypsin') enzyme = 'Chymotrypsin/P' + else if (enzyme == 'Lys-C') enzyme = 'Lys-C/P' + """ MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ From aa2ddc5fbad932fd8cef5bde83e38cfeef4a7808 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 7 Apr 2020 17:15:24 +0200 Subject: [PATCH 143/374] added some quotes just to be safe --- main.nf | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 67b1999..4e3d7e5 100644 --- a/main.nf +++ b/main.nf @@ -443,19 +443,19 @@ process search_engine_msgf { else if (enzyme == 'Asp-N') enzyme = 'Asp-N/B' else if (enzyme == 'Chymotrypsin') enzyme = 'Chymotrypsin/P' else if (enzyme == 'Lys-C') enzyme = 'Lys-C/P' - + """ MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ - -database ${database} \\ + -database "${database}" \\ -instrument ${params.instrument} \\ -matches_per_spec ${params.num_hits} \\ -min_precursor_charge ${params.min_precursor_charge} \\ -max_precursor_charge ${params.max_precursor_charge} \\ -min_peptide_length ${params.min_peptide_length} \\ -max_peptide_length ${params.max_peptide_length} \\ - -enzyme ${enzyme} \\ + -enzyme "${enzyme}" \\ -tryptic ${params.num_enzyme_termini} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ @@ -495,12 +495,12 @@ process search_engine_comet { CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ - -database ${database} \\ + -database "${database}" \\ -instrument ${params.instrument} \\ -allowed_missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ -num_enzyme_termini ${params.num_enzyme_termini} \\ - -enzyme ${enzyme} \\ + -enzyme "${enzyme}" \\ -precursor_charge ${params.min_precursor_charge}:${params.max_precursor_charge} \\ -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ From 9cfcec9ee39c031b203c3bfd10a9f32eafe7cfd1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 8 Apr 2020 21:43:49 +0200 Subject: [PATCH 144/374] [FEATURE] SDRF with local folder --- main.nf | 5 +++-- nextflow.config | 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 5f2f562..747008e 100644 --- a/main.nf +++ b/main.nf @@ -22,6 +22,7 @@ def helpMessage() { Main arguments: Either: --sdrf Path to PRIDE Sample to data relation format file + --root_folder (Optional) If given, looks for the filenames in the SDRF in this folder, locally Or: --spectra Path to input spectra as mzML or Thermo Raw --expdesign Path to optional experimental design file (if not given, it assumes unfractionated, unrelated samples) @@ -226,7 +227,7 @@ else """ } - //TODO use header and ref by col name + //TODO use header and reference by col name instead of index ch_sdrf_config_file .splitCsv(skip: 1, sep: '\t') .multiMap{ row -> id = UUID.randomUUID().toString() @@ -242,7 +243,7 @@ else row[10]) idx_settings: tuple(id, row[10]) - mzmls: tuple(id,row[0])} + mzmls: tuple(id, params.root_folder.empty() ? row[0] : params.root_folder + "/" + row[11])} .set{ch_sdrf_config} } diff --git a/nextflow.config b/nextflow.config index d128afb..fdf6aac 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,6 +10,7 @@ params { // Workflow flags sdrf = "" + root_folder = "" spectra = "" database = "" expdesign = "" From ede6ab2b81402f26eaae345c29ce09de7c420500 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 8 Apr 2020 22:00:20 +0200 Subject: [PATCH 145/374] little fix and remove view statements --- main.nf | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 747008e..5405bc3 100644 --- a/main.nf +++ b/main.nf @@ -243,7 +243,7 @@ else row[10]) idx_settings: tuple(id, row[10]) - mzmls: tuple(id, params.root_folder.empty() ? row[0] : params.root_folder + "/" + row[11])} + mzmls: tuple(id, params.root_folder.length() == 0 ? row[0] : (params.root_folder + "/" + row[1]))} .set{ch_sdrf_config} } @@ -310,7 +310,7 @@ process raw_file_conversion { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, path(rawfile) from branched_input.raw.view() + tuple mzml_id, path(rawfile) from branched_input.raw output: set mzml_id, file("*.mzML") into mzmls_converted @@ -430,7 +430,7 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)).view() + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)) // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -474,7 +474,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)).view() + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)) //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) From ea08e5b64dace93237b5dd897dcf0995d4653849 Mon Sep 17 00:00:00 2001 From: yperez Date: Fri, 10 Apr 2020 14:44:01 +0100 Subject: [PATCH 146/374] Change the process name and add lsf support --- main.nf | 86 ++++++++++++++++++++++++------------------------- nextflow.config | 7 ++-- 2 files changed, 48 insertions(+), 45 deletions(-) diff --git a/main.nf b/main.nf index b581857..b901a7a 100644 --- a/main.nf +++ b/main.nf @@ -29,16 +29,16 @@ def helpMessage() { And: --database Path to input protein database as fasta - + Decoy database: --add_decoys Add decoys to the given fasta --decoy_affix The decoy prefix or suffix used or to be used (default: DECOY_) - --affix_type Prefix (default) or suffix (WARNING: Percolator only supports prefices) + --affix_type Prefix (default) or suffix (WARNING: Percolator only supports prefices) Database Search: --search_engine Which search engine: "comet" (default) or "msgf" --enzyme Enzymatic cleavage (e.g. 'unspecific cleavage' or 'Trypsin' [default], see OpenMS enzymes) - --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: + --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: 'fully' valid: 'semi', 'fully') --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) @@ -50,7 +50,7 @@ def helpMessage() { --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff --min_precursor_charge Minimum precursor ion charge - --max_precursor_charge Maximum precursor ion charge + --max_precursor_charge Maximum precursor ion charge --min_peptide_length Minimum peptide length to consider --max_peptide_length Maximum peptide length to consider --instrument Type of instrument that generated the data (currently only 'high_res' [default] and 'low_res' supported) @@ -64,19 +64,19 @@ def helpMessage() { PSM Rescoring: --posterior_probabilities How to calculate posterior probabilities for PSMs: "percolator" = Re-score based on PSM-feature-based SVM and transform distance - to hyperplane for posteriors + to hyperplane for posteriors "fit_distributions" = Fit positive and negative distributions to scores (similar to PeptideProphet) --rescoring_debug Debug level during PSM rescoring --psm_pep_fdr_cutoff FDR cutoff on PSM level (or potential peptide level; see Percolator options) before going into - feature finding, map alignment and inference. + feature finding, map alignment and inference. Percolator specific: --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0 --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result --percolator_fdr_level Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') --post-processing-tdc Use target-decoy competition to assign q-values and PEPs. - --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) + --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) --generic-feature-set Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly. --subset-max-train Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other @@ -90,7 +90,7 @@ def helpMessage() { - ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) - none: do nothing --top_hits_only Use only the top hits for fitting - + //TODO add more options for rescoring part Inference and Quantification: @@ -110,7 +110,7 @@ def helpMessage() { --mass_recalibration Recalibrates masses to correct for instrument biases. (default: false) TODO must specify true or false - //TODO the following need to be passed still + //TODO the following need to be passed still --psm_pep_fdr_for_quant PSM/peptide level FDR used for quantification (if filtering on protein level is not enough) If Bayesian inference was chosen, this will be a peptide-level FDR and only the best PSMs per peptide will be reported. @@ -215,10 +215,10 @@ else output: file "experimental_design.tsv" into ch_expdesign file "openms.tsv" into ch_sdrf_config_file - + when: params.sdrf - + script: """ ## -t2 since the one-table format parser is broken in OpenMS2.5 @@ -290,7 +290,7 @@ branched_input.mzML //This piece only runs on data that is a.) raw and b.) needs conversion -//mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) +//mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) //GENERAL TODOS @@ -314,7 +314,7 @@ process raw_file_conversion { output: set mzml_id, file("*.mzML") into mzmls_converted - + script: """ ThermoRawFileParser.sh -i=${rawfile} -f=2 -o=./ > ${rawfile}_conversion.log @@ -334,7 +334,7 @@ process mzml_indexing { output: set mzml_id, file("out/*.mzML") into mzmls_indexed file "*.log" - + script: """ mkdir out @@ -350,7 +350,7 @@ branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).in //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] - : [ Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) + : [ Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { @@ -387,7 +387,7 @@ process generate_decoy_database { // output: // file "expdesign.tsv" into expdesign // when: -// !params.expdesign +// !params.expdesign // script: // strng = new File(mymzmls[0].toString()).getParentFile() @@ -427,7 +427,7 @@ process search_engine_msgf { input: tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)) - + // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) //each file(mzml_file) from mzmls @@ -520,10 +520,10 @@ process index_peptides { input: tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) - + //each mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet) //file database from pepidx_in_db.mix(pepidx_in_db_decoy) - + output: set mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP file "*.log" @@ -545,10 +545,10 @@ process index_peptides { // Branch a) Q-values and PEP from Percolator -process extract_perc_features { +process extract_percolator_features { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: set mzml_id, file(id_file) from id_files_idx_ForPerc @@ -564,7 +564,7 @@ process extract_perc_features { PSMFeatureExtractor -in ${id_file} \\ -out ${id_file.baseName}_feat.idXML \\ -threads ${task.cpus} \\ - > ${id_file.baseName}_extract_perc_features.log + > ${id_file.baseName}_extract_percolator_features.log """ } @@ -574,7 +574,7 @@ process extract_perc_features { process percolator { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: set mzml_id, file(id_file) from id_files_idx_feat @@ -635,7 +635,7 @@ process idfilter { process idscoreswitcher { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: file id_file from id_files_idx_feat_perc_filter @@ -665,10 +665,10 @@ process idscoreswitcher { // Branch b) Q-values and PEP from OpenMS // Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_idto the channels -process fdr { +process fdr_idpep { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: set mzml_id, file(id_file) from id_files_idx_ForIDPEP @@ -691,10 +691,10 @@ process fdr { """ } -process idscoreswitcher1 { +process idscoreswitcher_idpep_pre { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: file id_file from id_files_idx_ForIDPEP_fdr @@ -721,7 +721,7 @@ process idscoreswitcher1 { process idpep { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: file id_file from id_files_idx_ForIDPEP_fdr_switch @@ -741,10 +741,10 @@ process idpep { """ } -process idscoreswitcher2 { +process idscoreswitcher_idpep_post { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep @@ -767,7 +767,7 @@ process idscoreswitcher2 { """ } -process idfilter1 { +process idfilter_idpep { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' @@ -781,7 +781,7 @@ process idfilter1 { when: params.posterior_probabilities != "percolator" - + script: """ IDFilter -in ${id_file} \\ @@ -792,10 +792,10 @@ process idfilter1 { """ } -process idscoreswitcher3 { +process idscoreswitcher_idpep_postfilter { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - + input: file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter @@ -824,9 +824,9 @@ process idscoreswitcher3 { process proteomicslfq { - publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' - + input: file mzmls from mzmls_plfq.map{it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) file id_files from id_files_idx_feat_perc_fdr_filter_switched @@ -875,13 +875,13 @@ process msstats { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/msstats", mode: 'copy' - + when: !params.skip_post_msstats input: file csv from out_msstats - + output: file "*.pdf" file "*.csv" @@ -899,10 +899,10 @@ process ptxqc { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ptxqc", mode: 'copy' - + input: file mzTab from out_mzTab - + output: file "*.html" file "*.yaml" @@ -997,13 +997,13 @@ process get_software_versions { echo \$(PeptideIndexer 2>&1) > v_peptideindexer.txt || true echo \$(PSMFeatureExtractor 2>&1) > v_psmfeatureextractor.txt || true echo \$(PercolatorAdapter 2>&1) > v_percolatoradapter.txt || true - percolator -h &> v_percolator.txt + percolator -h &> v_percolator.txt echo \$(IDFilter 2>&1) > v_idfilter.txt || true echo \$(IDScoreSwitcher 2>&1) > v_idscoreswitcher.txt || true echo \$(FalseDiscoveryRate 2>&1) > v_falsediscoveryrate.txt || true echo \$(IDPosteriorErrorProbability 2>&1) > v_idposteriorerrorprobability.txt || true echo \$(ProteomicsLFQ 2>&1) > v_proteomicslfq.txt || true - echo $workflow.manifest.version &> v_msstats_plfq.txt + echo $workflow.manifest.version &> v_msstats_plfq.txt scrape_software_versions.py &> software_versions_mqc.yaml """ } diff --git a/nextflow.config b/nextflow.config index fdf6aac..fa6cc87 100644 --- a/nextflow.config +++ b/nextflow.config @@ -52,11 +52,11 @@ params { isotope_error_range = "0,1" fragment_method = "from_spectrum" instrument = "high_res" - protocol = "automatic" + protocol = "automatic" tryptic = "non" min_precursor_charge = 2 max_precursor_charge = 3 - min_peptide_length = 6 + min_peptide_length = 6 max_peptide_length = 40 matches_per_spec = 1 max_mods = 3 @@ -136,6 +136,9 @@ profiles { singularity.enabled = true singularity.autoMounts = true } + lsf { + process.executor = 'lsf' + } test { includeConfig 'conf/test.config' } test_full { includeConfig 'conf/test_full.config' } } From d7a5732670de8fb772231d735875b9e238f1973f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 11 Apr 2020 20:40:00 +0200 Subject: [PATCH 147/374] Extend documentation --- docs/usage.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 63a4cf7..efbd460 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -8,9 +8,14 @@ * [Updating the pipeline](#updating-the-pipeline) * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) + Either: + * [`--sdrf`](#--sdrf) + * [`--root_folder`](#--root_folder) + Or: * [`--spectra`](#--spectra) - * [`--database`](#--database) * [`--exp_design`](#--exp_design) + And: + * [`--database`](#--database) * [`-profile`](#-profile) * [Decoy database generation](#decoy-database-generation) * [`--add_decoys`](#--add_decoys) From 590a0c9e798a06d2ef310eb7e00df322ece28c76 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 11 Apr 2020 20:41:50 +0200 Subject: [PATCH 148/374] docs --- docs/usage.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/usage.md b/docs/usage.md index efbd460..b71a5d3 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -8,12 +8,15 @@ * [Updating the pipeline](#updating-the-pipeline) * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) + Either: * [`--sdrf`](#--sdrf) * [`--root_folder`](#--root_folder) + Or: * [`--spectra`](#--spectra) * [`--exp_design`](#--exp_design) + And: * [`--database`](#--database) * [`-profile`](#-profile) From d0404d43ddf586205a6e40f4175ab77b9bf67abc Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 14 Apr 2020 13:21:11 +0200 Subject: [PATCH 149/374] Remove NA proteins from volcano plots --- bin/msstats_plfq.R | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 2139567..4db657d 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -88,7 +88,9 @@ if (length(lvls) == 1) groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", width=12, height=12,dot.size = 2,ylimUp = 7) - groupComparisonPlots(data=test.MSstats$ComparisonResult, type="VolcanoPlot", + + test.MSstats$Volcano = test.MSstats$ComparisonResult[!is.na(test.MSstats$ComparisonResult$pvalue),] + groupComparisonPlots(data=test.MSstats$Volcano, type="VolcanoPlot", width=12, height=12,dot.size = 2,ylimUp = 7) if (nrow(constrast_mat) > 1) From 85f15e4c92602bff772977382b7fb35582b30167 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 16 Apr 2020 16:45:14 +0200 Subject: [PATCH 150/374] Use OMP variable to set threads for Percolator --- main.nf | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index b901a7a..33c27eb 100644 --- a/main.nf +++ b/main.nf @@ -596,7 +596,10 @@ process percolator { def pptdc = params.post_processing_tdc ? "" : "-post-processing-tdc" """ - PercolatorAdapter -in ${id_file} \\ + ## Percolator does not have a threads parameter. Set it via OpenMP env variable, + ## to honor threads on clusters + OMP_NUMBER_THREADS=${task.cpus} PercolatorAdapter \\ + -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ ${pptdc} \\ From 5cfa223407a1e2a62cc157575e793238c6394949 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:37:15 +0200 Subject: [PATCH 151/374] extended documentation --- docs/usage.md | 296 +++++++++++++++++++++++++++++++++++++++----------- main.nf | 2 +- 2 files changed, 232 insertions(+), 66 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index b71a5d3..293ed37 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -9,14 +9,14 @@ * [Reproducibility](#reproducibility) * [Main arguments](#main-arguments) - Either: + Either (using a PRIDE Sample to data relation format file): * [`--sdrf`](#--sdrf) * [`--root_folder`](#--root_folder) - Or: + Or (using spectrum files and an OpenMS style experimental design): * [`--spectra`](#--spectra) * [`--exp_design`](#--exp_design) - + And: * [`--database`](#--database) * [`-profile`](#-profile) @@ -28,11 +28,14 @@ * [`--search_engine`](#--search_engine) * [`--enzyme`](#--enzyme) * [`--num_enzyme_termini`](#--num_enzyme_termini) + * [`--num_hits`](#--num_hits) * [`--fixed_mods`](#--fixed_mods) * [`--variable_mods`](#--variable_mods) * [`--precursor_mass_tolerance`](#--precursor_mass_tolerance) + * [`--precursor_mass_tolerance_unit`](#--precursor_mass_tolerance_unit) + * [`--fragment_mass_tolerance`](#--fragment_mass_tolerance) + * [`--fragment_mass_tolerance_unit`](#--fragment_mass_tolerance_unit) * [`--allowed_missed_cleavages`](#--allowed_missed_cleavages) - * [`--num_hits`](#--num_hits) * [`--psm_level_fdr_cutoff`](#--psm_level_fdr_cutoff) * [`--min_precursor_charge`](#--min_precursor_charge) * [`--max_precursor_charge`](#--max_precursor_charge) @@ -111,7 +114,7 @@ NXF_OPTS='-Xms1g -Xmx4g' ## Running the pipeline -The typical command for running the pipeline is as follows: +The most simple command for running the pipeline is as follows: ```bash nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker @@ -142,40 +145,59 @@ It's a good idea to specify a pipeline version when running the pipeline on your First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-core/proteomicslfq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. -This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. +This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For running a (not necessarily stable) development version of the pipeline you can use the `dev` branch with `-r dev`. ## Main arguments -### `--spectra` +The input to the pipeline can be specified in two mutually exclusive ways: -Use this to specify the location of your input mzML or Thermo RAW files: + 1. By using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and + annotated PRIDE experiment (see here for examples). For this case, use: -```bash ---spectra 'path/to/data/*.mzML' -``` + ### `--sdrf` -or + For the URI or path to the SDRF file. Input files will be downloaded and cached from the URIs specified in the SDRF file. + An OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the + following parameters will currently be overwritten by the ones specified in the SDRF: + - `fixed_mods`, + - `variable_mods`, + - `precursor_mass_tolerance`, + - `precursor_mass_tolerance_unit`, + - `fragment_mass_tolerance`, + - `fragment_mass_tolerance_unit`, + - `fragment_method`, + - `enzyme` -```bash ---spectra 'path/to/data/*.raw' -``` -> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + ### `--root_folder` -The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + This optional parameter can be used to specify a root folder in which the spectrum files specified in the SDRF are searched. + It is usually used if you have a local version of the experiment already. Note that this option does not support recursive + searching yet. -Please note the following requirements: + 2. By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style + experimental design file. -1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character + ### `--spectra` -### `--database` + Use this to specify the location of your input mzML or Thermo RAW files: + + ```bash + --spectra 'path/to/data/*.mzML' + ``` + + or + + ```bash + --spectra 'path/to/data/*.raw' + ``` + + Please note the following requirements: + + 1. The path must be enclosed in quotes + 2. The path must have at least one `*` wildcard character -Needs to be given to specify the input protein database when you run the pipeline: -```bash ---database '[path to Fasta protein database]' -``` ### `--exp_design` @@ -185,6 +207,15 @@ Path or URL to an experimental design file (if not given, it assumes unfractiona --exp_design '[path to experimental design file in OpenMS-style tab separated format]' ``` +### `--database` + +Since the database is not included in an SDRF, this parameter always needs to be given to specify the input protein database +when you run the pipeline. Remember to include contaminants (and decoys if not added in the pipeline with --add-decoys) + +```bash +--database '[path to Fasta protein database]' +``` + ### `-profile` Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! @@ -205,55 +236,132 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * A profile with a complete configuration for automated testing * Includes links to test data and therefore doesn't need additional parameters +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. + +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + ## Database search ### `--precursor_mass_tolerance` -Precursor mass tolerance used for database search in ppm. TODO parameterize the unit. For High-Resolution instruments a precursor mass tolerance value of 5ppm is recommended. (i.e. 5) +Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5 ppm is recommended (i.e. 5). See also [`--precursor_mass_tolerance_unit`](#--precursor_mass_tolerance_unit). + +### `--precursor_mass_tolerance` + +Precursor mass tolerance unit used for database search. Possible values are "ppm" (default) and "Da". ### `--enzyme` -Specify which enzymatic restriction should be applied ('unspecific cleavage', 'Trypsin', see OpenMS enzymes) +Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS [enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended cutting rules, +as used by default with "Trypsin". I.e. if you specify "Trypsin" with MSGF, it will be automatically converted to "Trypsin/P" = +"Trypsin without proline rule". + +### `--num_enzyme_termini` +Specify the number of termini matching the enzyme cutting rules for a peptide to be considered. Valid values are +"fully" (default), "semi", or "none". + +### `--num_hits` +Specify the maximum number of top peptide candidates per spectrum to be reported by the search engine. Default: 1 ### `--fixed_mods` -Specify which fixed modifications should be applied to the database search (eg. '' or 'Carbamidomethyl (C)', see OpenMS modifications) +Specify which fixed modifications should be applied to the database search (eg. '' or 'Carbamidomethyl (C)', see Unimod modifications +in the style '({unimod name} ({optional term specificity} {optional origin})'). +All possible enzymes can be found in the restrictions mentioned in the command line documentation of e.g. [CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html) (scroll down a bit for the complete set). +Multiple fixed modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)') ### `--variable_mods` -Specify which variable modifications should be applied to the database search (eg. 'Oxidation (M)', see OpenMS modifications) - -Multiple fixed or variable modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)') +Specify which variable modifications should be applied to the database search (eg. 'Oxidation (M)', see Unimod modifications +in the style '({unimod name} ({optional term specificity} {optional origin})'). +All possible enzymes can be found in the restrictions mentioned in the command line documentation of e.g. [CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html) (scroll down a bit for the complete set). +Multiple variable modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)') ### `--allowed_missed_cleavages` -Specify the number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if the no-enzyme option is specified for comet. +Specify the maximum number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if "unspecific cleavage" is specified as enzyme. -### `--psm_level_fdr_cutoff` +### `--instrument` -Specify the PSM level cutoff for the identification FDR for IDFilter. +Type of instrument that generated the data. 'low_res' or 'high_res' (default; refers to LCQ and LTQ instruments) -## Protein Inference +### `--protocol` -### `--protein_level_fdr_cutoff` +MSGF only: Labeling or enrichment protocol used, if any -Specify the protein level cutoff for the identification FDR of PLFQ +### `--fragment_method` + +Currently unsupported. Defaults to "ALL" for Comet and "from_spectrum" for MSGF. Should be a sensible default for 99% of the cases. + +### `--isotope_error_range` + +Range of allowed isotope peak errors (MS-GF+ parameter '-ti'). Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation. Combined with 'precursor_mass_tolerance'/'precursor_error_units', this determines the actual precursor mass tolerance. E.g. for experimental mass 'exp' and calculated mass 'calc', '-precursor_mass_tolerance 20 -precursor_error_units ppm -isotope_error_range -1,2' tests '|exp - calc - n * 1.00335 Da| < 20 ppm' for n = -1, 0, 1, 2. + +### `--fragment_method` + +MSGFPlus: Fragmentation method ('from_spectrum' relies on spectrum meta data and uses CID as fallback option; MS-GF+ parameter '-m') + +### `--min_precursor_charge` + +Minimum precursor ion charge + +### `--max_precursor_charge` + +Maximum precursor ion charge + +### `--min_peptide_length` + +Minimum peptide length to consider (works with MSGF and in newer Comet versions) + +### `--max_peptide_length` + +Maximum peptide length to consider (works with MSGF and in newer Comet versions) + +### `--max_mods` + +Maximum number of modifications per peptide. If this value is large, the search may take very long. + +### `--db_debug` + +Set debug level for the search engines (regulates if intermediate output is kept and if you are going to see the output +of the underlying search engine) + +## PSM Rescoring: + +### `--posterior_probabilities` + +How to calculate posterior probabilities for PSMs: + - "percolator" = Re-score based on PSM-feature-based SVM and transform distance + to hyperplane for posteriors + - "fit_distributions" = Fit positive and negative distributions to scores + (similar to PeptideProphet) + +### `--rescoring_debug` + +Debug level during PSM rescoring for additional text output and keeping temporary files + +### `--psm_pep_fdr_cutoff` + +FDR cutoff on PSM level (or potential peptide level; see Percolator options) +before going into feature finding, map alignment and inference. + +### Percolator specific: ### `--train_FDR` -Percolator: False discovery rate threshold to define positive examples in training. Set to testFDR if 0. +False discovery rate threshold to define positive examples in training. Set to testFDR if 0 ### `--test_FDR` -Percolator: False discovery rate threshold for evaluating best cross validation result and reported end result. +False discovery rate threshold for evaluating best cross validation result and reported end result -### `--FDR_level` +### `--percolator_fdr_level` -Percolator: Level of FDR calculation ('peptide-level-fdrs', 'psm-level-fdrs', 'protein-level-fdrs'). +Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') -### `--klammer` +### `--post-processing-tdc` -Percolator: Retention time features are calculated as in Klammer et al. instead of with Elude. Only available if --description_correct_features is set. +Use target-decoy competition to assign q-values and PEPs. ### `--description_correct_features` @@ -267,52 +375,110 @@ Percolator provides the possibility to use so called description of correct feat 8 -> delta_retention_time\*delta_mass_calibration -### `--isotope_error_range` +### `--generic-feature-set` -Range of allowed isotope peak errors (MS-GF+ parameter '-ti'). Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation. Combined with 'precursor_mass_tolerance'/'precursor_error_units', this determines the actual precursor mass tolerance. E.g. for experimental mass 'exp' and calculated mass 'calc', '-precursor_mass_tolerance 20 -precursor_error_units ppm -isotope_error_range -1,2' tests '|exp - calc - n * 1.00335 Da| < 20 ppm' for n = -1, 0, 1, 2. +Use only generic (i.e. not search engine specific) features. Generating search engine specific +features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly. -### `--fragment_method` +### `--subset-max-train` -MSGFPlus: Fragmentation method ('from_spectrum' relies on spectrum meta data and uses CID as fallback option; MS-GF+ parameter '-m') +Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other +PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. +Default: 300,000 -### `--instrument` +### `--klammer` -MSGFPlus: Instrument that generated the data ('low_res'/'high_res' refer to LCQ and LTQ instruments; MS-GF+ parameter '-inst') +Retention time features are calculated as in Klammer et al. instead of with Elude. Default: false -### `--protocol` +### Distribution-fitting (IDPEP) specific: -MSGFPlus: Labeling or enrichment protocol used, if any (MS-GF+ parameter '-p') +### `--outlier_handling` -### `--tryptic` +How to handle outliers during fitting: + - ignore_iqr_outliers (default): ignore outliers outside of 3\*IQR from Q1/Q3 for fitting + - set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting + - ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) + - none: do nothing -MSGFPlus: Level of cleavage specificity required (MS-GF+ parameter '-ntt') +### `--top_hits_only` -### `--min_precursor_charge` +Use only the top peptide hits per spectrum for fitting. Default: true -MSGFPlus: Minimum precursor ion charge (only used for spectra without charge information; MS-GF+ parameter '-minCharge') +## Inference and Quantification: -### `--max_precursor_charge` +### `--inf_quant_debug` -MSGFPlus: Maximum precursor ion charge (only used for spectra without charge information; MS-GF+ parameter '-maxCharge') +Debug level during inference and quantification. (WARNING: Higher than 666 may produce a lot of additional output files) -### `--min_peptide_length` +### Inference: -MSGFPlus: Minimum peptide length to consider (MS-GF+ parameter '-minLength') +### `--protein_inference` -### `--max_peptide_length` +Infer proteins through: + - "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) (default) + - "bayesian" = computes a posterior probability for every protein based on a Bayesian network + - ("percolator" not yet supported) -MSGFPlus: Maximum peptide length to consider (MS-GF+ parameter '-maxLength') +### `--protein_level_fdr_cutoff` -### `--matches_per_spec` +Protein level FDR cutoff (Note: this affects and chooses the peptides used for quantification). Default: 0.05 -MSGFPLus: Number of matches per spectrum to be reported (MS-GF+ parameter '-n') +### Quantification: -### `--max_mods` +### `--transfer_ids` + +Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: increased memory consumption). (default: "false") + +### `--targeted_only` + +Only looks for quantifiable features at locations with an identified spectrum. (default: "true") + +### `--mass_recalibration` + +Recalibrates masses to correct for instrument biases. (default: "false") + +### `--psm_pep_fdr_for_quant` + +PSM/peptide level FDR used for quantification after inference (*in addition to protein-level filtering*) +If Bayesian inference was chosen, this will be a peptide-level FDR and only the best PSMs per +peptide will be reported. (default: off = 1.0) + +### `--protein_quantification` + +Quantify proteins based on: + - "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) + - "strictly_unique_peptides" = use peptides mapping to a unique single protein only + - "shared_peptides" = use shared peptides, too, but only greedily for its best group (by inference score) + +## Statistical post-processing: + +### `--skip_post_msstats` + +Skip MSstats for statistical post-processing? + +### `--ref_condition` + +Instead of all pairwise contrasts (default), uses the given condition name/number (corresponding to your experimental design) as a reference and creates pairwise contrasts against it. (TODO not yet fully implemented) + +### `--contrasts` + +Specify a set of contrasts in a semicolon seperated list of R-compatible contrasts with the +condition names/numbers as variables (e.g. "1-2;1-3;2-3"). Overwrites "--reference" (TODO not yet fully implemented) + +## Quality control: + +### `--skip_qc` + +Skip generation of quality control report by PTXQC? default: "true" since it is still unstable + +### `--ptxqc_report_layout` + +Specify a yaml file for the report layout (see PTXQC documentation) (TODO not yet fully implemented) -MSGFPlus: Maximum number of modifications per peptide. If this value is large, the search may take very long. Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. + ## Job resources ### Automatic resubmission diff --git a/main.nf b/main.nf index b901a7a..dc18018 100644 --- a/main.nf +++ b/main.nf @@ -48,7 +48,6 @@ def helpMessage() { --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) --fragment_mass_tolerance_unit Da or ppm (currently always ppm) --allowed_missed_cleavages Allowed missed cleavages - --psm_level_fdr_cutoff Identification PSM-level FDR cutoff --min_precursor_charge Minimum precursor ion charge --max_precursor_charge Maximum precursor ion charge --min_peptide_length Minimum peptide length to consider @@ -451,6 +450,7 @@ process search_engine_msgf { -threads ${task.cpus} \\ -database "${database}" \\ -instrument ${params.instrument} \\ + -protocol "${params.protocol}" \\ -matches_per_spec ${params.num_hits} \\ -min_precursor_charge ${params.min_precursor_charge} \\ -max_precursor_charge ${params.max_precursor_charge} \\ From f1f1638513268abb86a4691717f08a5d6aafa1a9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:46:52 +0200 Subject: [PATCH 152/374] lint --- docs/usage.md | 68 +++++++++++++++++++++++++++------------------------ 1 file changed, 36 insertions(+), 32 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 293ed37..d3fabc6 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -151,53 +151,57 @@ This version number will be logged in reports when you run the pipeline, so that The input to the pipeline can be specified in two mutually exclusive ways: - 1. By using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and - annotated PRIDE experiment (see here for examples). For this case, use: +============================ - ### `--sdrf` +*a)* Either by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and +annotated PRIDE experiment (see here for examples). For this case, use: - For the URI or path to the SDRF file. Input files will be downloaded and cached from the URIs specified in the SDRF file. - An OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the - following parameters will currently be overwritten by the ones specified in the SDRF: - - `fixed_mods`, - - `variable_mods`, - - `precursor_mass_tolerance`, - - `precursor_mass_tolerance_unit`, - - `fragment_mass_tolerance`, - - `fragment_mass_tolerance_unit`, - - `fragment_method`, - - `enzyme` +### `--sdrf` +For the URI or path to the SDRF file. Input files will be downloaded and cached from the URIs specified in the SDRF file. +An OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the +following parameters will currently be overwritten by the ones specified in the SDRF: - ### `--root_folder` +* `fixed_mods`, +* `variable_mods`, +* `precursor_mass_tolerance`, +* `precursor_mass_tolerance_unit`, +* `fragment_mass_tolerance`, +* `fragment_mass_tolerance_unit`, +* `fragment_method`, +* `enzyme` - This optional parameter can be used to specify a root folder in which the spectrum files specified in the SDRF are searched. - It is usually used if you have a local version of the experiment already. Note that this option does not support recursive - searching yet. +### `--root_folder` - 2. By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style - experimental design file. +This optional parameter can be used to specify a root folder in which the spectrum files specified in the SDRF are searched. +It is usually used if you have a local version of the experiment already. Note that this option does not support recursive +searching yet. - ### `--spectra` +============================ - Use this to specify the location of your input mzML or Thermo RAW files: +*b)* By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style +experimental design file. - ```bash - --spectra 'path/to/data/*.mzML' - ``` +### `--spectra` - or +Use this to specify the location of your input mzML or Thermo RAW files: - ```bash - --spectra 'path/to/data/*.raw' - ``` +```bash +--spectra 'path/to/data/*.mzML' +``` + +or - Please note the following requirements: +```bash +--spectra 'path/to/data/*.raw' +``` - 1. The path must be enclosed in quotes - 2. The path must have at least one `*` wildcard character +Please note the following requirements: +1. The path must be enclosed in quotes +2. The path must have at least one `*` wildcard character +============================ ### `--exp_design` From 0488fba579d2a5cb869b8e992e26c00de7f1acda Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:47:50 +0200 Subject: [PATCH 153/374] lint --- docs/usage.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index d3fabc6..318f75f 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -151,9 +151,9 @@ This version number will be logged in reports when you run the pipeline, so that The input to the pipeline can be specified in two mutually exclusive ways: -============================ +----- -*a)* Either by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and +_a)_ Either by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and annotated PRIDE experiment (see here for examples). For this case, use: ### `--sdrf` @@ -177,9 +177,9 @@ This optional parameter can be used to specify a root folder in which the spectr It is usually used if you have a local version of the experiment already. Note that this option does not support recursive searching yet. -============================ +----- -*b)* By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style +_b)_ By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style experimental design file. ### `--spectra` @@ -201,7 +201,7 @@ Please note the following requirements: 1. The path must be enclosed in quotes 2. The path must have at least one `*` wildcard character -============================ +----- ### `--exp_design` From fbbb69882adbd283341c9b127810661b0e7ccf32 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:49:23 +0200 Subject: [PATCH 154/374] lint --- docs/usage.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 318f75f..460b611 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -153,7 +153,7 @@ The input to the pipeline can be specified in two mutually exclusive ways: ----- -_a)_ Either by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and +__a)__ Either by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and annotated PRIDE experiment (see here for examples). For this case, use: ### `--sdrf` @@ -179,7 +179,7 @@ searching yet. ----- -_b)_ By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style +__b)__ By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style experimental design file. ### `--spectra` From 35baf1282352e31ed8ebb982e7f8fa9cbd8fc80c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:50:10 +0200 Subject: [PATCH 155/374] error --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 460b611..74ddf29 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -250,7 +250,7 @@ The pipeline also dynamically loads configurations from [https://github.com/nf-c Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5 ppm is recommended (i.e. 5). See also [`--precursor_mass_tolerance_unit`](#--precursor_mass_tolerance_unit). -### `--precursor_mass_tolerance` +### `--precursor_mass_tolerance_unit` Precursor mass tolerance unit used for database search. Possible values are "ppm" (default) and "Da". From 519119cb7738fa6a6b029da6ad993feae16a0c14 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:54:05 +0200 Subject: [PATCH 156/374] fixes --- conf/test_full.config | 2 +- docs/usage.md | 10 +++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/conf/test_full.config b/conf/test_full.config index a5464d7..d411ded 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -4,7 +4,7 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test, + * nextflow run nf-core/proteomicslfq -profile test_full, */ params { diff --git a/docs/usage.md b/docs/usage.md index 74ddf29..574a932 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -228,17 +228,21 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * `docker` * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from dockerhub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) + * Pulls software from Docker Hub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) * `singularity` * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - * Pulls software from DockerHub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) + * Pulls software from Docker Hub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) * `conda` * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. * A generic configuration profile to be used with [Conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) + * Pulls most software from the [Bioconda](https://bioconda.github.io/) and [conda-forge](https://conda-forge.org/) channels. * `test` * A profile with a complete configuration for automated testing * Includes links to test data and therefore doesn't need additional parameters +* `test_full` + * A profile with a complete configuration for automated testing on AWS + * Includes links to test data and therefore doesn't need additional parameters + * Downloads roughly 9GB of raw data from PRIDE and analyzes > We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. From 07185fb604e76054cfe94dd6be94c221255c175a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 20:56:17 +0200 Subject: [PATCH 157/374] more --- docs/usage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 574a932..45cbe55 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -238,11 +238,11 @@ If `-profile` is not specified at all the pipeline will be run locally and expec * Pulls most software from the [Bioconda](https://bioconda.github.io/) and [conda-forge](https://conda-forge.org/) channels. * `test` * A profile with a complete configuration for automated testing - * Includes links to test data and therefore doesn't need additional parameters + * Includes links to test data hosted on GitHub and therefore doesn't need additional parameters * `test_full` * A profile with a complete configuration for automated testing on AWS - * Includes links to test data and therefore doesn't need additional parameters - * Downloads roughly 9GB of raw data from PRIDE and analyzes + * Includes links to test data on GitHub and PRIDE and therefore doesn't need additional parameters + * Warning: Downloads roughly 9GB of raw data from PRIDE and analyzes > We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. From 7e9e6ad3797a3e04d2a78d40d5081bafaaa7cb19 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 21:04:15 +0200 Subject: [PATCH 158/374] lint --- docs/usage.md | 64 +++++++++++++++++++++++++-------------------------- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 45cbe55..fd35363 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -260,15 +260,18 @@ Precursor mass tolerance unit used for database search. Possible values are "ppm ### `--enzyme` -Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS [enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended cutting rules, -as used by default with "Trypsin". I.e. if you specify "Trypsin" with MSGF, it will be automatically converted to "Trypsin/P" = -"Trypsin without proline rule". +Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS +[enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended +cutting rules, as used by default with "Trypsin". I.e. if you specify "Trypsin" with MSGF, it will be automatically converted to +"Trypsin/P" = "Trypsin without proline rule". ### `--num_enzyme_termini` + Specify the number of termini matching the enzyme cutting rules for a peptide to be considered. Valid values are "fully" (default), "semi", or "none". ### `--num_hits` + Specify the maximum number of top peptide candidates per spectrum to be reported by the search engine. Default: 1 ### `--fixed_mods` @@ -305,13 +308,9 @@ Currently unsupported. Defaults to "ALL" for Comet and "from_spectrum" for MSGF. Range of allowed isotope peak errors (MS-GF+ parameter '-ti'). Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation. Combined with 'precursor_mass_tolerance'/'precursor_error_units', this determines the actual precursor mass tolerance. E.g. for experimental mass 'exp' and calculated mass 'calc', '-precursor_mass_tolerance 20 -precursor_error_units ppm -isotope_error_range -1,2' tests '|exp - calc - n * 1.00335 Da| < 20 ppm' for n = -1, 0, 1, 2. -### `--fragment_method` - -MSGFPlus: Fragmentation method ('from_spectrum' relies on spectrum meta data and uses CID as fallback option; MS-GF+ parameter '-m') - ### `--min_precursor_charge` -Minimum precursor ion charge +Minimum precursor ion charge ### `--max_precursor_charge` @@ -334,15 +333,16 @@ Maximum number of modifications per peptide. If this value is large, the search Set debug level for the search engines (regulates if intermediate output is kept and if you are going to see the output of the underlying search engine) -## PSM Rescoring: +## PSM Rescoring ### `--posterior_probabilities` How to calculate posterior probabilities for PSMs: - - "percolator" = Re-score based on PSM-feature-based SVM and transform distance - to hyperplane for posteriors - - "fit_distributions" = Fit positive and negative distributions to scores - (similar to PeptideProphet) + +* "percolator" = Re-score based on PSM-feature-based SVM and transform distance + to hyperplane for posteriors +* "fit_distributions" = Fit positive and negative distributions to scores + (similar to PeptideProphet) ### `--rescoring_debug` @@ -353,7 +353,7 @@ Debug level during PSM rescoring for additional text output and keeping temporar FDR cutoff on PSM level (or potential peptide level; see Percolator options) before going into feature finding, map alignment and inference. -### Percolator specific: +### Percolator specific ### `--train_FDR` @@ -398,40 +398,42 @@ Default: 300,000 Retention time features are calculated as in Klammer et al. instead of with Elude. Default: false -### Distribution-fitting (IDPEP) specific: +### Distribution-fitting (IDPEP) specific ### `--outlier_handling` How to handle outliers during fitting: - - ignore_iqr_outliers (default): ignore outliers outside of 3\*IQR from Q1/Q3 for fitting - - set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting - - ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) - - none: do nothing + +* ignore_iqr_outliers (default): ignore outliers outside of 3\*IQR from Q1/Q3 for fitting +* set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting +* ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) +* none: do nothing ### `--top_hits_only` Use only the top peptide hits per spectrum for fitting. Default: true -## Inference and Quantification: +## Inference and Quantification ### `--inf_quant_debug` Debug level during inference and quantification. (WARNING: Higher than 666 may produce a lot of additional output files) -### Inference: +### Inference ### `--protein_inference` Infer proteins through: - - "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) (default) - - "bayesian" = computes a posterior probability for every protein based on a Bayesian network - - ("percolator" not yet supported) + +* "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) (default) +* "bayesian" = computes a posterior probability for every protein based on a Bayesian network +* ("percolator" not yet supported) ### `--protein_level_fdr_cutoff` Protein level FDR cutoff (Note: this affects and chooses the peptides used for quantification). Default: 0.05 -### Quantification: +### Quantification ### `--transfer_ids` @@ -454,11 +456,11 @@ peptide will be reported. (default: off = 1.0) ### `--protein_quantification` Quantify proteins based on: - - "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) - - "strictly_unique_peptides" = use peptides mapping to a unique single protein only - - "shared_peptides" = use shared peptides, too, but only greedily for its best group (by inference score) +* "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) +* "strictly_unique_peptides" = use peptides mapping to a unique single protein only +* "shared_peptides" = use shared peptides, too, but only greedily for its best group (by inference score) -## Statistical post-processing: +## Statistical post-processing ### `--skip_post_msstats` @@ -473,7 +475,7 @@ Instead of all pairwise contrasts (default), uses the given condition name/numbe Specify a set of contrasts in a semicolon seperated list of R-compatible contrasts with the condition names/numbers as variables (e.g. "1-2;1-3;2-3"). Overwrites "--reference" (TODO not yet fully implemented) -## Quality control: +## Quality control ### `--skip_qc` @@ -483,10 +485,8 @@ Skip generation of quality control report by PTXQC? default: "true" since it is Specify a yaml file for the report layout (see PTXQC documentation) (TODO not yet fully implemented) - Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. - ## Job resources ### Automatic resubmission From 37d9180811dd91dcf52d8e2ffdeda072c853c74f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Apr 2020 21:05:58 +0200 Subject: [PATCH 159/374] finish --- docs/usage.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index fd35363..6eb984c 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -260,7 +260,7 @@ Precursor mass tolerance unit used for database search. Possible values are "ppm ### `--enzyme` -Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS +Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS [enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended cutting rules, as used by default with "Trypsin". I.e. if you specify "Trypsin" with MSGF, it will be automatically converted to "Trypsin/P" = "Trypsin without proline rule". @@ -437,7 +437,7 @@ Protein level FDR cutoff (Note: this affects and chooses the peptides used for q ### `--transfer_ids` -Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: increased memory consumption). (default: "false") +Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: increased memory consumption). (default: "false") ### `--targeted_only` @@ -456,6 +456,7 @@ peptide will be reported. (default: off = 1.0) ### `--protein_quantification` Quantify proteins based on: + * "unique_peptides" = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides) * "strictly_unique_peptides" = use peptides mapping to a unique single protein only * "shared_peptides" = use shared peptides, too, but only greedily for its best group (by inference score) From 4b7a1f4f57d151773bc4a4a3dc83ef2b1fb56b3e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 01:23:10 +0200 Subject: [PATCH 160/374] more --- docs/usage.md | 16 ++++++++++ main.nf | 2 ++ nextflow.config | 77 ++++++++++++++++++++++--------------------------- 3 files changed, 53 insertions(+), 42 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 6eb984c..a0ac7c7 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -248,6 +248,22 @@ If `-profile` is not specified at all the pipeline will be run locally and expec The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). + +## Decoy database generation + +### `--add_decoys` + +If decoys were not yet included in the input database, they can be appended by OpenMS DecoyGenerator (TODO allow specifying type). +Default: pseudo-reverse peptides + +### `--decoy_affix` + +Specify the string that was or will be added to the protein accession to label it + +### `--affix_type` + +Is the decoy label a prefix or suffix. Prefix is highly recommended as some tools (e.g. Percolator might not work well with suffixes) + ## Database search ### `--precursor_mass_tolerance` diff --git a/main.nf b/main.nf index dc18018..6702a19 100644 --- a/main.nf +++ b/main.nf @@ -463,6 +463,7 @@ process search_engine_msgf { -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -max_mods ${params.max_mods} \\ + -db_debug ${params.db_debug} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -509,6 +510,7 @@ process search_engine_comet { -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ -fragment_bin_tolerance ${frag_tol} \\ + -db_debug ${params.db_debug} \\ > ${mzml_file.baseName}_comet.log """ } diff --git a/nextflow.config b/nextflow.config index fa6cc87..ac8f800 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,77 +9,70 @@ params { // Workflow flags - sdrf = "" - root_folder = "" - spectra = "" - database = "" - expdesign = "" + sdrf = '' + root_folder = '' + spectra = '' + database = '' + expdesign = '' // Tools flags - posterior_probabilities = "percolator" + posterior_probabilities = 'percolator' add_decoys = false - search_engine = "comet" - protein_inference = "aggregation" + search_engine = 'comet' + protein_inference = 'aggregation' psm_pep_fdr_cutoff = 0.10 protein_level_fdr_cutoff = 0.05 - num_hits = 1 // decoys - decoy_affix = "DECOY_" - affix_type = "prefix" + decoy_affix = 'DECOY_' + affix_type = 'prefix' // shared search engine parameters enzyme = 'Trypsin' num_enzyme_termini = 'fully' + allowed_missed_cleavages = 1 precursor_mass_tolerance = 5 - precursor_mass_tolerance_unit = "ppm" + precursor_mass_tolerance_unit = 'ppm' fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 - fragment_mass_tolerance_unit = "ppm" - dissociation_method = "HCD" //currently unused. hard to find a good logic to beat the defaults - - // Percolator flags - train_FDR = 0.05 - test_FDR = 0.05 - FDR_level = 'peptide-level-fdrs' - klammer = false - description_correct_features = 0 - subset_max_train = 0 - post_processing_tdc = false - - // MSGF+ flags - isotope_error_range = "0,1" - fragment_method = "from_spectrum" - instrument = "high_res" - protocol = "automatic" - tryptic = "non" + fragment_mass_tolerance_unit = 'ppm' + dissociation_method = 'HCD' //currently unused. hard to find a good logic to beat the defaults + isotope_error_range = '0,1' + instrument = 'high_res' + protocol = 'automatic' min_precursor_charge = 2 max_precursor_charge = 3 min_peptide_length = 6 max_peptide_length = 40 - matches_per_spec = 1 + num_hits = 1 max_mods = 3 + db_debug = 0 - // Comet flags - allowed_missed_cleavages = 1 - // TODO + // Percolator flags + train_FDR = 0.05 + test_FDR = 0.05 + FDR_level = 'peptide-level-fdrs' + klammer = 'false' + description_correct_features = 0 + subset_max_train = 300000 + post_processing_tdc = 'false' // ProteomicsLFQ flags inf_quant_debug = 0 - protein_inference = "aggregation" - // TODO convert to real flags? - targeted_only = "true" - mass_recalibration = "false" - transfer_ids = "false" + protein_inference = 'aggregation' + targeted_only = 'true' + mass_recalibration = 'false' + transfer_ids = 'false' // MSstats skip_post_msstats = false - ref_condition = "" - contrasts = "" + ref_condition = '' + contrasts = '' // PTXQC - ptxqc_report_layout = "" + enable_qc = false + ptxqc_report_layout = '' outdir = './results' From 295615a66a65447540b8e48fa59c25bfc3b958c3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 01:24:40 +0200 Subject: [PATCH 161/374] more --- docs/usage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index a0ac7c7..e0066f7 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -248,12 +248,12 @@ If `-profile` is not specified at all the pipeline will be run locally and expec The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). - ## Decoy database generation ### `--add_decoys` -If decoys were not yet included in the input database, they can be appended by OpenMS DecoyGenerator (TODO allow specifying type). +If decoys were not yet included in the input database, they have to be appended by OpenMS DecoyGenerator by adding this flag +(TODO allow specifying type). Default: pseudo-reverse peptides ### `--decoy_affix` @@ -262,7 +262,7 @@ Specify the string that was or will be added to the protein accession to label i ### `--affix_type` -Is the decoy label a prefix or suffix. Prefix is highly recommended as some tools (e.g. Percolator might not work well with suffixes) +Is the decoy label a prefix or suffix. Prefix is highly recommended as some tools (e.g. Percolator) might not work well with suffixes ## Database search From 54ed8b2b0edaf10b1811ccfc6600bd47655bbffb Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 01:29:28 +0200 Subject: [PATCH 162/374] little error --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 6702a19..4bcd68d 100644 --- a/main.nf +++ b/main.nf @@ -463,7 +463,7 @@ process search_engine_msgf { -fixed_modifications ${fixed.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -variable_modifications ${variable.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -max_mods ${params.max_mods} \\ - -db_debug ${params.db_debug} \\ + -debug ${params.db_debug} \\ > ${mzml_file.baseName}_msgf.log """ } @@ -510,7 +510,7 @@ process search_engine_comet { -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ -fragment_bin_tolerance ${frag_tol} \\ - -db_debug ${params.db_debug} \\ + -debug ${params.db_debug} \\ > ${mzml_file.baseName}_comet.log """ } From ab42041bead0527be35fa97293aaa154afad9440 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 01:30:55 +0200 Subject: [PATCH 163/374] mention comprehensive tests --- conf/test_full.config | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/conf/test_full.config b/conf/test_full.config index d411ded..97f7785 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,9 +1,9 @@ /* * ------------------------------------------------- - * Nextflow config file for running tests + * Nextflow config file for running comprehensive tests * ------------------------------------------------- * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: + * to run a comprehensive test with data download from PRIDE. Use as follows: * nextflow run nf-core/proteomicslfq -profile test_full, */ From 4797d3bdc987d28f8d378d739644ed5c384776a4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 01:33:09 +0200 Subject: [PATCH 164/374] mention comprehensive tests --- conf/test_full.config | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/conf/test_full.config b/conf/test_full.config index 97f7785..4d539a1 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -3,8 +3,10 @@ * Nextflow config file for running comprehensive tests * ------------------------------------------------- * Defines bundled input files and everything required - * to run a comprehensive test with data download from PRIDE. Use as follows: + * to run a comprehensive test with data downloaded from PRIDE. Use as follows: * nextflow run nf-core/proteomicslfq -profile test_full, + * + * For a short test of functionality, see the 'test' profile/config. */ params { From e84693938503842961a670bbc556c6db6e89bfc8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 13:59:20 +0200 Subject: [PATCH 165/374] make QC off by default --- conf/test.config | 3 ++- main.nf | 7 +++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/conf/test.config b/conf/test.config index d0f9b3c..4c47356 100644 --- a/conf/test.config +++ b/conf/test.config @@ -31,5 +31,6 @@ params { search_engine = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" - post_processing_tdc = false + post_processing_tdc = "false" + enable_qc = true } diff --git a/main.nf b/main.nf index 4bcd68d..c2df837 100644 --- a/main.nf +++ b/main.nf @@ -370,8 +370,8 @@ process generate_decoy_database { """ DecoyDatabase -in ${mydatabase} \\ -out ${mydatabase.baseName}_decoy.fasta \\ - -decoy_string DECOY_ \\ - -decoy_string_position prefix \\ + -decoy_string ${params.decoy_affix} \\ + -decoy_string_position ${params.affix_type} \\ > ${mydatabase.baseName}_decoy_database.log """ } @@ -902,6 +902,9 @@ process ptxqc { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ptxqc", mode: 'copy' + when: + params.enable_qc + input: file mzTab from out_mzTab From eb6642a2fcde0f13a8c6a3d01c72210bdd9ba097 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 15:44:58 +0200 Subject: [PATCH 166/374] additions to localization --- main.nf | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index a215380..10620af 100644 --- a/main.nf +++ b/main.nf @@ -42,7 +42,7 @@ def helpMessage() { --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) - --phospho_rescoring Phosphosite rescoring using the luciphor algorithm + --mod_localization Specify the var. modifications whose localizations should be rescored with the luciphor algorithm --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) --allowed_missed_cleavages Allowed missed cleavages --psm_level_fdr_cutoff Identification PSM-level FDR cutoff @@ -786,22 +786,33 @@ process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: + tuple file(database), mzml_id, file(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.luciphor_settings)) file id_file from id_files_idx_feat_perc_fdr_filter_switched.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) file mzml_file_l from mzml_files_luciphor output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_fdr_filter_switched_luciphor + file "${id_file.baseName}_luciphor.idXML" into id_files_idx_feat_fdr_filter_switched_luciphor file "*.log" when: - params.phospho_rescoring + params.enable_mod_localization script: """ LuciphorAdapter -id ${id_file} \\ -in ${mzml_file_l} \\ - -out ${id_file.baseName}_switched.idXML \\ + -out ${id_file.baseName}_luciphor.idXML \\ -threads ${task.cpus} \\ + -num_threads ${task.cpus} \\ + -target_modifications ${params.mod_localization.tokenize(',').collect { "'${it}'" }.join(" ") } \\ + -fragment_method ${params.fragment_method} \\ + -fragment_mass_tolerance ${} \\ + -fragment_error_units ${} \\ + -neutral_losses "${params.luciphor_neutral_losses}" \\ + -decoy_mass "${params.luciphor_decoy_mass}" \\ + -decoy_neutral_losses "${params.luciphor_decoy_neutral_losses}" \\ + -max_charge_state ${params.max_precursor_charge} \\ + -max_peptide_length ${params.max_peptide_length} \\ > ${id_file.baseName}_scoreswitcher.log """ } From 09a34129d185c9181f1ccf4e49cd2d7e45869975 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Apr 2020 18:28:26 +0200 Subject: [PATCH 167/374] first draft of optional mod localization --- main.nf | 30 +++++++++++++++--------------- nextflow.config | 4 +++- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/main.nf b/main.nf index 4120205..9c6ac29 100644 --- a/main.nf +++ b/main.nf @@ -43,17 +43,13 @@ def helpMessage() { --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) -<<<<<<< HEAD + --enable_mod_localization Enable localization scoring with Luciphor --mod_localization Specify the var. modifications whose localizations should be rescored with the luciphor algorithm - --precursor_mass_tolerance Mass tolerance of precursor mass (ppm) - --allowed_missed_cleavages Allowed missed cleavages -======= --precursor_mass_tolerance Mass tolerance of precursor mass --precursor_mass_tolerance_unit Da or ppm --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) --fragment_mass_tolerance_unit Da or ppm (currently always ppm) --allowed_missed_cleavages Allowed missed cleavages ->>>>>>> dev --psm_level_fdr_cutoff Identification PSM-level FDR cutoff --min_precursor_charge Minimum precursor ion charge --max_precursor_charge Maximum precursor ion charge @@ -202,6 +198,9 @@ if (!params.sdrf) params.enzyme) idx_settings: tuple(id, params.enzyme) + luciphor_settings: + tuple(id, + params.fragment_method) mzmls: tuple(id,it)} .set{ch_sdrf_config} } @@ -249,6 +248,9 @@ else row[10]) idx_settings: tuple(id, row[10]) + luciphor_settings: + tuple(id, + row[9]) mzmls: tuple(id, params.root_folder.length() == 0 ? row[0] : (params.root_folder + "/" + row[1]))} .set{ch_sdrf_config} } @@ -830,7 +832,7 @@ process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(mzml_file), file(id_file), diss_meth from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) + tuple mzml_id, file(mzml_file), file(id_file), frag_meth from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) output: set mzml_id, file("${id_file.baseName}_luciphor.idXML") into id_files_luciphor @@ -849,9 +851,7 @@ process luciphor { -threads ${task.cpus} \\ -num_threads ${task.cpus} \\ -target_modifications ${params.mod_localization.tokenize(',').collect { "'${it}'" }.join(" ") } \\ - -fragment_method ${params.fragment_method} \\ - -fragment_mass_tolerance ${} \\ - -fragment_error_units ${} \\ + -fragment_method ${frag_method} \\ -neutral_losses "${params.luciphor_neutral_losses}" \\ -decoy_mass "${params.luciphor_decoy_mass}" \\ -decoy_neutral_losses "${params.luciphor_decoy_neutral_losses}" \\ @@ -859,6 +859,8 @@ process luciphor { -max_peptide_length ${params.max_peptide_length} \\ > ${id_file.baseName}_scoreswitcher.log """ + // -fragment_mass_tolerance ${} \\ + // -fragment_error_units ${} \\ } // --------------------------------------------------------------------- @@ -868,16 +870,14 @@ process luciphor { // ID files can come directly from the Percolator branch, IDPEP branch or // after optional processing with Luciphor mzmls_plfq - .join(id_files_idx_feat_fdr_filter_switched_luciphor + .join(id_files_luciphor .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_plfq) .mix(id_files_idx_feat_perc_fdr_filter_switched_plfq)) - .multiMap - { - it -> + .multiMap{ it -> mzmls: it[1] ids: it[2] } - .set(ch_plfq) + .set{ch_plfq} process proteomicslfq { @@ -887,7 +887,7 @@ process proteomicslfq { ///.toSortedList({ a, b -> b.baseName <=> a.baseName }) input: file(mzmls) from ch_plfq.mzmls.collect().view() - file(mzmls) from ch_plfq.ids.collect().view() + file(id_files) from ch_plfq.ids.collect().view() file expdes from ch_expdesign file fasta from plfq_in_db.mix(plfq_in_db_decoy) diff --git a/nextflow.config b/nextflow.config index fa6cc87..eb7c456 100644 --- a/nextflow.config +++ b/nextflow.config @@ -37,7 +37,9 @@ params { variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 fragment_mass_tolerance_unit = "ppm" - dissociation_method = "HCD" //currently unused. hard to find a good logic to beat the defaults + fragment_method = "HCD" //currently only used in luciphor. hard to find a good logic to beat the defaults for the engines + enable_mod_localization = false + mod_localization = "Phospho (S),Phospho (T),Phospho (Y)" // Percolator flags train_FDR = 0.05 From e2309aa893e244228c771d407fbb104cc4886014 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 20 Apr 2020 13:03:31 +0200 Subject: [PATCH 168/374] pptdc true --- conf/test.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conf/test.config b/conf/test.config index 4c47356..50c0e86 100644 --- a/conf/test.config +++ b/conf/test.config @@ -31,6 +31,6 @@ params { search_engine = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" - post_processing_tdc = "false" + post_processing_tdc = "true" enable_qc = true } From b3540254d4075ee63996c6ddf54339906d9d6879 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 20 Apr 2020 14:51:04 +0200 Subject: [PATCH 169/374] remove pptdc and set always. Add note on restrictions with percolator. --- conf/test.config | 1 - conf/test_full.config | 1 - docs/usage.md | 17 ++++++++++++----- main.nf | 7 ++----- nextflow.config | 1 - 5 files changed, 14 insertions(+), 13 deletions(-) diff --git a/conf/test.config b/conf/test.config index 50c0e86..2f391c1 100644 --- a/conf/test.config +++ b/conf/test.config @@ -31,6 +31,5 @@ params { search_engine = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" - post_processing_tdc = "true" enable_qc = true } diff --git a/conf/test_full.config b/conf/test_full.config index 4d539a1..9e157ce 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -30,5 +30,4 @@ params { decoy_affix = "rev" protein_inference = "bayesian" targeted_only = "false" - post_processing_tdc = false } diff --git a/docs/usage.md b/docs/usage.md index e0066f7..375def5 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -55,7 +55,6 @@ * [`--train_FDR`](#--train_FDR) * [`--test_FDR`](#--test_FDR) * [`--percolator_fdr_level`](#--percolator_fdr_level) - * [`--post-processing-tdc`](#--post-processing-tdc) * [`--description_correct_features`](#--description_correct_features) * [`--generic-feature-set`](#--feature) * [`--subset-max-train`](#--subset-max-train) @@ -371,6 +370,15 @@ before going into feature finding, map alignment and inference. ### Percolator specific +In the following you can find help for the Percolator specific options that are only used if [`--posterior_probabilities`](#--posterior_probabilities) was set to "percolator". +Note that there are currently some restrictions to the original options of Percolator: + +* no Percolator protein FDR possible (currently OpenMS' FDR is used on protein level) +* no support for separate target and decoy databases (i.e. no min-max q-value calculation or target-decoy competition strategy) +* no support for combined or experiment-wide peptide re-scoring. Currently search results per input file are submitted to Percolator independently. + +With time, some of the limitations might be removed. Pull requests are always welcome. + ### `--train_FDR` False discovery rate threshold to define positive examples in training. Set to testFDR if 0 @@ -383,10 +391,6 @@ False discovery rate threshold for evaluating best cross validation result and r Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') -### `--post-processing-tdc` - -Use target-decoy competition to assign q-values and PEPs. - ### `--description_correct_features` Percolator provides the possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is the used as predictive features. @@ -416,6 +420,9 @@ Retention time features are calculated as in Klammer et al. instead of with Elud ### Distribution-fitting (IDPEP) specific +Use this instead of Percolator if there are problems with Percolator (e.g. due to bad separation) or for performance +reasons. + ### `--outlier_handling` How to handle outliers during fitting: diff --git a/main.nf b/main.nf index c2df837..0736cc6 100644 --- a/main.nf +++ b/main.nf @@ -74,7 +74,6 @@ def helpMessage() { --train_FDR False discovery rate threshold to define positive examples in training. Set to testFDR if 0 --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result --percolator_fdr_level Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') - --post-processing-tdc Use target-decoy competition to assign q-values and PEPs. --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) --generic-feature-set Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly. @@ -595,17 +594,15 @@ process percolator { log.warn('Klammer will be implicitly off!') } - def pptdc = params.post_processing_tdc ? "" : "-post-processing-tdc" - + // currently post-processing-tdc is always set since we do not support separate TD databases """ PercolatorAdapter -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ - ${pptdc} \\ -subset-max-train ${params.subset_max_train} \\ -decoy-pattern ${params.decoy_affix} \\ + -post-processing-tdc \\ > ${id_file.baseName}_percolator.log - """ } diff --git a/nextflow.config b/nextflow.config index ac8f800..7272554 100644 --- a/nextflow.config +++ b/nextflow.config @@ -56,7 +56,6 @@ params { klammer = 'false' description_correct_features = 0 subset_max_train = 300000 - post_processing_tdc = 'false' // ProteomicsLFQ flags inf_quant_debug = 0 From 231db96e4e091e7bfed0356e5a6348e4b96a53c4 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 20 Apr 2020 15:13:14 +0200 Subject: [PATCH 170/374] Update nextflow.config --- nextflow.config | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nextflow.config b/nextflow.config index e61cc62..3e57aaa 100644 --- a/nextflow.config +++ b/nextflow.config @@ -40,6 +40,9 @@ params { fragment_method = "HCD" //currently only used in luciphor. hard to find a good logic to beat the defaults for the engines enable_mod_localization = false mod_localization = "Phospho (S),Phospho (T),Phospho (Y)" + isotope_error_range = '0,1' + instrument = 'high_res' + protocol = 'automatic' min_precursor_charge = 2 max_precursor_charge = 3 min_peptide_length = 6 From debde3e61f5fe3e6bd49a9b540876ebc199dce05 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 20 Apr 2020 15:13:48 +0200 Subject: [PATCH 171/374] Update nextflow.config --- nextflow.config | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nextflow.config b/nextflow.config index 3e57aaa..6a15bac 100644 --- a/nextflow.config +++ b/nextflow.config @@ -36,10 +36,10 @@ params { fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 - fragment_mass_tolerance_unit = "ppm" - fragment_method = "HCD" //currently only used in luciphor. hard to find a good logic to beat the defaults for the engines + fragment_mass_tolerance_unit = 'ppm' + fragment_method = 'HCD' //currently only used in luciphor. hard to find a good logic to beat the defaults for the engines enable_mod_localization = false - mod_localization = "Phospho (S),Phospho (T),Phospho (Y)" + mod_localization = 'Phospho (S),Phospho (T),Phospho (Y)' isotope_error_range = '0,1' instrument = 'high_res' protocol = 'automatic' From 0360e67c5e52b1696963a79ffe2da1f14f00be1f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:15:04 +0200 Subject: [PATCH 172/374] some fixes for phospho. adds a dev profile and a dockerfile for nightly containers --- conf/dev.config | 15 +++++++++++++++ dev/Dockerfile | 22 ++++++++++++++++++++++ dev/README.md | 5 +++++ dev/environment-dev.yml | 21 +++++++++++++++++++++ main.nf | 11 +++++++---- nextflow.config | 11 ++++++++++- 6 files changed, 80 insertions(+), 5 deletions(-) create mode 100644 conf/dev.config create mode 100644 dev/Dockerfile create mode 100644 dev/README.md create mode 100644 dev/environment-dev.yml diff --git a/conf/dev.config b/conf/dev.config new file mode 100644 index 0000000..e508461 --- /dev/null +++ b/conf/dev.config @@ -0,0 +1,15 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running with nightly dev containers + * ------------------------------------------------- + * Only overwrites the container. See dev/ folder for building instructions. + * Use as follows: + * nextflow run nf-core/proteomicslfq -profile dev, + */ + +params { + config_profile_name = 'Development profile' + config_profile_description = 'To use nightly development containers' +} + +process.container = "proteomicslfq-dev" diff --git a/dev/Dockerfile b/dev/Dockerfile new file mode 100644 index 0000000..0f4a260 --- /dev/null +++ b/dev/Dockerfile @@ -0,0 +1,22 @@ +from openms/executables:latest + +ENV DEBIAN_FRONTEND=noninteractive +ENV PATH /opt/conda/bin:$PATH + +RUN apt-get install -y wget +RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \ + /bin/bash ~/miniconda.sh -b -p /opt/conda && \ + rm ~/miniconda.sh && \ + ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ + echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \ + echo "conda activate base" >> ~/.bashrc + +RUN bash -c "conda init" + +# Install the conda environment +COPY environment-dev.yml / +RUN bash -c "conda env create -f /environment-dev.yml && conda clean -a" + +# Add conda installation dir to PATH (instead of doing 'conda activate') +ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH +CMD [/bin/bash] diff --git a/dev/README.md b/dev/README.md new file mode 100644 index 0000000..1bb3354 --- /dev/null +++ b/dev/README.md @@ -0,0 +1,5 @@ +Use this Dockerfile to create a container with nightly openms versions. +``` +docker build -t proteomicslfq-dev . +``` +Use the dev profile to use this container. diff --git a/dev/environment-dev.yml b/dev/environment-dev.yml new file mode 100644 index 0000000..a855628 --- /dev/null +++ b/dev/environment-dev.yml @@ -0,0 +1,21 @@ +# You can use this file to create a conda environment for this pipeline: +# conda env create -f environment.yml +name: nf-core-proteomicslfq-1.0dev +channels: + - conda-forge + - bioconda + - defaults +dependencies: + # bioconda + - bioconda::bioconductor-msstats # will include R + - bioconda::thermorawfileparser + - conda-forge::r-ptxqc=1.0.2 # for QC reports + - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports + - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) + - conda-forge::click=7.1.1 # for parse_sdrf.py + - conda-forge::pandas=1.0.3 # for parse_sdrf.py + - conda-forge::python=3.8.1 + - conda-forge::markdown=3.2.1 + - conda-forge::pymdown-extensions=6.0 + - conda-forge::pygments=2.5.2 + diff --git a/main.nf b/main.nf index 52ade88..866cd77 100644 --- a/main.nf +++ b/main.nf @@ -831,7 +831,7 @@ process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(mzml_file), file(id_file), frag_meth from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) + tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) output: set mzml_id, file("${id_file.baseName}_luciphor.idXML") into id_files_luciphor @@ -843,6 +843,9 @@ process luciphor { script: id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_plfq = Channel.empty() id_files_idx_feat_perc_fdr_filter_switched_plfq = Channel.empty() + def losses = params.luciphor_neutral_losses ? '-neutral_losses "${params.luciphor_neutral_losses}"' : '' + def dec_mass = params.luciphor_decoy_mass ? '-decoy_mass "${params.luciphor_decoy_mass}"' : '' + def dec_losses = params.luciphor_decoy_neutral_losses ? '-decoy_neutral_losses "${params.luciphor_decoy_neutral_losses}' : '' """ LuciphorAdapter -id ${id_file} \\ -in ${mzml_file} \\ @@ -851,9 +854,9 @@ process luciphor { -num_threads ${task.cpus} \\ -target_modifications ${params.mod_localization.tokenize(',').collect { "'${it}'" }.join(" ") } \\ -fragment_method ${frag_method} \\ - -neutral_losses "${params.luciphor_neutral_losses}" \\ - -decoy_mass "${params.luciphor_decoy_mass}" \\ - -decoy_neutral_losses "${params.luciphor_decoy_neutral_losses}" \\ + ${losses} \\ + ${dec_mass} \\ + ${dec_losses} \\ -max_charge_state ${params.max_precursor_charge} \\ -max_peptide_length ${params.max_peptide_length} \\ > ${id_file.baseName}_scoreswitcher.log diff --git a/nextflow.config b/nextflow.config index 6a15bac..46f4888 100644 --- a/nextflow.config +++ b/nextflow.config @@ -5,6 +5,9 @@ * Default config options for all environments. */ +// TODO remove debug +process.cache = 'lenient' + // Global default params, used in configs params { @@ -59,6 +62,11 @@ params { description_correct_features = 0 subset_max_train = 300000 + // Luciphor options + luciphor_neutral_losses = '' + luciphor_decoy_mass = '' + luciphor_decoy_neutral_losses = '' + // ProteomicsLFQ flags inf_quant_debug = 0 protein_inference = 'aggregation' @@ -103,7 +111,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/proteomicslfq:dev' +process.container = 'proteomicslfq-dev:latest' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -135,6 +143,7 @@ profiles { } test { includeConfig 'conf/test.config' } test_full { includeConfig 'conf/test_full.config' } + dev { includeConfig 'conf/dev.config' } } // Avoid this error: From 6b01a3d37491f9bb223716cf24bf3d26bcbb0400 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:21:40 +0200 Subject: [PATCH 173/374] lint --- dev/README.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/dev/README.md b/dev/README.md index 1bb3354..6202e19 100644 --- a/dev/README.md +++ b/dev/README.md @@ -1,5 +1,11 @@ -Use this Dockerfile to create a container with nightly openms versions. -``` +### Build a development container + +Use this Dockerfile to create a container with nightly OpenMS versions. +Depends on the DockerHub builds of [openms/executables](https://hub.docker.com/r/openms/executables). +Build with: + +```bash docker build -t proteomicslfq-dev . ``` -Use the dev profile to use this container. + +Then use the dev profile to use this container. From e4028bc8b6665c8987e1ef636b064c1c7eb97340 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:24:08 +0200 Subject: [PATCH 174/374] lint --- dev/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/README.md b/dev/README.md index 6202e19..fc8da66 100644 --- a/dev/README.md +++ b/dev/README.md @@ -1,4 +1,4 @@ -### Build a development container +# Build a development container Use this Dockerfile to create a container with nightly OpenMS versions. Depends on the DockerHub builds of [openms/executables](https://hub.docker.com/r/openms/executables). From 58ec75f05339971bc05940de7f59ae9f0d008bd3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:27:10 +0200 Subject: [PATCH 175/374] lint --- dev/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/README.md b/dev/README.md index fc8da66..ba349c6 100644 --- a/dev/README.md +++ b/dev/README.md @@ -5,7 +5,7 @@ Depends on the DockerHub builds of [openms/executables](https://hub.docker.com/r Build with: ```bash -docker build -t proteomicslfq-dev . +docker build -t proteomicslfq-dev . ``` Then use the dev profile to use this container. From d2d00515681af4460156bdd939bdcb36428f1b63 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:32:03 +0200 Subject: [PATCH 176/374] revert container in main config --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 46f4888..37a5a77 100644 --- a/nextflow.config +++ b/nextflow.config @@ -111,7 +111,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'proteomicslfq-dev:latest' +process.container = 'nfcore/proteomicslfq:dev' // Load base.config by default for all pipelines includeConfig 'conf/base.config' From b75d9faedcaac6d86043b0deab61ccbd532f274a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 21 Apr 2020 20:54:47 +0200 Subject: [PATCH 177/374] add a tools folder --- tools/README.md | 4 + .../GroupComparisonPlots.R | 842 ++++++++++++++++++ .../MsstatsInteractive.Rmd | 88 ++ 3 files changed, 934 insertions(+) create mode 100644 tools/README.md create mode 100644 tools/interactive_msstats/GroupComparisonPlots.R create mode 100644 tools/interactive_msstats/MsstatsInteractive.Rmd diff --git a/tools/README.md b/tools/README.md new file mode 100644 index 0000000..d4308f2 --- /dev/null +++ b/tools/README.md @@ -0,0 +1,4 @@ +# Tools for downstream analysis or interactive visualization + +This folder is to collect scripts and apps for non-standard downstream processing protocols +and interactive visualizations. diff --git a/tools/interactive_msstats/GroupComparisonPlots.R b/tools/interactive_msstats/GroupComparisonPlots.R new file mode 100644 index 0000000..827dd54 --- /dev/null +++ b/tools/interactive_msstats/GroupComparisonPlots.R @@ -0,0 +1,842 @@ + +############################################# +## groupComparisonPlots +############################################# + +#' export +#' importFrom gplots heatmap.2 +#' importFrom stats hclust +#' importFrom ggrepel geom_text_repel +#' importFrom marray maPalette + +groupComparisonPlots <- function(data=data, + type=type, + sig=0.05, + FCcutoff=FALSE, + logBase.pvalue=10, + ylimUp=FALSE, + ylimDown=FALSE, + xlimUp=FALSE, + x.axis.size=10, + y.axis.size=10, + dot.size=3, + text.size=4, + text.angle=0, + legend.size=13, + ProteinName=TRUE, + colorkey=TRUE, + numProtein=100, + clustering="both", + width=10, + height=10, + which.Comparison="all", + which.Protein="all", + address="") { + + ## save process output in each step + allfiles <- list.files() + filenaming <- "msstats" + + finalfile <- "msstats.log" + processout <- NULL + + processout <- rbind(processout, + as.matrix(c(" ", " ", "MSstats - groupComparisonPlots function", " "), ncol=1)) + + ## make upper letter + type <- toupper(type) + + if (length(setdiff(type, c("HEATMAP", "VOLCANOPLOT", "COMPARISONPLOT"))) != 0) { + + processout <- rbind(processout, + c(paste0("Input for type=", type, + ". However,'type' should be one of Heatmap, VolcanoPlot, ComparisonPlot."))) + write.table(processout, file=finalfile, row.names=FALSE) + + stop(paste0("Input for type=", type, + ". However,'type' should be one of Heatmap, VolcanoPlot, ComparisonPlot.")) + } + + ## check logBase.pvalue is 2,10 or not + if (logBase.pvalue != 2 & logBase.pvalue != 10) { + processout <- rbind(processout, + "ERROR : (-) Logarithm transformation for adjusted p-values : log2 or log10 only - stop") + write.table(processout, file=finalfile, row.names=FALSE) + + stop("Only -log2 or -log10 for logarithm transformation for adjusted p-values are posssible.\n") + } + + if (which.Comparison != "all") { + ## check which.comparison is name of comparison + if (is.character(which.Comparison)) { + + temp.name <- which.Comparison + + ## message if name of comparison is wrong. + if (length(setdiff(temp.name, unique(data$Label))) > 0) { + + processout <- rbind(processout, + paste0("Please check labels of comparions. Result does not have this comparison. - ", + toString(temp.name))) + write.table(processout, file=finalfile, row.names=FALSE) + + stop(paste0("Please check labels of comparions. Result does not have this comparison. - ", + toString(temp.name))) + } + } + + ## check which.comparison is order number of comparison + if (is.numeric(which.Comparison)) { + + temp.name <- levels(data$Label)[which.Comparison] + + ## message if name of comparison is wrong. + if (length(levels(data$Label)) (-log2(FCcutoff))] <- 1 + } + if (colnames(data)[3] == "log10FC") { + data$adj.pvalue[data[, 3] < log10(FCcutoff) & data[, 3] > (-log10(FCcutoff))] <- 1 + } + } + + final <- NULL + + ## based on p-value + for (i in 1:nlevels(data$Label)) { + sub <- data[data$Label == levels(data$Label)[i], ] + + if (logBase.pvalue == 2) { + temp <- -log2(sub$adj.pvalue) * sign(sub[, 3]) + } + + if (logBase.pvalue == 10) { + temp <- -log10(sub$adj.pvalue) * sign(sub[, 3]) + } + + final <- data.frame(cbind(final, temp)) + } + + obj <- final + data$Protein <- factor(data$Protein) + rownames(obj) <- levels(data$Protein) + colnames(obj) <- levels(data$Label) + + ## remove if whole rows or columns are NA + obj <- obj[rowSums(!is.na(obj)) != 0, colSums(!is.na(obj)) != 0] + + ## clustering for order + tempobj <- obj + tempobj[is.na(tempobj)] <- 50 + + if (toupper(clustering) == 'PROTEIN') { + obj <- obj[hclust(dist(tempobj), method="ward.D")$order, ] + } + if (toupper(clustering) == 'COMPARISON') { + obj <- obj[, hclust(dist(t(tempobj)), method="ward.D")$order] + } + if (toupper(clustering) == 'BOTH') { + obj <- obj[hclust(dist(tempobj), method="ward.D")$order, + hclust(dist(t(tempobj)), method="ward.D")$order] + } + if (toupper(clustering) == 'NONE') { + obj <- obj + } + + rm(tempobj) + + ## change the order + ##obj$id <- seq(1:nrow(obj)) + ##obj <- obj[order(obj$id,decreasing=TRUE),] + ##obj <- subset(obj, select=-c(id)) + + ## color scale + blue.red.18 <- maPalette(low="blue", high="red", mid="black", k=12) + my.colors <- blue.red.18 + #my.colors[my.colors=="#FFFFFF"] <- "gold" + my.colors <- c(my.colors, "grey") ## for NA + + ## color scale is fixed with log 10 based. then change break for log 2 based + up <- 10 + temp <- 10^(-sort(ceiling(seq(2, up, length=10)[c(1, 2, 3, 5, 10)]), decreasing=TRUE)) + breaks <- c(temp, sig) + + if (logBase.pvalue == 10) { + + neg.breaks <- log(breaks, 10) + my.breaks <- c(neg.breaks, 0, -neg.breaks[6:1], 101) + + } else if (logBase.pvalue == 2) { + + neg.breaks <- log(breaks, 2) + my.breaks <- c(neg.breaks, 0, -neg.breaks[6:1], 101) + } + + ## draw color key + blocks <- c(-breaks, 1, breaks[6:1]) + x.at <- seq(-0.05, 1.05, length.out=13) + + ## maximum number of proteins per heatmap + namepro <- rownames(obj) + totalpro <- length(namepro) + numheatmap <- totalpro %/% numProtein + 1 + + ## If there are the file with the same name, add next numbering at the end of file name + if (address != FALSE) { + allfiles <- list.files() + + num <- 0 + filenaming <- paste0(address, "Heatmap") + finalfile <- paste0(address, "Heatmap.pdf") + + while (is.element(finalfile, allfiles)) { + num <- num + 1 + finalfile <- paste0(paste(filenaming, num, sep="-"), ".pdf") + } + + pdf(finalfile, width=width, height=height) + } + + if (colorkey) { + par(mar=c(3, 3, 3, 3), mfrow=c(3, 1), oma=c(3, 0, 3, 0)) + plot.new() + image(z = matrix(seq(1:(length(my.colors) - 1)), ncol = 1), + col=my.colors[-length(my.colors)], + xaxt = "n", + yaxt = "n") + mtext("Color Key", side=3,line=1, cex=3) + mtext("(sign) Adjusted p-value", side=1, line=3, at=0.5, cex=1.7) + mtext(blocks, side=1, line=1, at=x.at, cex=1) + } + + ## draw heatmap + ## loop for numProtein + for (j in 1:numheatmap) { + + ## get the number proteins needed + if (j != numheatmap) { + tempobj <- obj[((j - 1) * numProtein + 1):(j * numProtein), ] + } else { + tempobj <- obj[((j - 1) * numProtein + 1):nrow(obj), ] + } + + par(oma=c(3, 0, 0, 4)) + heatmap.2(as.matrix(tempobj), + col=my.colors, + Rowv=FALSE, + Colv=FALSE, + dendrogram="none", + breaks=my.breaks, + trace="none", + na.color="grey", ## missing value will be grey + cexCol=(x.axis.size / 10), cexRow=(y.axis.size / 10), # assign text.size as option + key=FALSE, + lhei=c(0.1, 0.9), + lwid=c(0.1, 0.9)) + } + ## end loop for heatmap + + if (address != FALSE) { + dev.off() + } + } + + ####################### + ## VolcanoPlot + ####################### + if (type == "VOLCANOPLOT") { + + ## If there are the file with the same name, add next numbering at the end of file name + if (address != FALSE) { + allfiles <- list.files() + + num <- 0 + filenaming <- paste0(address, "VolcanoPlot") + finalfile <- paste0(address, "VolcanoPlot.pdf") + + while (is.element(finalfile, allfiles)) { + num <- num + 1 + finalfile <- paste0(paste(filenaming, num, sep="-"), ".pdf") + } + + pdf(finalfile, width=width, height=height) + } + + if (logBase.pvalue == 2) { + y.limUp <- 30 + } + + if (logBase.pvalue == 10) { + y.limUp <- 10 + } + + if (is.numeric(ylimUp)) { + y.limUp <- ylimUp + } + + ## remove the result, NA + data <- data[!is.na(data$adj.pvalue), ] + + ## group for coloring dots + if (!FCcutoff) { + data[data$adj.pvalue >= sig, "colgroup"] <- "No-regulation" + data[data$adj.pvalue < sig & data[, 3] > 0, "colgroup"] <- "Up-regulated" + data[data$adj.pvalue < sig & data[, 3] < 0, "colgroup"] <- "Down-regulated" + } + + if (is.numeric(FCcutoff)) { + data$colgroup <- "No-regulation" + + if (colnames(data)[3] == "log2FC") { + data[data$adj.pvalue < sig & data[, 3] > log2(FCcutoff), "colgroup"] <- "Up-regulated" + data[data$adj.pvalue < sig & data[, 3] < (-log2(FCcutoff)), "colgroup"] <- "Down-regulated" + } + + if (colnames(data)[3] == "log10FC") { + data[data$adj.pvalue < sig & data[, 3] > log10(FCcutoff), "colgroup"] <- "Up-regulated" + data[data$adj.pvalue < sig & data[, 3] < (-log10(FCcutoff)), "colgroup"] <- "Down-regulated" + } + } + + data$colgroup <- factor(data$colgroup, levels=c("No-regulation", "Down-regulated", "Up-regulated")) + + ## for multiple volcano plots, + for (i in 1:nlevels(data$Label)) { + + sub <- data[data$Label == levels(data$Label)[i], ] + + if (logBase.pvalue == 2) { + sub$adj.pvalue[sub$adj.pvalue < 2^(-y.limUp)] <- 2^(-y.limUp) + } + + if (logBase.pvalue == 10) { + sub$adj.pvalue[sub$adj.pvalue < 10^(-y.limUp)] <- 10^(-y.limUp) + } + + sub <- as.data.frame(sub) + + ## ylimUp + if (logBase.pvalue == 2) { + y.limup <- ceiling(max(-log2(sub[!is.na(sub$adj.pvalue), "adj.pvalue"]))) + if (y.limup < (-log2(sig))) { + y.limup <- (-log2(sig) + 1) ## for too small y.lim + } + } + + if (logBase.pvalue == 10) { + y.limup <- ceiling(max(-log10(sub[!is.na(sub$adj.pvalue), "adj.pvalue"]))) + if (y.limup < (-log10(sig))) { + y.limup <- (-log10(sig) + 1) ## for too small y.lim + } + } + + ## ylimDown + y.limdown <- 0 ## default is zero + if (is.numeric(ylimDown)) { + y.limdown <- ylimDown + } + + ## x.lim + x.lim <- ceiling(max(abs(sub[!is.na(sub[, 3]) & abs(sub[, 3]) != Inf, 3]))) ## log2FC or log10FC + if (x.lim < 3) { + x.lim <- 3 + } + if (is.numeric(xlimUp)) { + x.lim <- xlimUp + } + + ## for assigning x in ggplot2 + subtemp <- sub + colnames(subtemp)[3] <- "logFC" + + if (logBase.pvalue == 2) { + subtemp$log2adjp <- (-log2(subtemp$adj.pvalue)) + } + + if (logBase.pvalue == 10) { + subtemp$log10adjp <- (-log10(subtemp$adj.pvalue)) + } + + ## for x limit for inf or -inf + subtemp$newlogFC <- subtemp$logFC + subtemp[!is.na(subtemp$issue) & + subtemp$issue == "oneConditionMissing" & + subtemp$logFC == Inf, "newlogFC"] <- (x.lim - 0.2) + subtemp[!is.na(subtemp$issue) & + subtemp$issue == "oneConditionMissing" & + subtemp$logFC == (-Inf), "newlogFC"] <- (x.lim - 0.2) * (-1) + + ## add (*) in Protein name for Inf or -Inf + subtemp$Protein <- as.character(subtemp$Protein) + subtemp[!is.na(subtemp$issue) & + subtemp$issue == "oneConditionMissing", "Protein"] <- paste0("*", + subtemp[!is.na(subtemp$issue) & + subtemp$issue == "oneConditionMissing", "Protein"]) + + ## Plotting + if (logBase.pvalue == 2) { + ptemp <- ggplot(aes_string(x='logFC', y='log2adjp', + color='colgroup', + label='Protein', + customdata = 'Protein'), + data=subtemp) + + geom_point(size=dot.size) + + scale_color_manual(values=c("gray65", "blue", "red"), + labels=c("No regulation", "Down-regulated", "Up-regulated")) + + scale_y_continuous('-Log2 (adjusted p-value)', + limits=c(y.limdown, y.limup)) + + labs(title=unique(levels(sub$Label))) + } + + if (logBase.pvalue == 10) { + ptemp <- ggplot(aes_string(x='logFC', y='log10adjp', + color='colgroup', + label='Protein', + customdata = 'Protein'), + data=subtemp) + + geom_point(size=dot.size) + + scale_color_manual(values=c("gray65", "blue", "red"), + labels=c("No regulation", "Down-regulated", "Up-regulated")) + + scale_y_continuous('-Log10 (adjusted p-value)', + limits=c(y.limdown, y.limup)) + + labs(title=unique(levels(sub$Label))) + } + + ## x-axis labeling + if (colnames(sub)[3] == "log2FC") { + ptemp <- ptemp + + scale_x_continuous('Log2 fold change', + limits=c(-x.lim, x.lim)) + } + if (colnames(sub)[3] == "log10FC") { + ptemp <- ptemp + + scale_x_continuous('Log10 fold change', + limits=c(-x.lim, x.lim)) + } + + ## add protein name + if (ProteinName) { + if (length(unique(subtemp$colgroup)) == 1 & any(unique(subtemp$colgroup) == 'No-regulation')) { + message(paste0("The volcano plot for ", unique(subtemp$Label), + " does not show the protein names because none of them is significant.")) + } else { + ptemp <- ptemp + + geom_text_repel(data=subtemp[subtemp$colgroup != "No-regulation", ], + aes(label=Protein), + size=text.size, + col='black') + } + } + + ## For legend of linetype for cutoffs + ## first assign line type + ltypes <- c("type1"="twodash", "type2"="dotted") + + ## cutoff lines, FDR only + if (!FCcutoff) { + if (logBase.pvalue == 2) { + sigcut <- data.frame("Protein"='sigline', + "logFC"=seq(-x.lim, x.lim, length.out=20), + "log2adjp"=(-log2(sig)), + "line"='twodash') + + pfinal <- ptemp + + geom_line(data=sigcut, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values=c('twodash'=6), + labels=c(paste0("Adj p-value cutoff (", sig, ")"))) + + } + + if (logBase.pvalue == 10) { + pfinal <- ptemp + + geom_hline(aes(yintercept=-log10(sig), linetype='Adj p-value cutoff'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values = c(6)) + + } + } + + ## cutoff lines, FDR and Fold change cutoff + if (is.numeric(FCcutoff)) { + if (colnames(sub)[3] == "log2FC") { + if (logBase.pvalue == 2) { + + ## three different lines + sigcut <- data.frame("Protein"='sigline', + "logFC"=seq(-x.lim, x.lim, length.out=10), + "log2adjp"=(-log2(sig)), + "line"='twodash') + FCcutpos <- data.frame("Protein"='sigline', + "logFC"=log2(FCcutoff), + "log2adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + FCcutneg <- data.frame("Protein"='sigline', + "logFC"=(-log2(FCcutoff)), + "log2adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + + ## three lines, with order color first and then assign linetype manual + pfinal <- ptemp + + geom_line(data=sigcut, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutpos, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutneg, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values=c('dotted'=3, 'twodash'=6), + labels=c(paste0("Fold change cutoff (", FCcutoff, ")"), + paste0("Adj p-value cutoff (", sig, ")"))) + + guides(colour=guide_legend(override.aes=list(linetype=0)), + linetype=guide_legend()) + } + + if (logBase.pvalue == 10) { + + ## three different lines + sigcut <- data.frame("Protein"='sigline', + "logFC"=seq(-x.lim, x.lim, length.out=10), + "log10adjp"=(-log10(sig)), + "line"='twodash') + FCcutpos <- data.frame("Protein"='sigline', + "logFC"=log2(FCcutoff), + "log10adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + FCcutneg <- data.frame("Protein"='sigline', + "logFC"=(-log2(FCcutoff)), + "log10adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + + ## three lines, with order color first and then assign linetype manual + pfinal <- ptemp + + geom_line(data=sigcut, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutpos, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutneg, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values=c('dotted'=3, 'twodash'=6), + labels=c(paste0("Fold change cutoff (", FCcutoff, ")"), + paste0("Adj p-value cutoff (", sig, ")"))) + + guides(colour=guide_legend(override.aes=list(linetype=0)), + linetype=guide_legend()) + } + } + + if (colnames(sub)[3] == "log10FC") { + if (logBase.pvalue == 2) { + + ## three different lines + sigcut <- data.frame("Protein"='sigline', + "logFC"=seq(-x.lim, x.lim, length.out=10), + "log2adjp"=(-log2(sig)), + "line"='twodash') + FCcutpos <- data.frame("Protein"='sigline', + "logFC"=log10(FCcutoff), + "log2adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + FCcutneg <- data.frame("Protein"='sigline', + "logFC"=(-log10(FCcutoff)), + "log2adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + + ## three lines, with order color first and then assign linetype manual + pfinal <- ptemp + + geom_line(data=sigcut, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutpos, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutneg, + aes_string(x='logFC', y='log2adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values=c('dotted'=3, 'twodash'=6), + labels=c(paste0("Fold change cutoff (", FCcutoff, ")"), + paste0("Adj p-value cutoff (", sig, ")"))) + + guides(colour=guide_legend(override.aes=list(linetype=0)), + linetype=guide_legend()) + } + + if (logBase.pvalue == 10) { + + ## three different lines + sigcut <- data.frame("Protein"='sigline', + "logFC"=seq(-x.lim, x.lim, length.out=10), + "log10adjp"=(-log10(sig)), + "line"='twodash') + FCcutpos <- data.frame("Protein"='sigline', + "logFC"=log10(FCcutoff), + "log10adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + FCcutneg <- data.frame("Protein"='sigline', + "logFC"=(-log10(FCcutoff)), + "log10adjp"=seq(y.limdown, y.limup, length.out=10), + "line"='dotted') + + ## three lines, with order color first and then assign linetype manual + pfinal <- ptemp + + geom_line(data=sigcut, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutpos, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + geom_line(data=FCcutneg, + aes_string(x='logFC', y='log10adjp', linetype='line'), + colour="darkgrey", + size=0.6) + + scale_linetype_manual(values=c('dotted'=3, 'twodash'=6), + labels=c(paste0("Fold change cutoff (", FCcutoff, ")"), + paste0("Adj p-value cutoff (", sig, ")"))) + + guides(colour=guide_legend(override.aes=list(linetype=0)), + linetype=guide_legend()) + } + } + } + + pfinal <- pfinal + + theme( + panel.background = element_rect(fill='white', colour="black"), + panel.grid.minor = element_blank(), + axis.text.x = element_text(size=x.axis.size, colour="black"), + axis.text.y = element_text(size=y.axis.size, colour="black"), + axis.ticks = element_line(colour="black"), + axis.title.x = element_text(size=x.axis.size+5, vjust=-0.4), + axis.title.y = element_text(size=y.axis.size+5, vjust=0.3), + title = element_text(size=x.axis.size+8, vjust=1.5), + legend.position="bottom", + legend.key = element_rect(fill='white', colour='white'), + legend.text = element_text(size=legend.size), + legend.title = element_blank() + ) + + return(pfinal) + } ## end-loop + + if (address!=FALSE) { + dev.off() + } + } + + ####################### + ## Comparison Plot + ####################### + if (type == "COMPARISONPLOT") { + + datatemp <- data[!is.na(data$adj.pvalue), ] + datatemp$Protein <- factor(datatemp$Protein) + + ## choose comparison to draw plots + if (address == FALSE){ ## here I used != FALSE, instead of !address. Because address can be logical or characters. + if (which.Protein == 'all') { + stop('** Cannnot generate all comparison plots in a screen. Please set one protein at a time.') + } else if (length(which.Protein) > 1) { + stop('** Cannnot generate multiple comparison plots in a screen. Please set one protein at a time.') + } + } + + ## Then address should not be FALASE. + ## choose Proteins or not + if (which.Protein != "all") { + ## check which.Protein is name of Protein + if (is.character(which.Protein)) { + temp.name <- which.Protein + + ## message if name of Protein is wrong. + if (length(setdiff(temp.name,unique(datatemp$Protein))) > 0) { + stop(paste0("Please check protein name. Data set does not have this protein. - ", + toString(temp.name))) + } + } + + ## check which.Protein is order number of Protein + if (is.numeric(which.Protein)) { + temp.name <- levels(datatemp$Protein)[which.Protein] + + ## message if name of Protein is wrong. + if (length(levels(datatemp$Protein)) < max(which.Protein)) { + stop(paste0("Please check your selection of proteins. There are ", + length(levels(datatemp$Protein))," proteins in this dataset.")) + } + } + + ## use only assigned proteins + datatemp <- datatemp[which(datatemp$Protein %in% temp.name), ] + datatemp$Protein <- factor(datatemp$Protein) + } + + ## If there are the file with the same name, add next numbering at the end of file name + if (address!=FALSE) { + allfiles <- list.files() + + num <- 0 + filenaming <- paste0(address, "ComparisonPlot") + finalfile <- paste0(address, "ComparisonPlot.pdf") + + while (is.element(finalfile, allfiles)) { + num <- num+1 + finalfile <- paste0(paste(filenaming, num, sep="-"), ".pdf") + } + + pdf(finalfile, width=width, height=height) + } + + for (i in 1:nlevels(datatemp$Protein)) { + + sub <- datatemp[datatemp$Protein == levels(datatemp$Protein)[i], ] + #sub$ciw <- qt(1-sig/2,sub$DF)*sub$SE + ## adjust for multiple comparison within protein + sub$ciw <- qt(1 - sig / (2 * nrow(sub)), sub$DF) * sub$SE + + sub <- as.data.frame(sub) + + ## for assigning x in ggplot2 + colnames(sub)[3] <- "logFC" + + ## ylimUp + y.limup <- ceiling(max(sub$logFC + sub$ciw)) + if (is.numeric(ylimUp)) { + y.limup <- ylimUp + } + + ## ylimDown + y.limdown <- floor(min(sub$logFC - sub$ciw)) + if (is.numeric(ylimDown)) { + y.limdown <- ylimDown + } + + ## adjust xthe location for x-axis label + if(text.angle != 0){ + hjust <- 1 + vjust <- 1 + } else { + hjust <- 0.5 + vjust <- 0.5 + } + + ptemp <- ggplot(aes_string(x='Label', y='logFC'), data=sub) + + geom_errorbar(aes(ymax = logFC + ciw, ymin=logFC - ciw), + data=sub, + width=0.1, + colour="red") + + geom_point(size=dot.size, + colour="darkred") + + scale_x_discrete('Comparison') + + geom_hline(yintercept=0, + linetype="twodash", + colour="darkgrey", + size=0.6) + + labs(title=levels(datatemp$Protein)[i]) + + theme( + panel.background = element_rect(fill='white', colour="black"), + panel.grid.major.y = element_line(colour="grey95"), + panel.grid.minor.y = element_blank(), + axis.text.x = element_text(size=x.axis.size, colour="black", + angle=text.angle, hjust=hjust, vjust=vjust), + axis.text.y = element_text(size=y.axis.size, colour="black"), + axis.ticks = element_line(colour="black"), + axis.title.x = element_text(size=x.axis.size+5, vjust=-0.4), + axis.title.y = element_text(size=y.axis.size+5, vjust=0.3), + title = element_text(size=x.axis.size+8, vjust=1.5) + ) + + if (colnames(data)[3] == "log2FC") { + ptemp <- ptemp + + scale_y_continuous("Log2-Fold Change", + limits=c(y.limdown, y.limup)) + } + if (colnames(data)[3] == "log10FC") { + ptemp <- ptemp + + scale_y_continuous("Log10-Fold Change", + limits=c(y.limdown, y.limup)) + } + + print(ptemp) + + message(paste0("Drew compasison plot for ", unique(sub$PROTEIN), + "(", i, " of ", length(unique(datatemp$Protein)), ")")) + + } ## end-loop + + if (address!=FALSE) { + dev.off() + } + } ## end Comparison plot +} \ No newline at end of file diff --git a/tools/interactive_msstats/MsstatsInteractive.Rmd b/tools/interactive_msstats/MsstatsInteractive.Rmd new file mode 100644 index 0000000..183c918 --- /dev/null +++ b/tools/interactive_msstats/MsstatsInteractive.Rmd @@ -0,0 +1,88 @@ +--- +title: "MSstats Interactive" +author: "pfeuffer" +date: "21 4 2020" +output: html_document +runtime: shiny +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) + +msstats <- read.csv("/Users/pfeuffer/Downloads/msstats 3/msstats_results.csv", row.names = 1) + +library(ggplot2) +library(plotly) +library(stringr) +library(marray) # for mapalette in the heatmap of MSstats +library(gplots) # for heatmap.2 + +cleanupPlotlyLegend <- function(myplot) +{ + for (i in 1:length(myplot$x$data)){ + if (!is.null(myplot$x$data[[i]]$name)){ + myplot$x$data[[i]]$name = gsub("\\(","",str_split(myplot$x$data[[i]]$name,",")[[1]][1]) + } + } + return(myplot) +} +``` + +This R Markdown document is a test for interactive MSstats analsis. + + +```{r msstats, echo=FALSE, message=FALSE} +source('./GroupComparisonPlots.R') + +inputPanel( + selectInput("comparison_msstats", label = "Comparison:", + choices = levels(msstats$Label), selected = 1), + sliderInput("sig_adjust", label = "p-value cutoff:", + min = 0.001, max = 0.1, value = 0.05, step = 0.01) +) + +fluidRow( + column(6, + renderPlotly({ + cleanupPlotlyLegend( + ggplotly( + groupComparisonPlots(msstats,"VolcanoPlot",ProteinName = F,address = F, which.Comparison = input$comparison_msstats, sig = input$sig_adjust) + ) %>% + layout(legend = list( + orientation = "h", + x = 0, + y = -0.35, + font=list( + family="sans-serif", + size=10, + color="black" + ) + )) + ) %>% + toWebGL() %>% + event_register("plotly_selecting") + }) + ), + column(6, + renderPlot({ + d <- event_data("plotly_selecting") + if (is.null(d)) { + groupComparisonPlots(msstats,"Heatmap",ProteinName = F,address = F, sig = input$sig_adjust) + } else { + msstats %>% + subset( Protein %in% d$customdata) %>% + groupComparisonPlots("Heatmap",ProteinName = F,address = F, sig = input$sig_adjust) + } + }) + ) +) + + + # DEBUG + #renderPrint({ + # event_data("plotly_selecting") + #}) + + + +``` \ No newline at end of file From 8b01a11c85ba882468506b4ecc5ac2c7ad9c8bf7 Mon Sep 17 00:00:00 2001 From: yperez Date: Tue, 21 Apr 2020 22:18:12 +0100 Subject: [PATCH 178/374] remove the parse_sdrf script -> move to conda --- Dockerfile | 2 +- bin/parse_sdrf.py | 374 ---------------------------------------------- environment.yml | 3 +- main.nf | 2 +- 4 files changed, 3 insertions(+), 378 deletions(-) delete mode 100755 bin/parse_sdrf.py diff --git a/Dockerfile b/Dockerfile index 228e563..866fe8b 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,5 @@ FROM nfcore/base:1.9 -LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" \ +LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg, Yasset Perez-Riverol" \ description="Docker image containing all software requirements for the nf-core/proteomicslfq pipeline" # Install the conda environment diff --git a/bin/parse_sdrf.py b/bin/parse_sdrf.py deleted file mode 100755 index 65f264f..0000000 --- a/bin/parse_sdrf.py +++ /dev/null @@ -1,374 +0,0 @@ -#!/usr/bin/env python3 -import re -import sys -import logging -import os - -import click -import pandas as pd - -CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help']) - -warnings = dict() - -@click.group(context_settings=CONTEXT_SETTINGS) -def cli(): - """Tool to convert sdrf files into OpenMS config files""" - -def openms_ify_mods(sdrf_mods): - oms_mods = list() - - for m in sdrf_mods: - if "AC=UNIMOD" not in m and "AC=Unimod" not in m: - raise Exception("only UNIMOD modifications supported. " + m) - - name = re.search("NT=(.+?)(;|$)", m).group(1) - name = name.capitalize() - - # workaround for missing PP in some sdrf TODO: fix in sdrf spec? - if re.search("PP=(.+?)[;$]", m) is None: - pp = "Anywhere" - else: - pp = re.search("PP=(.+?)(;|$)", m).group( - 1) # one of [Anywhere, Protein N-term, Protein C-term, Any N-term, Any C-term - - if re.search("TA=(.+?)(;|$)", m) is None: # TODO: missing in sdrf. - warning_message = "Warning no TA= specified. Setting to N-term or C-term if possible." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - if "C-term" in pp: - ta = "C-term" - elif "N-term" in pp: - ta = "N-term" - else: - warning_message = "Reassignment not possible. Skipping." - #print(warning_message + " "+ m) - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - pass - else: - ta = re.search("TA=(.+?)(;|$)", m).group(1) # target amino-acid - aa = ta.split(",") # multiply target site e.g., S,T,Y including potentially termini "C-term" - - if pp == "Protein N-term" or pp == "Protein C-term": - for a in aa: - if a == "C-term" or a == "N-term": # no site specificity - oms_mods.append(name + " (" + pp + ")") # any Protein N/C-term - else: - oms_mods.append(name + " (" + pp + " " + a + ")") # specific Protein N/C-term - elif pp == "Any N-term" or pp == "Any C-term": - pp = pp.replace("Any ", "") # in OpenMS we just use N-term and C-term - for a in aa: - if a == "C-term" or aa == "N-term": # no site specificity - oms_mods.append(name + " (" + pp + ")") # any N/C-term - else: - oms_mods.append(name + " (" + pp + " " + a + ")") # specific N/C-term - else: # Anywhere in the peptide - for a in aa: - oms_mods.append(name + " (" + a + ")") # specific site in peptide - - return ",".join(oms_mods) - - -def openms_convert(sdrf_file: str = None, keep_raw: bool = False, onetable : bool = False, legacy : bool = False, verbose: bool = False): - print('PROCESSING: ' + sdrf_file + '"') - sdrf = pd.read_table(sdrf_file) - sdrf = sdrf.astype(str) - sdrf.columns = map(str.lower, sdrf.columns) # convert column names to lower-case - - # map filename to tuple of [fixed, variable] mods - mod_cols = [c for ind, c in enumerate(sdrf) if - c.startswith('comment[modification parameters')] # columns with modification parameters - - # get factor columns (except constant ones) - factor_cols = [c for ind, c in enumerate(sdrf) if c.startswith('factor value[') and len(sdrf[c].unique()) > 1] - - # get characteristics columns (except constant ones) - characteristics_cols = [c for ind, c in enumerate(sdrf) if c.startswith('characteristics[') and len(sdrf[c].unique()) > 1] - - # remove characteristics columns already present as factor - redundant_characteristics_cols = set() - for c in characteristics_cols: - c_col = sdrf[c] # select characteristics column - for f in factor_cols: # Iterate over all factor columns - f_col = sdrf[f] # select factor column - if c_col.equals(f_col): - redundant_characteristics_cols.add(c) - characteristics_cols = [x for x in characteristics_cols if x not in redundant_characteristics_cols] - - file2mods = dict() - file2pctol = dict() - file2pctolunit = dict() - file2fragtol = dict() - file2fragtolunit = dict() - file2diss = dict() - file2enzyme = dict() - file2fraction = dict() - file2label = dict() - file2source = dict() - source_name_list = list() - source_name2n_reps =dict() - file2combined_factors = dict() - file2technical_rep = dict() - for index, row in sdrf.iterrows(): - ## extract mods - all_mods = list(row[mod_cols]) - #print(all_mods) - var_mods = [m for m in all_mods if 'MT=variable' in m or 'MT=Variable' in m] # workaround for capitalization - var_mods.sort() - fixed_mods = [m for m in all_mods if 'MT=fixed' in m or 'MT=Fixed' in m] # workaround for capitalization - fixed_mods.sort() - if verbose: - print(row) - raw = row['comment[data file]'] - fixed_mods_string = "" - if fixed_mods is not None: - fixed_mods_string = openms_ify_mods(fixed_mods) - - variable_mods_string = "" - if var_mods is not None: - variable_mods_string = openms_ify_mods(var_mods) - - file2mods[raw] = (fixed_mods_string, variable_mods_string) - - source_name = row['source name'] - file2source[raw] = source_name - if not source_name in source_name_list: - source_name_list.append(source_name) - - if 'comment[precursor mass tolerance]' in row: - pc_tol_str = row['comment[precursor mass tolerance]'] - if "ppm" in pc_tol_str or "Da" in pc_tol_str: - pc_tmp = pc_tol_str.split(" ") - file2pctol[raw] = pc_tmp[0] - file2pctolunit[raw] = pc_tmp[1] - else: - warning_message = "Invalid precursor mass tolerance set. Assuming 10 ppm." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - file2pctol[raw] = "10" - file2pctolunit[raw] = "ppm" - else: - warning_message = "No precursor mass tolerance set. Assuming 10 ppm." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - file2pctol[raw] = "10" - file2pctolunit[raw] = "ppm" - - if 'comment[fragment mass tolerance]' in row: - f_tol_str = row['comment[fragment mass tolerance]'] - f_tol_str.replace("PPM", "ppm") # workaround - if "ppm" in f_tol_str or "Da" in f_tol_str: - f_tmp = f_tol_str.split(" ") - file2fragtol[raw] = f_tmp[0] - file2fragtolunit[raw] = f_tmp[1] - else: - warning_message = "Invalid fragment mass tolerance set. Assuming 20 ppm." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - file2fragtol[raw] = "20" - file2fragtolunit[raw] = "ppm" - else: - warning_message = "No fragment mass tolerance set. Assuming 20 ppm." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - file2fragtol[raw] = "20" - file2fragtolunit[raw] = "ppm" - - if 'comment[dissociation method]' in row: - diss_method = re.search("NT=(.+?)(;|$)", row['comment[dissociation method]']).group(1) - file2diss[raw] = diss_method.upper() - else: - warning_message = "No dissociation method provided. Assuming HCD." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - file2diss[raw] = 'HCD' - - if 'comment[technical replicate]' in row: - technical_replicate = str(row['comment[technical replicate]']) - if "not available" in technical_replicate: - file2technical_rep[raw] = "1" - else: - file2technical_rep[raw] = technical_replicate - else: - file2technical_rep[raw] = "1" - - # store highest replicate number for this source name - if source_name in source_name2n_reps: - source_name2n_reps[source_name] = max(int(source_name2n_reps[source_name]), int(file2technical_rep[raw])) - else: - source_name2n_reps[source_name] = int(file2technical_rep[raw]) - - enzyme = re.search("NT=(.+?)(;|$)", row['comment[cleavage agent details]']).group(1) - enzyme = enzyme.capitalize() - if "Trypsin/p" in enzyme: # workaround - enzyme = "Trypsin/P" - file2enzyme[raw] = enzyme - - if 'comment[fraction identifier]' in row: - fraction = str(row['comment[fraction identifier]']) - if "not available" in fraction: - file2fraction[raw] = "1" - else: - file2fraction[raw] = fraction - else: - file2fraction[raw] = "1" - - label = re.search("NT=(.+?)(;|$)", row['comment[label]']).group(1) - file2label[raw] = label - - ## extract factors - all_factors = list(row[factor_cols]) - combined_factors = "|".join(all_factors) - if combined_factors == "": - # fallback to characteristics (use them as factors) - all_factors = list(row[characteristics_cols]) - combined_factors = "|".join(all_factors) - if combined_factors == "": - warning_message = "No factors specified. Adding dummy factor used as condition." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - combined_factors = "none" - else: - warning_message = "No factors specified. Adding non-redundant characteristics as factor. Will be used as condition." - warnings[warning_message] = warnings.get(warning_message, 0) + 1 - - file2combined_factors[raw] = combined_factors - # print(combined_factors) - - ##################### only label-free supported right now - - # output of search settings - f = open("openms.tsv", "w+") - open_ms_search_settings_header = ["URI", "Filename", "FixedModifications", "VariableModifications", "Label", - "PrecursorMassTolerance", "PrecursorMassToleranceUnit", "FragmentMassTolerance", - "FragmentMassToleranceUnit", "DissociationMethod", "Enzyme"] - f.write("\t".join(open_ms_search_settings_header) + "\n") - for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - URI = row["comment[file uri]"] - raw = row["comment[data file]"] - f.write(URI + "\t" + raw + "\t" + file2mods[raw][0] + "\t" + file2mods[raw][1] + "\t" + file2label[raw] + "\t" + file2pctol[ - raw] + "\t" + file2pctolunit[raw] + "\t" + file2fragtol[raw] + "\t" + file2fragtolunit[raw] + "\t" + file2diss[ - raw] + "\t" + file2enzyme[raw] + "\n") - f.close() - - # output of experimental design - f = open("experimental_design.tsv", "w+") - raw_ext_regex = re.compile(r"\.raw$", re.IGNORECASE) - - if onetable: - if legacy: - open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample", - "MSstats_Condition", "MSstats_BioReplicate"] - else: - open_ms_experimental_design_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", - "MSstats_Condition", "MSstats_BioReplicate"] - f.write("\t".join(open_ms_experimental_design_header) + "\n") - - for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - source_name = row["source name"] - replicate = file2technical_rep[raw] - - # calculate fraction group by counting all technical replicates of the preceeding source names - source_name_index = source_name_list.index(source_name) - offset = 0 - for i in range(source_name_index): - offset = offset + int(source_name2n_reps[source_name_list[i]]) - - fraction_group = str(offset + int(replicate)) - sample = fraction_group - - if 'none' in file2combined_factors[raw]: - # no factor defined use sample as condition - condition = sample - else: - condition = file2combined_factors[raw] - label = file2label[raw] - if "label free sample" in label: - label = "1" - - if not keep_raw: - out = raw_ext_regex.sub(".mzML", raw) - else: - out = raw - - if legacy: - f.write(fraction_group + "\t" + file2fraction[ - raw] + "\t" + out + "\t" + label + "\t" + sample + "\t" + condition + "\t" + replicate + "\n") - else: - f.write(fraction_group + "\t" + file2fraction[ - raw] + "\t" + out + "\t" + label + "\t" + condition + "\t" + replicate + "\n") - f.close() - else: # two table format - openms_file_header = ["Fraction_Group", "Fraction", "Spectra_Filepath", "Label", "Sample"] - f.write("\t".join(openms_file_header) + "\n") - - for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - source_name = row["source name"] - replicate = file2technical_rep[raw] - - # calculate fraction group by counting all technical replicates of the preceeding source names - source_name_index = source_name_list.index(source_name) - offset = 0 - for i in range(source_name_index): - offset = offset + int(source_name2n_reps[source_name_list[i]]) - - fraction_group = str(offset + int(replicate)) - sample = fraction_group # TODO: change this for multiplexed - - label = file2label[raw] - if "label free sample" in label: - label = "1" - - if not keep_raw: - out = raw_ext_regex.sub(".mzML", raw) - else: - out = raw - - f.write(fraction_group + "\t" + file2fraction[raw] + "\t" + out + "\t" + label + "\t" + sample + "\n") - - # sample table - f.write("\n") - openms_sample_header = ["Sample", "MSstats_Condition", "MSstats_BioReplicate"] - f.write("\t".join(openms_sample_header) + "\n") - for index, row in sdrf.iterrows(): # does only work for label-free not for multiplexed. TODO - raw = row["comment[data file]"] - source_name = row["source name"] - replicate = file2technical_rep[raw] - - # calculate fraction group by counting all technical replicates of the preceeding source names - source_name_index = source_name_list.index(source_name) - offset = 0 - for i in range(source_name_index): - offset = offset + int(source_name2n_reps[source_name_list[i]]) - - fraction_group = str(offset + int(replicate)) - sample = fraction_group # TODO: change this for multiplexed - - if 'none' in file2combined_factors[raw]: - # no factor defined use sample as condition - condition = sample - else: - condition = file2combined_factors[raw] - - f.write(sample + "\t" + condition + "\t" + replicate + "\n") - - f.close() - - if len(warnings) != 0: - for k,v in warnings.items(): - print('WARNING: "' + k + '" occured ' + str(v) + ' times.') - - print("SUCCESS (WARNINGS=" + str(len(warnings)) + "): " + sdrf_file) - -@click.command('convert-openms', short_help='convert sdrf to openms file output') -@click.option('--sdrf', '-s', help='SDRF file') -@click.option('--raw', '-r', help='Keep filenames in experimental design output as raw.') -@click.option('--legacy/--modern', "-l/-m", default=False, help='legacy=Create artifical sample column not needed in OpenMS 2.6.') -@click.option('--onetable/--twotables', "-t1/-t2", default=False, help='Create one-table or two-tables format.') -@click.option('--verbose/--quiet', "-v/-q", default=False, help='Output debug information.') -@click.pass_context -def openms_from_sdrf(ctx, sdrf: str, raw: bool, onetable : bool, legacy: bool, verbose: bool): - if sdrf is None: - help() - openms_convert(sdrf, raw, onetable, legacy, verbose) - - -cli.add_command(openms_from_sdrf) - -if __name__ == "__main__": - cli() diff --git a/environment.yml b/environment.yml index 408e896..a67b8fa 100644 --- a/environment.yml +++ b/environment.yml @@ -12,10 +12,9 @@ dependencies: - conda-forge::r-ptxqc=1.0.2 # for QC reports - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) - - conda-forge::click=7.1.1 # for parse_sdrf.py - - conda-forge::pandas=1.0.3 # for parse_sdrf.py - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 + - bioconda::sdrf-pipelines=0.0.2 diff --git a/main.nf b/main.nf index 0937cdd..fa23017 100644 --- a/main.nf +++ b/main.nf @@ -221,7 +221,7 @@ else """ ## -t2 since the one-table format parser is broken in OpenMS2.5 ## -l for legacy behavior to always add sample columns - parse_sdrf.py convert-openms -t2 -l -s ${sdrf} > sdrf_parsing.log + parse_sdrf convert-openms -t2 -l -s ${sdrf} > sdrf_parsing.log """ } From 848225b4f8db8d6dadd450d337c0dae3722f1989 Mon Sep 17 00:00:00 2001 From: yperez Date: Tue, 21 Apr 2020 22:19:25 +0100 Subject: [PATCH 179/374] remove the parse_sdrf script -> move to conda --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 132042e..ea81e23 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ The nf-core/proteomicslfq pipeline comes with documentation about the pipeline, ## Credits -nf-core/proteomicslfq was originally written by Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg. +nf-core/proteomicslfq was originally written by Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg, Yasset Perez-Riverol ## Contributions and Support @@ -73,5 +73,5 @@ You can cite the `nf-core` publication as follows: > > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > -> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) From dadc83f9982747727097b578d7161456c2bfff72 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 22 Apr 2020 18:14:43 +0200 Subject: [PATCH 180/374] Update environment.yml --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index a67b8fa..28795ef 100644 --- a/environment.yml +++ b/environment.yml @@ -16,5 +16,5 @@ dependencies: - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 - - bioconda::sdrf-pipelines=0.0.2 + - bioconda::sdrf-pipelines=0.0.3 From 8257623261402030d2663115e95e1f2ccf18e4d6 Mon Sep 17 00:00:00 2001 From: yperez Date: Fri, 24 Apr 2020 09:12:01 +0100 Subject: [PATCH 181/374] sdrf-pipelines -> 0.0.4 --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 28795ef..93bc608 100644 --- a/environment.yml +++ b/environment.yml @@ -16,5 +16,5 @@ dependencies: - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 - - bioconda::sdrf-pipelines=0.0.3 + - bioconda::sdrf-pipelines=0.0.4 From c1cda5450d22da8a548443058abf9252249379d9 Mon Sep 17 00:00:00 2001 From: yperez Date: Sat, 25 Apr 2020 00:13:06 +0100 Subject: [PATCH 182/374] label each process --- conf/base.config | 4 ++-- main.nf | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/conf/base.config b/conf/base.config index b3b69b0..15137e3 100644 --- a/conf/base.config +++ b/conf/base.config @@ -16,8 +16,8 @@ process { memory = { check_max( 7.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 1 + errorStrategy = { task.exitStatus in [143,137,104,134,139,130] ? 'retry' : 'finish' } + maxRetries = 2 maxErrors = '-1' // Process-specific resource requirements diff --git a/main.nf b/main.nf index fa23017..c7badb7 100644 --- a/main.nf +++ b/main.nf @@ -305,6 +305,8 @@ branched_input.mzML */ process raw_file_conversion { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -324,6 +326,8 @@ process raw_file_conversion { */ process mzml_indexing { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -353,6 +357,8 @@ branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).in //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -415,6 +421,8 @@ if (params.search_engine == "msgf") process search_engine_msgf { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' // --------------------------------------------------------------------------------------------------------------------- @@ -469,6 +477,8 @@ process search_engine_msgf { process search_engine_comet { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' // --------------------------------------------------------------------------------------------------------------------- @@ -517,6 +527,8 @@ process search_engine_comet { process index_peptides { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -548,6 +560,8 @@ process index_peptides { process extract_percolator_features { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -574,6 +588,8 @@ process extract_percolator_features { //TODO parameterize and find a way to run across all runs merged process percolator { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -611,6 +627,8 @@ process percolator { process idfilter { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' @@ -636,6 +654,8 @@ process idfilter { process idscoreswitcher { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -669,6 +689,8 @@ process idscoreswitcher { // Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_idto the channels process fdr_idpep { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -695,6 +717,8 @@ process fdr_idpep { process idscoreswitcher_idpep_pre { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -722,6 +746,8 @@ process idscoreswitcher_idpep_pre { process idpep { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -745,6 +771,8 @@ process idpep { process idscoreswitcher_idpep_post { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -771,6 +799,8 @@ process idscoreswitcher_idpep_post { process idfilter_idpep { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' @@ -796,6 +826,8 @@ process idfilter_idpep { process idscoreswitcher_idpep_postfilter { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: @@ -826,6 +858,8 @@ process idscoreswitcher_idpep_postfilter { process proteomicslfq { + label 'process_medium' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' @@ -875,6 +909,8 @@ process proteomicslfq { process msstats { + label 'process_medium' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/msstats", mode: 'copy' @@ -899,6 +935,8 @@ process msstats { process ptxqc { + label 'process_low' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ptxqc", mode: 'copy' From 4d012d798a16bb98fcbfbe6227750393f63f0c2c Mon Sep 17 00:00:00 2001 From: yperez Date: Sat, 25 Apr 2020 00:13:38 +0100 Subject: [PATCH 183/374] label each process --- conf/base.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conf/base.config b/conf/base.config index 15137e3..210c2f9 100644 --- a/conf/base.config +++ b/conf/base.config @@ -27,7 +27,7 @@ process { // TODO nf-core: Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } + cpus = { check_max( 3 * task.attempt, 'cpus' ) } memory = { check_max( 14.GB * task.attempt, 'memory' ) } time = { check_max( 6.h * task.attempt, 'time' ) } } From c6b5e1685221efa0aa6c6ebc7a7a923ce868f01e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Apr 2020 17:08:45 +0200 Subject: [PATCH 184/374] adapted labels a little, added very_low specs and single_threaded --- conf/base.config | 27 ++++++++++++++++----------- main.nf | 45 +++++++++++++++++++++++++++++++-------------- 2 files changed, 47 insertions(+), 25 deletions(-) diff --git a/conf/base.config b/conf/base.config index 210c2f9..bce27fb 100644 --- a/conf/base.config +++ b/conf/base.config @@ -12,38 +12,43 @@ process { // TODO nf-core: Check the defaults for all processes - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 8.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139,130] ? 'retry' : 'finish' } + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } maxRetries = 2 maxErrors = '-1' // Process-specific resource requirements - // NOTE - Only one of the labels below are used in the fastqc process in the main script. - // If possible, it would be nice to keep the same label naming convention when - // adding in your processes. // TODO nf-core: Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_very_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 3.h * task.attempt, 'time' ) } + } withLabel:process_low { - cpus = { check_max( 3 * task.attempt, 'cpus' ) } - memory = { check_max( 14.GB * task.attempt, 'memory' ) } + cpus = { check_max( 4 * task.attempt, 'cpus' ) } + memory = { check_max( 10.GB * task.attempt, 'memory' ) } time = { check_max( 6.h * task.attempt, 'time' ) } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 42.GB * task.attempt, 'memory' ) } + cpus = { check_max( 8 * task.attempt, 'cpus' ) } + memory = { check_max( 32.GB * task.attempt, 'memory' ) } time = { check_max( 8.h * task.attempt, 'time' ) } } withLabel:process_high { cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 84.GB * task.attempt, 'memory' ) } + memory = { check_max( 64.GB * task.attempt, 'memory' ) } time = { check_max( 10.h * task.attempt, 'time' ) } } withLabel:process_long { time = { check_max( 20.h * task.attempt, 'time' ) } } + withLabel:process_single_thread { + time = { check_max( 1 * task.attempt, 'cpus' ) } + } withName:get_software_versions { cache = false } diff --git a/main.nf b/main.nf index c7badb7..9cc38c1 100644 --- a/main.nf +++ b/main.nf @@ -306,6 +306,7 @@ branched_input.mzML process raw_file_conversion { label 'process_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -357,7 +358,8 @@ branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).in //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -421,7 +423,7 @@ if (params.search_engine == "msgf") process search_engine_msgf { - label 'process_low' + label 'process_medium' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -477,7 +479,7 @@ process search_engine_msgf { process search_engine_comet { - label 'process_low' + label 'process_medium' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -560,7 +562,8 @@ process index_peptides { process extract_percolator_features { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -588,7 +591,12 @@ process extract_percolator_features { //TODO parameterize and find a way to run across all runs merged process percolator { - label 'process_low' + //TODO Actually it heavily depends on the subset_max_train option and the number of IDs + // would be cool to get an estimate by parsing the number of IDs from previous tools. + label 'process_medium' + //TODO The current percolator version only supports up to 3-fold CV so the following might make sense now + // but in the next version it will have nested CV + cpus { check_max( 3, 'cpus' ) } publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -627,7 +635,8 @@ process percolator { process idfilter { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' @@ -654,7 +663,8 @@ process idfilter { process idscoreswitcher { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -689,7 +699,8 @@ process idscoreswitcher { // Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_idto the channels process fdr_idpep { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -717,7 +728,8 @@ process fdr_idpep { process idscoreswitcher_idpep_pre { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -747,6 +759,7 @@ process idscoreswitcher_idpep_pre { process idpep { label 'process_low' + // I think Eigen optimization is multi-threaded, so leave threads open publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -771,7 +784,8 @@ process idpep { process idscoreswitcher_idpep_post { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -799,7 +813,8 @@ process idscoreswitcher_idpep_post { process idfilter_idpep { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' @@ -826,7 +841,8 @@ process idfilter_idpep { process idscoreswitcher_idpep_postfilter { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -858,7 +874,7 @@ process idscoreswitcher_idpep_postfilter { process proteomicslfq { - label 'process_medium' + label 'process_high' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' @@ -935,7 +951,8 @@ process msstats { process ptxqc { - label 'process_low' + label 'process_very_low' + label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/ptxqc", mode: 'copy' From 05f6d95cfbc5d1d2fadecd88515b1ba504fb7ce8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Apr 2020 20:04:17 +0200 Subject: [PATCH 185/374] weird --- conf/test.config | 8 +++++++- nextflow.config | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/conf/test.config b/conf/test.config index 2f391c1..16be061 100644 --- a/conf/test.config +++ b/conf/test.config @@ -14,7 +14,7 @@ params { // Limit resources so that this can run on Travis max_cpus = 2 max_memory = 6.GB - max_time = 48.h + max_time = 1.h // Input data spectra = [ @@ -33,3 +33,9 @@ params { decoy_affix = "rev" enable_qc = true } + +process { + cpus = 2 + memory = 6.GB + time = 1.h +} \ No newline at end of file diff --git a/nextflow.config b/nextflow.config index 7272554..c13e8fd 100644 --- a/nextflow.config +++ b/nextflow.config @@ -37,7 +37,7 @@ params { variable_mods = 'Oxidation (M)' fragment_mass_tolerance = 5 fragment_mass_tolerance_unit = 'ppm' - dissociation_method = 'HCD' //currently unused. hard to find a good logic to beat the defaults + fragment_method = 'HCD' //currently unused. hard to find a good logic to beat the defaults isotope_error_range = '0,1' instrument = 'high_res' protocol = 'automatic' From a3183f74ad1c777b7567b0bc3dd78cd11b8c55a6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Apr 2020 20:20:51 +0200 Subject: [PATCH 186/374] omg stupid --- conf/base.config | 2 +- conf/test.config | 6 ------ 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/conf/base.config b/conf/base.config index bce27fb..2f89b91 100644 --- a/conf/base.config +++ b/conf/base.config @@ -47,7 +47,7 @@ process { time = { check_max( 20.h * task.attempt, 'time' ) } } withLabel:process_single_thread { - time = { check_max( 1 * task.attempt, 'cpus' ) } + cpus = { check_max( 1 * task.attempt, 'cpus' ) } } withName:get_software_versions { cache = false diff --git a/conf/test.config b/conf/test.config index 16be061..1e9b5af 100644 --- a/conf/test.config +++ b/conf/test.config @@ -32,10 +32,4 @@ params { protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" enable_qc = true -} - -process { - cpus = 2 - memory = 6.GB - time = 1.h } \ No newline at end of file From 6395199512ad29abfaf3f4a3f75038c738fdf517 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 25 Apr 2020 20:34:27 +0200 Subject: [PATCH 187/374] remove klammer warning --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index c13e8fd..fb60686 100644 --- a/nextflow.config +++ b/nextflow.config @@ -53,7 +53,7 @@ params { train_FDR = 0.05 test_FDR = 0.05 FDR_level = 'peptide-level-fdrs' - klammer = 'false' + klammer = false description_correct_features = 0 subset_max_train = 300000 From e09c0de765d001e6de45a9e83ced4daed4cf3820 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 26 Apr 2020 21:18:30 +0200 Subject: [PATCH 188/374] sdrf update --- dev/environment-dev.yml | 1 + environment.yml | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/dev/environment-dev.yml b/dev/environment-dev.yml index a855628..9b80e46 100644 --- a/dev/environment-dev.yml +++ b/dev/environment-dev.yml @@ -18,4 +18,5 @@ dependencies: - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 + - bioconda::sdrf-pipelines=0.0.4 diff --git a/environment.yml b/environment.yml index 28795ef..93bc608 100644 --- a/environment.yml +++ b/environment.yml @@ -16,5 +16,5 @@ dependencies: - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 - - bioconda::sdrf-pipelines=0.0.3 + - bioconda::sdrf-pipelines=0.0.4 From ae87c73a78a9bdb035ce81b33523386db4686bc2 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 27 Apr 2020 09:49:48 +0100 Subject: [PATCH 189/374] change in some default parameters --- nextflow.config | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nextflow.config b/nextflow.config index fb60686..e8cdb1e 100644 --- a/nextflow.config +++ b/nextflow.config @@ -30,7 +30,7 @@ params { // shared search engine parameters enzyme = 'Trypsin' num_enzyme_termini = 'fully' - allowed_missed_cleavages = 1 + allowed_missed_cleavages = 2 precursor_mass_tolerance = 5 precursor_mass_tolerance_unit = 'ppm' fixed_mods = 'Carbamidomethyl (C)' @@ -42,11 +42,11 @@ params { instrument = 'high_res' protocol = 'automatic' min_precursor_charge = 2 - max_precursor_charge = 3 + max_precursor_charge = 4 min_peptide_length = 6 max_peptide_length = 40 num_hits = 1 - max_mods = 3 + max_mods = 5 db_debug = 0 // Percolator flags From 178e1c4b213f0a54930139b7beefae27ec41f01f Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 27 Apr 2020 10:12:40 +0100 Subject: [PATCH 190/374] change in some default parameters --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index e8cdb1e..677a02d 100644 --- a/nextflow.config +++ b/nextflow.config @@ -46,7 +46,7 @@ params { min_peptide_length = 6 max_peptide_length = 40 num_hits = 1 - max_mods = 5 + max_mods = 3 db_debug = 0 // Percolator flags From fa7fe463d82f0dab5b67d2d089b89a673c47f2aa Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 27 Apr 2020 12:53:17 +0200 Subject: [PATCH 191/374] Added advanced options to PeptideIndexer --- docs/usage.md | 13 +++++++++++++ main.nf | 10 +++++++++- nextflow.config | 4 ++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 375def5..4fdbc91 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -47,6 +47,9 @@ * [`--isotope_error_range`](#--isotope_error_range) * [`--max_mods`](#--max_mods) * [`--db_debug`](#--db_debug) +* [Peptide reindexing](#peptide-reindexing) + * [`--IL_equivalent`](#--IL_equivalent) + * [`--allow_unmatched`](#--allow_unmatched) * [PSM rescoring](#psm-rescoring) * [`--posterior_probabilities`](#--posterior_probabilities) * [`--rescoring_debug`](#--rescoring_debug) @@ -348,6 +351,16 @@ Maximum number of modifications per peptide. If this value is large, the search Set debug level for the search engines (regulates if intermediate output is kept and if you are going to see the output of the underlying search engine) +## Peptide reindexing + +### `--IL_equivalent` + +Should isoleucine and leucine be treated interchangeably? Default: true + +### `--allow_unmatched` + +Ignore unmatched peptides (Default: false; only activate if you double-checked all other settings) + ## PSM Rescoring ### `--posterior_probabilities` diff --git a/main.nf b/main.nf index 9cc38c1..559f9da 100644 --- a/main.nf +++ b/main.nf @@ -60,6 +60,10 @@ def helpMessage() { //TODO probably also still some options missing. Try to consolidate them whenever the two search engines share them + Peptide Re-indexing: + --IL_equivalent Should isoleucine and leucine be treated interchangeably? Default: true + --allow_unmatched Ignore unmatched peptides (Default: false; only activate if you double-checked all other settings) + PSM Rescoring: --posterior_probabilities How to calculate posterior probabilities for PSMs: "percolator" = Re-score based on PSM-feature-based SVM and transform distance @@ -544,13 +548,17 @@ process index_peptides { file "*.log" script: + def il = params.IL_equivalent ? '-IL_equivalent' : '' + def allow_um = params.allow_unmatched ? '-allow_unmatched' : '' """ PeptideIndexer -in ${id_file} \\ -out ${id_file.baseName}_idx.idXML \\ -threads ${task.cpus} \\ -fasta ${database} \\ -enzyme:name "${enzyme}" \\ - -enzyme:specificity ${pepidx_num_enzyme_termini} + -enzyme:specificity ${pepidx_num_enzyme_termini} \\ + ${il} \\ + ${allow_um} \\ > ${id_file.baseName}_index_peptides.log """ } diff --git a/nextflow.config b/nextflow.config index fb60686..2b85919 100644 --- a/nextflow.config +++ b/nextflow.config @@ -49,6 +49,10 @@ params { max_mods = 3 db_debug = 0 + // PeptideIndexer flags + IL_equivalent = true + allow_unmatched = false + // Percolator flags train_FDR = 0.05 test_FDR = 0.05 From 9aa8e0ac11662e24cc1bcaf8648469d8dac5f37c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 27 Apr 2020 13:23:32 +0200 Subject: [PATCH 192/374] instead of random UUID, use hash of filepath --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 9cc38c1..1518412 100644 --- a/main.nf +++ b/main.nf @@ -181,7 +181,7 @@ if (!params.sdrf) { ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) ch_spectra - .multiMap{ it -> id = UUID.randomUUID().toString() + .multiMap{ it -> id = it.md5() comet_settings: msgf_settings: tuple(id, params.fixed_mods, params.variable_mods, @@ -228,7 +228,7 @@ else //TODO use header and reference by col name instead of index ch_sdrf_config_file .splitCsv(skip: 1, sep: '\t') - .multiMap{ row -> id = UUID.randomUUID().toString() + .multiMap{ row -> id = it.md5() comet_settings: msgf_settings: tuple(id, row[2], row[3], From 20e5fad3f5b94dcd0d6a2fe930887b1dca09a96c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 27 Apr 2020 13:34:48 +0200 Subject: [PATCH 193/374] tostring --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 1518412..352332e 100644 --- a/main.nf +++ b/main.nf @@ -181,7 +181,7 @@ if (!params.sdrf) { ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) ch_spectra - .multiMap{ it -> id = it.md5() + .multiMap{ it -> id = it.toString().md5() comet_settings: msgf_settings: tuple(id, params.fixed_mods, params.variable_mods, @@ -228,7 +228,7 @@ else //TODO use header and reference by col name instead of index ch_sdrf_config_file .splitCsv(skip: 1, sep: '\t') - .multiMap{ row -> id = it.md5() + .multiMap{ row -> id = it.toString().md5() comet_settings: msgf_settings: tuple(id, row[2], row[3], From 1728f3caf111f212cf63b9be60924ee88ae728a7 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 27 Apr 2020 14:39:56 +0100 Subject: [PATCH 194/374] added big-cluster --- conf/big-cluster.config | 59 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 conf/big-cluster.config diff --git a/conf/big-cluster.config b/conf/big-cluster.config new file mode 100644 index 0000000..f513ca9 --- /dev/null +++ b/conf/big-cluster.config @@ -0,0 +1,59 @@ +/* + * ------------------------------------------------- + * nf-core/proteomicslfq Nextflow base config file + * ------------------------------------------------- + * A 'blank slate' config file, appropriate for general + * use on most high performace compute environments. + * Assumes that all software is installed and available + * on the PATH. Runs in `local` mode - all jobs will be + * run on the logged in environment. + * + * This configuration is used for big mzML files and datasets where + * the size of the mzML is higher than 10GB. It also contains parameters + * for error handling. For example, errorStrategyError = 130 is used also + * from LSF is use as a code exit. + */ + +process { + + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 8.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + + errorStrategy = { task.exitStatus in [143,137,104,134,139,130] ? 'retry' : 'finish' } + maxRetries = 2 + maxErrors = '-1' + + // Process-specific resource requirements + + withLabel:process_very_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 3.h * task.attempt, 'time' ) } + } + withLabel:process_low { + cpus = { check_max( 4 * task.attempt, 'cpus' ) } + memory = { check_max( 32.GB * task.attempt, 'memory' ) } + time = { check_max( 6.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 8 * task.attempt, 'cpus' ) } + memory = { check_max( 64.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 100.GB * task.attempt, 'memory' ) } + time = { check_max( 10.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withLabel:process_single_thread { + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + } + withName:get_software_versions { + cache = false + } +} + From fe685fa19d615ab48a649993391da26fef1ebc66 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 27 Apr 2020 16:10:53 +0200 Subject: [PATCH 195/374] Update main.nf --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 352332e..3117096 100644 --- a/main.nf +++ b/main.nf @@ -228,7 +228,7 @@ else //TODO use header and reference by col name instead of index ch_sdrf_config_file .splitCsv(skip: 1, sep: '\t') - .multiMap{ row -> id = it.toString().md5() + .multiMap{ row -> id = row.toString().md5() comet_settings: msgf_settings: tuple(id, row[2], row[3], From b1171362e1216d54980012e9e557fe465cf5348d Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 27 Apr 2020 15:29:23 +0100 Subject: [PATCH 196/374] refactor name --- conf/{big-cluster.config => big-nodes.config} | 2 ++ 1 file changed, 2 insertions(+) rename conf/{big-cluster.config => big-nodes.config} (97%) diff --git a/conf/big-cluster.config b/conf/big-nodes.config similarity index 97% rename from conf/big-cluster.config rename to conf/big-nodes.config index f513ca9..4d0cbf4 100644 --- a/conf/big-cluster.config +++ b/conf/big-nodes.config @@ -12,6 +12,8 @@ * the size of the mzML is higher than 10GB. It also contains parameters * for error handling. For example, errorStrategyError = 130 is used also * from LSF is use as a code exit. + * + * This config is used mainly in PRIDE LSF cluster. */ process { From 65941991890f862c697f358c99c887254e387c42 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 27 Apr 2020 15:39:59 +0100 Subject: [PATCH 197/374] refactor name --- conf/big-nodes.config | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/conf/big-nodes.config b/conf/big-nodes.config index 4d0cbf4..f92f903 100644 --- a/conf/big-nodes.config +++ b/conf/big-nodes.config @@ -1,14 +1,10 @@ /* * ------------------------------------------------- - * nf-core/proteomicslfq Nextflow base config file + * nf-core/proteomicslfq Nextflow big-nodes config file * ------------------------------------------------- - * A 'blank slate' config file, appropriate for general - * use on most high performace compute environments. - * Assumes that all software is installed and available - * on the PATH. Runs in `local` mode - all jobs will be - * run on the logged in environment. - * - * This configuration is used for big mzML files and datasets where + * A 'big-nodes' config file, appropriate for general + * use on most high performace compute environments with datasets with big RAW + * files. This configuration is used for big mzML files and datasets where * the size of the mzML is higher than 10GB. It also contains parameters * for error handling. For example, errorStrategyError = 130 is used also * from LSF is use as a code exit. From 796584adace5cd0d43d7512083dae32a07e62b80 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 14:10:25 +0200 Subject: [PATCH 198/374] Use guessing rule to convert from ppm frag_tol to Comet's bin_tol --- main.nf | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 5fcff82..65c348b 100644 --- a/main.nf +++ b/main.nf @@ -508,6 +508,17 @@ process search_engine_comet { //TODO we currently ignore the activation_method param to leave the default "ALL" for max. compatibility script: + if (frag_tol_unit == "ppm") { + // Note: This uses an arbitrary rule to decide if it was hi-res or low-res + // and uses Comet's defaults for bin size, in case unsupported unit "ppm" was given. + if (frag_tol.toDouble() < 50) { + bin_tol = "0.01" + } else { + bin_tol = "1.005" + } + } else { + bin_tol = frag_tol + } """ CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ @@ -524,7 +535,7 @@ process search_engine_comet { -max_variable_mods_in_peptide ${params.max_mods} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ - -fragment_bin_tolerance ${frag_tol} \\ + -fragment_bin_tolerance ${bin_tol} \\ -debug ${params.db_debug} \\ > ${mzml_file.baseName}_comet.log """ From 2b0d0e98157523666f55dba0d3cd3ffc8442138f Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 15:30:17 +0200 Subject: [PATCH 199/374] More elaborate conversion --- main.nf | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 65c348b..def851f 100644 --- a/main.nf +++ b/main.nf @@ -46,7 +46,7 @@ def helpMessage() { --precursor_mass_tolerance Mass tolerance of precursor mass --precursor_mass_tolerance_unit Da or ppm --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) - --fragment_mass_tolerance_unit Da or ppm (currently always ppm) + --fragment_mass_tolerance_unit Da or ppm (currently always Da) --allowed_missed_cleavages Allowed missed cleavages --min_precursor_charge Minimum precursor ion charge --max_precursor_charge Maximum precursor ion charge @@ -512,19 +512,30 @@ process search_engine_comet { // Note: This uses an arbitrary rule to decide if it was hi-res or low-res // and uses Comet's defaults for bin size, in case unsupported unit "ppm" was given. if (frag_tol.toDouble() < 50) { - bin_tol = "0.01" + bin_tol = "0.03" + bin_offset = "0.0" + if (!params.instrument) + inst = "high_res" } else { - bin_tol = "1.005" + bin_tol = "1.0005" + bin_offset = "0.4" + if (!params.instrument) + inst = "low_res" } + log.warn "The chosen search engine Comet does not support ppm fragment tolerances. We guessed a " + inst + + " instrument and set the fragment_bin_tolerance to " + bin_tol } else { bin_tol = frag_tol + bin_offset = frag_tol.toDouble() < 0.1 ? "0.0" : "0.4" + if (!params.instrument) + inst = frag_tol.toDouble() < 0.1 ? "high_res" : "low_res" } """ CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database "${database}" \\ - -instrument ${params.instrument} \\ + -instrument ${inst} \\ -allowed_missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ -num_enzyme_termini ${params.num_enzyme_termini} \\ @@ -536,6 +547,7 @@ process search_engine_comet { -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ -fragment_bin_tolerance ${bin_tol} \\ + -fragment_bin_offset ${bin_offset} \\ -debug ${params.db_debug} \\ > ${mzml_file.baseName}_comet.log """ From 881efd75b5fb2236d7c94ef23a4e30b8665e6ea5 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 17:20:40 +0200 Subject: [PATCH 200/374] allow additional openms peak picking --- main.nf | 58 ++++++++++++++++++++++++++++++++++++++----------- nextflow.config | 4 ++++ 2 files changed, 49 insertions(+), 13 deletions(-) diff --git a/main.nf b/main.nf index 5fcff82..2eba2fc 100644 --- a/main.nf +++ b/main.nf @@ -60,6 +60,10 @@ def helpMessage() { //TODO probably also still some options missing. Try to consolidate them whenever the two search engines share them + Peak picking: + --openms_peakpicking Use the OpenMS PeakPicker to ADDITIONALLY pick the spectra before the search. This is usually done + during conversion already. Only activate if something goes wrong. + Peptide Re-indexing: --IL_equivalent Should isoleucine and leucine be treated interchangeably? Default: true --allow_unmatched Ignore unmatched peptides (Default: false; only activate if you double-checked all other settings) @@ -294,16 +298,6 @@ branched_input.mzML //This piece only runs on data that is a.) raw and b.) needs conversion //mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) - -//GENERAL TODOS -// - Check why we depend on full filepaths and if that is needed -/* Proposition from nextflow gitter https://gitter.im/nextflow-io/nextflow?at=5e25fabea259cb0f0607a1a1 -* -* unless the specific filenames are important (depends on the tool you're using), I usually use the pattern outlined here: -* https://www.nextflow.io/docs/latest/process.html#multiple-input-files -* e.g: file "?????.mzML" from mzmls_plfq.toSortedList() and ProteomicsLFQ -in *.mzML -ids *.id -*/ - /* * STEP 0.1 - Raw file conversion */ @@ -351,7 +345,15 @@ process mzml_indexing { //Mix the converted raw data with the already supplied mzMLs and push these to the same channels as before -branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls_comet; mzmls_msgf; mzmls_plfq} +if (params.openms_peakpicking) +{ + branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).set{mzmls_pp} +} +else +{ + branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls_comet; mzmls_msgf; mzmls_plfq} +} + //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. @@ -387,8 +389,8 @@ process generate_decoy_database { """ } -// Doesnt work. Py script needs all the inputs to be together in a folder -// Wont work with nextflow. It needs to accept a list of paths for the inputs!! +// Doesnt work yet. Maybe point the script to the workspace? +// All the files should be there after collecting. //process generate_simple_exp_design_file { // publishDir "${params.outdir}", mode: 'copy' // input: @@ -406,6 +408,35 @@ process generate_decoy_database { // """ //} +process openms_peakpicker { + + label 'process_low' + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + + input: + tuple mzml_id, path(mzml_file) from mzmls_pp + + when: + params.openms_peakpicking + + output: + set mzml_id, file("${mzml_file.baseName}_picked.mzML") into mzmls_comet, mzmls_msgf, mzmls_plfq + file "*.log" + + script: + // TODO maybe allow specifying ms-levels and inmemory + """ + PeakPickerHiRes -in ${mzml_file} \\ + -out ${mzml_file.baseName}_picked.mzML \\ + -threads ${task.cpus} \\ + -debug ${params.pp_debug} \\ + -processOption lowmemory \\ + > ${mzml_file.baseName}_msgf.log + """ +} + + if (params.enzyme == "unspecific cleavage") { params.num_enzyme_termini == "none" @@ -461,6 +492,7 @@ process search_engine_msgf { MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ + -java_memory ${task.memory.toMega()} \\ -database "${database}" \\ -instrument ${params.instrument} \\ -protocol "${params.protocol}" \\ diff --git a/nextflow.config b/nextflow.config index 6c056d6..b3e2c85 100644 --- a/nextflow.config +++ b/nextflow.config @@ -27,6 +27,10 @@ params { decoy_affix = 'DECOY_' affix_type = 'prefix' + // peak picking if used + openms_peakpicking = false + pp_debug = 0 + // shared search engine parameters enzyme = 'Trypsin' num_enzyme_termini = 'fully' From e59432094abb19e0332dbff57a8a23a25365ae47 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 17:41:31 +0200 Subject: [PATCH 201/374] Added better defaults --- main.nf | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/main.nf b/main.nf index def851f..30e0afd 100644 --- a/main.nf +++ b/main.nf @@ -43,15 +43,15 @@ def helpMessage() { --num_hits Number of peptide hits per spectrum (PSMs) in output file (default: '1') --fixed_mods Fixed modifications ('Carbamidomethyl (C)', see OpenMS modifications) --variable_mods Variable modifications ('Oxidation (M)', see OpenMS modifications) - --precursor_mass_tolerance Mass tolerance of precursor mass - --precursor_mass_tolerance_unit Da or ppm - --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) - --fragment_mass_tolerance_unit Da or ppm (currently always Da) - --allowed_missed_cleavages Allowed missed cleavages - --min_precursor_charge Minimum precursor ion charge - --max_precursor_charge Maximum precursor ion charge - --min_peptide_length Minimum peptide length to consider - --max_peptide_length Maximum peptide length to consider + --precursor_mass_tolerance Mass tolerance of precursor mass (default: 5) + --precursor_mass_tolerance_unit Da or ppm (default: ppm) + --fragment_mass_tolerance Mass tolerance for fragment masses (currently only controls Comets fragment_bin_tol) (default: 0.03) + --fragment_mass_tolerance_unit Da or ppm (default: Da) + --allowed_missed_cleavages Allowed missed cleavages (default: 2) + --min_precursor_charge Minimum precursor ion charge (default: 2) + --max_precursor_charge Maximum precursor ion charge (default: 4) + --min_peptide_length Minimum peptide length to consider (default: 6) + --max_peptide_length Maximum peptide length to consider (default: 40) --instrument Type of instrument that generated the data (currently only 'high_res' [default] and 'low_res' supported) --protocol Used labeling or enrichment protocol (if any) --fragment_method Used fragmentation method (currently unused since we let the search engines consider all MS2 spectra and let them determine from the spectrum metadata) From aaa018c0ae56f5fbd562a0f7778d336fb488ded8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 17:49:53 +0200 Subject: [PATCH 202/374] fix missing channel when deactivated --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 2eba2fc..089468b 100644 --- a/main.nf +++ b/main.nf @@ -352,10 +352,10 @@ if (params.openms_peakpicking) else { branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).into{mzmls_comet; mzmls_msgf; mzmls_plfq} + mzmls_pp = Channel.empty() } - //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] From ef925127a65cec1ca84da897e87b901fcc09b2bd Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 17:53:03 +0200 Subject: [PATCH 203/374] adapt defaults accordingly --- nextflow.config | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nextflow.config b/nextflow.config index 6c056d6..b998fab 100644 --- a/nextflow.config +++ b/nextflow.config @@ -35,8 +35,8 @@ params { precursor_mass_tolerance_unit = 'ppm' fixed_mods = 'Carbamidomethyl (C)' variable_mods = 'Oxidation (M)' - fragment_mass_tolerance = 5 - fragment_mass_tolerance_unit = 'ppm' + fragment_mass_tolerance = 0.03 + fragment_mass_tolerance_unit = 'Da' fragment_method = 'HCD' //currently unused. hard to find a good logic to beat the defaults isotope_error_range = '0,1' instrument = 'high_res' From c80957dfcde1dbc8d16d09ea694496b16a003505 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 17:56:21 +0200 Subject: [PATCH 204/374] rename and mix channels --- main.nf | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 089468b..45bb7ab 100644 --- a/main.nf +++ b/main.nf @@ -421,7 +421,7 @@ process openms_peakpicker { params.openms_peakpicking output: - set mzml_id, file("${mzml_file.baseName}_picked.mzML") into mzmls_comet, mzmls_msgf, mzmls_plfq + set mzml_id, file("${mzml_file.baseName}_picked.mzML") into mzmls_comet_picked, mzmls_msgf_picked, mzmls_plfq_picked file "*.log" script: @@ -469,7 +469,11 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.join(ch_sdrf_config.msgf_settings)) + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme + from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf) + .combine( + mzmls_msgf.mix(mzmls_msgf_picked) + .join(ch_sdrf_config.msgf_settings)) // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -525,7 +529,11 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.join(ch_sdrf_config.comet_settings)) + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme + from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) + .combine( + mzmls_comet.mix(mzmls_comet_picked) + .join(ch_sdrf_config.comet_settings)) //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) @@ -920,7 +928,7 @@ process proteomicslfq { publishDir "${params.outdir}/proteomics_lfq", mode: 'copy' input: - file mzmls from mzmls_plfq.map{it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) + file mzmls from mzmls_plfq.mix(mzmls_plfq_picked).map{it[1]}.toSortedList({ a, b -> b.baseName <=> a.baseName }) file id_files from id_files_idx_feat_perc_fdr_filter_switched .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch) .toSortedList({ a, b -> b.baseName <=> a.baseName }) From 263e6870a350df22ac7b89f5337cd73a41dfb309 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 18:03:58 +0200 Subject: [PATCH 205/374] rename and mix channels2 --- main.nf | 1 + 1 file changed, 1 insertion(+) diff --git a/main.nf b/main.nf index 45bb7ab..fac9e4e 100644 --- a/main.nf +++ b/main.nf @@ -348,6 +348,7 @@ process mzml_indexing { if (params.openms_peakpicking) { branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).set{mzmls_pp} + [mzmls_comet, mzmls_msgf, mzmls_plfq] = [Channel.empty(), Channel.empty(), Channel.empty()] } else { From ab4a7e479482a8eaf8be2b362e3ab94405f3e830 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 18:08:51 +0200 Subject: [PATCH 206/374] well now newlines --- main.nf | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/main.nf b/main.nf index fac9e4e..b1479ab 100644 --- a/main.nf +++ b/main.nf @@ -470,11 +470,7 @@ process search_engine_msgf { errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme - from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf) - .combine( - mzmls_msgf.mix(mzmls_msgf_picked) - .join(ch_sdrf_config.msgf_settings)) + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.mix(mzmls_msgf_picked).join(ch_sdrf_config.msgf_settings)) // This was another way of handling the combination //file database from searchengine_in_db.mix(searchengine_in_db_decoy) @@ -530,11 +526,7 @@ process search_engine_comet { // I actually dont know, where else this would be needed. errorStrategy 'terminate' input: - tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme - from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) - .combine( - mzmls_comet.mix(mzmls_comet_picked) - .join(ch_sdrf_config.comet_settings)) + tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.mix(mzmls_comet_picked).join(ch_sdrf_config.comet_settings)) //or //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) From 69429f97ed0b0eeba07335b499f8a1793a76cc9a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 18:16:33 +0200 Subject: [PATCH 207/374] wrong parentheses? --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index b1479ab..b0ec93a 100644 --- a/main.nf +++ b/main.nf @@ -348,7 +348,7 @@ process mzml_indexing { if (params.openms_peakpicking) { branched_input_mzMLs.inputIndexedMzML.mix(mzmls_converted).mix(mzmls_indexed).set{mzmls_pp} - [mzmls_comet, mzmls_msgf, mzmls_plfq] = [Channel.empty(), Channel.empty(), Channel.empty()] + (mzmls_comet, mzmls_msgf, mzmls_plfq) = [Channel.empty(), Channel.empty(), Channel.empty()] } else { @@ -360,7 +360,7 @@ else //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] - : [ Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) + : [ Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database), Channel.fromPath(params.database) ] ) //Add decoys if params.add_decoys is set appropriately process generate_decoy_database { From f997004e1822663616c3949074c124eecabc9137 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 28 Apr 2020 18:34:01 +0200 Subject: [PATCH 208/374] allow inmemory --- main.nf | 5 +++-- nextflow.config | 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index b0ec93a..278c5c5 100644 --- a/main.nf +++ b/main.nf @@ -426,13 +426,14 @@ process openms_peakpicker { file "*.log" script: - // TODO maybe allow specifying ms-levels and inmemory + // TODO maybe allow specifying ms-levels + in_mem = params.peakpicker_inmemory ? "inmemory" : "lowmemory" """ PeakPickerHiRes -in ${mzml_file} \\ -out ${mzml_file.baseName}_picked.mzML \\ -threads ${task.cpus} \\ -debug ${params.pp_debug} \\ - -processOption lowmemory \\ + -processOption ${in_mem} \\ > ${mzml_file.baseName}_msgf.log """ } diff --git a/nextflow.config b/nextflow.config index b3e2c85..27bb558 100644 --- a/nextflow.config +++ b/nextflow.config @@ -29,6 +29,7 @@ params { // peak picking if used openms_peakpicking = false + peakpicking_inmemory = false pp_debug = 0 // shared search engine parameters From bec609dfb2f6d84923970c58c3e60df3a908d036 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 18:54:57 +0200 Subject: [PATCH 209/374] Update main.nf --- main.nf | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 30e0afd..62ecf99 100644 --- a/main.nf +++ b/main.nf @@ -514,13 +514,11 @@ process search_engine_comet { if (frag_tol.toDouble() < 50) { bin_tol = "0.03" bin_offset = "0.0" - if (!params.instrument) - inst = "high_res" + inst = params.instrument ?: "high_res" } else { bin_tol = "1.0005" bin_offset = "0.4" - if (!params.instrument) - inst = "low_res" + inst = params.instrument ?: "low_res" } log.warn "The chosen search engine Comet does not support ppm fragment tolerances. We guessed a " + inst + " instrument and set the fragment_bin_tolerance to " + bin_tol @@ -528,7 +526,11 @@ process search_engine_comet { bin_tol = frag_tol bin_offset = frag_tol.toDouble() < 0.1 ? "0.0" : "0.4" if (!params.instrument) + { inst = frag_tol.toDouble() < 0.1 ? "high_res" : "low_res" + } else { + inst = params.instrument + } } """ CometAdapter -in ${mzml_file} \\ From 0a1301cea8d0e3e53d7cbc46cd770a3ae52850a6 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 18:56:43 +0200 Subject: [PATCH 210/374] Update nextflow.config --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index b998fab..5f40fe9 100644 --- a/nextflow.config +++ b/nextflow.config @@ -39,7 +39,7 @@ params { fragment_mass_tolerance_unit = 'Da' fragment_method = 'HCD' //currently unused. hard to find a good logic to beat the defaults isotope_error_range = '0,1' - instrument = 'high_res' + instrument = '' //auto-determined from tolerances protocol = 'automatic' min_precursor_charge = 2 max_precursor_charge = 4 From 7ced9ee08a18a640f207fe74d1465fbeeec9d248 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 19:01:16 +0200 Subject: [PATCH 211/374] added auto det. of instr. for MSGF too --- main.nf | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 62ecf99..ae5fb48 100644 --- a/main.nf +++ b/main.nf @@ -457,12 +457,18 @@ process search_engine_msgf { else if (enzyme == 'Chymotrypsin') enzyme = 'Chymotrypsin/P' else if (enzyme == 'Lys-C') enzyme = 'Lys-C/P' + if ((frag_tol.toDouble() < 50 && frag_tol_unit == "ppm") || (frag_tol.toDouble() < 0.1 && frag_tol_unit == "Da")) + { + inst = params.instrument ?: "high_res" + } else { + inst = params.instrument ?: "low_res" + } """ MSGFPlusAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database "${database}" \\ - -instrument ${params.instrument} \\ + -instrument ${params.instrument ?: 'high} \\ -protocol "${params.protocol}" \\ -matches_per_spec ${params.num_hits} \\ -min_precursor_charge ${params.min_precursor_charge} \\ From 8771e2c2e2c53fab6c3371812e2d849ee773a2e3 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 19:10:03 +0200 Subject: [PATCH 212/374] oops --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index ae5fb48..f42b8bd 100644 --- a/main.nf +++ b/main.nf @@ -468,7 +468,7 @@ process search_engine_msgf { -out ${mzml_file.baseName}.idXML \\ -threads ${task.cpus} \\ -database "${database}" \\ - -instrument ${params.instrument ?: 'high} \\ + -instrument ${inst} \\ -protocol "${params.protocol}" \\ -matches_per_spec ${params.num_hits} \\ -min_precursor_charge ${params.min_precursor_charge} \\ From b3dbf5f79fecd4d7eb44b9f53750f9f9da50491a Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 19:29:36 +0200 Subject: [PATCH 213/374] Add description for inmemory --- main.nf | 1 + 1 file changed, 1 insertion(+) diff --git a/main.nf b/main.nf index 278c5c5..9073198 100644 --- a/main.nf +++ b/main.nf @@ -63,6 +63,7 @@ def helpMessage() { Peak picking: --openms_peakpicking Use the OpenMS PeakPicker to ADDITIONALLY pick the spectra before the search. This is usually done during conversion already. Only activate if something goes wrong. + --peakpicker_inmemory Perform OpenMS peakpicking in-memory. Needs at least the size of the mzML file as RAM but is faster. default: false Peptide Re-indexing: --IL_equivalent Should isoleucine and leucine be treated interchangeably? Default: true From bc4346d08d2f7e88f51fd8a80e86f77e62215604 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 28 Apr 2020 19:30:38 +0200 Subject: [PATCH 214/374] unify name --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 9073198..65a9d32 100644 --- a/main.nf +++ b/main.nf @@ -63,7 +63,7 @@ def helpMessage() { Peak picking: --openms_peakpicking Use the OpenMS PeakPicker to ADDITIONALLY pick the spectra before the search. This is usually done during conversion already. Only activate if something goes wrong. - --peakpicker_inmemory Perform OpenMS peakpicking in-memory. Needs at least the size of the mzML file as RAM but is faster. default: false + --peakpicking_inmemory Perform OpenMS peakpicking in-memory. Needs at least the size of the mzML file as RAM but is faster. default: false Peptide Re-indexing: --IL_equivalent Should isoleucine and leucine be treated interchangeably? Default: true @@ -428,7 +428,7 @@ process openms_peakpicker { script: // TODO maybe allow specifying ms-levels - in_mem = params.peakpicker_inmemory ? "inmemory" : "lowmemory" + in_mem = params.peakpicking_inmemory ? "inmemory" : "lowmemory" """ PeakPickerHiRes -in ${mzml_file} \\ -out ${mzml_file.baseName}_picked.mzML \\ From f747c5b5d07156be5ff86a9b06537188d27de8e6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 29 Apr 2020 12:47:28 +0200 Subject: [PATCH 215/374] added MS level option for PP --- main.nf | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index bfdf64c..3fa24d5 100644 --- a/main.nf +++ b/main.nf @@ -63,7 +63,8 @@ def helpMessage() { Peak picking: --openms_peakpicking Use the OpenMS PeakPicker to ADDITIONALLY pick the spectra before the search. This is usually done during conversion already. Only activate if something goes wrong. - --peakpicking_inmemory Perform OpenMS peakpicking in-memory. Needs at least the size of the mzML file as RAM but is faster. default: false + --peakpicking_inmemory Perform OpenMS peakpicking in-memory. Needs at least the size of the mzML file as RAM but is faster. default: false + --peakpicking_ms_levels Which MS levels to pick. default: [] which means auto-convert all non-centroided Peptide Re-indexing: --IL_equivalent Should isoleucine and leucine be treated interchangeably? Default: true @@ -427,7 +428,6 @@ process openms_peakpicker { file "*.log" script: - // TODO maybe allow specifying ms-levels in_mem = params.peakpicking_inmemory ? "inmemory" : "lowmemory" """ PeakPickerHiRes -in ${mzml_file} \\ @@ -435,7 +435,8 @@ process openms_peakpicker { -threads ${task.cpus} \\ -debug ${params.pp_debug} \\ -processOption ${in_mem} \\ - > ${mzml_file.baseName}_msgf.log + -algorithm:ms_levels ${params.peakpicking_ms_levels} \\ + > ${mzml_file.baseName}_pp.log """ } From 426ebe003361dfd8630648b320152b3375783118 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 29 Apr 2020 14:34:04 +0200 Subject: [PATCH 216/374] Update main.nf --- main.nf | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index bfdf64c..4082238 100644 --- a/main.nf +++ b/main.nf @@ -1011,7 +1011,9 @@ process msstats { file csv from out_msstats output: - file "*.pdf" + // The generation of the PDFs from MSstats are very unstable, especially with auto-contrasts. + // And users can easily fix anything based on the csv and the included script -> make optional + file "*.pdf" optional true file "*.csv" file "*.log" @@ -1025,7 +1027,7 @@ process msstats { process ptxqc { - label 'process_very_low' + label 'process_low' label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' From 28a6291c0a18dea6e17c9e0bb50c71f05be6ee17 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 29 Apr 2020 17:38:42 +0200 Subject: [PATCH 217/374] spelling --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 4082238..af4d19b 100644 --- a/main.nf +++ b/main.nf @@ -696,7 +696,7 @@ process percolator { """ ## Percolator does not have a threads parameter. Set it via OpenMP env variable, ## to honor threads on clusters - OMP_NUMBER_THREADS=${task.cpus} PercolatorAdapter \\ + OMP_NUM_THREADS=${task.cpus} PercolatorAdapter \\ -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ From 1aaa178b6dc8c81300ad51c6a2318edc9caf2868 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 30 Apr 2020 00:36:22 +0200 Subject: [PATCH 218/374] do not rename mzml in picking --- main.nf | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index c79918b..4993b48 100644 --- a/main.nf +++ b/main.nf @@ -424,14 +424,15 @@ process openms_peakpicker { params.openms_peakpicking output: - set mzml_id, file("${mzml_file.baseName}_picked.mzML") into mzmls_comet_picked, mzmls_msgf_picked, mzmls_plfq_picked + set mzml_id, file("out/${mzml_file.baseName}.mzML") into mzmls_comet_picked, mzmls_msgf_picked, mzmls_plfq_picked file "*.log" script: in_mem = params.peakpicking_inmemory ? "inmemory" : "lowmemory" """ + mkdir out PeakPickerHiRes -in ${mzml_file} \\ - -out ${mzml_file.baseName}_picked.mzML \\ + -out out/${mzml_file.baseName}.mzML \\ -threads ${task.cpus} \\ -debug ${params.pp_debug} \\ -processOption ${in_mem} \\ From 448dbe8dcd9b9bdf0a60a614de9d5415824a4b29 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 30 Apr 2020 00:53:54 +0200 Subject: [PATCH 219/374] default level for pp missing --- nextflow.config | 1 + 1 file changed, 1 insertion(+) diff --git a/nextflow.config b/nextflow.config index 1c6ca72..60c26a9 100644 --- a/nextflow.config +++ b/nextflow.config @@ -30,6 +30,7 @@ params { // peak picking if used openms_peakpicking = false peakpicking_inmemory = false + peakpicking_ms_levels = "[]" // means all/auto pp_debug = 0 // shared search engine parameters From 410a6898a1659ff87891e385530837a9da880515 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 30 Apr 2020 01:08:41 +0200 Subject: [PATCH 220/374] Update nextflow.config --- nextflow.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 60c26a9..fc1c9d6 100644 --- a/nextflow.config +++ b/nextflow.config @@ -30,7 +30,7 @@ params { // peak picking if used openms_peakpicking = false peakpicking_inmemory = false - peakpicking_ms_levels = "[]" // means all/auto + peakpicking_ms_levels = '' // means all/auto pp_debug = 0 // shared search engine parameters From 12d0a9563aabd8acbcc014ace7022d97688b47a6 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 30 Apr 2020 01:10:41 +0200 Subject: [PATCH 221/374] Update main.nf --- main.nf | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 4993b48..ed0e830 100644 --- a/main.nf +++ b/main.nf @@ -429,6 +429,7 @@ process openms_peakpicker { script: in_mem = params.peakpicking_inmemory ? "inmemory" : "lowmemory" + lvls = params.peakpicking_ms_levels ? "-algorithm:ms_levels ${params.peakpicking_ms_levels}" : "" """ mkdir out PeakPickerHiRes -in ${mzml_file} \\ @@ -436,7 +437,7 @@ process openms_peakpicker { -threads ${task.cpus} \\ -debug ${params.pp_debug} \\ -processOption ${in_mem} \\ - -algorithm:ms_levels ${params.peakpicking_ms_levels} \\ + ${lvls} \\ > ${mzml_file.baseName}_pp.log """ } From 179927b4d7c1841f0d5c810e5cdc8f6f95b42e3a Mon Sep 17 00:00:00 2001 From: yperez Date: Thu, 30 Apr 2020 10:51:39 +0100 Subject: [PATCH 222/374] CITATIONS has been added, mainly citations: OpenMS, MSstats and ThermoRAWFileParser --- CITATIONS.md | 28 ++++++++++++++++++++++++++++ README.md | 2 ++ 2 files changed, 30 insertions(+) create mode 100644 CITATIONS.md diff --git a/CITATIONS.md b/CITATIONS.md new file mode 100644 index 0000000..83a825d --- /dev/null +++ b/CITATIONS.md @@ -0,0 +1,28 @@ +# nf-core/imcyto: Citations + +## Pipeline tools + +* [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) + > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +* [OpenMS](https://www.ncbi.nlm.nih.gov/pubmed/27575624/) + > Röst HL., Sachsenberg T., Aiche S., Bielow C., Weisser H., Aicheler F., Andreotti S., Ehrlich HC., Gutenbrunner P., Kenar E., Liang X., Nahnsen S., Nilse L., Pfeuffer J., Rosenberger G., Rurik M., Schmitt U., Veit J., Walzer M., Wojnar D., Wolski WE., Schilling O., Choudhary JS, Malmström L., Aebersold R., Reinert K., Kohlbacher O. (2016). OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nature methods, 13(9), 741–748. https://doi.org/10.1038/nmeth.3959. PubMed PMID: 27575624; PubMed Central PMCID: PMC5617107. + +* [MSstats](https://www.ncbi.nlm.nih.gov/pubmed/24794931/) + > Choi M., Chang CY., Clough T., Broudy D., Killeen T., MacLean B., Vitek O. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics (Oxford, England), 30(17), 2524–2526. https://doi.org/10.1093/bioinformatics/btu305. PubMed PMID: 24794931. + +* [ThermoRawFileParser](https://www.ncbi.nlm.nih.gov/pubmed/31755270/) + > Hulstaert N., Shofstahl J., Sachsenberg T., Walzer M., Barsnes H., Martens L., Perez-Riverol, Y. (2020). ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. Journal of proteome research, 19(1), 537–542. https://doi.org/10.1021/acs.jproteome.9b00328. PubMed PMID: 31755270 + +## Software packaging/containerisation tools + +* [BioContainers](https://www.ncbi.nlm.nih.gov/pubmed/28379341/) + > da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341. + +* [Singularity](https://www.ncbi.nlm.nih.gov/pubmed/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. + +* [Conda](https://www.ncbi.nlm.nih.gov/pubmed/29967506/) + > Grüning B., Dale R., Sjödin A., Chapman BA., Rowe J., Tomkins-Tinch CH., Valieris R., Köster J., Bioconda Team (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature methods, 15(7), 475–476. https://doi.org/10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +* [Docker](https://www.docker.com/) diff --git a/README.md b/README.md index ea81e23..ef8a1c1 100644 --- a/README.md +++ b/README.md @@ -75,3 +75,5 @@ You can cite the `nf-core` publication as follows: > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). > ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) + +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. From 1825eb8663f7944fed98fe5abdcaf593888b81be Mon Sep 17 00:00:00 2001 From: yperez Date: Thu, 30 Apr 2020 11:23:10 +0100 Subject: [PATCH 223/374] Update citations --- CITATIONS.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CITATIONS.md b/CITATIONS.md index 83a825d..daadb68 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -1,4 +1,4 @@ -# nf-core/imcyto: Citations +# nf-core/proteomicslfq: Citations ## Pipeline tools @@ -26,3 +26,4 @@ > Grüning B., Dale R., Sjödin A., Chapman BA., Rowe J., Tomkins-Tinch CH., Valieris R., Köster J., Bioconda Team (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature methods, 15(7), 475–476. https://doi.org/10.1038/s41592-018-0046-7. PubMed PMID: 29967506. * [Docker](https://www.docker.com/) + > Merkel D. (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux journal, 2014(239), 2. From 5780f5c3ff695533eeb09337032ca04accfa0133 Mon Sep 17 00:00:00 2001 From: yperez Date: Thu, 30 Apr 2020 11:28:41 +0100 Subject: [PATCH 224/374] DOI fixed --- CITATIONS.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/CITATIONS.md b/CITATIONS.md index daadb68..05e087c 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -6,13 +6,13 @@ > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. * [OpenMS](https://www.ncbi.nlm.nih.gov/pubmed/27575624/) - > Röst HL., Sachsenberg T., Aiche S., Bielow C., Weisser H., Aicheler F., Andreotti S., Ehrlich HC., Gutenbrunner P., Kenar E., Liang X., Nahnsen S., Nilse L., Pfeuffer J., Rosenberger G., Rurik M., Schmitt U., Veit J., Walzer M., Wojnar D., Wolski WE., Schilling O., Choudhary JS, Malmström L., Aebersold R., Reinert K., Kohlbacher O. (2016). OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nature methods, 13(9), 741–748. https://doi.org/10.1038/nmeth.3959. PubMed PMID: 27575624; PubMed Central PMCID: PMC5617107. + > Röst HL., Sachsenberg T., Aiche S., Bielow C., Weisser H., Aicheler F., Andreotti S., Ehrlich HC., Gutenbrunner P., Kenar E., Liang X., Nahnsen S., Nilse L., Pfeuffer J., Rosenberger G., Rurik M., Schmitt U., Veit J., Walzer M., Wojnar D., Wolski WE., Schilling O., Choudhary JS, Malmström L., Aebersold R., Reinert K., Kohlbacher O. (2016). OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nature methods, 13(9), 741–748. doi: 10.1038/nmeth.3959. PubMed PMID: 27575624; PubMed Central PMCID: PMC5617107. * [MSstats](https://www.ncbi.nlm.nih.gov/pubmed/24794931/) - > Choi M., Chang CY., Clough T., Broudy D., Killeen T., MacLean B., Vitek O. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics (Oxford, England), 30(17), 2524–2526. https://doi.org/10.1093/bioinformatics/btu305. PubMed PMID: 24794931. + > Choi M., Chang CY., Clough T., Broudy D., Killeen T., MacLean B., Vitek O. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics (Oxford, England), 30(17), 2524–2526. doi: 10.1093/bioinformatics/btu305. PubMed PMID: 24794931. * [ThermoRawFileParser](https://www.ncbi.nlm.nih.gov/pubmed/31755270/) - > Hulstaert N., Shofstahl J., Sachsenberg T., Walzer M., Barsnes H., Martens L., Perez-Riverol, Y. (2020). ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. Journal of proteome research, 19(1), 537–542. https://doi.org/10.1021/acs.jproteome.9b00328. PubMed PMID: 31755270 + > Hulstaert N., Shofstahl J., Sachsenberg T., Walzer M., Barsnes H., Martens L., Perez-Riverol, Y. (2020). ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. Journal of proteome research, 19(1), 537–542. doi: 10.1021/acs.jproteome.9b00328. PubMed PMID: 31755270 ## Software packaging/containerisation tools @@ -23,7 +23,7 @@ > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. * [Conda](https://www.ncbi.nlm.nih.gov/pubmed/29967506/) - > Grüning B., Dale R., Sjödin A., Chapman BA., Rowe J., Tomkins-Tinch CH., Valieris R., Köster J., Bioconda Team (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature methods, 15(7), 475–476. https://doi.org/10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + > Grüning B., Dale R., Sjödin A., Chapman BA., Rowe J., Tomkins-Tinch CH., Valieris R., Köster J., Bioconda Team (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. * [Docker](https://www.docker.com/) > Merkel D. (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux journal, 2014(239), 2. From 2644297f07dce02ad75db37f328dde4f816dae50 Mon Sep 17 00:00:00 2001 From: yperez Date: Thu, 30 Apr 2020 14:39:51 +0100 Subject: [PATCH 225/374] Added Comet, MS-GF+ and PTXQC --- CITATIONS.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/CITATIONS.md b/CITATIONS.md index 05e087c..ab24dc9 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -14,6 +14,15 @@ * [ThermoRawFileParser](https://www.ncbi.nlm.nih.gov/pubmed/31755270/) > Hulstaert N., Shofstahl J., Sachsenberg T., Walzer M., Barsnes H., Martens L., Perez-Riverol, Y. (2020). ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. Journal of proteome research, 19(1), 537–542. doi: 10.1021/acs.jproteome.9b00328. PubMed PMID: 31755270 +* [Comet](https://www.ncbi.nlm.nih.gov/pubmed/23148064/) + > Eng JK., Jahan TA., Hoopmann MR. (2013). Comet: an open-source MS/MS sequence database search tool. Proteomics, 13(1), 22–24. doi: 10.1002/pmic.201200439. PubMed PMID: 23148064 + +* [MS-GF+](https://www.ncbi.nlm.nih.gov/pubmed/25358478/) + > Kim S., Pevzner PA. (2014). MS-GF+ makes progress towards a universal database search tool for proteomics. Nature communications, 5, 5277. doi: 10.1038/ncomms6277. PubMed PMID: 25358478; PubMed Central PMCID: PMC5036525 + +* [PTXQC](https://www.ncbi.nlm.nih.gov/pubmed/26653327/) + > Bielow C., Mastrobuoni G., Kempa S. (2016). Proteomics Quality Control: Quality Control Software for MaxQuant Results. Journal of proteome research, 15(3), 777–787. doi: 10.1021/acs.jproteome.5b00780. PubMed PMID: 26653327 + ## Software packaging/containerisation tools * [BioContainers](https://www.ncbi.nlm.nih.gov/pubmed/28379341/) From 755e82b462a07c4476080f26ce9ae5576ccdb22a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 12:18:40 +0200 Subject: [PATCH 226/374] started on output description --- docs/output.md | 63 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/docs/output.md b/docs/output.md index fc5bf00..fc198f0 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,10 +1,67 @@ # nf-core/proteomicslfq: Output -This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. - - +This document describes the output produced by the pipeline. ## Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: + +* (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing +* (optional) Decoy database generation for the provided DB (fasta) with OpenMS +* Database search with either MSGF+ or Comet through OpenMS adapters +* Re-mapping potentially identified peptides to the database for consistency and error-checking (using OpenMS' PeptideIndexer) +* (Intermediate score switching steps to use appropriate scores for the next step) +* PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS +* (Intermediate score switching steps to use appropriate scores for the next step) +* PSM/Peptide-level FDR filtering +* Protein inference and labelfree quantification based on MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ + +## Output + +Output is by default written to the $NXF_WORKSPACE/results folder. You can change that with TODO +The output consists of the following folders: + +results +├── ids +│   └── [${infile}\*.idXML](#identifications) +├── logs +│   └── ... +├── msstats +│   ├── [ComparisonPlot.pdf](#msstats-plots) +│   ├── [VolcanoPlot.pdf](#msstats-plots) +│   ├── [Heatmap.pdf](#msstats-plots) +│   └── [msstats_results.csv](#msstats-table) +├── pipeline_info +│   └── [...](#nextflow-pipeline-info) +├── proteomics_lfq +│   ├── [debug_*.idXML](#debug-output) +│   ├── [out.consensusXML](#consenusxml) +│   ├── [out.csv](#msstats-ready-quantity-table) +│   └── [out.mzTab](#mztab) +└── ptxqc + ├── [report_v1.0.2_out.yaml](#ptxqc-yaml-config) + ├── [report_v1.0.2_out_${hash}.html](#ptxqc-report) + └── [report_v1.0.2_out_${hash}.pdf](#ptxqc-report) + +### Nextflow pipeline info + +### ProteomicsLFQ main output + +#### ConsensusXML + +#### MSstats-redy quantity table + +#### mzTab + +### MSstats output + +#### MSstats table + +#### MSstats plots + +### PTXQC output + +#### PTXQC report + +#### PTXQC yaml config From b461274714f7840a238515e416e436eaa58d4939 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 12:35:58 +0200 Subject: [PATCH 227/374] test MD --- docs/output.md | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/docs/output.md b/docs/output.md index fc198f0..22cde3f 100644 --- a/docs/output.md +++ b/docs/output.md @@ -23,26 +23,46 @@ Output is by default written to the $NXF_WORKSPACE/results folder. You can chang The output consists of the following folders: results + ├── ids + │   └── [${infile}\*.idXML](#identifications) + ├── logs + │   └── ... + ├── msstats + │   ├── [ComparisonPlot.pdf](#msstats-plots) + │   ├── [VolcanoPlot.pdf](#msstats-plots) + │   ├── [Heatmap.pdf](#msstats-plots) -│   └── [msstats_results.csv](#msstats-table) -├── pipeline_info + +│   └── [msstats\_results.csv](#msstats-table) + +├── pipeline\_info + │   └── [...](#nextflow-pipeline-info) -├── proteomics_lfq -│   ├── [debug_*.idXML](#debug-output) + +├── proteomics\_lfq + +│   ├── [debug\_\*.idXML](#debug-output) + │   ├── [out.consensusXML](#consenusxml) + │   ├── [out.csv](#msstats-ready-quantity-table) + │   └── [out.mzTab](#mztab) + └── ptxqc - ├── [report_v1.0.2_out.yaml](#ptxqc-yaml-config) - ├── [report_v1.0.2_out_${hash}.html](#ptxqc-report) - └── [report_v1.0.2_out_${hash}.pdf](#ptxqc-report) + + ├── [report\_v1.0.2\_out.yaml](#ptxqc-yaml-config) + + ├── [report\_v1.0.2\_out\_${hash}.html](#ptxqc-report) + + └── [report\_v1.0.2\_out\_${hash}.pdf](#ptxqc-report) ### Nextflow pipeline info From a13960fa77722fea023fa2cb1b07a224863dd810 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 12:38:35 +0200 Subject: [PATCH 228/374] use lists --- docs/output.md | 59 +++++++++++++++++--------------------------------- 1 file changed, 20 insertions(+), 39 deletions(-) diff --git a/docs/output.md b/docs/output.md index 22cde3f..cbd42ea 100644 --- a/docs/output.md +++ b/docs/output.md @@ -24,45 +24,26 @@ The output consists of the following folders: results -├── ids - -│   └── [${infile}\*.idXML](#identifications) - -├── logs - -│   └── ... - -├── msstats - -│   ├── [ComparisonPlot.pdf](#msstats-plots) - -│   ├── [VolcanoPlot.pdf](#msstats-plots) - -│   ├── [Heatmap.pdf](#msstats-plots) - -│   └── [msstats\_results.csv](#msstats-table) - -├── pipeline\_info - -│   └── [...](#nextflow-pipeline-info) - -├── proteomics\_lfq - -│   ├── [debug\_\*.idXML](#debug-output) - -│   ├── [out.consensusXML](#consenusxml) - -│   ├── [out.csv](#msstats-ready-quantity-table) - -│   └── [out.mzTab](#mztab) - -└── ptxqc - - ├── [report\_v1.0.2\_out.yaml](#ptxqc-yaml-config) - - ├── [report\_v1.0.2\_out\_${hash}.html](#ptxqc-report) - - └── [report\_v1.0.2\_out\_${hash}.pdf](#ptxqc-report) +* ids + * [${infile}\*.idXML](#identifications) +* logs + * ... +* msstats + * [ComparisonPlot.pdf](#msstats-plots) + * [VolcanoPlot.pdf](#msstats-plots) + * [Heatmap.pdf](#msstats-plots) + * [msstats\_results.csv](#msstats-table) +* pipeline\_info + * [...](#nextflow-pipeline-info) +* proteomics\_lfq + * [debug\_\*.idXML](#debug-output) + * [out.consensusXML](#consenusxml) + * [out.csv](#msstats-ready-quantity-table) + * [out.mzTab](#mztab) +* ptxqc + * [report\_v1.0.2\_out.yaml](#ptxqc-yaml-config) + * [report\_v1.0.2\_out\_${hash}.html](#ptxqc-report) + * [report\_v1.0.2\_out\_${hash}.pdf](#ptxqc-report) ### Nextflow pipeline info From 124668d45331e1aa8bc41170e954fe4248da9020 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 12:39:38 +0200 Subject: [PATCH 229/374] number items --- docs/output.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/output.md b/docs/output.md index cbd42ea..a1a272e 100644 --- a/docs/output.md +++ b/docs/output.md @@ -7,15 +7,15 @@ This document describes the output produced by the pipeline. The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing -* (optional) Decoy database generation for the provided DB (fasta) with OpenMS -* Database search with either MSGF+ or Comet through OpenMS adapters -* Re-mapping potentially identified peptides to the database for consistency and error-checking (using OpenMS' PeptideIndexer) -* (Intermediate score switching steps to use appropriate scores for the next step) -* PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS -* (Intermediate score switching steps to use appropriate scores for the next step) -* PSM/Peptide-level FDR filtering -* Protein inference and labelfree quantification based on MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ +1. (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing +1. (optional) Decoy database generation for the provided DB (fasta) with OpenMS +1. Database search with either MSGF+ or Comet through OpenMS adapters +1. Re-mapping potentially identified peptides to the database for consistency and error-checking (using OpenMS' PeptideIndexer) +1. (Intermediate score switching steps to use appropriate scores for the next step) +1. PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS +1. (Intermediate score switching steps to use appropriate scores for the next step) +1. PSM/Peptide-level FDR filtering +1. Protein inference and labelfree quantification based on MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ ## Output From 2c0d57d77638dcb328ae409ee387af32e22e555c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 12:55:24 +0200 Subject: [PATCH 230/374] fill sections --- docs/output.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/docs/output.md b/docs/output.md index a1a272e..18fdf0f 100644 --- a/docs/output.md +++ b/docs/output.md @@ -47,22 +47,54 @@ results ### Nextflow pipeline info +Information about the execution and structure of the pipeline. If run with the corresponding nextflow parameters, +it can include things like a visual representation of the pipeline and/or a html report on the execution with +info on memory consumption, CPU usage and runtimes. + ### ProteomicsLFQ main output +The `proteomics\_lfq` folder contains the output of the pipeline without any statistical postprocessing. +And is avaible in three different formats. + #### ConsensusXML +A consensusXML file (TODO link to schema or description) as the closest representation of the internal data +structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools. + #### MSstats-redy quantity table +A simple tsv file ready to be read by the OpenMStoMSstats function of the MSstats R package. It should hold +the same quantities as the consensusXML but rearranged in a "long" table format with additional information +about the experimental design used by MSstats. + #### mzTab +A complete mzTab file ready for submission to PRIDE. TODO link to mzTab schema/guide. + ### MSstats output +The `msstats` folder contains MSstats' post-processed (e.g. imputation, outlier removal) quantities and statistical +measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results. +The results will only be available if there was more than one condition. + #### MSstats table +See MSstats vignette. + #### MSstats plots +See MSstats vignette for Heatmap, VolcanoPlot and ComparisonPlot (per protein). + ### PTXQC output +If activated, the `ptxqc` folder will contain the report of the PTXQC R package based on the mzTab output of proteomicsLFQ. +TODO link + #### PTXQC report +See PTXQC vignette. In the report itself the calculated and visualized QC metrics are actually quite extensively described already. + #### PTXQC yaml config + +The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and +re-run PTXQC manually. From ccdcded4f964df657087ed1602012a046717c832 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 14:18:10 +0200 Subject: [PATCH 231/374] trailing space --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 18fdf0f..80a6fdb 100644 --- a/docs/output.md +++ b/docs/output.md @@ -96,5 +96,5 @@ See PTXQC vignette. In the report itself the calculated and visualized QC metric #### PTXQC yaml config -The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and +The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and re-run PTXQC manually. From 6aba7c621b15ae6e26498fa9a3308aa175c79332 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 16:52:32 +0200 Subject: [PATCH 232/374] missing ID, spelling --- docs/output.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 80a6fdb..134e415 100644 --- a/docs/output.md +++ b/docs/output.md @@ -51,6 +51,11 @@ Information about the execution and structure of the pipeline. If run with the c it can include things like a visual representation of the pipeline and/or a html report on the execution with info on memory consumption, CPU usage and runtimes. +### Identifications + +Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS' +internal idXML format. TODO link to schema. + ### ProteomicsLFQ main output The `proteomics\_lfq` folder contains the output of the pipeline without any statistical postprocessing. @@ -61,7 +66,7 @@ And is avaible in three different formats. A consensusXML file (TODO link to schema or description) as the closest representation of the internal data structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools. -#### MSstats-redy quantity table +#### MSstats-ready quantity table A simple tsv file ready to be read by the OpenMStoMSstats function of the MSstats R package. It should hold the same quantities as the consensusXML but rearranged in a "long" table format with additional information From d7c5ae48edc6a6443493e88a63545e0298791215 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 4 May 2020 17:19:32 +0200 Subject: [PATCH 233/374] escape --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 134e415..26e2b55 100644 --- a/docs/output.md +++ b/docs/output.md @@ -58,7 +58,7 @@ internal idXML format. TODO link to schema. ### ProteomicsLFQ main output -The `proteomics\_lfq` folder contains the output of the pipeline without any statistical postprocessing. +The `proteomics_lfq` folder contains the output of the pipeline without any statistical postprocessing. And is avaible in three different formats. #### ConsensusXML From b28c39e1b2041421db3d1b36628d3c3b30f718f4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 10:51:51 +0200 Subject: [PATCH 234/374] first try on consensusID support --- main.nf | 92 +++++++++++++++++++++++++++++++++------------------------ 1 file changed, 53 insertions(+), 39 deletions(-) diff --git a/main.nf b/main.nf index ed0e830..d2de84a 100644 --- a/main.nf +++ b/main.nf @@ -296,7 +296,6 @@ branched_input.mzML //Push raw files through process that does the conversion, everything else directly to downstream Channel with mzMLs - //This piece only runs on data that is a.) raw and b.) needs conversion //mzML files will be mixed after this step to provide output for downstream processing - allowing you to even specify mzMLs and RAW files in a mixed mode as input :-) @@ -358,7 +357,6 @@ else mzmls_pp = Channel.empty() } - //Fill the channels with empty Channels in case that we want to add decoys. Otherwise fill with output from database. (searchengine_in_db_msgf, searchengine_in_db_comet, pepidx_in_db, plfq_in_db) = ( params.add_decoys ? [ Channel.empty(), Channel.empty(), Channel.empty(), Channel.empty() ] @@ -442,7 +440,6 @@ process openms_peakpicker { """ } - if (params.enzyme == "unspecific cleavage") { params.num_enzyme_termini == "none" @@ -455,12 +452,8 @@ if (params.num_enzyme_termini == "fully") } /// Search engine -if (params.search_engine == "msgf") -{ - search_engine_score = "SpecEValue" -} else { //comet - search_engine_score = "expect" -} +search_engine_score_msgf = "SpecEValue" +search_engine_score_comet = "expect" process search_engine_msgf { @@ -481,7 +474,7 @@ process search_engine_msgf { //file database from searchengine_in_db.mix(searchengine_in_db_decoy) //each file(mzml_file) from mzmls when: - params.search_engine == "msgf" + params.search_engine.contains("msgf") output: set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_msgf @@ -539,12 +532,8 @@ process search_engine_comet { input: tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.mix(mzmls_comet_picked).join(ch_sdrf_config.comet_settings)) - //or - //file database from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet) - //each file(mzml_file) from mzmls - when: - params.search_engine == "comet" + params.search_engine.contains("comet") output: set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_comet @@ -609,9 +598,6 @@ process index_peptides { input: tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) - //each mzml_id, file(id_file) from id_files_msgf.mix(id_files_comet) - //file database from pepidx_in_db.mix(pepidx_in_db_decoy) - output: set mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP file "*.log" @@ -636,7 +622,6 @@ process index_peptides { // --------------------------------------------------------------------- // Branch a) Q-values and PEP from Percolator - process extract_percolator_features { label 'process_very_low' @@ -672,7 +657,7 @@ process percolator { // would be cool to get an estimate by parsing the number of IDs from previous tools. label 'process_medium' //TODO The current percolator version only supports up to 3-fold CV so the following might make sense now - // but in the next version it will have nested CV + // but in the next version it will have nested CV cpus { check_max( 3, 'cpus' ) } publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -681,33 +666,62 @@ process percolator { set mzml_id, file(id_file) from id_files_idx_feat output: - file "${id_file.baseName}_perc.idXML" into id_files_idx_feat_perc + set mzml_id, file("${id_file.baseName}_perc.idXML") into id_files_idx_feat_perc file "*.log" when: - params.posterior_probabilities == "percolator" + params.posterior_probabilities == "percolator" || params. // NICE-TO-HAVE: the decoy-pattern is automatically detected from PeptideIndexer. // Parse its output and put the correct one here. script: - if (params.klammer && params.description_correct_features == 0) { - log.warn('Klammer was specified, but description of correct features was still 0. Please provide a description of correct features greater than 0.') - log.warn('Klammer will be implicitly off!') - } + if (params.klammer && params.description_correct_features == 0) { + log.warn('Klammer was specified, but description of correct features was still 0. Please provide a description of correct features greater than 0.') + log.warn('Klammer will be implicitly off!') + } + + // currently post-processing-tdc is always set since we do not support separate TD databases + """ + ## Percolator does not have a threads parameter. Set it via OpenMP env variable, + ## to honor threads on clusters + OMP_NUM_THREADS=${task.cpus} PercolatorAdapter \\ + -in ${id_file} \\ + -out ${id_file.baseName}_perc.idXML \\ + -threads ${task.cpus} \\ + -subset-max-train ${params.subset_max_train} \\ + -decoy-pattern ${params.decoy_affix} \\ + -post-processing-tdc \\ + > ${id_file.baseName}_percolator.log + """ +} + +process consensusid { + + label 'process_medium' + label 'process_single_thread' + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' + + input: + tuple mzml_id, tuple(id_files_from_ses) from id_files_idx_feat_perc.groupTuple(size: params.search_engine.split(",").size()) + + output: + file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_filter + file "*.log" + + when: + params.search_engine.split(",").size() > 1 + + script: + """ + ConsensusID -in ${id_files_from_ses}.toList().join(" ") \\ + -out ${id_file.baseName}_consensus.idXML \\ + -threads ${task.cpus} \\ + -algorithm best \\ + > ${id_file.baseName}_idfilter.log + """ - // currently post-processing-tdc is always set since we do not support separate TD databases - """ - ## Percolator does not have a threads parameter. Set it via OpenMP env variable, - ## to honor threads on clusters - OMP_NUM_THREADS=${task.cpus} PercolatorAdapter \\ - -in ${id_file} \\ - -out ${id_file.baseName}_perc.idXML \\ - -threads ${task.cpus} \\ - -subset-max-train ${params.subset_max_train} \\ - -decoy-pattern ${params.decoy_affix} \\ - -post-processing-tdc \\ - > ${id_file.baseName}_percolator.log - """ } process idfilter { From 2e660a2182a993a4792b8e3e2fd5f4e0a24ace71 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 11:15:01 +0200 Subject: [PATCH 235/374] fixes --- main.nf | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/main.nf b/main.nf index 094e50d..01289af 100644 --- a/main.nf +++ b/main.nf @@ -233,7 +233,7 @@ else //TODO use header and reference by col name instead of index ch_sdrf_config_file .splitCsv(skip: 1, sep: '\t') - .multiMap{ row -> id = it.toString().md5() + .multiMap{ row -> id = row.toString().md5() comet_settings: msgf_settings: tuple(id, row[2], row[3], @@ -874,6 +874,11 @@ process idscoreswitcher_idpep_postfilter { """ } +plfq_in_id = params.enable_mod_localization + ? Channel.empty() + : id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_plfq + .mix(id_files_idx_feat_perc_fdr_filter_switched_plfq) + process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -882,15 +887,13 @@ process luciphor { tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) output: - set mzml_id, file("${id_file.baseName}_luciphor.idXML") into id_files_luciphor + set mzml_id, file("${id_file.baseName}_luciphor.idXML") into plfq_in_id_luciphor file "*.log" when: params.enable_mod_localization script: - id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_plfq = Channel.empty() - id_files_idx_feat_perc_fdr_filter_switched_plfq = Channel.empty() def losses = params.luciphor_neutral_losses ? '-neutral_losses "${params.luciphor_neutral_losses}"' : '' def dec_mass = params.luciphor_decoy_mass ? '-decoy_mass "${params.luciphor_decoy_mass}"' : '' def dec_losses = params.luciphor_decoy_neutral_losses ? '-decoy_neutral_losses "${params.luciphor_decoy_neutral_losses}' : '' @@ -920,9 +923,7 @@ process luciphor { // ID files can come directly from the Percolator branch, IDPEP branch or // after optional processing with Luciphor mzmls_plfq - .join(id_files_luciphor - .mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_plfq) - .mix(id_files_idx_feat_perc_fdr_filter_switched_plfq)) + .join(plfq_in_id.mix(plfq_in_id_luciphor)) .multiMap{ it -> mzmls: it[1] ids: it[2] From bb6e9c7a776329c33c1468f6dee500767903090d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 11:18:14 +0200 Subject: [PATCH 236/374] refactor env --- environment.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/environment.yml b/environment.yml index 93bc608..ecc7417 100644 --- a/environment.yml +++ b/environment.yml @@ -6,9 +6,9 @@ channels: - bioconda - defaults dependencies: - # bioconda - - bioconda::openms-thirdparty=2.5.0 + - bioconda::openms-thirdparty=2.5.0 # also includes comet, MSGF, Luciphor, TRFP - bioconda::bioconductor-msstats=3.18.0 # will include R + - bioconda::sdrf-pipelines=0.0.4 # for SDRF conversion - conda-forge::r-ptxqc=1.0.2 # for QC reports - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) @@ -16,5 +16,5 @@ dependencies: - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 - - bioconda::sdrf-pipelines=0.0.4 + From dba684639ffb1468a547fc47b795eef095a09a1f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 11:33:47 +0200 Subject: [PATCH 237/374] add localization test to CI --- .github/workflows/ci.yml | 3 +++ conf/test_localize.config | 37 +++++++++++++++++++++++++++++++++++++ nextflow.config | 1 + 3 files changed, 41 insertions(+) create mode 100644 conf/test_localize.config diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index cd9f919..7000150 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -51,6 +51,9 @@ jobs: - name: Run pipeline with test data run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test,docker + - name: Run pipeline with test data + run: | + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test_localize,docker - uses: actions/upload-artifact@v1 if: always() name: Upload results diff --git a/conf/test_localize.config b/conf/test_localize.config new file mode 100644 index 0000000..077ea42 --- /dev/null +++ b/conf/test_localize.config @@ -0,0 +1,37 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests with + * modification localization + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a fast and simple test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test_phospho, + */ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on Travis + max_cpus = 2 + max_memory = 6.GB + max_time = 1.h + + // Input data + spectra = [ + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F2.mzML' + ] + database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' + expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' + posterior_probabilities = "fit_distributions" + enable_mod_localization = true + search_engine = "msgf" + protein_level_fdr_cutoff = 1.0 + decoy_affix = "rev" + enable_qc = true +} \ No newline at end of file diff --git a/nextflow.config b/nextflow.config index 8312160..771698b 100644 --- a/nextflow.config +++ b/nextflow.config @@ -152,6 +152,7 @@ profiles { process.executor = 'lsf' } test { includeConfig 'conf/test.config' } + test_localize { includeConfig 'conf/test_localize.config' } test_full { includeConfig 'conf/test_full.config' } dev { includeConfig 'conf/dev.config' } } From 30a1df2aad7cfd8aacca3eaedc63cc251f238caa Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 11:42:14 +0200 Subject: [PATCH 238/374] use matrix --- .github/workflows/ci.yml | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 7000150..9b90fe4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -14,11 +14,13 @@ jobs: NXF_VER: ${{ matrix.nxf_ver }} NXF_ANSI_LOG: false TOWER_ACCESS_TOKEN: ${{ secrets.NONAWS_TOWER_ACCESS_TOKEN }} + TEST_PROFILE: ${{ matrix.test_profile }} runs-on: ubuntu-latest strategy: matrix: # Nextflow versions: check pipeline minimum and current latest nxf_ver: ['20.01.0', ''] + test_profile: ['test','test_localize'] steps: - uses: actions/checkout@v2 - name: Determine tower usage @@ -50,10 +52,7 @@ jobs: docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - name: Run pipeline with test data run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test,docker - - name: Run pipeline with test data - run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile test_localize,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker - uses: actions/upload-artifact@v1 if: always() name: Upload results From bd0c654d989c193832106a5baf968fd6a2b649a9 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 6 May 2020 13:05:50 +0200 Subject: [PATCH 239/374] we have to find new test data for phospho or fix the luciphor adapter --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 9b90fe4..134f8b4 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,7 +20,7 @@ jobs: matrix: # Nextflow versions: check pipeline minimum and current latest nxf_ver: ['20.01.0', ''] - test_profile: ['test','test_localize'] + test_profile: ['test'] steps: - uses: actions/checkout@v2 - name: Determine tower usage From e6be91a4d64cf4464ccd715423cea2147e80812e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 6 May 2020 13:25:55 +0200 Subject: [PATCH 240/374] more changes for multiSEs --- main.nf | 134 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/main.nf b/main.nf index d2de84a..2dd5af7 100644 --- a/main.nf +++ b/main.nf @@ -313,7 +313,7 @@ process raw_file_conversion { tuple mzml_id, path(rawfile) from branched_input.raw output: - set mzml_id, file("*.mzML") into mzmls_converted + tuple mzml_id, file("*.mzML") into mzmls_converted script: """ @@ -331,10 +331,10 @@ process mzml_indexing { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, path(mzmlfile) from branched_input_mzMLs.nonIndexedMzML + tuple mzml_id, path(mzmlfile) from branched_input_mzMLs.nonIndexedMzML output: - set mzml_id, file("out/*.mzML") into mzmls_indexed + tuple mzml_id, file("out/*.mzML") into mzmls_indexed file "*.log" script: @@ -422,7 +422,7 @@ process openms_peakpicker { params.openms_peakpicking output: - set mzml_id, file("out/${mzml_file.baseName}.mzML") into mzmls_comet_picked, mzmls_msgf_picked, mzmls_plfq_picked + tuple mzml_id, file("out/${mzml_file.baseName}.mzML") into mzmls_comet_picked, mzmls_msgf_picked, mzmls_plfq_picked file "*.log" script: @@ -477,7 +477,7 @@ process search_engine_msgf { params.search_engine.contains("msgf") output: - set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_msgf + tuple mzml_id, file("${mzml_file.baseName}.idXML"), val(search_engine_score_comet) into id_files_msgf file "*.log" script: @@ -536,7 +536,7 @@ process search_engine_comet { params.search_engine.contains("comet") output: - set mzml_id, file("${mzml_file.baseName}.idXML") into id_files_comet + tuple mzml_id, file("${mzml_file.baseName}.idXML"), val(search_engine_score_comet) into id_files_comet file "*.log" //TODO we currently ignore the activation_method param to leave the default "ALL" for max. compatibility @@ -596,10 +596,10 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + tuple mzml_id, file(id_file), search_engine_score, enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) output: - set mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP + tuple mzml_id, file("${id_file.baseName}_idx.idXML"), search_engine_score into id_files_idx_ForPerc, id_files_idx_ForIDPEP file "*.log" script: @@ -630,10 +630,10 @@ process extract_percolator_features { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, file(id_file) from id_files_idx_ForPerc + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForPerc output: - set mzml_id, file("${id_file.baseName}_feat.idXML") into id_files_idx_feat + tuple mzml_id, file("${id_file.baseName}_feat.idXML"), search_engine_score into id_files_idx_feat file "*.log" when: @@ -663,14 +663,14 @@ process percolator { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, file(id_file) from id_files_idx_feat + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_feat output: - set mzml_id, file("${id_file.baseName}_perc.idXML") into id_files_idx_feat_perc + tuple mzml_id, file("${id_file.baseName}_perc.idXML"), search_engine_score into id_files_idx_feat_perc file "*.log" when: - params.posterior_probabilities == "percolator" || params. + params.posterior_probabilities == "percolator" // NICE-TO-HAVE: the decoy-pattern is automatically detected from PeptideIndexer. // Parse its output and put the correct one here. @@ -695,35 +695,6 @@ process percolator { """ } -process consensusid { - - label 'process_medium' - label 'process_single_thread' - - publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' - - input: - tuple mzml_id, tuple(id_files_from_ses) from id_files_idx_feat_perc.groupTuple(size: params.search_engine.split(",").size()) - - output: - file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_filter - file "*.log" - - when: - params.search_engine.split(",").size() > 1 - - script: - """ - ConsensusID -in ${id_files_from_ses}.toList().join(" ") \\ - -out ${id_file.baseName}_consensus.idXML \\ - -threads ${task.cpus} \\ - -algorithm best \\ - > ${id_file.baseName}_idfilter.log - """ - -} - process idfilter { label 'process_very_low' @@ -733,10 +704,10 @@ process idfilter { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - file id_file from id_files_idx_feat_perc + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_feat_perc output: - file "${id_file.baseName}_filter.idXML" into id_files_idx_feat_perc_filter + tuple mzml_id, file("${id_file.baseName}_filter.idXML"), search_engine_score into id_files_idx_feat_perc_filter file "*.log" when: @@ -760,10 +731,10 @@ process idscoreswitcher { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_feat_perc_filter + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_feat_perc_filter output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_feat_perc_fdr_filter_switched + tuple mzml_id, file("${id_file.baseName}_switched.idXML"), search_engine_score into id_files_idx_feat_perc_fdr_filter_switched file "*.log" when: @@ -787,7 +758,7 @@ process idscoreswitcher { // --------------------------------------------------------------------- // Branch b) Q-values and PEP from OpenMS -// Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_idto the channels +// Note: for IDPEP we never need any file specific settings so we can stop adding the mzml_id to the channels process fdr_idpep { label 'process_very_low' @@ -796,10 +767,10 @@ process fdr_idpep { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - set mzml_id, file(id_file) from id_files_idx_ForIDPEP + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP output: - file "${id_file.baseName}_fdr.idXML" into id_files_idx_ForIDPEP_fdr + tuple mzml_id, file("${id_file.baseName}_fdr.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr file "*.log" when: @@ -825,10 +796,10 @@ process idscoreswitcher_idpep_pre { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForIDPEP_fdr + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP_fdr output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch + tuple mzml_id, file("${id_file.baseName}_switched.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr_switch file "*.log" when: @@ -836,14 +807,14 @@ process idscoreswitcher_idpep_pre { script: """ - IDScoreSwitcher -in ${id_file} \\ - -out ${id_file.baseName}_switched.idXML \\ - -threads ${task.cpus} \\ - -old_score q-value \\ - -new_score ${search_engine_score}_score \\ - -new_score_orientation lower_better \\ - -new_score_type ${search_engine_score} \\ - > ${id_file.baseName}_scoreswitcher1.log + IDScoreSwitcher -in ${id_file} \\ + -out ${id_file.baseName}_switched.idXML \\ + -threads ${task.cpus} \\ + -old_score q-value \\ + -new_score ${search_engine_score}_score \\ + -new_score_orientation lower_better \\ + -new_score_type ${search_engine_score} \\ + > ${id_file.baseName}_scoreswitcher1.log """ } @@ -855,10 +826,10 @@ process idpep { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForIDPEP_fdr_switch + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP_fdr_switch output: - file "${id_file.baseName}_idpep.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep + tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr_switch_idpep file "*.log" when: @@ -881,10 +852,10 @@ process idscoreswitcher_idpep_post { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP_fdr_switch_idpep output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch + tuple mzml_id, file("${id_file.baseName}_switched.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr_switch_idpep_switch file "*.log" when: @@ -911,10 +882,10 @@ process idfilter_idpep { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP_fdr_switch_idpep_switch output: - file "${id_file.baseName}_filter.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter + tuple mzml_id, file("${id_file.baseName}_filter.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter file "*.log" when: @@ -938,10 +909,10 @@ process idscoreswitcher_idpep_postfilter { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - file id_file from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter + tuple mzml_id, file(id_file), search_engine_score from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter output: - file "${id_file.baseName}_switched.idXML" into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch + tuple mzml_id, file("${id_file.baseName}_switched.idXML"), search_engine_score into id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch file "*.log" when: @@ -963,6 +934,35 @@ process idscoreswitcher_idpep_postfilter { // --------------------------------------------------------------------- // Main Branch +process consensusid { + + label 'process_medium' + label 'process_single_thread' + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' + + input: + tuple mzml_id, file(id_files_from_ses), search_engine_score from id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter.mix(id_files_idx_feat_perc_fdr_filter_switched).groupTuple(size: params.search_engine.split(",").size()) + + output: + tuple mzml_id, file("${id_file.baseName}_filter.idXML"), search_engine_score into consensusids + file "*.log" + + when: + params.search_engine.split(",").size() > 1 + + script: + """ + ConsensusID -in ${id_files_from_ses}.toList().join(" ") \\ + -out ${id_file.baseName}_consensus.idXML \\ + -threads ${task.cpus} \\ + -algorithm best \\ + > ${id_file.baseName}_idfilter.log + """ + +} + process proteomicslfq { label 'process_high' From c07d6e1039f768d89818ef4a40a32fca0b55f18e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 11 May 2020 18:14:19 +0200 Subject: [PATCH 241/374] fix input for phospho --- main.nf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/main.nf b/main.nf index 225debe..ca1622e 100644 --- a/main.nf +++ b/main.nf @@ -887,14 +887,14 @@ process idfilter { plfq_in_id = params.enable_mod_localization ? Channel.empty() - : id_filtered) + : id_filtered process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_files_idx_feat_perc_fdr_filter_switched_luciphor.mix(id_files_idx_ForIDPEP_fdr_switch_idpep_switch_filter_switch_luciphor)).join(ch_sdrf_config.luciphor_settings) + tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_filtered).join(ch_sdrf_config.luciphor_settings) output: set mzml_id, file("${id_file.baseName}_luciphor.idXML") into plfq_in_id_luciphor From 06d246d0f6f1e189c8d65e189f04031a457a155f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 12 May 2020 00:24:02 +0200 Subject: [PATCH 242/374] more fixes --- main.nf | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/main.nf b/main.nf index ca1622e..1189dc8 100644 --- a/main.nf +++ b/main.nf @@ -481,7 +481,7 @@ process search_engine_msgf { params.search_engine.contains("msgf") output: - tuple mzml_id, file("${mzml_file.baseName}.idXML") into id_files_msgf + tuple mzml_id, file("${mzml_file.baseName}_msgf.idXML") into id_files_msgf file "*.log" script: @@ -499,7 +499,7 @@ process search_engine_msgf { } """ MSGFPlusAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ + -out ${mzml_file.baseName}_msgf.idXML \\ -threads ${task.cpus} \\ -java_memory ${task.memory.toMega()} \\ -database "${database}" \\ @@ -540,7 +540,7 @@ process search_engine_comet { params.search_engine.contains("comet") output: - tuple mzml_id, file("${mzml_file.baseName}.idXML") into id_files_comet + tuple mzml_id, file("${mzml_file.baseName}_comet.idXML") into id_files_comet file "*.log" //TODO we currently ignore the activation_method param to leave the default "ALL" for max. compatibility @@ -571,7 +571,7 @@ process search_engine_comet { } """ CometAdapter -in ${mzml_file} \\ - -out ${mzml_file.baseName}.idXML \\ + -out ${mzml_file.baseName}_comet.idXML \\ -threads ${task.cpus} \\ -database "${database}" \\ -instrument ${inst} \\ @@ -600,7 +600,7 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), enzyme, file(database) from id_files_msgf.mix(id_files_comet).join(ch_sdrf_config.idx_settings).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) + tuple mzml_id, file(id_file), val(enzyme), file(database) from id_files_msgf.mix(id_files_comet).combine(ch_sdrf_config.idx_settings, by: 0).combine(pepidx_in_db.mix(pepidx_in_db_decoy)).view() output: tuple mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP, id_files_idx_ForIDPEP_noFDR @@ -670,7 +670,7 @@ process percolator { tuple mzml_id, file(id_file) from id_files_idx_feat output: - tuple mzml_id, file("${id_file.baseName}_perc.idXML"), "MS:1001491" into id_files_perc, id_files_perc_consID + tuple mzml_id, file("${id_file.baseName}_perc.idXML"), val("MS:1001491") into id_files_perc, id_files_perc_consID file "*.log" when: @@ -748,7 +748,7 @@ process idpep { tuple mzml_id, file(id_file) from id_files_idx_ForIDPEP_FDR.mix(id_files_idx_ForIDPEP_noFDR) output: - tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), "q-value_score" into id_files_idpep, id_files_idpep_consID + tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), val("q-value_score") into id_files_idpep, id_files_idpep_consID file "*.log" when: @@ -807,10 +807,10 @@ process consensusid { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_files_from_ses), qval_score from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engine.split(",").size()) + tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engine.split(",").size()) output: - tuple mzml_id, file("${id_file.baseName}_filter.idXML") into consensusids + tuple mzml_id, file("${mzml_id}_consensus.idXML") into consensusids file "*.log" when: @@ -818,7 +818,7 @@ process consensusid { script: """ - ConsensusID -in ${id_files_from_ses}.toList().join(" ") \\ + ConsensusID -in ${id_files_from_ses} \\ -out ${mzml_id}_consensus.idXML \\ -per_spectrum \\ -threads ${task.cpus} \\ @@ -872,7 +872,7 @@ process idfilter { tuple mzml_id, file(id_file) from id_files_noConsID_qval.mix(consensusids_fdr) output: - tuple mzml_id, file("${id_file.baseName}_filter.idXML") into id_filtered + tuple mzml_id, file("${id_file.baseName}_filter.idXML") into id_filtered, id_filtered_luciphor file "*.log" script: @@ -894,7 +894,7 @@ process luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_filtered).join(ch_sdrf_config.luciphor_settings) + tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_filtered_luciphor).join(ch_sdrf_config.luciphor_settings) output: set mzml_id, file("${id_file.baseName}_luciphor.idXML") into plfq_in_id_luciphor From b218dbfb961e69887b91a9a0d324089fbd5ac4b1 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 13 May 2020 23:20:59 +0200 Subject: [PATCH 243/374] fix one engine case --- main.nf | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 1189dc8..ae53464 100644 --- a/main.nf +++ b/main.nf @@ -703,6 +703,10 @@ process percolator { // --------------------------------------------------------------------- // Branch b) Q-values and PEP from OpenMS +if(params.posterior_probabilities != "percolator" && params.search_engine.split(",").size() == 1) +{ + id_files_idx_ForIDPEP_noFDR = Channel.empty() +} process fdr_idpep { label 'process_very_low' @@ -732,11 +736,7 @@ process fdr_idpep { """ } -if (params.search_engine.split(",").size() != 1) -{ - id_files_idx_ForIDPEP_fdr = Channel.empty() -} - +//idpep picks the best scores for each search engine automatically. No switching needed after FDR. process idpep { label 'process_low' From ac51e9c51fa1aef46458a7720f865298cc20d6eb Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 18 May 2020 20:10:10 +0200 Subject: [PATCH 244/374] more docu --- docs/usage.md | 43 ++++++++++++++++++++++++++++++++++++++++++- main.nf | 19 ++++++++++--------- nextflow.config | 2 ++ 3 files changed, 54 insertions(+), 10 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 4fdbc91..268bd92 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -25,7 +25,7 @@ * [`--decoy_affix`](#-decoy_affix) * [`--affix_type`](#-profile) * [Database search](#database-search) - * [`--search_engine`](#--search_engine) + * [`--search_engines`](#--search_engines) * [`--enzyme`](#--enzyme) * [`--num_enzyme_termini`](#--num_enzyme_termini) * [`--num_hits`](#--num_hits) @@ -65,6 +65,10 @@ * [Distribution specific](#distribution-specific) * [`--outlier_handling`](#--outlier_handling) * [`--top_hits_only`](#--top_hits_only) +* [ConsensusID](#consensusid) + * [`--consensusid_algorithm`](#--consensusid_algorithm) + * [`--min_consensus_support`](#--min_consensus_support) + * [`--consensusid_considered_top_hits`](#--consensusid_considered_top_hits) * [Inference and Quantification](#inference-and-quantification) * [`--inf_quant_debug`](#--inf_quant_debug) * [Inference](#inference) @@ -268,6 +272,15 @@ Is the decoy label a prefix or suffix. Prefix is highly recommended as some tool ## Database search +### `--search_engines` + +A comma-separated list of search engines to run in parallel on each mzML file. Currently supported: comet and msgf (default: comet) +If more than one search engine is given, results are combined based on posterior error probabilities (see the different types +of estimation procedures under [`--posterior_probabilities`](#--posterior_probabilities)). Combination is done with +[CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html). +See also its corresponding [`--consensusid_algorithm`](#--consensusid_algorithm) parameter for different combination strategies. +Combinations may profit from an increased [`--num_hits`](#--num_hits) parameter. + ### `--precursor_mass_tolerance` Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5 ppm is recommended (i.e. 5). See also [`--precursor_mass_tolerance_unit`](#--precursor_mass_tolerance_unit). @@ -449,6 +462,34 @@ How to handle outliers during fitting: Use only the top peptide hits per spectrum for fitting. Default: true +## ConsensusID + +The following parameters are only used when more than one search engine was specified in the [`--search_engines`](`#--search_engines`) +parameter for combination. + +### `--consensusid_algorithm` + +Specifies how search engine results are combined: ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ("search engines") into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores. + +The available algorithms are: + +* PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits. +* PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ("shared peak count"). This algorithm, too, requires PEPs as scores. +* best: For each peptide ID, this uses the best score of any search engine as the consensus score. +* worst: For each peptide ID, this uses the worst score of any search engine as the consensus score. +* average: For each peptide ID, this uses the average score of all search engines as the consensus score. +* ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score. + +To make scores comparable, for best, worst and average, PEPs are used as well. Peptide IDs are only considered the same if they map to exactly the same sequence (including modifications and their localization). Also isobaric aminoacids are (for now) only considered equal with the PEPMatrix/PEPIons algorithms. + +### `--min_consensus_support` + +This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must "support" a peptide identification that should be kept. The meaning of "support" differs slightly between algorithms: For best, worst, average and rank, each search run supports peptides that it has also identified among its top considered\_hits candidates. So min\_support simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set min\_support to 0.5.) For the similarity-based algorithms PEPMatrix and PEPIons, the "support" for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.) Note: For most of the subsequent algorithms, only the best identification per spectrum is used. + +### `--consensusid_considered_top_hits` + +Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix and PEPIons algorithms, which involve costly "all vs. all" comparisons of peptide hits per spectrum across engines. + ## Inference and Quantification ### `--inf_quant_debug` diff --git a/main.nf b/main.nf index ae53464..90d13c0 100644 --- a/main.nf +++ b/main.nf @@ -36,7 +36,7 @@ def helpMessage() { --affix_type Prefix (default) or suffix (WARNING: Percolator only supports prefices) Database Search: - --search_engine Which search engine: "comet" (default) or "msgf" + --search_engines Which search engine: "comet" (default) or "msgf" --enzyme Enzymatic cleavage (e.g. 'unspecific cleavage' or 'Trypsin' [default], see OpenMS enzymes) --num_enzyme_termini Specify the termini where the cleavage rule has to match (default: 'fully' valid: 'semi', 'fully') @@ -478,7 +478,7 @@ process search_engine_msgf { //file database from searchengine_in_db.mix(searchengine_in_db_decoy) //each file(mzml_file) from mzmls when: - params.search_engine.contains("msgf") + params.search_engines.contains("msgf") output: tuple mzml_id, file("${mzml_file.baseName}_msgf.idXML") into id_files_msgf @@ -537,7 +537,7 @@ process search_engine_comet { tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.mix(mzmls_comet_picked).join(ch_sdrf_config.comet_settings)) when: - params.search_engine.contains("comet") + params.search_engines.contains("comet") output: tuple mzml_id, file("${mzml_file.baseName}_comet.idXML") into id_files_comet @@ -703,7 +703,7 @@ process percolator { // --------------------------------------------------------------------- // Branch b) Q-values and PEP from OpenMS -if(params.posterior_probabilities != "percolator" && params.search_engine.split(",").size() == 1) +if(params.posterior_probabilities != "percolator" && params.search_engines.split(",").size() == 1) { id_files_idx_ForIDPEP_noFDR = Channel.empty() } @@ -722,7 +722,7 @@ process fdr_idpep { file "*.log" when: - params.posterior_probabilities != "percolator" && params.search_engine.split(",").size() == 1 + params.posterior_probabilities != "percolator" && params.search_engines.split(",").size() == 1 script: """ @@ -758,7 +758,8 @@ process idpep { """ IDPosteriorErrorProbability -in ${id_file} \\ -out ${id_file.baseName}_idpep.idXML \\ - -threads ${task.cpus} \\ + -fit_algorithm:outlier_handling ${params.outlier_handling} \\ + -threads ${task.cpus} \\ > ${id_file.baseName}_idpep.log """ } @@ -782,7 +783,7 @@ process idscoreswitcher_to_qval { file "*.log" when: - params.search_engine.split(",").size() == 1 + params.search_engines.split(",").size() == 1 script: """ @@ -807,14 +808,14 @@ process consensusid { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engine.split(",").size()) + tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) output: tuple mzml_id, file("${mzml_id}_consensus.idXML") into consensusids file "*.log" when: - params.search_engine.split(",").size() > 1 + params.search_engines.split(",").size() > 1 script: """ diff --git a/nextflow.config b/nextflow.config index 771698b..88e1fe1 100644 --- a/nextflow.config +++ b/nextflow.config @@ -64,6 +64,8 @@ params { IL_equivalent = true allow_unmatched = false + // IDPEP flags + outlier_handling = "none" // Percolator flags train_FDR = 0.05 test_FDR = 0.05 From fe24fb58898b4f29b51a4ab4e0c889ec63b6f3f9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 18 May 2020 20:16:55 +0200 Subject: [PATCH 245/374] more doc --- main.nf | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/main.nf b/main.nf index 90d13c0..3e6ae55 100644 --- a/main.nf +++ b/main.nf @@ -103,6 +103,11 @@ def helpMessage() { //TODO add more options for rescoring part + ConsensusID: + --consensusid_algorithm Choose method to combine probabilities from multiple search engines (if used). Valid: best, worst, average, rank, PEPMatrix, PEPIons (Default: best) + --min_consensus_support Choose ratio of ADDITIONAL evidence for a peptide ID of a spectrum. Varies across methods. See documentation for further info. (Default: 0) + --consensusid_considered_top_hits Number of top hits per spectrum considered for consensus scoring. (Default: 0 = all) + Inference and Quantification: --inf_quant_debug Debug level during inference and quantification. (WARNING: Higher than 666 may produce a lot of additional output files) From 53c084ad7b0c3ceef678dd4451ca7447dbbe6fe2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 18 May 2020 20:21:23 +0200 Subject: [PATCH 246/374] added to nextflow config --- nextflow.config | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index 88e1fe1..be73be3 100644 --- a/nextflow.config +++ b/nextflow.config @@ -21,7 +21,7 @@ params { // Tools flags posterior_probabilities = 'percolator' add_decoys = false - search_engine = 'comet' + search_engines = 'comet' protein_inference = 'aggregation' psm_pep_fdr_cutoff = 0.10 protein_level_fdr_cutoff = 0.05 @@ -66,6 +66,7 @@ params { // IDPEP flags outlier_handling = "none" + // Percolator flags train_FDR = 0.05 test_FDR = 0.05 @@ -74,6 +75,11 @@ params { description_correct_features = 0 subset_max_train = 300000 + // ConsensusID + consensusid_algorithm = 'best' + min_consensus_support = 0 + consensusid_considered_top_hits = 0 + // Luciphor options luciphor_neutral_losses = '' luciphor_decoy_mass = '' From 184da051356d8055abd4bee8160dfb7619a96e5b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 18 May 2020 20:22:08 +0200 Subject: [PATCH 247/374] md lint --- docs/usage.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 268bd92..aa4ebd4 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -276,8 +276,8 @@ Is the decoy label a prefix or suffix. Prefix is highly recommended as some tool A comma-separated list of search engines to run in parallel on each mzML file. Currently supported: comet and msgf (default: comet) If more than one search engine is given, results are combined based on posterior error probabilities (see the different types -of estimation procedures under [`--posterior_probabilities`](#--posterior_probabilities)). Combination is done with -[CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html). +of estimation procedures under [`--posterior_probabilities`](#--posterior_probabilities)). Combination is done with +[ConsensusID](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_ConsensusID.html). See also its corresponding [`--consensusid_algorithm`](#--consensusid_algorithm) parameter for different combination strategies. Combinations may profit from an increased [`--num_hits`](#--num_hits) parameter. From bdbdd63b342fbbeda7685ef98b6187a648a645e2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 18 May 2020 20:35:29 +0200 Subject: [PATCH 248/374] added params to step --- main.nf | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 3e6ae55..0447b8c 100644 --- a/main.nf +++ b/main.nf @@ -828,7 +828,9 @@ process consensusid { -out ${mzml_id}_consensus.idXML \\ -per_spectrum \\ -threads ${task.cpus} \\ - -algorithm best \\ + -algorithm ${params.consensusid_algorithm} \\ + -filter:min_support ${params.min_consensus_support} \\ + -filter:considered_hits ${params.consensusid_considered_top_hits} \\ > ${mzml_id}_consensusID.log """ From b54430d49099b1bc68063f16fe5c61f90e1bf076 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 19 May 2020 12:11:08 +0200 Subject: [PATCH 249/374] fix error after param rename. remove view. --- main.nf | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 0447b8c..39b92d2 100644 --- a/main.nf +++ b/main.nf @@ -605,7 +605,7 @@ process index_peptides { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), val(enzyme), file(database) from id_files_msgf.mix(id_files_comet).combine(ch_sdrf_config.idx_settings, by: 0).combine(pepidx_in_db.mix(pepidx_in_db_decoy)).view() + tuple mzml_id, file(id_file), val(enzyme), file(database) from id_files_msgf.mix(id_files_comet).combine(ch_sdrf_config.idx_settings, by: 0).combine(pepidx_in_db.mix(pepidx_in_db_decoy)) output: tuple mzml_id, file("${id_file.baseName}_idx.idXML") into id_files_idx_ForPerc, id_files_idx_ForIDPEP, id_files_idx_ForIDPEP_noFDR @@ -810,7 +810,6 @@ process consensusid { label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' - publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) @@ -839,7 +838,6 @@ process consensusid { process fdr_consensusid { label 'process_medium' - //TODO could be easily parallelized label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' @@ -853,7 +851,7 @@ process fdr_consensusid { file "*.log" when: - params.search_engine.split(",").size() > 1 + params.search_engines.split(",").size() > 1 script: """ From 6e13ab88c63c0112d063245b55c01a463c1e7658 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 19 May 2020 12:14:25 +0200 Subject: [PATCH 250/374] added script to tools folder to visualize search engine scores --- .../visualize_search_engine_scores.py | 97 +++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 tools/search_engine_scores/visualize_search_engine_scores.py diff --git a/tools/search_engine_scores/visualize_search_engine_scores.py b/tools/search_engine_scores/visualize_search_engine_scores.py new file mode 100644 index 0000000..d9bada4 --- /dev/null +++ b/tools/search_engine_scores/visualize_search_engine_scores.py @@ -0,0 +1,97 @@ +from pyopenms import * +from typing import List +import pandas as pd +import plotly.express as px +import plotly + +prots = [] # type: List[ProteinIdentification] +peps = [] # type: List[PeptideIdentification] +IdXMLFile().load("/Users/pfeuffer/Downloads/debugPLFQ/BSAConsensus/consensus_eval/UPS1_50000amol_R1_comet_idx_feat_perc.xml", prots, peps) + +cols = ["comet", "comet_xcorr", "comet_pep", "msgf", "msgf_raw", "msgf_pep", "comet_seq", "msgf_seq", 'target_decoy_comet', 'target_decoy_msgf'] + +d = {} +for pep in peps: + best_hit = pep.getHits()[0] # type: PeptideHit + d[pep.getMetaValue("spectrum_reference")] = { + 'comet': best_hit.getMetaValue("COMET:lnExpect"), + 'comet_xcorr': best_hit.getMetaValue("MS:1002252"), + 'comet_pep': best_hit.getScore(), + 'msgf': None, + 'msgf_raw': None, + 'msgf_pep': None, + 'comet_seq': best_hit.getSequence().toString(), + 'msgf_seq': None, + 'target_decoy_comet': best_hit.getMetaValue("target_decoy"), + 'target_decoy_msgf': None, + } + + +prots = [] # type: List[ProteinIdentification] +peps = [] # type: List[PeptideIdentification] +IdXMLFile().load("/Users/pfeuffer/Downloads/debugPLFQ/BSAConsensus/consensus_eval/UPS1_50000amol_R1_msgf_idx_feat_perc.xml", prots, peps) + +for pep in peps: + best_hit = pep.getHits()[0] # type: PeptideHit + spec_ref = pep.getMetaValue("spectrum_reference") + if spec_ref in d: + mydict = d[spec_ref] + mydict["msgf"] = np.log10(best_hit.getMetaValue("MS:1002052")) + mydict["msgf_seq"] = best_hit.getSequence().toString() + mydict["msgf_raw"] = best_hit.getMetaValue("MS:1002049") + mydict["msgf_pep"] = best_hit.getScore() + mydict["target_decoy_msgf"] = best_hit.getMetaValue("target_decoy") + + else: + d[spec_ref] = { + 'comet': None, + 'comet_pep': None, + 'msgf': best_hit.getMetaValue("MS:1002052"), # MSGF SpecEval + 'msgf_raw': best_hit.getMetaValue("MS:1002049"), + 'msgf_pep': best_hit.getScore(), + 'comet_seq': None, + 'msgf_seq': best_hit.getSequence().toString(), + 'target_decoy_comet': None, + 'target_decoy_msgf': best_hit.getMetaValue("target_decoy") + } + + +df = pd.DataFrame.from_dict(d, orient="index", columns=cols) +df['same_seq'] = df.apply(lambda x: x['comet_seq'] == x['msgf_seq'], axis=1) + + +def getTDstatus(x): + if x['same_seq']: + return x['target_decoy_comet'] + else: + if x['target_decoy_comet'] is not None and x['target_decoy_msgf'] is not None: + if x['target_decoy_comet'] == x['target_decoy_msgf']: + return x['target_decoy_comet'] + else: + return 'mixed' + elif x['target_decoy_comet'] is not None: + return x['target_decoy_comet'] + else: + return x['target_decoy_msgf'] + + +df['target_decoy'] = df.apply(getTDstatus, axis=1) +df['pep_max'] = df.apply(lambda x: max(x["comet_pep"],x["msgf_pep"]), axis=1) + +print(df['comet_seq'].head(10)) +print(df['msgf_seq'].head(10)) + +df.to_csv("/Users/pfeuffer/Downloads/debugPLFQ/BSAConsensus/consensus_eval/pandas.csv") +#tidy_df = df.melt(id_vars=[]) +#print(tidy_df.head()) +fig = px.scatter(df, x='comet_xcorr', y='msgf_raw', color='same_seq', symbol='target_decoy', + hover_data=["target_decoy_comet", "comet_seq", "msgf_seq"], + marginal_x="violin", + marginal_y="violin", ) +fig.show() +plotly.offline.plot(fig, "/Users/pfeuffer/Downloads/debugPLFQ/BSAConsensus/consensus_eval/plotly.html") + +fig = px.violin(df, x='target_decoy', y='pep_max', color='target_decoy') +fig.show() + + From 0d2e36bffa6c908dba31a400d8b0b5e5b6b5f492 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 24 May 2020 21:09:04 +0200 Subject: [PATCH 251/374] allow same name mzMLs for local sdrf input --- main.nf | 6 +++++- nextflow.config | 1 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index b6c07db..a7b4c5c 100644 --- a/main.nf +++ b/main.nf @@ -259,7 +259,11 @@ else luciphor_settings: tuple(id, row[9]) - mzmls: tuple(id, params.root_folder.length() == 0 ? row[0] : (params.root_folder + "/" + row[1]))} + mzmls: tuple(id, !params.root_folder ? + row[0] : + params.root_folder + "/" + (params.local_input_type ? + row[1].take(row[1].lastIndexOf('.')) + '.' + params.local_input_type : + row[1]))} .set{ch_sdrf_config} } diff --git a/nextflow.config b/nextflow.config index 771698b..b396fb3 100644 --- a/nextflow.config +++ b/nextflow.config @@ -14,6 +14,7 @@ params { // Workflow flags sdrf = '' root_folder = '' + local_input_type = '' spectra = '' database = '' expdesign = '' From 1c85910435c4a2e050231bc612d929942eeff438 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 24 May 2020 21:28:45 +0200 Subject: [PATCH 252/374] add docu for new feature --- docs/usage.md | 7 +++++++ main.nf | 3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 4fdbc91..6b9a601 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -12,6 +12,7 @@ Either (using a PRIDE Sample to data relation format file): * [`--sdrf`](#--sdrf) * [`--root_folder`](#--root_folder) + * [`--local_input_type`](#--rlocal_input_type) Or (using spectrum files and an OpenMS style experimental design): * [`--spectra`](#--spectra) @@ -179,6 +180,12 @@ This optional parameter can be used to specify a root folder in which the spectr It is usually used if you have a local version of the experiment already. Note that this option does not support recursive searching yet. +### `--local_input_type` + +If the above [`--root_folder`](#--root_folder) was given to load local input files, this overwrites the file type/extension of +the filename as specified in the SDRF. Usually used in case you have an mzML-converted version of the files already. Needs to be +one of 'mzML' or 'raw' (the letter cases should match your files exactly). + ----- __b)__ By specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style diff --git a/main.nf b/main.nf index a7b4c5c..6da8d93 100644 --- a/main.nf +++ b/main.nf @@ -23,6 +23,7 @@ def helpMessage() { Either: --sdrf Path to PRIDE Sample to data relation format file --root_folder (Optional) If given, looks for the filenames in the SDRF in this folder, locally + --local_input_type (Optional) If given and 'root_folder' was specified, it overwrites the filetype in the SDRF for local lookup and matches only the basename. Or: --spectra Path to input spectra as mzML or Thermo Raw --expdesign Path to optional experimental design file (if not given, it assumes unfractionated, unrelated samples) @@ -56,7 +57,7 @@ def helpMessage() { --max_peptide_length Maximum peptide length to consider (default: 40) --instrument Type of instrument that generated the data (currently only 'high_res' [default] and 'low_res' supported) --protocol Used labeling or enrichment protocol (if any) - --fragment_method Used fragmentation method (currently unused since we let the search engines consider all MS2 spectra and let them determine from the spectrum metadata) + --fragment_method Used fragmentation method (currently unused since we let the search engines consider all MS2 spectra and let them determine from the spectrum metadata) --max_mods Maximum number of modifications per peptide. If this value is large, the search may take very long --db_debug Debug level during database search From 92bb6091a254ee6921052561c1dc4790e6417f0c Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 25 May 2020 00:42:54 +0200 Subject: [PATCH 253/374] spell --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 6b9a601..78f1338 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -12,7 +12,7 @@ Either (using a PRIDE Sample to data relation format file): * [`--sdrf`](#--sdrf) * [`--root_folder`](#--root_folder) - * [`--local_input_type`](#--rlocal_input_type) + * [`--local_input_type`](#--local_input_type) Or (using spectrum files and an OpenMS style experimental design): * [`--spectra`](#--spectra) From bdb19a9f01fe790b426588c0e1ec227d11a8aa48 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 5 Jun 2020 10:45:06 +0200 Subject: [PATCH 254/374] use a nightly openms conda package on dev --- environment.yml | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/environment.yml b/environment.yml index ecc7417..b950387 100644 --- a/environment.yml +++ b/environment.yml @@ -1,20 +1,27 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml name: nf-core-proteomicslfq-1.0dev + channels: + - bgruening - conda-forge - bioconda - - defaults dependencies: - - bioconda::openms-thirdparty=2.5.0 # also includes comet, MSGF, Luciphor, TRFP - - bioconda::bioconductor-msstats=3.18.0 # will include R - - bioconda::sdrf-pipelines=0.0.4 # for SDRF conversion - - conda-forge::r-ptxqc=1.0.2 # for QC reports - - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - - conda-forge::openjdk=8.0.192 # pin java to 8 for MSGF (otherwise it somehow chooses 11) + - bgruening::openms + - bioconda::thermorawfileparser + - bioconda::msgf_plus + - bioconda::comet-ms + - bioconda::luciphor2 + - bioconda::percolator + - bioconda::bioconductor-msstats=3.20.0 # will include R + - bioconda::sdrf-pipelines # for SDRF conversion + - conda-forge::r-ptxqc # for QC reports + - conda-forge::fonts-conda-ecosystem # for the fonts in QC reports + - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 + From b1f8b3524e15c88852510d815c62bdb08495c2c7 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 5 Jun 2020 13:37:40 +0200 Subject: [PATCH 255/374] [FEATURE] Build docker images in every PR --- .github/workflows/ci.yml | 51 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 5 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 134f8b4..bcff598 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -41,15 +41,25 @@ jobs: if: github.event_name == 'pull_request' shell: bash run: echo "::set-env name=RUN_NAME::PR`jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH"`" - id: extract_pr - - name: Install Nextflow - run: | - wget -qO- get.nextflow.io | bash - sudo mv nextflow /usr/local/bin/ + id: extract_pr + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/proteomicslfq:dev - name: Pull docker image + if: ${{ !env.GIT_DIFF }} run: | docker pull nfcore/proteomicslfq:dev docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ - name: Run pipeline with test data run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker @@ -65,3 +75,34 @@ jobs: with: name: nextflow.log path: .nextflow.log + + push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run if the tests passed + needs: test + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/proteomicslfq' && (github.event_name == 'release' || github.event_name == 'push') }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/proteomicslfq:latest + + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:dev + docker push nfcore/proteomicslfq:dev + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/proteomicslfq:latest + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:${{ github.ref }} + docker push nfcore/proteomicslfq:${{ github.ref }} From 8850058f35370c550a24cee44e3b46ce5f3ed32b Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 5 Jun 2020 13:43:43 +0200 Subject: [PATCH 256/374] try to trigger image change --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index fa33460..531d339 100644 --- a/Dockerfile +++ b/Dockerfile @@ -9,7 +9,7 @@ RUN conda env create -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH -# OpenMS Adapters need the raw jars of Java based bioconda tools in the PATH. Not the wrappers that conda creates. +# OpenMS Adapters need the raw jars of Java-based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/LuciPHOr2/Luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) From 6687febb183d9c6ed01797e253b8b436a8360284 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 5 Jun 2020 14:06:06 +0200 Subject: [PATCH 257/374] [FIX] luciphor move in Dockerfile --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index 531d339..782e176 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,7 +11,7 @@ ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH # OpenMS Adapters need the raw jars of Java-based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) -RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/LuciPHOr2/Luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) +RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From dc93c014387cb357a00fc7c9d49c27a95f22ca6a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 14:03:03 +0200 Subject: [PATCH 258/374] try installing ptxqc from github --- Dockerfile | 9 +++++++++ environment.yml | 5 ++--- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Dockerfile b/Dockerfile index 782e176..d58db8a 100644 --- a/Dockerfile +++ b/Dockerfile @@ -13,5 +13,14 @@ ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) +# ------------- Parts for dev-only (to have nightly versions of some tools) -------------# + +Rscript < nf-core-proteomicslfq-1.0dev.yml diff --git a/environment.yml b/environment.yml index b950387..c0b704f 100644 --- a/environment.yml +++ b/environment.yml @@ -15,7 +15,8 @@ dependencies: - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - bioconda::sdrf-pipelines # for SDRF conversion - - conda-forge::r-ptxqc # for QC reports + # TODO for release, pin to a version. For now, install in Dockerfile from github + #- conda-forge::r-ptxqc # for QC reports - conda-forge::fonts-conda-ecosystem # for the fonts in QC reports - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 @@ -23,5 +24,3 @@ dependencies: - conda-forge::pymdown-extensions=6.0 - conda-forge::pygments=2.5.2 - - From f7dfe3cf772321459d650f21630563b3e1360f3a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 14:22:32 +0200 Subject: [PATCH 259/374] forgot RUN --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index d58db8a..c1d6d6f 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -Rscript < Date: Tue, 9 Jun 2020 14:36:39 +0200 Subject: [PATCH 260/374] no heredoc --- Dockerfile | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/Dockerfile b/Dockerfile index c1d6d6f..825dc95 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,12 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN Rscript < nf-core-proteomicslfq-1.0dev.yml From 765f38c79473cc7555e66304d66aeb49c0bfc5f4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 14:50:46 +0200 Subject: [PATCH 261/374] pipe --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index 825dc95..2d340aa 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN Rscript 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' +RUN echo 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' | Rscript # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From a796ed01cd5ba9a8f654fdbf930686d8abb7a1bb Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 15:12:43 +0200 Subject: [PATCH 262/374] option --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index 2d340aa..846032d 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN echo 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' | Rscript +RUN echo 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' | Rscript -e # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From adb4d0b699de346950a969968a1d6a7bc445716e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 15:36:02 +0200 Subject: [PATCH 263/374] ... --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index 846032d..c758bed 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN echo 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' | Rscript -e +RUN Rscript -e 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From 63770e945c723e96e8d217183872520a76e9ff76 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 15:53:04 +0200 Subject: [PATCH 264/374] add cran mirror --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index c758bed..2b0403c 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN Rscript -e 'if (!require(devtools, quietly = TRUE)) install.packages("devtools"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' +RUN Rscript -e 'if (!require(devtools, quietly = TRUE)) install.packages("devtools", repos = "http://cran.us.r-project.org"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From 4be67cd4e9bc23c6710f2896409d352cc311a424 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 16:29:17 +0200 Subject: [PATCH 265/374] install devtools via conda --- Dockerfile | 2 +- environment.yml | 6 ++++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/Dockerfile b/Dockerfile index 2b0403c..dedbffa 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN Rscript -e 'if (!require(devtools, quietly = TRUE)) install.packages("devtools", repos = "http://cran.us.r-project.org"); library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' +RUN Rscript -e 'if (library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml diff --git a/environment.yml b/environment.yml index c0b704f..ee1f9d9 100644 --- a/environment.yml +++ b/environment.yml @@ -7,6 +7,7 @@ channels: - conda-forge - bioconda dependencies: + # TODO fix versions for release - bgruening::openms - bioconda::thermorawfileparser - bioconda::msgf_plus @@ -14,10 +15,11 @@ dependencies: - bioconda::luciphor2 - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - - bioconda::sdrf-pipelines # for SDRF conversion + - bioconda::sdrf-pipelines=0.0.5 # for SDRF conversion # TODO for release, pin to a version. For now, install in Dockerfile from github #- conda-forge::r-ptxqc # for QC reports - - conda-forge::fonts-conda-ecosystem # for the fonts in QC reports + - conda-forge::r-devtools=2.3.0 + - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 From 0ee7868cc12db14cf4b2ce8d232291fb5fd9056b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 16:44:39 +0200 Subject: [PATCH 266/374] fix --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index dedbffa..8ec0e54 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,7 +15,7 @@ RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor # ------------- Parts for dev-only (to have nightly versions of some tools) -------------# -RUN Rscript -e 'if (library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' +RUN Rscript -e 'library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml From 2199e0a4a7c932dbc164198a6dcfdeab59883b73 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 17:02:58 +0200 Subject: [PATCH 267/374] add pandoc --- environment.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index ee1f9d9..218524b 100644 --- a/environment.yml +++ b/environment.yml @@ -16,9 +16,11 @@ dependencies: - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - bioconda::sdrf-pipelines=0.0.5 # for SDRF conversion - # TODO for release, pin to a version. For now, install in Dockerfile from github + # ---- TODO for release, pin to a version. For now, install in Dockerfile from github #- conda-forge::r-ptxqc # for QC reports - conda-forge::r-devtools=2.3.0 + - conda-forge::pandoc=2.9.2.1 + # ------------------------------------------------------------------------------------ - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 From 4d04d846da43ca22699557974f9f1b6d6fd2095f Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 18:51:36 +0200 Subject: [PATCH 268/374] revert --- Dockerfile | 4 ---- environment.yml | 6 +----- 2 files changed, 1 insertion(+), 9 deletions(-) diff --git a/Dockerfile b/Dockerfile index 8ec0e54..782e176 100644 --- a/Dockerfile +++ b/Dockerfile @@ -13,9 +13,5 @@ ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) -# ------------- Parts for dev-only (to have nightly versions of some tools) -------------# - -RUN Rscript -e 'library("devtools"); install_github("cbielow/PTXQC", build_vignettes = FALSE, dependencies = TRUE)' - # Dump the details of the installed packages to a file for posterity RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml diff --git a/environment.yml b/environment.yml index 218524b..9cd931a 100644 --- a/environment.yml +++ b/environment.yml @@ -16,11 +16,7 @@ dependencies: - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - bioconda::sdrf-pipelines=0.0.5 # for SDRF conversion - # ---- TODO for release, pin to a version. For now, install in Dockerfile from github - #- conda-forge::r-ptxqc # for QC reports - - conda-forge::r-devtools=2.3.0 - - conda-forge::pandoc=2.9.2.1 - # ------------------------------------------------------------------------------------ + - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 From 25de7756c1cbf65f3948fa28f5e5784e32f67bbc Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 19:25:38 +0200 Subject: [PATCH 269/374] conda fix --- environment.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/environment.yml b/environment.yml index 9cd931a..925ecce 100644 --- a/environment.yml +++ b/environment.yml @@ -17,6 +17,7 @@ dependencies: - bioconda::bioconductor-msstats=3.20.0 # will include R - bioconda::sdrf-pipelines=0.0.5 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports + - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 From c609fa657b9a0278fd930b668675c02814e3be74 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 9 Jun 2020 22:17:05 +0200 Subject: [PATCH 270/374] wrong search engine for mini test --- conf/test.config | 2 +- conf/test_full.config | 2 +- conf/test_localize.config | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/conf/test.config b/conf/test.config index 1e9b5af..901ab90 100644 --- a/conf/test.config +++ b/conf/test.config @@ -28,7 +28,7 @@ params { database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' posterior_probabilities = "fit_distributions" - search_engine = "msgf" + search_engines = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" enable_qc = true diff --git a/conf/test_full.config b/conf/test_full.config index 9e157ce..74b9051 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -25,7 +25,7 @@ params { database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata-aws/uniprot_yeast_reviewed_isoforms_ups1_crap.fasta_td.fasta' expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata-aws/experimental_design_short.tsv' posterior_probabilities = "percolator" - search_engine = "comet" + search_engines = "comet" psm_pep_fdr_cutoff = 0.01 decoy_affix = "rev" protein_inference = "bayesian" diff --git a/conf/test_localize.config b/conf/test_localize.config index 077ea42..d846bb6 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -30,7 +30,7 @@ params { expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' posterior_probabilities = "fit_distributions" enable_mod_localization = true - search_engine = "msgf" + search_engines = "msgf" protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" enable_qc = true From 18bf4f90290cf66a8a7ecd4894b8b577f45fb332 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 6 Jul 2020 15:47:54 +0200 Subject: [PATCH 271/374] [FEATURE] Added json schema --- nextflow_schema.json | 588 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 588 insertions(+) create mode 100644 nextflow_schema.json diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 0000000..b6be87d --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,588 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/proteomicslfq/master/nextflow_schema.json", + "title": "nf-core/proteomicslfq pipeline parameters", + "description": "Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.", + "type": "object", + "properties": { + "Input/output options": { + "type": "object", + "properties": { + "outdir": { + "type": "string", + "description": "The output directory where the results will be saved.", + "default": "./results", + "fa_icon": "fas fa-folder-open", + "help_text": "" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "An email address to send a summary email to when the pipeline is completed.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + } + }, + "fa_icon": "fas fa-terminal", + "description": "Define where the pipeline should find input data and save output data." + }, + "Generic options": { + "type": "object", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "hidden": true, + "fa_icon": "fas fa-question-circle", + "default": false + }, + "name": { + "type": "string", + "description": "Workflow name.", + "fa_icon": "fas fa-fingerprint", + "hidden": true, + "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "hidden": true, + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully." + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "hidden": true, + "default": false, + "help_text": "" + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true, + "default": false, + "help_text": "" + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true, + "help_text": "" + } + }, + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`." + }, + "Max job request options": { + "type": "object", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + }, + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details." + }, + "Institutional config options": { + "type": "object", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "" + } + }, + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline." + }, + "Main parameters (SDRF)": { + "type": "object", + "description": "", + "default": "", + "properties": { + "sdrf": { + "type": "string", + "fa_icon": "fas fa-vials" + }, + "root_folder": { + "type": "string", + "fa_icon": "fas fa-folder" + }, + "local_input_type": { + "type": "string", + "fa_icon": "fas fa-file-invoice" + } + }, + "fa_icon": "far fa-chart-bar" + }, + "Main parameters (TSV)": { + "type": "object", + "description": "", + "default": "", + "properties": { + "expdesign": { + "type": "string", + "fa_icon": "fas fa-file-csv" + }, + "spectra": { + "type": "string", + "fa_icon": "fas fa-copy" + } + }, + "fa_icon": "far fa-chart-bar" + }, + "Protein database": { + "type": "object", + "description": "", + "default": "", + "properties": { + "database": { + "type": "string", + "fa_icon": "fas fa-file" + }, + "add_decoys": { + "type": "string", + "fa_icon": "fas fa-coins" + }, + "decoy_affix": { + "type": "string", + "default": "DECOY_", + "fa_icon": "fas fa-font" + }, + "affix_type": { + "type": "string", + "default": "prefix", + "fa_icon": "fas fa-list-ol" + } + }, + "fa_icon": "fas fa-database", + "required": [ + "database" + ] + }, + "Spectrum preprocessing": { + "type": "object", + "description": "", + "default": "", + "properties": { + "openms_peakpicking": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "peakpicking_inmemory": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "peakpicking_ms_levels": { + "type": "string", + "fa_icon": "fas fa-font" + } + }, + "fa_icon": "far fa-chart-bar" + }, + "Database search": { + "type": "object", + "description": "", + "default": "", + "properties": { + "search_engines": { + "type": "string", + "default": "comet", + "fa_icon": "fas fa-tasks" + }, + "enzyme": { + "type": "string", + "default": "Trypsin", + "fa_icon": "fas fa-list-ol" + }, + "num_enzyme_termini": { + "type": "string", + "fa_icon": "fas fa-list-ol" + }, + "allowed_missed_cleavages": { + "type": "integer", + "default": 2, + "fa_icon": "fas fa-sliders-h" + }, + "precursor_mass_tolerance": { + "type": "integer", + "default": 5, + "fa_icon": "fas fa-sliders-h" + }, + "precursor_mass_tolerance_unit": { + "type": "string", + "default": "ppm", + "fa_icon": "fas fa-sliders-h" + }, + "fragment_mass_tolerance": { + "type": "number", + "default": 0.03, + "fa_icon": "fas fa-sliders-h" + }, + "fragment_mass_tolerance_unit": { + "type": "string", + "default": "Da", + "fa_icon": "fas fa-list-ol" + }, + "fixed_mods": { + "type": "string", + "default": "Carbamidomethyl (C)", + "fa_icon": "fas fa-tasks" + }, + "variable_mods": { + "type": "string", + "default": "Oxidation (M)", + "fa_icon": "fas fa-tasks" + }, + "fragment_method": { + "type": "string", + "default": "HCD", + "fa_icon": "fas fa-list-ol" + }, + "isotope_error_range": { + "type": "string", + "default": "0,1", + "fa_icon": "fas fa-tasks" + }, + "instrument": { + "type": "string", + "fa_icon": "fas fa-list-ol" + }, + "protocol": { + "type": "string", + "default": "automatic", + "fa_icon": "fas fa-list-ol" + }, + "min_precursor_charge": { + "type": "integer", + "default": 2, + "fa_icon": "fas fa-sliders-h" + }, + "max_precursor_charge": { + "type": "integer", + "default": 4, + "fa_icon": "fas fa-sliders-h" + }, + "min_peptide_length": { + "type": "integer", + "default": 6, + "fa_icon": "fas fa-sliders-h" + }, + "max_peptide_length": { + "type": "integer", + "default": 40, + "fa_icon": "fas fa-sliders-h" + }, + "num_hits": { + "type": "integer", + "default": 1, + "fa_icon": "fas fa-sliders-h" + }, + "max_mods": { + "type": "integer", + "default": 3, + "fa_icon": "fas fa-sliders-h" + }, + "db_debug": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-bug" + } + }, + "fa_icon": "fas fa-search" + }, + "Modification localization": { + "type": "object", + "description": "", + "default": "", + "properties": { + "enable_mod_localization": { + "type": "string", + "fa_icon": "fas fa-toggle-on" + }, + "mod_localization": { + "type": "string", + "default": "Phospho (S),Phospho (T),Phospho (Y)", + "fa_icon": "fas fa-tasks" + } + }, + "fa_icon": "fas fa-search-location" + }, + "Peptide re-indexing": { + "type": "object", + "description": "", + "default": "", + "properties": { + "allow_unmatched": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "IL_equivalent": { + "type": "string", + "default": "true", + "fa_icon": "far fa-check-square" + } + }, + "fa_icon": "fas fa-project-diagram" + }, + "PSM re-scoring (general)": { + "type": "object", + "description": "", + "default": "", + "properties": { + "posterior_probabilities": { + "type": "string", + "fa_icon": "fas fa-list-ol" + }, + "psm_pep_fdr_cutoff": { + "type": "number", + "default": 0.1, + "fa_icon": "fas fa-filter" + }, + "pp_debug": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-bug" + } + }, + "fa_icon": "fas fa-star-half-alt" + }, + "PSM re-scoring (Percolator)": { + "type": "object", + "description": "", + "default": "", + "properties": { + "FDR_level": { + "type": "string", + "default": "peptide-level-fdrs", + "fa_icon": "fas fa-list-ol" + }, + "train_FDR": { + "type": "number", + "default": 0.05, + "fa_icon": "fas fa-sliders-h" + }, + "test_FDR": { + "type": "number", + "default": 0.05, + "fa_icon": "fas fa-sliders-h" + }, + "subset_max_train": { + "type": "integer", + "default": 300000, + "fa_icon": "fas fa-sliders-h" + }, + "klammer": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "description_correct_features": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-list-ol" + } + }, + "fa_icon": "fas fa-star-half" + }, + "PSM re-scoring (distribution fitting)": { + "type": "object", + "description": "", + "default": "", + "properties": { + "outlier_handling": { + "type": "string", + "default": "none", + "fa_icon": "fas fa-list-ol" + }, + "new_param_2": { + "type": "string", + "description": "", + "default": "" + } + }, + "fa_icon": "far fa-star-half" + }, + "protein_inference": { + "type": "string", + "default": "aggregation", + "fa_icon": "fas fa-list-ol" + }, + "protein_level_fdr_cutoff": { + "type": "number", + "default": 0.05, + "fa_icon": "fas fa-filter" + }, + "consensusid_algorithm": { + "type": "string", + "default": "best", + "fa_icon": "fas fa-list-ol" + }, + "min_consensus_support": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-filter" + }, + "consensusid_considered_top_hits": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-sliders-h" + }, + "luciphor_neutral_losses": { + "type": "string", + "fa_icon": "fas fa-font", + "hidden": true + }, + "luciphor_decoy_mass": { + "type": "number", + "fa_icon": "fas fa-font", + "hidden": true + }, + "luciphor_decoy_neutral_losses": { + "type": "string", + "fa_icon": "fas fa-font", + "hidden": true + }, + "inf_quant_debug": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-bug" + }, + "targeted_only": { + "type": "string", + "default": "true", + "fa_icon": "far fa-check-square" + }, + "mass_recalibration": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "transfer_ids": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "skip_post_msstats": { + "type": "string", + "fa_icon": "fas fa-forward" + }, + "ref_condition": { + "type": "string", + "fa_icon": "fas fa-font" + }, + "contrasts": { + "type": "string", + "fa_icon": "fas fa-font" + }, + "enable_qc": { + "type": "string", + "fa_icon": "fas fa-toggle-on" + }, + "ptxqc_report_layout": { + "type": "string", + "fa_icon": "far fa-file" + }, + "Consensus ID": { + "type": "object", + "description": "", + "default": "", + "properties": {}, + "fa_icon": "fas fa-code-branch" + }, + "Protein inference": { + "type": "object", + "description": "", + "default": "", + "properties": {}, + "fa_icon": "fab fa-hubspot" + }, + "Protein quantification": { + "type": "object", + "description": "", + "default": "", + "properties": {}, + "fa_icon": "fas fa-braille" + }, + "Statistical post-processing": { + "type": "object", + "description": "", + "default": "", + "properties": {}, + "fa_icon": "fab fa-r-project" + }, + "Quality control": { + "type": "object", + "description": "", + "default": "", + "properties": {}, + "fa_icon": "fas fa-file-medical-alt" + } + } +} \ No newline at end of file From 87c63ae77538f9b0cb7d8967353168278a2b5096 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 6 Jul 2020 16:35:07 +0200 Subject: [PATCH 272/374] smaller fixes --- nextflow_schema.json | 176 +++++++++++++++++++++---------------------- 1 file changed, 88 insertions(+), 88 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index b6be87d..49f4ac3 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -373,6 +373,21 @@ "type": "string", "default": "Phospho (S),Phospho (T),Phospho (Y)", "fa_icon": "fas fa-tasks" + }, + "luciphor_neutral_losses": { + "type": "string", + "fa_icon": "fas fa-font", + "hidden": true + }, + "luciphor_decoy_mass": { + "type": "number", + "fa_icon": "fas fa-font", + "hidden": true + }, + "luciphor_decoy_neutral_losses": { + "type": "string", + "fa_icon": "fas fa-font", + "hidden": true } }, "fa_icon": "fas fa-search-location" @@ -462,126 +477,111 @@ "type": "string", "default": "none", "fa_icon": "fas fa-list-ol" - }, - "new_param_2": { - "type": "string", - "description": "", - "default": "" } }, "fa_icon": "far fa-star-half" }, - "protein_inference": { - "type": "string", - "default": "aggregation", - "fa_icon": "fas fa-list-ol" - }, - "protein_level_fdr_cutoff": { - "type": "number", - "default": 0.05, - "fa_icon": "fas fa-filter" - }, - "consensusid_algorithm": { - "type": "string", - "default": "best", - "fa_icon": "fas fa-list-ol" - }, - "min_consensus_support": { - "type": "integer", - "default": 0, - "fa_icon": "fas fa-filter" - }, - "consensusid_considered_top_hits": { - "type": "integer", - "default": 0, - "fa_icon": "fas fa-sliders-h" - }, - "luciphor_neutral_losses": { - "type": "string", - "fa_icon": "fas fa-font", - "hidden": true - }, - "luciphor_decoy_mass": { - "type": "number", - "fa_icon": "fas fa-font", - "hidden": true - }, - "luciphor_decoy_neutral_losses": { - "type": "string", - "fa_icon": "fas fa-font", - "hidden": true - }, - "inf_quant_debug": { - "type": "integer", - "default": 0, - "fa_icon": "fas fa-bug" - }, - "targeted_only": { - "type": "string", - "default": "true", - "fa_icon": "far fa-check-square" - }, - "mass_recalibration": { - "type": "string", - "fa_icon": "far fa-check-square" - }, - "transfer_ids": { - "type": "string", - "fa_icon": "far fa-check-square" - }, - "skip_post_msstats": { - "type": "string", - "fa_icon": "fas fa-forward" - }, - "ref_condition": { - "type": "string", - "fa_icon": "fas fa-font" - }, - "contrasts": { - "type": "string", - "fa_icon": "fas fa-font" - }, - "enable_qc": { - "type": "string", - "fa_icon": "fas fa-toggle-on" - }, - "ptxqc_report_layout": { - "type": "string", - "fa_icon": "far fa-file" - }, "Consensus ID": { "type": "object", "description": "", "default": "", - "properties": {}, + "properties": { + "consensusid_algorithm": { + "type": "string", + "default": "best", + "fa_icon": "fas fa-list-ol" + }, + "consensusid_considered_top_hits": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-sliders-h" + }, + "min_consensus_support": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-filter" + } + }, "fa_icon": "fas fa-code-branch" }, "Protein inference": { "type": "object", "description": "", "default": "", - "properties": {}, + "properties": { + "protein_inference": { + "type": "string", + "default": "aggregation", + "fa_icon": "fas fa-list-ol" + }, + "protein_level_fdr_cutoff": { + "type": "number", + "default": 0.05, + "fa_icon": "fas fa-filter" + } + }, "fa_icon": "fab fa-hubspot" }, "Protein quantification": { "type": "object", "description": "", "default": "", - "properties": {}, + "properties": { + "mass_recalibration": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "transfer_ids": { + "type": "string", + "fa_icon": "far fa-check-square" + }, + "targeted_only": { + "type": "string", + "default": "true", + "fa_icon": "far fa-check-square" + }, + "inf_quant_debug": { + "type": "integer", + "default": 0, + "fa_icon": "fas fa-bug" + } + }, "fa_icon": "fas fa-braille" }, "Statistical post-processing": { "type": "object", "description": "", "default": "", - "properties": {}, + "properties": { + "skip_post_msstats": { + "type": "string", + "fa_icon": "fas fa-forward" + }, + "ref_condition": { + "type": "string", + "fa_icon": "fas fa-font" + }, + "contrasts": { + "type": "string", + "fa_icon": "fas fa-font" + } + }, "fa_icon": "fab fa-r-project" }, "Quality control": { "type": "object", "description": "", "default": "", - "properties": {}, + "properties": { + "enable_qc": { + "type": "string", + "fa_icon": "fas fa-toggle-on" + }, + "ptxqc_report_layout": { + "type": "string", + "fa_icon": "far fa-file" + } + }, "fa_icon": "fas fa-file-medical-alt" } } From e4f35c7db44944aa47cf0e09a30d521dea38a176 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 7 Jul 2020 13:31:01 +0200 Subject: [PATCH 273/374] Start copying over descriptions from usage.md --- nextflow_schema.json | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 49f4ac3..2dc079c 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -166,20 +166,26 @@ }, "Main parameters (SDRF)": { "type": "object", - "description": "", + "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see here TODO for examples). Alternatively, you can use the [TSV options](#main_parameters__tsv_)", "default": "", "properties": { "sdrf": { "type": "string", - "fa_icon": "fas fa-vials" + "description": "The URI or path to the SDRF file", + "fa_icon": "fas fa-vials", + "help_text": "The URI or path to the SDRF file. Input files will be downloaded and cached from the URIs specified in the SDRF file.\nAn OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the\nfollowing parameters will currently be overwritten by the ones specified in the SDRF:\n\n* `fixed_mods`,\n* `variable_mods`,\n* `precursor_mass_tolerance`,\n* `precursor_mass_tolerance_unit`,\n* `fragment_mass_tolerance`,\n* `fragment_mass_tolerance_unit`,\n* `fragment_method`,\n* `enzyme`" }, "root_folder": { "type": "string", - "fa_icon": "fas fa-folder" + "description": "Root folder in which the spectrum files specified in the SDRF are searched", + "fa_icon": "fas fa-folder", + "help_text": "This optional parameter can be used to specify a root folder in which the spectrum files specified in the SDRF are searched.\nIt is usually used if you have a local version of the experiment already. Note that this option does not support recursive\nsearching yet." }, "local_input_type": { "type": "string", - "fa_icon": "fas fa-file-invoice" + "description": "Overwrite the file type/extension of the filename as specified in the SDRF", + "fa_icon": "fas fa-file-invoice", + "help_text": "If the above [`--root_folder`](#params_root_folder) was given to load local input files, this overwrites the file type/extension of\nthe filename as specified in the SDRF. Usually used in case you have an mzML-converted version of the files already. Needs to be\none of 'mzML' or 'raw' (the letter cases should match your files exactly)." } }, "fa_icon": "far fa-chart-bar" From bca805985f2083b7054bb2405c6101904aac5957 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 14 Jul 2020 21:46:26 +0200 Subject: [PATCH 274/374] Filled more parameter --- nextflow_schema.json | 61 +++++++++++++++++++++++++++++++------------- 1 file changed, 43 insertions(+), 18 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 2dc079c..74ed6be 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -166,7 +166,7 @@ }, "Main parameters (SDRF)": { "type": "object", - "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by using a path or URI to a PRIDE Sample to data relation format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see here TODO for examples). Alternatively, you can use the [TSV options](#main_parameters__tsv_)", + "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by using a path or URI to a PRIDE Sample to Data Relation Format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see [here](https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects) for examples). Alternatively, you can use the [TSV options](#main_parameters__tsv_)", "default": "", "properties": { "sdrf": { @@ -192,7 +192,7 @@ }, "Main parameters (TSV)": { "type": "object", - "description": "", + "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style experimental design file. Alternatively, you can use the [SDRF options](#main_parameters__sdrf_)", "default": "", "properties": { "expdesign": { @@ -201,33 +201,43 @@ }, "spectra": { "type": "string", - "fa_icon": "fas fa-copy" + "description": "Location of mzML or Thermo RAW files", + "fa_icon": "fas fa-copy", + "help_text": "Use this to specify the location of your input mzML or Thermo RAW files:\n\n```bash\n--spectra 'path/to/data/*.mzML'\n```\n\nor\n\n```bash\n--spectra 'path/to/data/*.raw'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character" } }, "fa_icon": "far fa-chart-bar" }, "Protein database": { "type": "object", - "description": "", + "description": "Settings that relate to the mandatory protein database and the optional generation of decoy entries.", "default": "", "properties": { "database": { "type": "string", - "fa_icon": "fas fa-file" + "description": "The `fasta` protein database used during database search.", + "fa_icon": "fas fa-file", + "help_text": "Since the database is not included in an SDRF, this parameter always needs to be given to specify the input protein database\nwhen you run the pipeline. Remember to include contaminants (and decoys if not added in the pipeline with \\-\\-add-decoys)\n\n```bash\n--database '[path to Fasta protein database]'\n```" }, "add_decoys": { "type": "string", - "fa_icon": "fas fa-coins" + "description": "Generate and append decoys to the given protein database", + "fa_icon": "fas fa-coins", + "help_text": "If decoys were not yet included in the input database, they have to be appended by OpenMS DecoyGenerator by adding this flag (TODO allow specifying type).\nDefault: pseudo-reverse peptides" }, "decoy_affix": { "type": "string", + "description": "Pre- or suffix of decoy proteins in their accession", "default": "DECOY_", - "fa_icon": "fas fa-font" + "fa_icon": "fas fa-font", + "help_text": "If [`--add-decoys`](#params_add_decoys) was set, this setting is used during generation and passed to all tools that need decoy information.\n If decoys were appended to the database externally, this setting needs to match the used affix. (While OpenMS tools can infer the affix automatically, some thirdparty tools might not.)\nTypical values are 'rev', 'decoy', 'dec'. Look for them in your database." }, "affix_type": { "type": "string", + "description": "Location of the decoy marker string in the fasta accession. Before (prefix) or after (suffix)", "default": "prefix", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Prefix is highly recommended. Only in case an external tool marked decoys with a suffix, e.g. `sp|Q12345|ProteinA_DECOY` change this parameter to suffix." } }, "fa_icon": "fas fa-database", @@ -237,20 +247,26 @@ }, "Spectrum preprocessing": { "type": "object", - "description": "", + "description": "In case you start from profile mode mzMLs or the internal preprocessing during conversion with the ThermoRawFileParser fails (e.g. due to new instrument types), preprocessing has to be performed with OpenMS. Use this section to configure.", "default": "", "properties": { "openms_peakpicking": { - "type": "string", - "fa_icon": "far fa-check-square" + "type": "boolean", + "description": "Activate OpenMS-internal peak picking", + "fa_icon": "far fa-check-square", + "help_text": "Activate OpenMS-internal peak picking with the tool PeakPickerHiRes. Skips already picked spectra." }, "peakpicking_inmemory": { - "type": "string", - "fa_icon": "far fa-check-square" + "type": "boolean", + "description": "Perform peakpicking in memory", + "fa_icon": "far fa-check-square", + "help_text": "Perform peakpicking in memory. Use only if problems occur.", }, "peakpicking_ms_levels": { "type": "string", - "fa_icon": "fas fa-font" + "description": "Which MS levels to pick as comma separated list. Leave empty for auto-detection.", + "fa_icon": "fas fa-font", + "help_text": "Which MS levels to pick as comma separated list, e.g. `--peakpicking_ms_levels 1,2`. Leave empty for auto-detection." } }, "fa_icon": "far fa-chart-bar" @@ -262,22 +278,31 @@ "properties": { "search_engines": { "type": "string", + "description": "A comma separated list of search engines. Valid: comet, msgf", "default": "comet", - "fa_icon": "fas fa-tasks" + "fa_icon": "fas fa-tasks", + "help_text": "A comma-separated list of search engines to run in parallel on each mzML file. Currently supported: comet and msgf (default: comet)\nIf more than one search engine is given, results are combined based on posterior error probabilities (see the different types\nof estimation procedures under [`--posterior_probabilities`](#--posterior_probabilities)). Combination is done with\n[ConsensusID](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_ConsensusID.html).\nSee also its corresponding [`--consensusid_algorithm`](#--consensusid_algorithm) parameter for different combination strategies.\nCombinations may profit from an increased [`--num_hits`](#--num_hits) parameter." }, "enzyme": { "type": "string", + "description": "The enzyme to be used for in-silico digestion", "default": "Trypsin", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS\n[enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended\ncutting rules, as used by default with `Trypsin`. I.e. if you specify `Trypsin with MSGF, it will be automatically converted to\n`Trypsin/P`= 'Trypsin without proline rule'." }, "num_enzyme_termini": { "type": "string", - "fa_icon": "fas fa-list-ol" + "description": "Specify the amount of termini matching the enzyme cutting rules for a peptide to be considered. Valid values are `fully` (default), `semi`, or `none`", + "default": "fully", + "fa_icon": "fas fa-list-ol", + "help_text": "" }, "allowed_missed_cleavages": { "type": "integer", + "description": "Specify the maximum number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if `unspecific cleavage` is specified as enzyme.", "default": 2, - "fa_icon": "fas fa-sliders-h" + "fa_icon": "fas fa-sliders-h", + "help_text": "" }, "precursor_mass_tolerance": { "type": "integer", From ad43956968b741ed4e56ca62deab31da7a1ca0ad Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 15 Jul 2020 16:52:49 +0200 Subject: [PATCH 275/374] [FIX] Switch scores before luciphor --- main.nf | 64 +++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 51 insertions(+), 13 deletions(-) diff --git a/main.nf b/main.nf index 052a94a..3962305 100644 --- a/main.nf +++ b/main.nf @@ -680,7 +680,7 @@ process percolator { tuple mzml_id, file(id_file) from id_files_idx_feat output: - tuple mzml_id, file("${id_file.baseName}_perc.idXML"), val("MS:1001491") into id_files_perc, id_files_perc_consID + tuple mzml_id, file("${id_file.baseName}_perc.idXML"), val("MS:1001491"), val("pep") into id_files_perc, id_files_perc_consID file "*.log" when: @@ -758,7 +758,7 @@ process idpep { tuple mzml_id, file(id_file) from id_files_idx_ForIDPEP_FDR.mix(id_files_idx_ForIDPEP_noFDR) output: - tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), val("q-value_score") into id_files_idpep, id_files_idpep_consID + tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), val("q-value_score"), val("Posterior Error Probability") into id_files_idpep, id_files_idpep_consID file "*.log" when: @@ -769,7 +769,7 @@ process idpep { IDPosteriorErrorProbability -in ${id_file} \\ -out ${id_file.baseName}_idpep.idXML \\ -fit_algorithm:outlier_handling ${params.outlier_handling} \\ - -threads ${task.cpus} \\ + -threads ${task.cpus} \\ > ${id_file.baseName}_idpep.log """ } @@ -786,10 +786,10 @@ process idscoreswitcher_to_qval { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), qval_score from id_files_idpep.mix(id_files_perc) + tuple mzml_id, file(id_file), val(qval_score), val(pep_score) from id_files_idpep.mix(id_files_perc) output: - tuple mzml_id, file("${id_file.baseName}_switched.idXML") into id_files_noConsID_qval + tuple mzml_id, file("${id_file.baseName}_switched.idXML"), val(pep_score) into id_files_noConsID_qval file "*.log" when: @@ -816,17 +816,20 @@ process consensusid { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + // we can drop qval_score in this branch since we have to recalculate FDR anyway input: - tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) + tuple mzml_id, file(id_files_from_ses), val(qval_score), val(pep_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) output: - tuple mzml_id, file("${mzml_id}_consensus.idXML") into consensusids + tuple mzml_id, file("${mzml_id}_consensus.idXML"), val(pep_score_first) into consensusids file "*.log" when: params.search_engines.split(",").size() > 1 script: + // pep scores have to be the same. Otherwise the tool fails anyway. + pep_score_first = pep_score[0] """ ConsensusID -in ${id_files_from_ses} \\ -out ${mzml_id}_consensus.idXML \\ @@ -849,10 +852,10 @@ process fdr_consensusid { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_file) from consensusids + tuple mzml_id, file(id_file), val(pep_score) from consensusids output: - tuple mzml_id, file("${id_file.baseName}_fdr.idXML") into consensusids_fdr + tuple mzml_id, file("${id_file.baseName}_fdr.idXML"), val(pep_score) into consensusids_fdr file "*.log" when: @@ -880,10 +883,10 @@ process idfilter { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_file) from id_files_noConsID_qval.mix(consensusids_fdr) + tuple mzml_id, file(id_file), val(pep_score) from id_files_noConsID_qval.mix(consensusids_fdr) output: - tuple mzml_id, file("${id_file.baseName}_filter.idXML") into id_filtered, id_filtered_luciphor + tuple mzml_id, file("${id_file.baseName}_filter.idXML"), val(pep_score) into id_filtered, id_filtered_luciphor file "*.log" script: @@ -900,12 +903,46 @@ plfq_in_id = params.enable_mod_localization ? Channel.empty() : id_filtered +// TODO make luciphor pick its own score so we can skip this step +process idscoreswitcher_for_luciphor { + + label 'process_very_low' + label 'process_single_thread' + + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + + input: + tuple mzml_id, file(id_file), val(pep_score) from id_filtered_luciphor + + output: + tuple mzml_id, file("${id_file.baseName}_pep.idXML") into id_filtered_luciphor_pep + file "*.log" + + when: + params.enable_mod_localization + + script: + """ + IDScoreSwitcher -in ${id_file} \\ + -out ${id_file.baseName}_pep.idXML \\ + -threads ${task.cpus} \\ + -old_score "q-value" \\ + -new_score "${pep_score}_score" \\ + -new_score_type "Posterior Error Probability" \\ + -new_score_orientation lower_better \\ + > ${id_file.baseName}_switch_pep_for_luciphor.log + + """ +} + process luciphor { + label 'process_medium' + publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_filtered_luciphor).join(ch_sdrf_config.luciphor_settings) + tuple mzml_id, file(mzml_file), file(id_file), frag_method from mzmls_luciphor.join(id_filtered_luciphor_pep).join(ch_sdrf_config.luciphor_settings) output: set mzml_id, file("${id_file.baseName}_luciphor.idXML") into plfq_in_id_luciphor @@ -931,7 +968,8 @@ process luciphor { ${dec_losses} \\ -max_charge_state ${params.max_precursor_charge} \\ -max_peptide_length ${params.max_peptide_length} \\ - > ${id_file.baseName}_scoreswitcher.log + -debug ${params.localization_debug} \\ + > ${id_file.baseName}_luciphor.log """ // -fragment_mass_tolerance ${} \\ // -fragment_error_units ${} \\ From b144f428d2716717e90c5b0e4fad36198838668e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jul 2020 11:25:20 +0200 Subject: [PATCH 276/374] use latest sdrf --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 925ecce..7644a2f 100644 --- a/environment.yml +++ b/environment.yml @@ -15,7 +15,7 @@ dependencies: - bioconda::luciphor2 - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - - bioconda::sdrf-pipelines=0.0.5 # for SDRF conversion + - bioconda::sdrf-pipelines # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports From 1d49a5105f9ebff5affc7c7792b69b67d4a4e8e9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 16 Jul 2020 11:52:01 +0200 Subject: [PATCH 277/374] use latest sdrf. use version --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 7644a2f..c629fbc 100644 --- a/environment.yml +++ b/environment.yml @@ -15,7 +15,7 @@ dependencies: - bioconda::luciphor2 - bioconda::percolator - bioconda::bioconductor-msstats=3.20.0 # will include R - - bioconda::sdrf-pipelines # for SDRF conversion + - bioconda::sdrf-pipelines=0.0.9 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports From 5157f9eca673cf884f5d4026b703a539005cce32 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 13:17:31 +0200 Subject: [PATCH 278/374] Some fixes and add test data --- .github/workflows/ci.yml | 2 +- conf/test_localize.config | 18 +++--------------- main.nf | 28 +++++++++++++--------------- nextflow.config | 4 +--- 4 files changed, 18 insertions(+), 34 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index bcff598..9f385dd 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,7 +20,7 @@ jobs: matrix: # Nextflow versions: check pipeline minimum and current latest nxf_ver: ['20.01.0', ''] - test_profile: ['test'] + test_profile: ['test', 'test_localize'] steps: - uses: actions/checkout@v2 - name: Determine tower usage diff --git a/conf/test_localize.config b/conf/test_localize.config index d846bb6..d9cdf1c 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -18,20 +18,8 @@ params { max_time = 1.h // Input data - spectra = [ - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F2.mzML', - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F1.mzML', - 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F2.mzML' - ] - database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' - expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' - posterior_probabilities = "fit_distributions" + sdrf = '' + database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true - search_engines = "msgf" - protein_level_fdr_cutoff = 1.0 - decoy_affix = "rev" - enable_qc = true + search_engines = "comet,msgf" } \ No newline at end of file diff --git a/main.nf b/main.nf index 3962305..80155b1 100644 --- a/main.nf +++ b/main.nf @@ -680,7 +680,7 @@ process percolator { tuple mzml_id, file(id_file) from id_files_idx_feat output: - tuple mzml_id, file("${id_file.baseName}_perc.idXML"), val("MS:1001491"), val("pep") into id_files_perc, id_files_perc_consID + tuple mzml_id, file("${id_file.baseName}_perc.idXML"), val("MS:1001491") into id_files_perc, id_files_perc_consID file "*.log" when: @@ -758,7 +758,7 @@ process idpep { tuple mzml_id, file(id_file) from id_files_idx_ForIDPEP_FDR.mix(id_files_idx_ForIDPEP_noFDR) output: - tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), val("q-value_score"), val("Posterior Error Probability") into id_files_idpep, id_files_idpep_consID + tuple mzml_id, file("${id_file.baseName}_idpep.idXML"), val("q-value_score") into id_files_idpep, id_files_idpep_consID file "*.log" when: @@ -786,10 +786,10 @@ process idscoreswitcher_to_qval { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), val(qval_score), val(pep_score) from id_files_idpep.mix(id_files_perc) + tuple mzml_id, file(id_file), val(qval_score) from id_files_idpep.mix(id_files_perc) output: - tuple mzml_id, file("${id_file.baseName}_switched.idXML"), val(pep_score) into id_files_noConsID_qval + tuple mzml_id, file("${id_file.baseName}_switched.idXML") into id_files_noConsID_qval file "*.log" when: @@ -818,18 +818,16 @@ process consensusid { // we can drop qval_score in this branch since we have to recalculate FDR anyway input: - tuple mzml_id, file(id_files_from_ses), val(qval_score), val(pep_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) + tuple mzml_id, file(id_files_from_ses), val(qval_score) from id_files_idpep_consID.mix(id_files_perc_consID).groupTuple(size: params.search_engines.split(",").size()) output: - tuple mzml_id, file("${mzml_id}_consensus.idXML"), val(pep_score_first) into consensusids + tuple mzml_id, file("${mzml_id}_consensus.idXML") into consensusids file "*.log" when: params.search_engines.split(",").size() > 1 script: - // pep scores have to be the same. Otherwise the tool fails anyway. - pep_score_first = pep_score[0] """ ConsensusID -in ${id_files_from_ses} \\ -out ${mzml_id}_consensus.idXML \\ @@ -852,10 +850,10 @@ process fdr_consensusid { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_file), val(pep_score) from consensusids + tuple mzml_id, file(id_file) from consensusids output: - tuple mzml_id, file("${id_file.baseName}_fdr.idXML"), val(pep_score) into consensusids_fdr + tuple mzml_id, file("${id_file.baseName}_fdr.idXML") into consensusids_fdr file "*.log" when: @@ -883,10 +881,10 @@ process idfilter { publishDir "${params.outdir}/ids", mode: 'copy', pattern: '*.idXML' input: - tuple mzml_id, file(id_file), val(pep_score) from id_files_noConsID_qval.mix(consensusids_fdr) + tuple mzml_id, file(id_file) from id_files_noConsID_qval.mix(consensusids_fdr) output: - tuple mzml_id, file("${id_file.baseName}_filter.idXML"), val(pep_score) into id_filtered, id_filtered_luciphor + tuple mzml_id, file("${id_file.baseName}_filter.idXML") into id_filtered, id_filtered_luciphor file "*.log" script: @@ -912,7 +910,7 @@ process idscoreswitcher_for_luciphor { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' input: - tuple mzml_id, file(id_file), val(pep_score) from id_filtered_luciphor + tuple mzml_id, file(id_file) from id_filtered_luciphor output: tuple mzml_id, file("${id_file.baseName}_pep.idXML") into id_filtered_luciphor_pep @@ -927,7 +925,7 @@ process idscoreswitcher_for_luciphor { -out ${id_file.baseName}_pep.idXML \\ -threads ${task.cpus} \\ -old_score "q-value" \\ - -new_score "${pep_score}_score" \\ + -new_score "Posterior Error Probability_score" \\ -new_score_type "Posterior Error Probability" \\ -new_score_orientation lower_better \\ > ${id_file.baseName}_switch_pep_for_luciphor.log @@ -968,7 +966,7 @@ process luciphor { ${dec_losses} \\ -max_charge_state ${params.max_precursor_charge} \\ -max_peptide_length ${params.max_peptide_length} \\ - -debug ${params.localization_debug} \\ + -debug ${params.luciphor_debug} \\ > ${id_file.baseName}_luciphor.log """ // -fragment_mass_tolerance ${} \\ diff --git a/nextflow.config b/nextflow.config index 849b0fc..d400706 100644 --- a/nextflow.config +++ b/nextflow.config @@ -5,9 +5,6 @@ * Default config options for all environments. */ -// TODO remove debug -process.cache = 'lenient' - // Global default params, used in configs params { @@ -85,6 +82,7 @@ params { luciphor_neutral_losses = '' luciphor_decoy_mass = '' luciphor_decoy_neutral_losses = '' + luciphor_debug = 0 // ProteomicsLFQ flags inf_quant_debug = 0 From b0f7a6b68d1863b60beff1cbf1eea38b86a039ba Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 13:19:44 +0200 Subject: [PATCH 279/374] minor fixes --- conf/test_localize.config | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/conf/test_localize.config b/conf/test_localize.config index d9cdf1c..7adb624 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -5,12 +5,12 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test_phospho, + * nextflow run nf-core/proteomicslfq -profile test_localize, */ params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' + config_profile_name = 'Test phospho-localization profile' + config_profile_description = 'Minimal test dataset to check pipeline function for phospho-localization, SDRF parsing and ConsensusID.' // Limit resources so that this can run on Travis max_cpus = 2 @@ -18,7 +18,7 @@ params { max_time = 1.h // Input data - sdrf = '' + sdrf = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/test_phospho.sdrf' database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true search_engines = "comet,msgf" From 9774dfbb9c8f42055767e8441f7250f33bcce08b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 16:20:23 +0200 Subject: [PATCH 280/374] try to gather logs of failed steps --- .github/workflows/ci.yml | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 9f385dd..57f2567 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -60,9 +60,21 @@ jobs: run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - name: Run pipeline with test data + - name: Run pipeline with test data run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker + - name: Gather failed logs + if: failed() || cancelled() + run: | + mkdir failed_logs + failed=$(grep "FAILED" results/pipeline_info/execution_trace.txt | cut -f 2) + while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ ; done <<< "$failed" + - uses: actions/upload-artifact@v1 + if: failed() || cancelled() + name: Upload failed logs + with: + name: failed_logs + path: failed_logs - uses: actions/upload-artifact@v1 if: always() name: Upload results @@ -76,6 +88,7 @@ jobs: name: nextflow.log path: .nextflow.log + push_dockerhub: name: Push new Docker image to Docker Hub runs-on: ubuntu-latest From f8da9fae9259e5fc5ef982769d13112c2749f788 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 16:21:16 +0200 Subject: [PATCH 281/374] raise luciphor debug level --- conf/test_localize.config | 1 + 1 file changed, 1 insertion(+) diff --git a/conf/test_localize.config b/conf/test_localize.config index 7adb624..c5985cf 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -22,4 +22,5 @@ params { database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true search_engines = "comet,msgf" + luciphor_debug = 42 } \ No newline at end of file From 686454753f29f1f677995a8df7e6850aebde150b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 16:22:41 +0200 Subject: [PATCH 282/374] failsafe for no logs present --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 57f2567..e2fc2ee 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -68,7 +68,7 @@ jobs: run: | mkdir failed_logs failed=$(grep "FAILED" results/pipeline_info/execution_trace.txt | cut -f 2) - while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ ; done <<< "$failed" + while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ | true ; done <<< "$failed" - uses: actions/upload-artifact@v1 if: failed() || cancelled() name: Upload failed logs From 952a99a66fbba09e0d870215492f64bedf2c5a88 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 16:41:25 +0200 Subject: [PATCH 283/374] ... --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e2fc2ee..6919cb0 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -64,13 +64,13 @@ jobs: run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker - name: Gather failed logs - if: failed() || cancelled() + if: ${{ failed() }} || $ {{ cancelled() }} run: | mkdir failed_logs failed=$(grep "FAILED" results/pipeline_info/execution_trace.txt | cut -f 2) while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ | true ; done <<< "$failed" - uses: actions/upload-artifact@v1 - if: failed() || cancelled() + if: ${{ failed() }} || $ {{ cancelled() }} name: Upload failed logs with: name: failed_logs From fbb98d1247f8c81be1ea362aa1d134a846b5aeef Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 16:43:48 +0200 Subject: [PATCH 284/374] ... --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6919cb0..51043e2 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -64,13 +64,13 @@ jobs: run: | nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker - name: Gather failed logs - if: ${{ failed() }} || $ {{ cancelled() }} + if: failure() || cancelled() run: | mkdir failed_logs failed=$(grep "FAILED" results/pipeline_info/execution_trace.txt | cut -f 2) while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ | true ; done <<< "$failed" - uses: actions/upload-artifact@v1 - if: ${{ failed() }} || $ {{ cancelled() }} + if: failure() || cancelled() name: Upload failed logs with: name: failed_logs From f677b3d0228b8445f76c9a3cd96117c0dfdee803 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 17:45:58 +0200 Subject: [PATCH 285/374] also localize for pyrophospho --- conf/test_localize.config | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/conf/test_localize.config b/conf/test_localize.config index c5985cf..e838bb8 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -21,6 +21,7 @@ params { sdrf = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/test_phospho.sdrf' database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true - search_engines = "comet,msgf" + mod_localization = 'pyrophospho (S),Phospho (S),Phospho (T),Phospho (Y)' + search_engines = 'comet,msgf' luciphor_debug = 42 } \ No newline at end of file From 5bd2266749ef06d5483f7f66243b0e5e82f11107 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 18:50:23 +0200 Subject: [PATCH 286/374] remove pyrophospho for localization --- conf/test_localize.config | 1 - 1 file changed, 1 deletion(-) diff --git a/conf/test_localize.config b/conf/test_localize.config index e838bb8..c38f315 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -21,7 +21,6 @@ params { sdrf = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/test_phospho.sdrf' database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true - mod_localization = 'pyrophospho (S),Phospho (S),Phospho (T),Phospho (Y)' search_engines = 'comet,msgf' luciphor_debug = 42 } \ No newline at end of file From f8b1488f3cb8451d1c110ebcf48ff61630d5606d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 17 Jul 2020 22:04:41 +0200 Subject: [PATCH 287/374] For consensusID use the loosest cutting rules always --- main.nf | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/main.nf b/main.nf index 80155b1..8eb1b7a 100644 --- a/main.nf +++ b/main.nf @@ -579,6 +579,19 @@ process search_engine_comet { inst = params.instrument } } + + // for consensusID the cutting rules need to be the same. So we adapt to the loosest rules from MSGF + // TODO find another solution. In ProteomicsLFQ we re-run PeptideIndexer (remove??) and if we + // e.g. add XTandem, after running ConsensusID it will lose the auto-detection ability for the + // XTandem specific rules. + if (params.search_engines.contains("msgf")) + { + if (enzyme == 'Trypsin') enzyme = 'Trypsin/P' + else if (enzyme == 'Arg-C') enzyme = 'Arg-C/P' + else if (enzyme == 'Asp-N') enzyme = 'Asp-N/B' + else if (enzyme == 'Chymotrypsin') enzyme = 'Chymotrypsin/P' + else if (enzyme == 'Lys-C') enzyme = 'Lys-C/P' + } """ CometAdapter -in ${mzml_file} \\ -out ${mzml_file.baseName}_comet.idXML \\ From 137358b503db8d2e3dd6bc293d358d0f588d663e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 18 Jul 2020 00:24:00 +0200 Subject: [PATCH 288/374] Apply the same for peptide indexer --- main.nf | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/main.nf b/main.nf index 8eb1b7a..dd87f2a 100644 --- a/main.nf +++ b/main.nf @@ -632,6 +632,15 @@ process index_peptides { script: def il = params.IL_equivalent ? '-IL_equivalent' : '' def allow_um = params.allow_unmatched ? '-allow_unmatched' : '' + // see comment in CometAdapter. Alternative here in PeptideIndexer is to let it auto-detect the enzyme by not specifying. + if (params.search_engines.contains("msgf")) + { + if (enzyme == 'Trypsin') enzyme = 'Trypsin/P' + else if (enzyme == 'Arg-C') enzyme = 'Arg-C/P' + else if (enzyme == 'Asp-N') enzyme = 'Asp-N/B' + else if (enzyme == 'Chymotrypsin') enzyme = 'Chymotrypsin/P' + else if (enzyme == 'Lys-C') enzyme = 'Lys-C/P' + } """ PeptideIndexer -in ${id_file} \\ -out ${id_file.baseName}_idx.idXML \\ From 8cb766dd90b2c9164542bcfb24a618766f2c3176 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 27 Jul 2020 15:35:09 +0200 Subject: [PATCH 289/374] Force rebuild of container --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index c629fbc..195a21c 100644 --- a/environment.yml +++ b/environment.yml @@ -7,7 +7,7 @@ channels: - conda-forge - bioconda dependencies: - # TODO fix versions for release + # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - bgruening::openms - bioconda::thermorawfileparser - bioconda::msgf_plus From 8c9c5a5c50e42ad6383ae7a6cb632cdc292bdbc2 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 27 Jul 2020 21:26:00 +0200 Subject: [PATCH 290/374] more output --- main.nf | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index dd87f2a..4ed2a42 100644 --- a/main.nf +++ b/main.nf @@ -697,6 +697,7 @@ process percolator { cpus { check_max( 3, 'cpus' ) } publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/raw_ids", mode: 'copy', pattern: '*.idXML' input: tuple mzml_id, file(id_file) from id_files_idx_feat @@ -775,6 +776,7 @@ process idpep { // I think Eigen optimization is multi-threaded, so leave threads open publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/raw_ids", mode: 'copy', pattern: '*.idXML' input: tuple mzml_id, file(id_file) from id_files_idx_ForIDPEP_FDR.mix(id_files_idx_ForIDPEP_noFDR) @@ -791,7 +793,7 @@ process idpep { IDPosteriorErrorProbability -in ${id_file} \\ -out ${id_file.baseName}_idpep.idXML \\ -fit_algorithm:outlier_handling ${params.outlier_handling} \\ - -threads ${task.cpus} \\ + -threads ${task.cpus} \\ > ${id_file.baseName}_idpep.log """ } @@ -837,6 +839,7 @@ process consensusid { label 'process_single_thread' publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' + publishDir "${params.outdir}/consensus_ids", mode: 'copy', pattern: '*.idXML' // we can drop qval_score in this branch since we have to recalculate FDR anyway input: From fbe4f740a2008ecefa3346f590b024170d0bd909 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 29 Jul 2020 19:04:21 +0200 Subject: [PATCH 291/374] adapted to new schema --- nextflow_schema.json | 59 +++++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 20 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 74ed6be..04aa85e 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -4,8 +4,9 @@ "title": "nf-core/proteomicslfq pipeline parameters", "description": "Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.", "type": "object", - "properties": { - "Input/output options": { + "definitions": { + "input_output_options": { + "title": "Input/output options", "type": "object", "properties": { "outdir": { @@ -26,7 +27,8 @@ "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data." }, - "Generic options": { + "generic_options": { + "title": "Generic options", "type": "object", "properties": { "help": { @@ -80,7 +82,8 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`." }, - "Max job request options": { + "max_job_request_options": { + "title": "Max job request options", "type": "object", "properties": { "max_cpus": { @@ -112,7 +115,8 @@ "description": "Set the top limit for requested resources for any single job.", "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details." }, - "Institutional config options": { + "institutional_config_options": { + "title": "Institutional config options", "type": "object", "properties": { "custom_config_version": { @@ -164,7 +168,8 @@ "description": "Parameters used to describe centralised config profiles. These should not be edited.", "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline." }, - "Main parameters (SDRF)": { + "main_parameters__sdrf_": { + "title": "Main parameters (SDRF)", "type": "object", "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by using a path or URI to a PRIDE Sample to Data Relation Format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see [here](https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects) for examples). Alternatively, you can use the [TSV options](#main_parameters__tsv_)", "default": "", @@ -190,7 +195,8 @@ }, "fa_icon": "far fa-chart-bar" }, - "Main parameters (TSV)": { + "main_parameters__tsv_": { + "title": "Main parameters (TSV)", "type": "object", "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style experimental design file. Alternatively, you can use the [SDRF options](#main_parameters__sdrf_)", "default": "", @@ -208,7 +214,8 @@ }, "fa_icon": "far fa-chart-bar" }, - "Protein database": { + "protein_database": { + "title": "Protein database", "type": "object", "description": "Settings that relate to the mandatory protein database and the optional generation of decoy entries.", "default": "", @@ -245,7 +252,8 @@ "database" ] }, - "Spectrum preprocessing": { + "spectrum_preprocessing": { + "title": "Spectrum preprocessing", "type": "object", "description": "In case you start from profile mode mzMLs or the internal preprocessing during conversion with the ThermoRawFileParser fails (e.g. due to new instrument types), preprocessing has to be performed with OpenMS. Use this section to configure.", "default": "", @@ -271,7 +279,8 @@ }, "fa_icon": "far fa-chart-bar" }, - "Database search": { + "database_search": { + "title": "Database search", "type": "object", "description": "", "default": "", @@ -391,7 +400,8 @@ }, "fa_icon": "fas fa-search" }, - "Modification localization": { + "modification_localization": { + "title": "Modification localization", "type": "object", "description": "", "default": "", @@ -423,7 +433,8 @@ }, "fa_icon": "fas fa-search-location" }, - "Peptide re-indexing": { + "peptide_re_indexing": { + "title": "Peptide re-indexing", "type": "object", "description": "", "default": "", @@ -440,7 +451,8 @@ }, "fa_icon": "fas fa-project-diagram" }, - "PSM re-scoring (general)": { + "psm_re_scoring__general_": { + "title": "PSM re-scoring (general)", "type": "object", "description": "", "default": "", @@ -462,7 +474,8 @@ }, "fa_icon": "fas fa-star-half-alt" }, - "PSM re-scoring (Percolator)": { + "psm_re_scoring__percolator_": { + "title": "PSM re-scoring (Percolator)", "type": "object", "description": "", "default": "", @@ -499,7 +512,8 @@ }, "fa_icon": "fas fa-star-half" }, - "PSM re-scoring (distribution fitting)": { + "psm_re_scoring__distribution_fitting_": { + "title": "PSM re-scoring (distribution fitting)", "type": "object", "description": "", "default": "", @@ -512,7 +526,8 @@ }, "fa_icon": "far fa-star-half" }, - "Consensus ID": { + "consensus_id": { + "title": "Consensus ID", "type": "object", "description": "", "default": "", @@ -535,7 +550,8 @@ }, "fa_icon": "fas fa-code-branch" }, - "Protein inference": { + "protein_inference": { + "title": "Protein inference", "type": "object", "description": "", "default": "", @@ -553,7 +569,8 @@ }, "fa_icon": "fab fa-hubspot" }, - "Protein quantification": { + "protein_quantification": { + "title": "Protein Quantification", "type": "object", "description": "", "default": "", @@ -579,7 +596,8 @@ }, "fa_icon": "fas fa-braille" }, - "Statistical post-processing": { + "statistical_post_processing": { + "title": "Statistical post-processing", "type": "object", "description": "", "default": "", @@ -599,7 +617,8 @@ }, "fa_icon": "fab fa-r-project" }, - "Quality control": { + "quality_control": { + "title": "Quality control", "type": "object", "description": "", "default": "", From 31b2943db3461fc9af296e9843e664ee72e2326e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 29 Jul 2020 19:20:34 +0200 Subject: [PATCH 292/374] added allOf section --- nextflow_schema.json | 63 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 61 insertions(+), 2 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 04aa85e..90b8c94 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -268,7 +268,7 @@ "type": "boolean", "description": "Perform peakpicking in memory", "fa_icon": "far fa-check-square", - "help_text": "Perform peakpicking in memory. Use only if problems occur.", + "help_text": "Perform peakpicking in memory. Use only if problems occur." }, "peakpicking_ms_levels": { "type": "string", @@ -634,5 +634,64 @@ }, "fa_icon": "fas fa-file-medical-alt" } - } + }, + "allOf": [ + { + "$ref": "#/definitions/input_output_options" + }, + { + "$ref": "#/definitions/generic_options" + }, + { + "$ref": "#/definitions/max_job_request_options" + }, + { + "$ref": "#/definitions/institutional_config_options" + }, + { + "$ref": "#/definitions/main_parameters__sdrf_" + }, + { + "$ref": "#/definitions/main_parameters__tsv_" + }, + { + "$ref": "#/definitions/protein_database" + }, + { + "$ref": "#/definitions/spectrum_preprocessing" + }, + { + "$ref": "#/definitions/database_search" + }, + { + "$ref": "#/definitions/modification_localization" + }, + { + "$ref": "#/definitions/peptide_re_indexing" + }, + { + "$ref": "#/definitions/psm_re_scoring__general_" + }, + { + "$ref": "#/definitions/psm_re_scoring__percolator_" + }, + { + "$ref": "#/definitions/psm_re_scoring__distribution_fitting_" + }, + { + "$ref": "#/definitions/consensus_id" + }, + { + "$ref": "#/definitions/protein_inference" + }, + { + "$ref": "#/definitions/protein_quantification" + }, + { + "$ref": "#/definitions/statistical_post_processing" + }, + { + "$ref": "#/definitions/quality_control" + } + ] } \ No newline at end of file From cda327925f4001963e2957166a67aa0854429d1e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 30 Jul 2020 18:39:05 +0200 Subject: [PATCH 293/374] Finished json schema. --- docs/usage.md | 2 +- main.nf | 1 + nextflow_schema.json | 146 ++++++++++++++++++++++++++++++++++--------- 3 files changed, 119 insertions(+), 30 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index d7da2f4..5e42974 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -510,7 +510,7 @@ Debug level during inference and quantification. (WARNING: Higher than 666 may p Infer proteins through: * "aggregation" = aggregates all peptide scores across a protein (by calculating the maximum) (default) -* "bayesian" = computes a posterior probability for every protein based on a Bayesian network +* "bayesian" = computes a posterior probability for every protein based on a Bayesian network (i.e. using Epifany) * ("percolator" not yet supported) ### `--protein_level_fdr_cutoff` diff --git a/main.nf b/main.nf index 052a94a..068b9e3 100644 --- a/main.nf +++ b/main.nf @@ -984,6 +984,7 @@ process proteomicslfq { -targeted_only ${params.targeted_only} \\ -mass_recalibration ${params.mass_recalibration} \\ -transfer_ids ${params.transfer_ids} \\ + -protein_quantification ${params.protein_quant} \\ -out out.mzTab \\ -threads ${task.cpus} \\ -out_msstats out.csv \\ diff --git a/nextflow_schema.json b/nextflow_schema.json index 90b8c94..7378f2c 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -294,10 +294,10 @@ }, "enzyme": { "type": "string", - "description": "The enzyme to be used for in-silico digestion", + "description": "The enzyme to be used for in-silico digestion, in 'OpenMS format'", "default": "Trypsin", "fa_icon": "fas fa-list-ol", - "help_text": "Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS\n[enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended\ncutting rules, as used by default with `Trypsin`. I.e. if you specify `Trypsin with MSGF, it will be automatically converted to\n`Trypsin/P`= 'Trypsin without proline rule'." + "help_text": "Specify which enzymatic restriction should be applied, e.g. 'unspecific cleavage', 'Trypsin' (default), see OpenMS\n[enzymes](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/CHEMISTRY/Enzymes.xml). Note: MSGF does not support extended\ncutting rules, as used by default with `Trypsin`. I.e. if you specify `Trypsin` with MSGF, it will be automatically converted to\n`Trypsin/P`= 'Trypsin without proline rule'." }, "num_enzyme_termini": { "type": "string", @@ -315,85 +315,112 @@ }, "precursor_mass_tolerance": { "type": "integer", + "description": "Precursor mass tolerance used for database search. For High-Resolution instruments a precursor mass tolerance value of 5 ppm is recommended (i.e. 5). See also [`--precursor_mass_tolerance_unit`](#--precursor_mass_tolerance_unit).", "default": 5, "fa_icon": "fas fa-sliders-h" }, "precursor_mass_tolerance_unit": { "type": "string", + "description": "Precursor mass tolerance unit used for database search. Possible values are 'ppm' (default) and 'Da'.", "default": "ppm", "fa_icon": "fas fa-sliders-h" }, "fragment_mass_tolerance": { "type": "number", + "description": "Fragment mass tolerance used for database search. The default of 0.03 Da is for high-resolution instruments.", "default": 0.03, - "fa_icon": "fas fa-sliders-h" + "fa_icon": "fas fa-sliders-h", + "help_text": "Caution: for Comet we are estimating the `fragment_bin_tolerance` parameter based on this automatically." }, "fragment_mass_tolerance_unit": { "type": "string", + "description": "Fragment mass tolerance unit used for database search. Possible values are 'ppm' (default) and 'Da'.", "default": "Da", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Caution: for Comet we are estimating the `fragment_bin_tolerance` parameter based on this automatically." }, "fixed_mods": { "type": "string", + "description": "A comma-separated list of fixed modifications with their Unimod name to be searched during database search", "default": "Carbamidomethyl (C)", - "fa_icon": "fas fa-tasks" + "fa_icon": "fas fa-tasks", + "help_text": "Specify which fixed modifications should be applied to the database search (eg. '' or 'Carbamidomethyl (C)', see Unimod modifications\nin the style '({unimod name} ({optional term specificity} {optional origin})').\nAll possible modifications can be found in the restrictions mentioned in the command line documentation of e.g. [CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html) (scroll down a bit for the complete set).\nMultiple fixed modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)').\nFixed modifications need to be found at every matching amino acid for a peptide to be reported." }, "variable_mods": { "type": "string", + "description": "A comma-separated list of variable modifications with their Unimod name to be searched during database search", "default": "Oxidation (M)", - "fa_icon": "fas fa-tasks" + "fa_icon": "fas fa-tasks", + "help_text": "Specify which variable modifications should be applied to the database search (eg. '' or 'Oxidation (M)', see Unimod modifications\nin the style '({unimod name} ({optional term specificity} {optional origin})').\nAll possible modifications can be found in the restrictions mentioned in the command line documentation of e.g. [CometAdapter](https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/release/latest/html/TOPP_CometAdapter.html) (scroll down a bit for the complete set).\nMultiple variable modifications can be specified comma separated (e.g. 'Carbamidomethyl (C),Oxidation (M)').\nVariable modifications may or may not be found at matching amino acids for a peptide to be reported." }, "fragment_method": { "type": "string", + "description": "The fragmentation method used during tandem MS. (MS/MS or MS2).", "default": "HCD", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Currently unsupported. Defaults to `ALL` for Comet and `from_spectrum`, for MSGF. Should be a sensible default for 99% of the cases.", + "hidden": true }, "isotope_error_range": { "type": "string", + "description": "Comma-separated range of integers with allowed isotope peak errors for precursor tolerance (MS-GF+ parameter '-ti'). E.g. -1,2", "default": "0,1", - "fa_icon": "fas fa-tasks" + "fa_icon": "fas fa-tasks", + "help_text": "Range of integers with allowed isotope peak errors (MS-GF+ parameter '-ti'). Takes into account the error introduced by choosing a non-monoisotopic peak for fragmentation. Combined with 'precursor_mass_tolerance'/'precursor_error_units', this determines the actual precursor mass tolerance. E.g. for experimental mass 'exp' and calculated mass 'calc', '-precursor_mass_tolerance 20 -precursor_error_units ppm -isotope_error_range -1,2' tests '|exp - calc - n * 1.00335 Da| < 20 ppm' for n = -1, 0, 1, 2." }, "instrument": { "type": "string", - "fa_icon": "fas fa-list-ol" + "description": "Type of instrument that generated the data. 'low_res' or 'high_res' (default; refers to LCQ and LTQ instruments)", + "default": "high_res", + "fa_icon": "fas fa-list-ol", + "help_text": "" }, "protocol": { "type": "string", + "description": "MSGF only: Labeling or enrichment protocol used, if any. Default: automatic", "default": "automatic", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "" }, "min_precursor_charge": { "type": "integer", + "description": "Minimum precursor ion charge. Omit the '+'", "default": 2, "fa_icon": "fas fa-sliders-h" }, "max_precursor_charge": { "type": "integer", + "description": "Maximum precursor ion charge. Omit the '+'", "default": 4, "fa_icon": "fas fa-sliders-h" }, "min_peptide_length": { "type": "integer", + "description": "Minimum peptide length to consider (works with MSGF and in newer Comet versions)", "default": 6, "fa_icon": "fas fa-sliders-h" }, "max_peptide_length": { "type": "integer", + "description": "Maximum peptide length to consider (works with MSGF and in newer Comet versions)", "default": 40, "fa_icon": "fas fa-sliders-h" }, "num_hits": { "type": "integer", + "description": "Specify the maximum number of top peptide candidates per spectrum to be reported by the search engine. Default: 1", "default": 1, "fa_icon": "fas fa-sliders-h" }, "max_mods": { "type": "integer", + "description": "Maximum number of modifications per peptide. If this value is large, the search may take very long.", "default": 3, "fa_icon": "fas fa-sliders-h" }, "db_debug": { "type": "integer", + "description": "Debug level when running the database search. Logs become more verbose and at '>5' temporary files are kept.", "default": 0, "fa_icon": "fas fa-bug" } @@ -403,31 +430,38 @@ "modification_localization": { "title": "Modification localization", "type": "object", - "description": "", + "description": "Settings for calculating a localization probability with LucXor for modifications with multiple candidate amino acids in a peptide.", "default": "", "properties": { "enable_mod_localization": { "type": "string", + "description": "Turn the mechanism on.", "fa_icon": "fas fa-toggle-on" }, "mod_localization": { "type": "string", + "description": "Which variable modifications to use for scoring their localization.", "default": "Phospho (S),Phospho (T),Phospho (Y)", "fa_icon": "fas fa-tasks" }, "luciphor_neutral_losses": { "type": "string", + "description": "List of neutral losses to consider for mod. localization.", "fa_icon": "fas fa-font", + "help_text": "List the types of neutral losses that you want to consider. The residue field is case sensitive. For example: lower case 'sty' implies that the neutral loss can only occur if the specified modification is present.\nSyntax: 'NL = - '\n(default: '[sty -H3PO4 -97.97690]')", "hidden": true }, "luciphor_decoy_mass": { "type": "number", + "description": "How much to add to an amino acid to make it a decoy for mod. localization.", "fa_icon": "fas fa-font", "hidden": true }, "luciphor_decoy_neutral_losses": { "type": "string", + "description": "List of neutral losses to consider for mod. localization from an internally generated decoy sequence.", "fa_icon": "fas fa-font", + "help_text": "For handling the neutral loss from a decoy sequence. The syntax for this is identical to that of the normal neutral losses given above except that the residue is always 'X'. Syntax: DECOY_NL = X - (default: '[X -H3PO4 -97.97690]')", "hidden": true } }, @@ -441,10 +475,13 @@ "properties": { "allow_unmatched": { "type": "string", + "description": "Do not fail if there are some unmatched peptides. Only activate as last resort, if you know that the rest of your settings are fine!", + "default": "false", "fa_icon": "far fa-check-square" }, "IL_equivalent": { "type": "string", + "description": "Should isoleucine and leucine be treated interchangeably when mapping search engine hits to the database? Default: true", "default": "true", "fa_icon": "far fa-check-square" } @@ -454,20 +491,23 @@ "psm_re_scoring__general_": { "title": "PSM re-scoring (general)", "type": "object", - "description": "", + "description": "Choose between different rescoring/posterior probability calculation methods and set them up.", "default": "", "properties": { "posterior_probabilities": { "type": "string", + "description": "How to calculate posterior probabilities for PSMs:\n\n* 'percolator' = Re-score based on PSM-feature-based SVM and transform distance\n to hyperplane for posteriors\n* 'fit_distributions' = Fit positive and negative distributions to scores\n (similar to PeptideProphet)", "fa_icon": "fas fa-list-ol" }, "psm_pep_fdr_cutoff": { "type": "number", + "description": "FDR cutoff on PSM level (or potential peptide level; see Percolator options) before going into feature finding, map alignment and inference.", "default": 0.1, "fa_icon": "fas fa-filter" }, "pp_debug": { "type": "integer", + "description": "Debug level when running the re-scoring. Logs become more verbose and at '>5' temporary files are kept.", "default": 0, "fa_icon": "fas fa-bug" } @@ -477,37 +517,45 @@ "psm_re_scoring__percolator_": { "title": "PSM re-scoring (Percolator)", "type": "object", - "description": "", + "description": "In the following you can find help for the Percolator specific options that are only used if [`--posterior_probabilities`](#--posterior_probabilities) was set to 'percolator'.\nNote that there are currently some restrictions to the original options of Percolator:\n\n* no Percolator protein FDR possible (currently OpenMS' FDR is used on protein level)\n* no support for separate target and decoy databases (i.e. no min-max q-value calculation or target-decoy competition strategy)\n* no support for combined or experiment-wide peptide re-scoring. Currently search results per input file are submitted to Percolator independently.", "default": "", "properties": { "FDR_level": { "type": "string", + "description": "Calculate FDR on PSM ('psm-level-fdrs') or peptide level ('peptide-level-fdrs')?", "default": "peptide-level-fdrs", "fa_icon": "fas fa-list-ol" }, "train_FDR": { "type": "number", + "description": "The FDR cutoff to be used during training of the SVM.", "default": 0.05, "fa_icon": "fas fa-sliders-h" }, "test_FDR": { "type": "number", + "description": "The FDR cutoff to be used during testing of the SVM.", "default": 0.05, "fa_icon": "fas fa-sliders-h" }, "subset_max_train": { "type": "integer", + "description": "Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. This is a runtime vs. discriminability tradeoff. Default: 300,000", "default": 300000, "fa_icon": "fas fa-sliders-h" }, "klammer": { "type": "string", - "fa_icon": "far fa-check-square" + "description": "Retention time features are calculated as in Klammer et al. instead of with Elude. Default: false", + "fa_icon": "far fa-check-square", + "hidden": true }, "description_correct_features": { "type": "integer", + "description": "Use additional features whose values are learnt by correct entries. See help text. Default: 0 = none", "default": 0, - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Percolator provides the possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is then used as predictive features.\n\n1 -> iso-electric point\n\n2 -> mass calibration\n\n4 -> retention time\n\n8 -> `delta_retention_time * delta_mass_calibration`" } }, "fa_icon": "fas fa-star-half" @@ -515,11 +563,12 @@ "psm_re_scoring__distribution_fitting_": { "title": "PSM re-scoring (distribution fitting)", "type": "object", - "description": "", + "description": "Use this instead of Percolator if there are problems with Percolator (e.g. due to bad separation) or for performance", "default": "", "properties": { "outlier_handling": { "type": "string", + "description": "How to handle outliers during fitting:\n\n* ignore_iqr_outliers (default): ignore outliers outside of `3*IQR` from Q1/Q3 for fitting\n* set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting\n* ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem)\n* none: do nothing", "default": "none", "fa_icon": "fas fa-list-ol" } @@ -534,18 +583,24 @@ "properties": { "consensusid_algorithm": { "type": "string", + "description": "How to combine the probabilities from the single search engines: best, combine using a sequence similarity-matrix (PEPMatrix), combine using shared ion count of peptides (PEPIons). See help for further info.", "default": "best", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Specifies how search engine results are combined: ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ('search engines') into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.\n\nThe available algorithms are:\n\n* PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits.\n* PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ('shared peak count'). This algorithm, too, requires PEPs as scores.\n* best: For each peptide ID, this uses the best score of any search engine as the consensus score.\n* worst: For each peptide ID, this uses the worst score of any search engine as the consensus score.\n* average: For each peptide ID, this uses the average score of all search engines as the consensus score.\n* ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score.\n\nTo make scores comparable, for best, worst and average, PEPs are used as well. Peptide IDs are only considered the same if they map to exactly the same sequence (including modifications and their localization). Also isobaric aminoacids are (for now) only considered equal with the PEPMatrix/PEPIons algorithms." }, "consensusid_considered_top_hits": { "type": "integer", + "description": "Only use the top N hits per search engine and spectrum for combination. Default: 0 = all", "default": 0, - "fa_icon": "fas fa-sliders-h" + "fa_icon": "fas fa-sliders-h", + "help_text": "Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix and PEPIons algorithms, which involve costly 'all vs. all' comparisons of peptide hits per spectrum across engines." }, "min_consensus_support": { "type": "integer", + "description": "A threshold for the ratio of occurence/similarity scores of a peptide in other runs, to be reported. See help.", "default": 0, - "fa_icon": "fas fa-filter" + "fa_icon": "fas fa-filter", + "help_text": "This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must 'support' a peptide identification that should be kept. The meaning of 'support' differs slightly between algorithms: For best, worst, average and rank, each search run supports peptides that it has also identified among its top `consensusid_considered_top_hits` candidates. So `min_consensus_support` simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set `min_support` to 0.5.) For the similarity-based algorithms PEPMatrix and PEPIons, the 'support' for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.) Note: For most of the subsequent algorithms, only the best identification per spectrum is used." } }, "fa_icon": "fas fa-code-branch" @@ -553,18 +608,22 @@ "protein_inference": { "title": "Protein inference", "type": "object", - "description": "", + "description": "To group proteins, calculate scores on the protein (group) level and to potentially modify associations from peptides to proteins.", "default": "", "properties": { "protein_inference": { "type": "string", + "description": "The inference method to use. 'aggregation' (default) or 'bayesian'.", "default": "aggregation", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "help_text": "Infer proteins through:\n\n* 'aggregation' = aggregates all peptide scores across a protein (by calculating the maximum) (default)\n* 'bayesian' = computes a posterior probability for every protein based on a Bayesian network (i.e. using Epifany)\n* ('percolator' not yet supported)" }, "protein_level_fdr_cutoff": { "type": "number", + "description": "The experiment-wide protein (group)-level FDR cutoff. Default: 0.05", "default": 0.05, - "fa_icon": "fas fa-filter" + "fa_icon": "fas fa-filter", + "help_text": "This can be protein level if 'strictly_unique_peptides' are used for protein quantification. See [`--protein_quant`](#params_protein_quant)" } }, "fa_icon": "fab fa-hubspot" @@ -575,21 +634,40 @@ "description": "", "default": "", "properties": { - "mass_recalibration": { + "protein_quant": { "type": "string", + "description": "Quantify proteins based on:\n\n* 'unique_peptides' = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides)\n* 'strictly_unique_peptides' = use peptides mapping to a unique single protein only\n* 'shared_peptides' = use shared peptides, too, but only greedily for its best group (by inference score)", + "default": "unique_peptides", + "enum": ["unique_peptides","strictly_unique_peptides","shared_peptides"], "fa_icon": "far fa-check-square" }, - "transfer_ids": { + "quantification_method": { "type": "string", + "description": "Currently UNSUPPORTED in this workflow. Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs (spectral_counting).", + "default": "feature_intensity", + "enum": ["feature_intensity", "spectral_counting"], + "fa_icon": "far fa-check-square" + }, + "mass_recalibration": { + "type": "boolean", + "description": "Recalibrates masses based on precursor mass deviations to correct for instrument biases. (default: 'false')", + "fa_icon": "far fa-check-square" + }, + "transfer_ids": { + "type": "boolean", + "description": "Transfer IDs over aligned samples to increase the number of quantifiable features (WARNING: increased memory consumption). (default: 'false')", + "default": false, "fa_icon": "far fa-check-square" }, "targeted_only": { - "type": "string", - "default": "true", + "type": "boolean", + "description": "Only looks for quantifiable features at locations with an identified spectrum. Set to false to include unidentified features so they can be linked to identified ones (=match between runs)", + "default": true, "fa_icon": "far fa-check-square" }, "inf_quant_debug": { "type": "integer", + "description": "Debug level when running the re-scoring. Logs become more verbose and at '>666' potentially very large temporary files are kept.", "default": 0, "fa_icon": "fas fa-bug" } @@ -600,18 +678,24 @@ "title": "Statistical post-processing", "type": "object", "description": "", - "default": "", + "default": "Parameters for the R script using MSstats for statistical post processing and quantification visualization.", "properties": { "skip_post_msstats": { - "type": "string", + "type": "boolean", + "description": "Skip MSstats for statistical post-processing?", + "default": false, "fa_icon": "fas fa-forward" }, "ref_condition": { "type": "string", + "description": "Instead of all pairwise contrasts (default), uses the given condition name/number (corresponding to your experimental design) as a reference and creates pairwise contrasts against it. (TODO not yet fully implemented)", + "default": "", "fa_icon": "fas fa-font" }, "contrasts": { "type": "string", + "description": "Allows full control over contrasts by specifying a set of contrasts in a semicolon seperated list of R-compatible contrasts with the condition names/numbers as variables (e.g. `1-2;1-3;2-3`). Overwrites '--ref_condition' (TODO not yet fully implemented)", + "default": "", "fa_icon": "fas fa-font" } }, @@ -624,11 +708,15 @@ "default": "", "properties": { "enable_qc": { - "type": "string", + "type": "boolean", + "description": "Enable generation of quality control report by PTXQC? default: 'false' since it is still unstable", + "default": false, "fa_icon": "fas fa-toggle-on" }, "ptxqc_report_layout": { "type": "string", + "description": "Specify a yaml file for the report layout (see PTXQC documentation) (TODO not yet fully implemented)", + "default": "", "fa_icon": "far fa-file" } }, From 9e796fe9a44f989068dff44a7acf8441f21c96b6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 30 Jul 2020 18:47:16 +0200 Subject: [PATCH 294/374] Used schema builder to consolidate schema --- nextflow_schema.json | 83 +++++++++++++++++++------------------------- 1 file changed, 36 insertions(+), 47 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 7378f2c..c4b5cf1 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -13,8 +13,7 @@ "type": "string", "description": "The output directory where the results will be saved.", "default": "./results", - "fa_icon": "fas fa-folder-open", - "help_text": "" + "fa_icon": "fas fa-folder-open" }, "email": { "type": "string", @@ -35,8 +34,7 @@ "type": "boolean", "description": "Display help text.", "hidden": true, - "fa_icon": "fas fa-question-circle", - "default": false + "fa_icon": "fas fa-question-circle" }, "name": { "type": "string", @@ -57,25 +55,20 @@ "type": "boolean", "description": "Send plain-text email instead of HTML.", "fa_icon": "fas fa-remove-format", - "hidden": true, - "default": false, - "help_text": "" + "hidden": true }, "monochrome_logs": { "type": "boolean", "description": "Do not use coloured log outputs.", "fa_icon": "fas fa-palette", - "hidden": true, - "default": false, - "help_text": "" + "hidden": true }, "tracedir": { "type": "string", "description": "Directory to keep pipeline Nextflow logs and reports.", "default": "${params.outdir}/pipeline_info", "fa_icon": "fas fa-cogs", - "hidden": true, - "help_text": "" + "hidden": true } }, "fa_icon": "fas fa-file-import", @@ -124,8 +117,7 @@ "description": "Git commit id for Institutional configs.", "default": "master", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "" + "fa_icon": "fas fa-users-cog" }, "custom_config_base": { "type": "string", @@ -139,29 +131,25 @@ "type": "string", "description": "Institutional configs hostname.", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "" + "fa_icon": "fas fa-users-cog" }, "config_profile_description": { "type": "string", "description": "Institutional config description.", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "" + "fa_icon": "fas fa-users-cog" }, "config_profile_contact": { "type": "string", "description": "Institutional config contact information.", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "" + "fa_icon": "fas fa-users-cog" }, "config_profile_url": { "type": "string", "description": "Institutional config URL link.", "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "" + "fa_icon": "fas fa-users-cog" } }, "fa_icon": "fas fa-university", @@ -304,14 +292,17 @@ "description": "Specify the amount of termini matching the enzyme cutting rules for a peptide to be considered. Valid values are `fully` (default), `semi`, or `none`", "default": "fully", "fa_icon": "fas fa-list-ol", - "help_text": "" + "enum": [ + "fully", + "semi", + "none" + ] }, "allowed_missed_cleavages": { "type": "integer", "description": "Specify the maximum number of allowed missed enzyme cleavages in a peptide. The parameter is not applied if `unspecific cleavage` is specified as enzyme.", "default": 2, - "fa_icon": "fas fa-sliders-h", - "help_text": "" + "fa_icon": "fas fa-sliders-h" }, "precursor_mass_tolerance": { "type": "integer", @@ -372,15 +363,13 @@ "type": "string", "description": "Type of instrument that generated the data. 'low_res' or 'high_res' (default; refers to LCQ and LTQ instruments)", "default": "high_res", - "fa_icon": "fas fa-list-ol", - "help_text": "" + "fa_icon": "fas fa-list-ol" }, "protocol": { "type": "string", "description": "MSGF only: Labeling or enrichment protocol used, if any. Default: automatic", "default": "automatic", - "fa_icon": "fas fa-list-ol", - "help_text": "" + "fa_icon": "fas fa-list-ol" }, "min_precursor_charge": { "type": "integer", @@ -421,7 +410,6 @@ "db_debug": { "type": "integer", "description": "Debug level when running the database search. Logs become more verbose and at '>5' temporary files are kept.", - "default": 0, "fa_icon": "fas fa-bug" } }, @@ -508,7 +496,6 @@ "pp_debug": { "type": "integer", "description": "Debug level when running the re-scoring. Logs become more verbose and at '>5' temporary files are kept.", - "default": 0, "fa_icon": "fas fa-bug" } }, @@ -545,7 +532,7 @@ "fa_icon": "fas fa-sliders-h" }, "klammer": { - "type": "string", + "type": "boolean", "description": "Retention time features are calculated as in Klammer et al. instead of with Elude. Default: false", "fa_icon": "far fa-check-square", "hidden": true @@ -553,7 +540,6 @@ "description_correct_features": { "type": "integer", "description": "Use additional features whose values are learnt by correct entries. See help text. Default: 0 = none", - "default": 0, "fa_icon": "fas fa-list-ol", "help_text": "Percolator provides the possibility to use so called description of correct features, i.e. features for which desirable values are learnt from the previously identified target PSMs. The absolute value of the difference between desired value and observed value is then used as predictive features.\n\n1 -> iso-electric point\n\n2 -> mass calibration\n\n4 -> retention time\n\n8 -> `delta_retention_time * delta_mass_calibration`" } @@ -586,19 +572,22 @@ "description": "How to combine the probabilities from the single search engines: best, combine using a sequence similarity-matrix (PEPMatrix), combine using shared ion count of peptides (PEPIons). See help for further info.", "default": "best", "fa_icon": "fas fa-list-ol", - "help_text": "Specifies how search engine results are combined: ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ('search engines') into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.\n\nThe available algorithms are:\n\n* PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits.\n* PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ('shared peak count'). This algorithm, too, requires PEPs as scores.\n* best: For each peptide ID, this uses the best score of any search engine as the consensus score.\n* worst: For each peptide ID, this uses the worst score of any search engine as the consensus score.\n* average: For each peptide ID, this uses the average score of all search engines as the consensus score.\n* ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score.\n\nTo make scores comparable, for best, worst and average, PEPs are used as well. Peptide IDs are only considered the same if they map to exactly the same sequence (including modifications and their localization). Also isobaric aminoacids are (for now) only considered equal with the PEPMatrix/PEPIons algorithms." + "help_text": "Specifies how search engine results are combined: ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ('search engines') into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.\n\nThe available algorithms are:\n\n* PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits.\n* PEPIons: Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ('shared peak count'). This algorithm, too, requires PEPs as scores.\n* best: For each peptide ID, this uses the best score of any search engine as the consensus score.\n* worst: For each peptide ID, this uses the worst score of any search engine as the consensus score.\n* average: For each peptide ID, this uses the average score of all search engines as the consensus score.\n* ranks: Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score.\n\nTo make scores comparable, for best, worst and average, PEPs are used as well. Peptide IDs are only considered the same if they map to exactly the same sequence (including modifications and their localization). Also isobaric aminoacids are (for now) only considered equal with the PEPMatrix/PEPIons algorithms.", + "enum": [ + "best", + "PEPMatrix", + "PEPIons" + ] }, "consensusid_considered_top_hits": { "type": "integer", "description": "Only use the top N hits per search engine and spectrum for combination. Default: 0 = all", - "default": 0, "fa_icon": "fas fa-sliders-h", "help_text": "Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix and PEPIons algorithms, which involve costly 'all vs. all' comparisons of peptide hits per spectrum across engines." }, "min_consensus_support": { "type": "integer", "description": "A threshold for the ratio of occurence/similarity scores of a peptide in other runs, to be reported. See help.", - "default": 0, "fa_icon": "fas fa-filter", "help_text": "This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must 'support' a peptide identification that should be kept. The meaning of 'support' differs slightly between algorithms: For best, worst, average and rank, each search run supports peptides that it has also identified among its top `consensusid_considered_top_hits` candidates. So `min_consensus_support` simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set `min_support` to 0.5.) For the similarity-based algorithms PEPMatrix and PEPIons, the 'support' for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.) Note: For most of the subsequent algorithms, only the best identification per spectrum is used." } @@ -638,15 +627,22 @@ "type": "string", "description": "Quantify proteins based on:\n\n* 'unique_peptides' = use peptides mapping to single proteins or a group of indistinguishable proteins (according to the set of experimentally identified peptides)\n* 'strictly_unique_peptides' = use peptides mapping to a unique single protein only\n* 'shared_peptides' = use shared peptides, too, but only greedily for its best group (by inference score)", "default": "unique_peptides", - "enum": ["unique_peptides","strictly_unique_peptides","shared_peptides"], - "fa_icon": "far fa-check-square" + "enum": [ + "unique_peptides", + "strictly_unique_peptides", + "shared_peptides" + ], + "fa_icon": "fas fa-list-ol" }, "quantification_method": { "type": "string", "description": "Currently UNSUPPORTED in this workflow. Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs (spectral_counting).", "default": "feature_intensity", - "enum": ["feature_intensity", "spectral_counting"], - "fa_icon": "far fa-check-square" + "enum": [ + "feature_intensity", + "spectral_counting" + ], + "fa_icon": "fas fa-list-ol" }, "mass_recalibration": { "type": "boolean", @@ -656,7 +652,6 @@ "transfer_ids": { "type": "boolean", "description": "Transfer IDs over aligned samples to increase the number of quantifiable features (WARNING: increased memory consumption). (default: 'false')", - "default": false, "fa_icon": "far fa-check-square" }, "targeted_only": { @@ -668,7 +663,6 @@ "inf_quant_debug": { "type": "integer", "description": "Debug level when running the re-scoring. Logs become more verbose and at '>666' potentially very large temporary files are kept.", - "default": 0, "fa_icon": "fas fa-bug" } }, @@ -683,19 +677,16 @@ "skip_post_msstats": { "type": "boolean", "description": "Skip MSstats for statistical post-processing?", - "default": false, "fa_icon": "fas fa-forward" }, "ref_condition": { "type": "string", "description": "Instead of all pairwise contrasts (default), uses the given condition name/number (corresponding to your experimental design) as a reference and creates pairwise contrasts against it. (TODO not yet fully implemented)", - "default": "", "fa_icon": "fas fa-font" }, "contrasts": { "type": "string", "description": "Allows full control over contrasts by specifying a set of contrasts in a semicolon seperated list of R-compatible contrasts with the condition names/numbers as variables (e.g. `1-2;1-3;2-3`). Overwrites '--ref_condition' (TODO not yet fully implemented)", - "default": "", "fa_icon": "fas fa-font" } }, @@ -710,13 +701,11 @@ "enable_qc": { "type": "boolean", "description": "Enable generation of quality control report by PTXQC? default: 'false' since it is still unstable", - "default": false, "fa_icon": "fas fa-toggle-on" }, "ptxqc_report_layout": { "type": "string", "description": "Specify a yaml file for the report layout (see PTXQC documentation) (TODO not yet fully implemented)", - "default": "", "fa_icon": "far fa-file" } }, From 0ff16f5deb0cd4dd785c360a5f2675d79e1faee2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 30 Jul 2020 19:38:04 +0200 Subject: [PATCH 295/374] Added missing params to workflow --- main.nf | 1 + nextflow.config | 2 ++ 2 files changed, 3 insertions(+) diff --git a/main.nf b/main.nf index 068b9e3..94fc0e0 100644 --- a/main.nf +++ b/main.nf @@ -981,6 +981,7 @@ process proteomicslfq { -design ${expdes} \\ -fasta ${fasta} \\ -protein_inference ${params.protein_inference} \\ + -quantification_method ${params.quantification_method} \\ -targeted_only ${params.targeted_only} \\ -mass_recalibration ${params.mass_recalibration} \\ -transfer_ids ${params.transfer_ids} \\ diff --git a/nextflow.config b/nextflow.config index 849b0fc..d4a7796 100644 --- a/nextflow.config +++ b/nextflow.config @@ -89,6 +89,8 @@ params { // ProteomicsLFQ flags inf_quant_debug = 0 protein_inference = 'aggregation' + protein_quant = 'unique_peptides' + quantification_method = 'feature_intensity' targeted_only = 'true' mass_recalibration = 'false' transfer_ids = 'false' From ec197dcdddb69ac54504a390d9822d4406fabab6 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 30 Jul 2020 20:00:35 +0200 Subject: [PATCH 296/374] Fix some lint stuff --- .github/workflows/awsfulltest.yml | 31 +++++++++++++++++++++++++++++++ .github/workflows/awstest.yml | 29 ++++++++++++++++++----------- nextflow.config | 1 + 3 files changed, 50 insertions(+), 11 deletions(-) create mode 100644 .github/workflows/awsfulltest.yml diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 0000000..4bdee0f --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,31 @@ +name: nf-core AWS test +# This workflow is triggered on PRs to the master branch. +# It runs the -profile 'test_full' on AWS batch + +on: + push: + branches: + - master + - feature/awstests #TODO remove after testing + release: + types: [published] + +jobs: + run-awstest: + name: Run AWS test + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + env: + AWS_ACCESS_KEY_ID: ${{secrets.AWS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}} + TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}} + run: | + aws batch submit-job --region eu-west-1 --job-name nf-core-proteomicslfq --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 4bdee0f..422f5d9 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -1,17 +1,16 @@ name: nf-core AWS test -# This workflow is triggered on PRs to the master branch. -# It runs the -profile 'test_full' on AWS batch +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test' on AWS batch on: push: branches: - - master - - feature/awstests #TODO remove after testing - release: - types: [published] + - master + - dev # just for testing purposes, to be removed jobs: run-awstest: + if: github.repository == 'nf-core/proteomicslfq' name: Run AWS test runs-on: ubuntu-latest steps: @@ -24,8 +23,16 @@ jobs: run: conda install -c conda-forge awscli - name: Start AWS batch job env: - AWS_ACCESS_KEY_ID: ${{secrets.AWS_KEY_ID}} - AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}} - TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}} - run: | - aws batch submit-job --region eu-west-1 --job-name nf-core-proteomicslfq --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} + AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} + TOWER_ACCESS_TOKEN: ${{secrets.AWS_TOWER_TOKEN}} + #AWS_JOB_DEFINITION: ${{secrets.AWS_JOB_DEFINITION}} + AWS_JOB_QUEUE: ${{secrets.AWS_JOB_QUEUE}} + AWS_S3_BUCKET: ${{secrets.AWS_S3_BUCKET}} + run: | # Submits job to AWS batch using a 'nextflow-4GiB' job definition. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files. + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-atacseq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition nextflow-4GiB \ + --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}' diff --git a/nextflow.config b/nextflow.config index d4a7796..af146aa 100644 --- a/nextflow.config +++ b/nextflow.config @@ -12,6 +12,7 @@ process.cache = 'lenient' params { // Workflow flags + input = '' //TODO unused. Maybe we should allow SDRF and Raw and MzML as input and then pick the right mode from the file ending. sdrf = '' root_folder = '' local_input_type = '' From efb7c2b6232f05cd1c86600188c7e65f7fe98105 Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Thu, 30 Jul 2020 20:09:30 +0000 Subject: [PATCH 297/374] Template update for nf-core/tools version 1.10.1 --- .github/.dockstore.yml | 5 + .github/CONTRIBUTING.md | 50 +- .github/ISSUE_TEMPLATE/bug_report.md | 56 ++- .github/ISSUE_TEMPLATE/feature_request.md | 30 +- .github/PULL_REQUEST_TEMPLATE.md | 29 +- .github/markdownlint.yml | 4 - .github/workflows/awsfulltest.yml | 40 ++ .github/workflows/awstest.yml | 40 ++ .github/workflows/branch.yml | 36 ++ .github/workflows/ci.yml | 55 +++ .github/workflows/linting.yml | 61 +++ .github/workflows/push_dockerhub.yml | 39 ++ .gitignore | 5 +- .travis.yml | 42 -- CHANGELOG.md | 14 +- CODE_OF_CONDUCT.md | 8 +- Dockerfile | 18 +- LICENSE | 2 +- README.md | 69 ++- assets/email_template.html | 2 + assets/email_template.txt | 29 +- assets/multiqc_config.yaml | 6 +- assets/nf-core-proteomicslfq_logo.png | Bin 0 -> 11668 bytes assets/sendmail_template.txt | 17 + .../scrape_software_versions.cpython-37.pyc | Bin 1283 -> 0 bytes bin/markdown_to_html.py | 91 ++++ bin/markdown_to_html.r | 51 -- bin/scrape_software_versions.py | 53 +- conf/awsbatch.config | 18 - conf/base.config | 37 +- conf/igenomes.config | 452 ++++++++++++++---- conf/test.config | 11 +- conf/test_full.config | 22 + docs/README.md | 20 +- docs/configuration/adding_your_own.md | 86 ---- docs/configuration/local.md | 47 -- docs/configuration/reference_genomes.md | 50 -- docs/images/nf-core-proteomicslfq_logo.png | Bin 0 -> 20821 bytes docs/installation.md | 113 ----- docs/output.md | 52 +- docs/troubleshooting.md | 30 -- docs/usage.md | 259 ++-------- environment.yml | 8 +- main.nf | 305 ++++++------ nextflow.config | 62 ++- nextflow_schema.json | 259 ++++++++++ 46 files changed, 1610 insertions(+), 1073 deletions(-) create mode 100644 .github/.dockstore.yml create mode 100644 .github/workflows/awsfulltest.yml create mode 100644 .github/workflows/awstest.yml create mode 100644 .github/workflows/branch.yml create mode 100644 .github/workflows/ci.yml create mode 100644 .github/workflows/linting.yml create mode 100644 .github/workflows/push_dockerhub.yml delete mode 100644 .travis.yml create mode 100644 assets/nf-core-proteomicslfq_logo.png delete mode 100644 bin/__pycache__/scrape_software_versions.cpython-37.pyc create mode 100755 bin/markdown_to_html.py delete mode 100755 bin/markdown_to_html.r delete mode 100644 conf/awsbatch.config create mode 100644 conf/test_full.config delete mode 100644 docs/configuration/adding_your_own.md delete mode 100644 docs/configuration/local.md delete mode 100644 docs/configuration/reference_genomes.md create mode 100644 docs/images/nf-core-proteomicslfq_logo.png delete mode 100644 docs/installation.md delete mode 100644 docs/troubleshooting.md create mode 100644 nextflow_schema.json diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml new file mode 100644 index 0000000..030138a --- /dev/null +++ b/.github/.dockstore.yml @@ -0,0 +1,5 @@ +# Dockstore config version, not pipeline version +version: 1.2 +workflows: + - subclass: nfl + primaryDescriptorPath: /nextflow.config diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 23aeaa4..e095919 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,45 +1,57 @@ # nf-core/proteomicslfq: Contributing Guidelines -Hi there! Many thanks for taking an interest in improving nf-core/proteomicslfq. +Hi there! +Many thanks for taking an interest in improving nf-core/proteomicslfq. -We try to manage the required tasks for nf-core/proteomicslfq using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time. +We try to manage the required tasks for nf-core/proteomicslfq using GitHub issues, you probably came to this page when creating one. +Please use the pre-filled template to save time. -However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) +However, don't be put off by this template - other more general issues and suggestions are welcome! +Contributions to the code are even more welcome ;) -> If you need help using or modifying nf-core/proteomicslfq then the best place to go is the Gitter chatroom where you can ask us questions directly: https://gitter.im/nf-core/Lobby +> If you need help using or modifying nf-core/proteomicslfq then the best place to ask is on the nf-core Slack [#proteomicslfq](https://nfcore.slack.com/channels/proteomicslfq) channel ([join our Slack here](https://nf-co.re/join/slack)). ## Contribution workflow -If you'd like to write some code for nf-core/proteomicslfq, the standard workflow -is as follows: -1. Check that there isn't already an issue about your idea in the - [nf-core/proteomicslfq issues](https://github.com/nf-core/proteomicslfq/issues) to avoid - duplicating work. +If you'd like to write some code for nf-core/proteomicslfq, the standard workflow is as follows: + +1. Check that there isn't already an issue about your idea in the [nf-core/proteomicslfq issues](https://github.com/nf-core/proteomicslfq/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this -2. Fork the [nf-core/proteomicslfq repository](https://github.com/nf-core/proteomicslfq) to your GitHub account +2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/proteomicslfq repository](https://github.com/nf-core/proteomicslfq) to your GitHub account 3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged. - -If you're not used to this workflow with git, you can start with some [basic docs from GitHub](https://help.github.com/articles/fork-a-repo/) or even their [excellent interactive tutorial](https://try.github.io/). +4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). ## Tests -When you create a pull request with changes, [Travis CI](https://travis-ci.org/) will run automatic tests. + +When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests. Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then. There are typically two types of tests that run: ### Lint Tests -The nf-core has a [set of guidelines](http://nf-co.re/guidelines) which all pipelines must adhere to. + +`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. ### Pipeline Tests -Each nf-core pipeline should be set up with a minimal set of test-data. -Travis CI then runs the pipeline on this data to ensure that it exists successfully. + +Each `nf-core` pipeline should be set up with a minimal set of test-data. +`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. If there are any failures then the automated tests fail. -These tests are run both with the latest available version of Nextflow and also the minimum required version that is stated in the pipeline code. +These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code. + +## Patch + +: warning: Only in the unlikely and regretful event of a release happening with a bug. + +* On your own fork, make a new branch `patch` based on `upstream/master`. +* Fix the bug, and bump version (X.Y.Z+1). +* A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/proteomicslfq documentation](https://github.com/nf-core/proteomicslfq#documentation) and don't hesitate to get in touch on [Gitter](https://gitter.im/nf-core/Lobby) + +For further information/help, please consult the [nf-core/proteomicslfq documentation](https://nf-co.re/proteomicslfq/docs) and don't hesitate to get in touch on the nf-core Slack [#proteomicslfq](https://nfcore.slack.com/channels/proteomicslfq) channel ([join our Slack here](https://nf-co.re/join/slack)). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 5b1a680..55a2590 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,31 +1,45 @@ + + +## Description of the bug -#### Describe the bug -A clear and concise description of what the bug is. + + +## Steps to reproduce -#### Steps to reproduce Steps to reproduce the behaviour: -1. Command line: `nextflow run ...` -2. See error: _Please provide your error message_ -#### Expected behaviour -A clear and concise description of what you expected to happen. +1. Command line: +2. See error: + +## Expected behaviour + + + +## System + +- Hardware: +- Executor: +- OS: +- Version + +## Nextflow Installation + +- Version: -#### System: - - Hardware: [e.g. HPC, Desktop, Cloud...] - - Executor: [e.g. slurm, local, awsbatch...] - - OS: [e.g. CentOS Linux, macOS, Linux Mint...] - - Version [e.g. 7, 10.13.6, 18.3...] +## Container engine -#### Nextflow Installation: - - Version: [e.g. 0.31.0] +- Engine: +- version: +- Image tag: -#### Container engine: - - Engine: [e.g. Conda, Docker or Singularity] - - version: [e.g. 1.0.0] - - Image tag: [e.g. nfcore/proteomicslfq:1.0.0] +## Additional context -#### Additional context -Add any other context about the problem here. + diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 1f025b7..3697545 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,16 +1,26 @@ + + +## Is your feature request related to a problem? Please describe + + + + + +## Describe the solution you'd like + + -#### Is your feature request related to a problem? Please describe. -A clear and concise description of what the problem is. -Ex. I'm always frustrated when [...] +## Describe alternatives you've considered -#### Describe the solution you'd like -A clear and concise description of what you want to happen. + -#### Describe alternatives you've considered -A clear and concise description of any alternative solutions or features you've considered. +## Additional context -#### Additional context -Add any other context about the feature request here. + diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 97678e3..d213dd4 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,15 +1,20 @@ -Many thanks to contributing to nf-core/proteomicslfq! + ## PR checklist - - [ ] This comment contains a description of changes (with reason) - - [ ] If you've fixed a bug or added code that should be tested, add tests! - - [ ] If necessary, also make a PR on the [nf-core/proteomicslfq branch on the nf-core/test-datasets repo]( https://github.com/nf-core/test-datasets/pull/new/nf-core/proteomicslfq) - - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). - - [ ] Make sure your code lints (`nf-core lint .`). - - [ ] Documentation in `docs` is updated - - [ ] `CHANGELOG.md` is updated - - [ ] `README.md` is updated - -**Learn more about contributing:** https://github.com/nf-core/proteomicslfq/tree/master/.github/CONTRIBUTING.md + +- [ ] This comment contains a description of changes (with reason) +- [ ] `CHANGELOG.md` is updated +- [ ] If you've fixed a bug or added code that should be tested, add tests! +- [ ] Documentation in `docs` is updated +- [ ] If necessary, also make a PR on the [nf-core/proteomicslfq branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/proteomicslfq) diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index e052a63..96b12a7 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -1,9 +1,5 @@ # Markdownlint configuration file default: true, line-length: false -no-multiple-blanks: 0 -blanks-around-headers: false -blanks-around-lists: false -header-increment: false no-duplicate-header: siblings_only: true diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml new file mode 100644 index 0000000..282744b --- /dev/null +++ b/.github/workflows/awsfulltest.yml @@ -0,0 +1,40 @@ +name: nf-core AWS full size tests +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test_full' on AWS batch + +on: + release: + types: [published] + +jobs: + run-awstest: + name: Run AWS full tests + if: github.repository == 'nf-core/proteomicslfq' + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + # TODO nf-core: You can customise AWS full pipeline tests as required + # Add full size test data (but still relatively small datasets for few samples) + # on the `test_full.config` test runs with only one set of parameters + # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command + env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + run: | + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-proteomicslfq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition $AWS_JOB_DEFINITION \ + --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml new file mode 100644 index 0000000..4f3e0fc --- /dev/null +++ b/.github/workflows/awstest.yml @@ -0,0 +1,40 @@ +name: nf-core AWS test +# This workflow is triggered on push to the master branch. +# It runs the -profile 'test' on AWS batch + +on: + push: + branches: + - master + +jobs: + run-awstest: + name: Run AWS tests + if: github.repository == 'nf-core/proteomicslfq' + runs-on: ubuntu-latest + steps: + - name: Setup Miniconda + uses: goanpeca/setup-miniconda@v1.0.2 + with: + auto-update-conda: true + python-version: 3.7 + - name: Install awscli + run: conda install -c conda-forge awscli + - name: Start AWS batch job + # TODO nf-core: You can customise CI pipeline run tests as required + # For example: adding multiple test runs with different parameters + # Remember that you can parallelise this by using strategy.matrix + env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + run: | + aws batch submit-job \ + --region eu-west-1 \ + --job-name nf-core-proteomicslfq \ + --job-queue $AWS_JOB_QUEUE \ + --job-definition $AWS_JOB_DEFINITION \ + --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml new file mode 100644 index 0000000..517f099 --- /dev/null +++ b/.github/workflows/branch.yml @@ -0,0 +1,36 @@ +name: nf-core branch protection +# This workflow is triggered on PRs to master branch on the repository +# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` +on: + pull_request: + branches: [master] + +jobs: + test: + runs-on: ubuntu-latest + steps: + # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches + - name: Check PRs + if: github.repository == 'nf-core/proteomicslfq' + run: | + { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/proteomicslfq ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + Hi @${{ github.event.pull_request.user.login }}, + + It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch. + The `master` branch on nf-core repositories should always contain code from the latest release. + Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch. + + You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..9580460 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,55 @@ +name: nf-core CI +# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors +on: + push: + branches: + - dev + pull_request: + release: + types: [published] + +jobs: + test: + name: Run workflow tests + # Only run on push if this is the nf-core dev branch (merged PRs) + if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/proteomicslfq') }} + runs-on: ubuntu-latest + env: + NXF_VER: ${{ matrix.nxf_ver }} + NXF_ANSI_LOG: false + strategy: + matrix: + # Nextflow versions: check pipeline minimum and current latest + nxf_ver: ['19.10.0', ''] + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Check if Dockerfile or Conda environment changed + uses: technote-space/get-diff-action@v1 + with: + PREFIX_FILTER: | + Dockerfile + environment.yml + + - name: Build new docker image + if: env.GIT_DIFF + run: docker build --no-cache . -t nfcore/proteomicslfq:dev + + - name: Pull docker image + if: ${{ !env.GIT_DIFF }} + run: | + docker pull nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - name: Run pipeline with test data + # TODO nf-core: You can customise CI pipeline run tests as required + # For example: adding multiple test runs with different parameters + # Remember that you can parallelise this by using strategy.matrix + run: | + nextflow run ${GITHUB_WORKSPACE} -profile test,docker diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml new file mode 100644 index 0000000..eb66c14 --- /dev/null +++ b/.github/workflows/linting.yml @@ -0,0 +1,61 @@ +name: nf-core linting +# This workflow is triggered on pushes and PRs to the repository. +# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +on: + push: + pull_request: + release: + types: [published] + +jobs: + Markdown: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install markdownlint + run: npm install -g markdownlint-cli + - name: Run Markdownlint + run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + YAML: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v1 + - uses: actions/setup-node@v1 + with: + node-version: '10' + - name: Install yaml-lint + run: npm install -g yaml-lint + - name: Run yaml-lint + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + nf-core: + runs-on: ubuntu-latest + steps: + + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Install Nextflow + run: | + wget -qO- get.nextflow.io | bash + sudo mv nextflow /usr/local/bin/ + + - uses: actions/setup-python@v1 + with: + python-version: '3.6' + architecture: 'x64' + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install nf-core + + - name: Run nf-core lint + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core lint ${GITHUB_WORKSPACE} + diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub.yml new file mode 100644 index 0000000..6f7b031 --- /dev/null +++ b/.github/workflows/push_dockerhub.yml @@ -0,0 +1,39 @@ +name: nf-core Docker push +# This builds the docker image and pushes it to DockerHub +# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) +on: + push: + branches: + - dev + release: + types: [published] + +push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/proteomicslfq' }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/proteomicslfq:latest + + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:dev + docker push nfcore/proteomicslfq:dev + + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/proteomicslfq:latest + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:${{ github.event.release.tag_name }} + docker push nfcore/proteomicslfq:${{ github.event.release.tag_name }} diff --git a/.gitignore b/.gitignore index 46f69e4..aa4bb5b 100644 --- a/.gitignore +++ b/.gitignore @@ -3,4 +3,7 @@ work/ data/ results/ .DS_Store -tests/test_data +tests/ +testing/ +testing* +*.pyc diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 23265b8..0000000 --- a/.travis.yml +++ /dev/null @@ -1,42 +0,0 @@ -sudo: required -language: python -jdk: openjdk8 -services: docker -python: '3.6' -cache: pip -matrix: - fast_finish: true - -before_install: - # PRs to master are only ok if coming from dev branch - - '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])' - # Pull the docker image first so the test doesn't wait for this - - docker pull nfcore/proteomicslfq:dev - # Fake the tag locally so that the pipeline runs properly - # Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1) - - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev - -install: - # Install Nextflow - - mkdir /tmp/nextflow && cd /tmp/nextflow - - wget -qO- get.nextflow.io | bash - - sudo ln -s /tmp/nextflow/nextflow /usr/local/bin/nextflow - # Install nf-core/tools - - pip install --upgrade pip - - pip install nf-core - # Reset - - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests - # Install markdownlint-cli - - sudo apt-get install npm && npm install -g markdownlint-cli - -env: - - NXF_VER='0.32.0' # Specify a minimum NF version that should be tested and work - - NXF_VER='' # Plus: get the latest NF version and check that it works - -script: - # Lint the pipeline code - - nf-core lint ${TRAVIS_BUILD_DIR} - # Lint the documentation - - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml - # Run the pipeline with the test profile - - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker diff --git a/CHANGELOG.md b/CHANGELOG.md index 14b0aab..8f785de 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,16 @@ # nf-core/proteomicslfq: Changelog +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + ## v1.0dev - [date] -Initial release of nf-core/proteomicslfq, created with the [nf-core](http://nf-co.re/) template. + +Initial release of nf-core/proteomicslfq, created with the [nf-core](https://nf-co.re/) template. + +### `Added` + +### `Fixed` + +### `Dependencies` + +### `Deprecated` diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 2109619..405fb1b 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -34,13 +34,13 @@ This Code of Conduct applies both within project spaces and in public spaces whe ## Enforcement -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on the [Gitter channel](https://gitter.im/nf-core/Lobby). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team on [Slack](https://nf-co.re/join/slack). The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. ## Attribution -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] -[homepage]: http://contributor-covenant.org -[version]: http://contributor-covenant.org/version/1/4/ +[homepage]: https://contributor-covenant.org +[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ diff --git a/Dockerfile b/Dockerfile index 96056e8..eaab6a6 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,7 +1,17 @@ -FROM nfcore/base -LABEL authors="The Heumos Brothers - Simon and Lukas" \ - description="Docker image containing all requirements for nf-core/proteomicslfq pipeline" +FROM nfcore/base:1.10.1 +LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" \ + description="Docker image containing all software requirements for the nf-core/proteomicslfq pipeline" +# Install the conda environment COPY environment.yml / -RUN conda env create -f /environment.yml && conda clean -a +RUN conda env create --quiet -f /environment.yml && conda clean -a + +# Add conda installation dir to PATH (instead of doing 'conda activate') ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH + +# Dump the details of the installed packages to a file for posterity +RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml + +# Instruct R processes to use these empty files instead of clashing with a local version +RUN touch .Rprofile +RUN touch .Renviron diff --git a/LICENSE b/LICENSE index a4a5bdb..c14b073 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) The Heumos Brothers - Simon and Lukas +Copyright (c) Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 0a94e86..2150f86 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,69 @@ -# nf-core/proteomicslfq +# ![nf-core/proteomicslfq](docs/images/nf-core-proteomicslfq_logo.png) **Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.**. -[![Build Status](https://travis-ci.com/nf-core/proteomicslfq.svg?branch=master)](https://travis-ci.com/nf-core/proteomicslfq) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.32.0-brightgreen.svg)](https://www.nextflow.io/) +[![GitHub Actions CI Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) +[![GitHub Actions Linting Status](https://github.com/nf-core/proteomicslfq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/proteomicslfq/actions) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/) +[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/proteomicslfq.svg)](https://hub.docker.com/r/nfcore/proteomicslfq) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23proteomicslfq-4A154B?logo=slack)](https://nfcore.slack.com/channels/proteomicslfq) ## Introduction -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible. +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. + +## Quick Start + +1. Install [`nextflow`](https://nf-co.re/usage/installation) + +2. Install either [`Docker`](https://docs.docker.com/engine/installation/) or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ + +3. Download the pipeline and test it on a minimal dataset with a single command: + + ```bash + nextflow run nf-core/proteomicslfq -profile test, + ``` + + > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + +4. Start running your own analysis! + + + + ```bash + nextflow run nf-core/proteomicslfq -profile --input '*_R{1,2}.fastq.gz' --genome GRCh37 + ``` + +See [usage docs](docs/usage.md) for all of the available options when running the pipeline. ## Documentation -The nf-core/proteomicslfq pipeline comes with documentation about the pipeline, found in the `docs/` directory: -1. [Installation](docs/installation.md) -2. Pipeline configuration - * [Local installation](docs/configuration/local.md) - * [Adding your own system](docs/configuration/adding_your_own.md) - * [Reference genomes](docs/configuration/reference_genomes.md) -3. [Running the pipeline](docs/usage.md) -4. [Output and how to interpret the results](docs/output.md) -5. [Troubleshooting](docs/troubleshooting.md) +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-core/proteomicslfq/docs](https://nf-core/proteomicslfq/docs) or find in the [`docs/` directory](docs). ## Credits -nf-core/proteomicslfq was originally written by The Heumos Brothers - Simon and Lukas. + +nf-core/proteomicslfq was originally written by Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg. + +## Contributions and Support + +If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). + +For further information or help, don't hesitate to get in touch on the [Slack `#proteomicslfq` channel](https://nfcore.slack.com/channels/proteomicslfq) (you can join with [this invite](https://nf-co.re/join/slack)). + +## Citation + + + + +You can cite the `nf-core` publication as follows: + +> **The nf-core framework for community-curated bioinformatics pipelines.** +> +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> +> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) diff --git a/assets/email_template.html b/assets/email_template.html index 9898556..ba4fb4c 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -11,6 +11,8 @@
+ +

nf-core/proteomicslfq v${version}

Run Name: $runName

diff --git a/assets/email_template.txt b/assets/email_template.txt index e40f0a2..95765b1 100644 --- a/assets/email_template.txt +++ b/assets/email_template.txt @@ -1,6 +1,12 @@ -======================================== - nf-core/proteomicslfq v${version} -======================================== +---------------------------------------------------- + ,--./,-. + ___ __ __ __ ___ /,-._.--~\\ + |\\ | |__ __ / ` / \\ |__) |__ } { + | \\| | \\__, \\__/ | \\ |___ \\`-._,-`-, + `._,._,' + nf-core/proteomicslfq v${version} +---------------------------------------------------- + Run Name: $runName <% if (success){ @@ -17,23 +23,6 @@ ${errorReport} } %> -<% if (!success){ - out << """#################################################### -## nf-core/proteomicslfq execution completed unsuccessfully! ## -#################################################### -The exit status of the task that caused the workflow execution to fail was: $exitStatus. -The full error message was: - -${errorReport} -""" -} else { - out << "## nf-core/proteomicslfq execution completed successfully! ##" -} -%> - - - - The workflow was completed at $dateComplete (duration: $duration) The command used to launch the workflow was as follows: diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml index da4b15b..f47b0ad 100644 --- a/assets/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -3,5 +3,9 @@ report_comment: > analysis pipeline. For information about how to interpret these results, please see the documentation. report_section_order: - nf-core/proteomicslfq-software-versions: + software_versions: order: -1000 + nf-core-proteomicslfq-summary: + order: -1001 + +export_plots: true diff --git a/assets/nf-core-proteomicslfq_logo.png b/assets/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..b7651c75ac5dc3298f6c264a3d42e2ce252f6499 GIT binary patch literal 11668 zcmb_?WmgaUJ^^?&GYs{+{_!AUMnFxti;oQAr{j~0i;ALNvbjWm zhNh$piH=0SD$Es0*yDxm2l+w&bJ^@&CKDAElSGr0#g>dW8m9On-h&=siA5fki9})l zb=0yx>*V?E+HLx7;=%2H;^A(B_I|3ezFlZTb5UsHUPI`St;-}}*UQIC-!$BN(GZ4Vucu zq7kVqiL0PXRMj?q16Yi>*}NthYdYozl7Y$OYv(XJOzE@JFxxu-NWnoL z387C%dIh5hmvuN?>KvBnMSb(0@#OIFu+9H8;rJ-&BMesXvx1s)ZHOndFVuTssyKs6 zG=+-xm%C*$quRn&*e2=_cN1NvbPV>Fo__+|f+z|aS3eamNV)@0E6Sj}-l1Qbi&}`C zQ+HpSknMfFpT^P@nQLhHenZ#l01IsMVR#nR!mfU1apKRNs%>K$cdeRDlL_7&7frIH z)6o1+RL*G6CBCp>E3OwSTdL1@n2z@#0uuUrM7AG|o0qdkMiV^sg6VE6Ht@5M*JkBe zp{Bgal!p1H&YrKtp3S|2S+y*ze&=+PEct`Gx3B|I8g+@ZL}*YY3-H_uP}S!bXU#)q zbY^87vODl-b4JKYz6*^2Pa z9wToQ+Yzbb9I>!H{Z}TJ)9}l`b>|1>bh3MuOhhe3I)WB+wlaav_XjP{8Yp~R^ss~| zgp#%+*r~#Z+V60G;0EUGl_Y?Wv{x9NG#A~7+WkZcW9zHWm&yX1&UEI(Jmkyy)2DBX z!+WO%L15_&v{(9ze}(-@O*+{T06zxvZlqV4F$;u)i=S1x@QVQCkjV;HywEk*4x-H+G4 zlE1v|o-Zlr%QzJ*4oGG3j))G zt8>fJko@Q=9-g^}CCg{X23F&^qdgJ6=>fx*lSdLWLcIhLdfMc~&uvojPNZp1nrrN< zb|Mn-#oU$|14a%=c70%`lP6 zCs__+(t+~1AC3vE90N;Th-abgi$55_BbTZM<+o3RRBQ?5 zC#BmX>zB{wd$V%I5+o}0O)?%H+^xFSP9FDshB|J($~$y>K}0%oFADg=U#m%yqdYp{ zax$!;gI-DZ;K5sF*R9(7&3L4~_hvjLAii|C+I}Cq(tk%={8O3dcB_ji|2D%>@A$b8 zOOE?h3ue|f;2$qB6;B5kTj5_x3t1i--{2`)k@%QcmN^Ru%&b|*)-O<_L~x35HATjEoiV}q(l zv(Jp3u1>_(``o&O!>DTj(2)6XQ5H4OAWmx&Kk8wKzX5Kz^>@2j|3Bq`V?3{)8|$3` zud{ldC#ywSQqAYkhu>*(gHGfn?nAF8AK_+*MYyd~}d}`?wKn^T>-rYxYzY zvP~64Vt3igQKSMJAO&-}3-LBXL}iUWwf$&H+*pg%Y=@lnzds)LZm={NHK-k2!TwSy zKQwBs*w+WJm6-y5xEr6>9&2gwI1p(Zbp-g#A4vmk&4*ErstL4B*k{Jh{2!HDi=MMp zIATf&>k%@*Z~1V93ID+P8B(|E7pEtp6{a$q{_SuKA$OSs%d}FbQ9j+F9_jiY>va0SE2vQSZ^= zx}J8^m?~d0R7q*Vdpiq4El?6w7d|DJDu|$GGYWSL8ZHrM%TUG=jurMM?IG|BL>XW7?wzO9Q9f~+|11|m6FV?2}7iC^viZgzUo%(lEnQr2FI7Yox zDx8;-9Q=xT0w0m>B5hSd)R+{1P_X>=$IVlQ4>ed-mDxG%mOd+*nSg1#CM(^=c6M^B zTA+JKIgzRG1wB@~)Fm(%+=h+_`UbdzpZ48F1!@FleB@VXp4B!BMbZsbyo=SGuNWhL z-$FWFs{_m51ZL2Opsk>-VBj%oAiQB3<=vkGq+hR?uo5lgL}Fy+_jY}r4B*$tMzAFe zdaip$d*Bhs@ILQ(Z#%I!|1AUHk?)ANnU;1UKYn#4N_5-qvFZ7S?6goN4L41M$qU+Q zpB`oKLYQ1o6Dj=tjJAbm9WF&`#ouflp88F?vl3Dq-g$5Ey@Y^z2|(cao)t@kKOeWN zakrFU_sDqDCPA9u(rq=W_bVpl$sDslI*qiArRx6VSC7r=*s>`WZ z-(UMXF2uK8Wyd>lEh#7jl_&xRN%x_)@qZB|GP$t62dgQ6Vc`tUNxJV63D*G2u@_uH zuG_iTPOHawhZ8-^#1(1Dd4H=miA=SMwHIk#Z@A77qy0soEQ*pG5=Cv(K*E0_so z`Q{?99EfUgzTssV8+8_ASUCWg`%?&BivgP$fMb=Qs?^(bT*^Z0g|&C>Pl?QAA*@+z$pvfcHs z>}EUo(#`QA>zuUqOy_Ev;)?X?1a?!biW}BERFpLJ6jmi=8Rh(s`n-p1@>5f^c-d|g zTvyYyQx@`;Hh)_~f?Bb@rbk%!w3zE*!=RC$<$;{0e#xMHM4lcoo~p$#%GNT>D^;PN zJOKX0f51WEj%vKZ5@S$}SX(7BV;JEcn+@8M9I`NAxeeNX=1;06R=3W%MF%{&!pwszYbDHJ`?pJHX40NK3pSQ)h(|rF-s)lU4=C*_JMy}swhZQe@M#GGmKIF9qP3^ zxhI;m>K@c)zHtQ)wjfuA)~^Xcvja7BQm8GoC`iKn*qcFHkNln0NvJC1P9B{Pp}a*b z9FCLRaZ`!yaFmD@RT;~^y=lQFC{NhsWi2RSzd2w+S6%U$_RGZna^8}?9x?=TkM zqV01Fhxt3Yv&hxwLM477N)&iimPI?!e9?84Mh*-Dz3;molDe9F!~IhGBf8s>7)J$g ze&MT(U8H>*wly%_zu2p?XJLg5`F`P*unf549o;$#=YU+yb8pisYx-10=A&H=c~2sK zqNOUGZ9J}~(e2-Y>w)u&rPz4N`8`>kHT4GKsaL%l9y@*h59Zo*SQFK(IUwQLw9&0g zg{O#arzlvYfEm|gB2UIDC#d*w3t&I2@pqXz7ADQEO7ZHo<(JtU@y1&_5r1yct$>(m zA&I@N?dh#6_Wb*5U_h6Q)#}_>{T=ZC=gBO=4SuQ^w7kBT%d7G&XP(WkbqxF zAAvp!o{iG0Bpk--_DsWejRN~4lcxnPeCpN1*wa*W=O#S~76O{BWK(Y3tH%3?w-e9# z-ti0{b72p@GwX=5{@#G#Srv3J@4eo_*3^9S35Y8RratXc z8%)fhzzK|f_IZ-yLB3hUxf3OsqLmOPz^G;%or>j&`w<3A#sU5tB-y z?Y2pb>o$nrOox~Gxw@u80H5ZrvV#1JIcALaID}_Ql5d$D#|K|13W+wF41L;51G6dO zf!f@|8+#eU>A_uulJB4N06S;mn1_GtUGlCIa=}&tpR5UK3sIJUT3k`ZWRM$$%Ypn? zDsV$Y{p$dcINb&F`v^BWJBI5-?6AH!E%Q(O5}$03l7V z<2p9?qD99&X6nB~9+bMB;3IluSu=LduQHYy9vpdQBW_LOP2>2^8d)f@+n)=&Qv`RL zqaRE37)r4{8v z3QoYW{T06FTB)z;`&VO=mU)$>ljV+re!s8r{iy}v#Jy>B*HuHuWjTM0p)S)*I>eRF zxJnFL*(zY&{4*6?uUb=;-)U&}mNtvEl3X42h)>FwechZ2j+i#0?x0g;E!~tMYYay> ziokCzKe9zEbdwKNnpox~10iIyO+qp;DkVeq4)ngqwA27e7lpR#N^Vudf2kyGNT(W|E=#ME7pcCyW{z|eFcGPS^&HWf~Z)?hj1 zTNi&aLUGxd9NbLFF|E{l8cVgD3R>5-bF(9sL>6e|e3uB7oiuC8Be#|>6*5g0`Q1cX z5yxjPy4H41qhNC3A+{y6=so*JA(>jdZlDCK-e>CQo8|+Lep?PTp-+r*!xz5HnD+hd zmxUwTVj#;mpJIl+IK|Oh%_Cz^_Pl7r@jrLAENh-$nf?8fHuGgHqzaGS_63_4@rj+r z&pF`An6dIhqI1nmZJFsv4vUy~?t0c?lwDD>lLp#tba3Yy8FpZdd-)!1wjf{M@*t`oxee;s z>YNm91$h2j8ks7!-<>aDX2magG$@$d4qd83uS)`ye+;vS zi2cK6WB6T~axRFN(q~IqC>$;Dz=7h5Mm@^SrSN<3H=>^OJ^T0KC$1g12d!kXEn8mL zWuwdCepe&BB8*n#0A!s+msX~eBp}x&kY2eAJFZ!a+v%XS*Sf<3)`=va(=VTjUthK= zQ!8-XG+zc|fU9zY=3}T}=dcqUsLNXY9-t63OdqEw5`=(K3E{g02lPVeuF%5gqofR& zsw6nzsLA)3=S?3G%lX${_;ohUR_DtFJ-u{+)nqrkmY%cAa!QD6F)J506`3}CBbY6! zziBo|qhh?LM$Na!8Bz~)`i`R4mD{AbmuBT*Nf}s zChLdKSEVu#mzgaE$a*)CR4vz%B6%E=mf*sL_1O$M4na5`Inr!nN;k68Cg5) zLYh%oFnz}cRD0@W?MoTL0uwfR^v!X)4tk!#5@6SDM#hcSf1QdOMkRgc41F$RkC)5# zWv@BS0Dwio{u<>t@Cb5&2rln|F?>t7veQ%wZ7-Cj8ZsoT?tIo&Xe?bXQV^WrWqD;k z#${ZYEK7nC9RjB_5LO+_phiCkE2%|Co5Jy@=7^|0>Ivu$U0KUVz!`Q{D4Ri+sk{L2 zesFLw-1v-NM;rCsf~%~KSTudHv(GsSONPp;2f-~GtyqvF5n$32A+J*m=d z+@EYR??FYTQEMan?lhD#&Zp;zwmE+L7X9Cc*@KXq-jcn9B;hNz6&a5-iwcq{qA4T8 zvU$RDYKE3=3@yA->S*P2y);oVH>oSo=@yIIiGuVn$Ee(gEbG;8!uzlm}cfww3&LP0>d+T{@W} zyRgxVeLH~}aN)iE!jMj-dgEjmes+om|Ec3Ahhd4B-R8f!Z||$;lzN0=Y1d6GEOly| zeseZ_-H06p<`RwTaJjN=X&%_!Fv1EB6sIzMwP9#uTaol&X&R#E8@@P}oLel~y?c#J z&IeMWp9M9)3uWmQ6ja(bi+Xx{?qda6mVLG>qNZiS-5^10B8KzVtxXdqKy0`JlPtQSE!Y@$UCdBT*t1K zCSs&1S&*;G3de|S!CvPAyu^j2$y)efP-Y2XWVxy`+egJ{I3`ftl7ndj%44Bf3ofAw z#Ps#**`}Ak08+8yHcxfm13xsEyhXOE&pJqFN!;r-P*Y-uK?=jK!pM|}Rl|vkcw~h> zm$tNZU>Tb6a)-3x2u?1`=6)aCNxk=n=M5wOv)p4?Rs&CHBBOb$PoL8JLy>XZIT7RX z%Q>eczxenI#Y|r~H+B(fzmq`e9;rxI6nk5hmiY6g%krb!oR~{mo|30M_?R=9p^R4K zx1qz%W1BzJvZ9fS7FTzEEK$fV_*fVRi{_GiX;Rf?WUXbD=sZU2`755m(~;)|33p%g zBJ!nfyJ9O@2UH8x_i?M{M(j*Uc2%rUn1n%v!hsQ@MOl4OF6ZD;3|&f4F%S0v?$Fj%BapBG)he_ep~CS=F% zvr$C=C)wYY74LE<*O$af0ccb(ar;~0$FAD8dNN^K-wlVc1dfB?D6BuDO6PkeYu}<( z5#u4f#bWnee#4BC9+?F-KiYt4E!1L3yGY?+Y@|#(axf$Oa$cyG>qvRyTBL{Goa@MG zBU3g{*ph*a@YCEx$kP05!?(|#nr>!7_1?y>1guqKWF2RJ8_8DuwISQJ7fPPAd-fA} zs~ITB9zv_#2R!_jZPufeE{>VvC9N}=G!*}d0?8I!@}BT29Hy$VQiY3>+g=SdKh&_J zcd?Ym%g0cHrL789xY$mz6!oC>59qbqSH*m&Tj%%;NqY&RfZdBTj163$?CzP<&G}Gu z@k){hC1XBiN2)nJQtFdLtwZfljvenszYs6G*8}Z_-B{?IQ{^bEZN%(9Qk6>|4)P2; zciMn0N(CbRvSu(pFZ7$`-ipeEgy#teKIqTTGen|buMbPRp`@4or_CfY`8U+neip-9 zr;)D}`x!Ag_*t*rmHH*(%mbVLzLQ<=MbDutkp^xl+@kh;|M%m(5pvC zNus~W@M42b?C;oR)7qI2%3A~RSs8f75{?+_^b*67s(cl-u8o_= z!!~1>mEX_7F@?}cw`mOY_&Znlm-Is<`ssWAs0~`cOI52LLL<|dJySV&>CnX0$37Uw znP$E>h<>N;u_cvZ9#9Q*h#Nj0B1DfF>`NhQOKv)!sp7TmQnM{@L8R}MUHXD=;R_|> zk2{!eo(G|qFLq*4WwI*ZaHRA!}@sZr*xG^-zCn5#P(he5J3h+uMQQAuJh%OElJz_A4 zQ&#~^RARE)6iIn97r#&v|Ri|#^`;pxshl^4aXacJzu7^;v zqDLli~Nn6%&C=KqJ}f7+Zs7eC*2 zqlm7iuP#~l3X0N`ZtNM15B^ACd?YidmBpVgE~HqI(}(<`N77i(wAEFV^u)I2`YTVk z3$H8$l&c*s1#d3KmSf{o@Xv}<~Z$?FcrXU7~>e@s82h}TWsQDK^ zI@yhve0?}PhW3Q*oad-{(q$`^N_xPhRn>_b)#c|A|8uiDLB=-$Zs52PYaV-i#koIq zxWviI?O7J|{*&FKzjA+&0`wu&CA=lHdJJ0@MRtLYC@yiJWl7%8-z)Z9y<%P8lCR)TEVi>U zpF-Bl_JWkjAt211s%nb;H)a?|%B;(2XiESYAYK~m-e@&)f8$5|gc=BKuk-_%{Bm4j zV9WpWjlmr`N;p%=%Z|^Qk1@fOh#_CPW-$*|YsRHS*ILDhJM27xWb6&*zIUrY?->C@ z-bnjDll^=JiC7tnCvE6yK%nRlOuCBi!B-MqQm-1w3-5mZ3raV)tbN56G-UoSzWO{p z*=;87UhZB}gc(hO>Y4xtbn6kB=xNrNFRF`EWa3b2;MQfoAnuYHF`A&lJue%m>7r*+ z)~=q^rAL38#n_JINrZ$dA*+W~#yM{-xB&B#;?*P1JIrBuXk^-o1#`Y^Nw5h6X!)wm6K%I;CIJwjXZEa@=wnt&ap7ZC_->H5Owu%%_gxsN1yI3yI(eqM%?0#3 zE+n$?r*#kgC%@f5I5rd9aaWVTwuOiMl$-EL^4l&Uj|nP6QWLZdd_bnCp)!R~I`CK& z*-p(^Az%N)IKcm(7hw4Xb8aj_g6_(@W?fErSkAGVDnEoyiN;%D;sxf0v@1*Wi3jrn z`b0FgNA~7#JpP((EF54ixTBKFCQ94UbEc5J#bsd=7}bsw#5zRZuW{%KN8pY?U|B3#U^qATA?PePEouO5AVVP3EA7zS znDpwh5tk+C>|i6ZTvESycgox1vpfT1mW*KFTV>SEp{^wNd^o+Q(%?Ewe%Rx7Hx5-n z@hAneGZq>}gc}1TNp3V3`4kn<0-`91p%xeO-742VZLujTmh7NJ7=uffIXW5qSrS^NkUq@rM0MFG+)&+Z<}Ed zsOx(}6JwkdDlhJ%dMp>0spKz=VMSN_fhW)gH)DrcKUfacJAi&ZU)qOIHG@>@sr{V%e$65iRv5OB$hX3bgS>1& z*6hnwcZ3^P-5d}fgdD|iF3MdxEc-IdKHtVE4J>`+zcnoe`ApnAzOE*2Ua~c#EAG){ zX!BjhAQ()0ZKFu=J4xzCjs^`RdA_U1+Ii0;O4D5vGN~dCCJ`p`4n0OL&@Kuig66ff z@-H$`ob;c(QQ56|q={1&Le%7z_)rh6v;pt^kSYGc~m1yRZ{XJ%+bM&pNO(MJTzhVza~qN3c!Kx8yv& zt*)67`&v&loSJ<0T5wj#gmrxvoBENkm zYp1y$pCU88LsJ_wj(uBt#){5-8~iF;QPEUfRk&gTd8Cv!6REfpW-^lixm%_FBlW~T z%yMjTE`VM5u$bLN!do4=256f*&Qy)FX5A|56oH5DVYjZGTzf5Tbd+|9fdnTP!Xbh= zG2h9$t8ubS9|cc*g(-oH;jo26(S-!#GTFsxjncu~LdT$zTshUtzbZl+TAQHi#--?= zi{MM^{yV{r*&xBaXq6B*Dl;h_$~`4dbP;>SN{q#qkjG$c^`B`dQ4G7*Y_>MQOlqgb zh|90;?e)cL-xN57l`M$McwPE{8`!0nMJ3d+%u%BbOP`x(rwMc!+exzQ?qv!GVXLA~ z@|_e)u-R=a0CUk6oK;wAb{0_pu4DBoWG>lzXk&^;oje+zPTFu!fkI-|S()0IEXUNpGnwVB`F+c;Q zQtQU5x@zf4)6?F6QlU2naleypIHez6Urd4(9Di!{EmW=vrKt@Rr@_xEqFHgmjV%!f z2g<~qTnO~NCJ&_=k#Woj&*2au$Q4bm;oU1JQFJK~PW)DNPO zTtD>VOdCJy`}reVd>p$t1R;}gISH<>GSUjAfK2Wcd?Q(ChA-yuZHiWm*#8L@Wo%6< zeJgf1jBxH_K28+8cv<_UQ3sU7;*~f)189SObQn&Cz9*nogRktf0X$@=0;U}E5 ziVI~&{{hLau&0Cfg8zozUc~fT(jzCR1^t#=n3%JQ-d2^6x%Zq<5CLIE2hBiI#I^MD z&U)gySG4*`yr11)2VlBOF##T%#v=JvNth-E<-Z%jwgtr`=|e;N;pqLiPZ4h(Tb1jf$!G zSn#VD!{gPSdV7;mvU>tDBzqPqi54#qwYAvSLLF^bu}k%$~s!TANQg25g|gBOK za^lj=wd{7&=q5L5@SU1tv7q4l*FtwN>a}=gjVR>|T%0PdP3Mq7gr|ojU-IYD&xbhF z1<_ek(h~efQn^HQ#>R~}Y+_*b+?R@^t2-X5xT03sX>%41z1OlHyO9Ge1Pf9ZQ_@-^dxGK46U#=3SJK?%okmA_7j;u2Vkkbxqz=Byd0%sI$=9V$Iv zi8A+=;F}#j`s(Xykb*R_WNsa9qWlBjZn*VBv*G}f877^J)gLCYJz^Z(cGYiV*~p(q zkW|Mx8;HG=6w+j!Uz{WeJiY=j8@mukCcs?z;Xk0Sn=hj5ZZQY&Nus&5P^AjZy3vz8 z4zwIVrCQwNk!ownE3oDWvCBzqMd_C&3xtV2&6Ow?)(z0mCk9+Mgos62rZG2FBJ2oA-RC!6lNG5RR#cQ~~x-UO~NglxZaf3=Q z)k*j*kw!UZvN;NGse{xJ(1myVm%_YGbIxAW5+D<9?u5JA53UWRXQ#58--{rXoUL@!{O;{u_%6xcPjwoQb1i zGnb&@vSqr@S^tu>(qyIP8`QQqB3UU-ADs_{DkE-AC&pacnSXtwR9FV#;P5myfw8%W z8mNNLm+4Be!}lg1%#fIKt`I78$InT6Vrcw21dPK=N@OWL0SF+MvL_a(0MCV|nqUa% zD+gyv*x}ShktIPIXHZM;ly48zOaFMIy?!ADz9bON7sN3c5o-TDaDp9J4n(+DJa$1( zTMkrR>nW7RHaJpkw+yf%0%Q@@6N{BnE^Ei|nln;934K4MuyZl=29tN`3~p_T`s{X`t{Br&!QF@hHK1CHi$;tr-lVTS{o!n>r zi}iGeZJ&hqQ-$5Xpx-%3ZMVuJpJs-WAh)m*K4vyQpHxaoWpmSpm_}!VJKWUKJ729q zxf=HB={~zm@ZVW0u-^a1h)kpT aCPK9^3 +Content-Disposition: inline; filename="nf-core-proteomicslfq_logo.png" + +<% out << new File("$baseDir/assets/nf-core-proteomicslfq_logo.png"). + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> + <% if (mqcFile){ def mqcFileObj = new File("$mqcFile") diff --git a/bin/__pycache__/scrape_software_versions.cpython-37.pyc b/bin/__pycache__/scrape_software_versions.cpython-37.pyc deleted file mode 100644 index 8e730b9709745db1041eee1a29bd1223ac31485d..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 1283 zcmZ`&&2QsG6rYJ5*Kv}5WS3?4f<^%;jcDQ&`##(=MW}K!e zIrR_VAK;MO;LiWSg_EzG_Q(}+;Ms8`BzDL0d%t<}zUR#s-L6G&{r=Z4m#Zcrf0mow zDo;McU;cxH6HX%%QrzEsyiwd;O~`1~20ZkQkIzYD0bb?%f>=()3jbL9P^tapK3oR;$81tAiEB)^)M>Rq65k*>d2>%ukq9 zw-G$;2Wgaw)Av8A)~Ef+nfF5Fd7Uoc$r^s%w-3OG4FRuPo{QAiYd!<6T9IhsJE5}CWaK43nvTRjnHjgI52ZiO~6slZ+=ZOa1LD8=B!bGLeN=ZY2 zyC>iZ79a}*5@$db%}_xs3;L>{^FmKEND8_rbOAmu%w78nLxL}YS<#Grh0K_y0#VVt z5^0_l&FUHaO2lJ~Q<1-KZ53xW4*w`jBo-8nlo#G4AJ3(oOh-W~0J&H3-3u9EA~*{Q z@IL-G$6k_3?9*(uvR{UYjhEnO;O(00?Ghblw}t7%DbFLAlzosVNIJAbP2Hg7*J&8k W+)1W>2kp2s_!-;$V diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py new file mode 100755 index 0000000..a26d1ff --- /dev/null +++ b/bin/markdown_to_html.py @@ -0,0 +1,91 @@ +#!/usr/bin/env python +from __future__ import print_function +import argparse +import markdown +import os +import sys +import io + + +def convert_markdown(in_fn): + input_md = io.open(in_fn, mode="r", encoding="utf-8").read() + html = markdown.markdown( + "[TOC]\n" + input_md, + extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"], + extension_configs={ + "pymdownx.b64": {"base_path": os.path.dirname(in_fn)}, + "pymdownx.highlight": {"noclasses": True}, + "toc": {"title": "Table of Contents"}, + }, + ) + return html + + +def wrap_html(contents): + header = """ + + + + + +
+ """ + footer = """ +
+ + + """ + return header + contents + footer + + +def parse_args(args=None): + parser = argparse.ArgumentParser() + parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.") + parser.add_argument( + "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout." + ) + return parser.parse_args(args) + + +def main(args=None): + args = parse_args(args) + converted_md = convert_markdown(args.mdfile.name) + html = wrap_html(converted_md) + args.out.write(html) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/bin/markdown_to_html.r b/bin/markdown_to_html.r deleted file mode 100755 index abe1335..0000000 --- a/bin/markdown_to_html.r +++ /dev/null @@ -1,51 +0,0 @@ -#!/usr/bin/env Rscript - -# Command line argument processing -args = commandArgs(trailingOnly=TRUE) -if (length(args) < 2) { - stop("Usage: markdown_to_html.r ", call.=FALSE) -} -markdown_fn <- args[1] -output_fn <- args[2] - -# Load / install packages -if (!require("markdown")) { - install.packages("markdown", dependencies=TRUE, repos='http://cloud.r-project.org/') - library("markdown") -} - -base_css_fn <- getOption("markdown.HTML.stylesheet") -base_css <- readChar(base_css_fn, file.info(base_css_fn)$size) -custom_css <- paste(base_css, " -body { - padding: 3em; - margin-right: 350px; - max-width: 100%; -} -#toc { - position: fixed; - right: 20px; - width: 300px; - padding-top: 20px; - overflow: scroll; - height: calc(100% - 3em - 20px); -} -#toc_header { - font-size: 1.8em; - font-weight: bold; -} -#toc > ul { - padding-left: 0; - list-style-type: none; -} -#toc > ul ul { padding-left: 20px; } -#toc > ul > li > a { display: none; } -img { max-width: 800px; } -") - -markdownToHTML( - file = markdown_fn, - output = output_fn, - stylesheet = custom_css, - options = c('toc', 'base64_images', 'highlight_code') -) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 02de9a5..f2c7f23 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -5,35 +5,50 @@ # TODO nf-core: Add additional regexes for new tools in process get_software_versions regexes = { - 'nf-core/proteomicslfq': ['v_pipeline.txt', r"(\S+)"], - 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"], - 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], + "nf-core/proteomicslfq": ["v_pipeline.txt", r"(\S+)"], + "Nextflow": ["v_nextflow.txt", r"(\S+)"], + "FastQC": ["v_fastqc.txt", r"FastQC v(\S+)"], + "MultiQC": ["v_multiqc.txt", r"multiqc, version (\S+)"], } results = OrderedDict() -results['nf-core/proteomicslfq'] = 'N/A' -results['Nextflow'] = 'N/A' -results['FastQC'] = 'N/A' -results['MultiQC'] = 'N/A' +results["nf-core/proteomicslfq"] = 'N/A' +results["Nextflow"] = 'N/A' +results["FastQC"] = 'N/A' +results["MultiQC"] = 'N/A' # Search each file using its regex for k, v in regexes.items(): - with open(v[0]) as x: - versions = x.read() - match = re.search(v[1], versions) - if match: - results[k] = "v{}".format(match.group(1)) + try: + with open(v[0]) as x: + versions = x.read() + match = re.search(v[1], versions) + if match: + results[k] = "v{}".format(match.group(1)) + except IOError: + results[k] = False + +# Remove software set to false in results +for k in list(results): + if not results[k]: + del results[k] # Dump to YAML -print (''' -id: 'nf-core/proteomicslfq-software-versions' +print( + """ +id: 'software_versions' section_name: 'nf-core/proteomicslfq Software Versions' section_href: 'https://github.com/nf-core/proteomicslfq' plot_type: 'html' description: 'are collected at run time from the software output.' data: |
-''') -for k,v in results.items(): - print("
{}
{}
".format(k,v)) -print ("
") +""" +) +for k, v in results.items(): + print("
{}
{}
".format(k, v)) +print(" ") + +# Write out regexes as csv file: +with open("software_versions.csv", "w") as f: + for k, v in results.items(): + f.write("{}\t{}\n".format(k, v)) diff --git a/conf/awsbatch.config b/conf/awsbatch.config deleted file mode 100644 index 14af586..0000000 --- a/conf/awsbatch.config +++ /dev/null @@ -1,18 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for running on AWS batch - * ------------------------------------------------- - * Base config needed for running with -profile awsbatch - */ -params { - config_profile_name = 'AWSBATCH' - config_profile_description = 'AWSBATCH Cloud Profile' - config_profile_contact = 'Alexander Peltzer (@apeltzer)' - config_profile_url = 'https://aws.amazon.com/de/batch/' -} - -aws.region = params.awsregion -process.executor = 'awsbatch' -process.queue = params.awsqueue -executor.awscli = '/home/ec2-user/miniconda/bin/aws' -params.tracedir = './' diff --git a/conf/base.config b/conf/base.config index b0c95d5..11e9d90 100644 --- a/conf/base.config +++ b/conf/base.config @@ -13,22 +13,39 @@ process { // TODO nf-core: Check the defaults for all processes cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 8.GB * task.attempt, 'memory' ) } - time = { check_max( 2.h * task.attempt, 'time' ) } + memory = { check_max( 7.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } maxRetries = 1 maxErrors = '-1' // Process-specific resource requirements + // NOTE - Only one of the labels below are used in the fastqc process in the main script. + // If possible, it would be nice to keep the same label naming convention when + // adding in your processes. // TODO nf-core: Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors -} - -params { - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h - igenomes_base = 's3://ngi-igenomes/igenomes/' + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 14.GB * task.attempt, 'memory' ) } + time = { check_max( 6.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 42.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 84.GB * task.attempt, 'memory' ) } + time = { check_max( 10.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withName:get_software_versions { + cache = false + } + } diff --git a/conf/igenomes.config b/conf/igenomes.config index d19e61f..caeafce 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -9,139 +9,413 @@ params { // illumina iGenomes reference file paths - // TODO nf-core: Add new reference types and strip out those that are not needed genomes { 'GRCh37' { - bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" } 'GRCm38' { - bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" } 'TAIR10' { - bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" } 'EB2' { - bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" } 'UMD3.1' { - bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" } 'WBcel235' { - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" } 'CanFam3.1' { - bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" } 'GRCz10' { - bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" } 'BDGP6' { - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" } 'EquCab2' { - bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" } 'EB1' { - bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" } 'Galgal4' { - bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" } 'Gm01' { - bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" } 'Mmul_1' { - bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" } 'IRGSP-1.0' { - bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" } 'CHIMP2.1.4' { - bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" } 'Rnor_6.0' { - bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" } 'R64-1-1' { - bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" } 'EF2' { - bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" } 'Sbi1' { - bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" } 'Sscrofa10.2' { - bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" } 'AGPv3' { - bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" - fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" - star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.37e9" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" } } } diff --git a/conf/test.config b/conf/test.config index 6d1c793..4e44772 100644 --- a/conf/test.config +++ b/conf/test.config @@ -4,19 +4,22 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test + * nextflow run nf-core/proteomicslfq -profile test, */ params { - // Limit resources so that this can run on Travis + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + // Limit resources so that this can run on GitHub Actions max_cpus = 2 max_memory = 6.GB max_time = 48.h + // Input data // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets // TODO nf-core: Give any required params for the test so that command line flags are not needed - singleEnd = false - readPaths = [ + single_end = false + input_paths = [ ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] ] diff --git a/conf/test_full.config b/conf/test_full.config new file mode 100644 index 0000000..bef80aa --- /dev/null +++ b/conf/test_full.config @@ -0,0 +1,22 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running full-size tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a full size pipeline test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test_full, + */ + +params { + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data for full size test + // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) + // TODO nf-core: Give any required params for the test so that command line flags are not needed + single_end = false + readPaths = [ + ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], + ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] + ] +} diff --git a/docs/README.md b/docs/README.md index adc99bd..bba9686 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,12 +1,12 @@ # nf-core/proteomicslfq: Documentation -The nf-core/proteomicslfq documentation is split into the following files: - -1. [Installation](installation.md) -2. Pipeline configuration - * [Local installation](configuration/local.md) - * [Adding your own system](configuration/adding_your_own.md) - * [Reference genomes](configuration/reference_genomes.md) -3. [Running the pipeline](usage.md) -4. [Output and how to interpret the results](output.md) -5. [Troubleshooting](troubleshooting.md) +The nf-core/proteomicslfq documentation is split into the following pages: + + + +* [Usage](usage.md) + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. +* [Output](output.md) + * An overview of the different results produced by the pipeline and how to interpret them. + +You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/configuration/adding_your_own.md b/docs/configuration/adding_your_own.md deleted file mode 100644 index e7f0f92..0000000 --- a/docs/configuration/adding_your_own.md +++ /dev/null @@ -1,86 +0,0 @@ -# nf-core/proteomicslfq: Configuration for other clusters - -It is entirely possible to run this pipeline on other clusters, though you will need to set up your own config file so that the pipeline knows how to work with your cluster. - -> If you think that there are other people using the pipeline who would benefit from your configuration (eg. other common cluster setups), please let us know. We can add a new configuration and profile which can used by specifying `-profile ` when running the pipeline. The config file will then be hosted at `nf-core/configs` and will be pulled automatically before the pipeline is executed. - -If you are the only person to be running this pipeline, you can create your config file as `~/.nextflow/config` and it will be applied every time you run Nextflow. Alternatively, save the file anywhere and reference it when running the pipeline with `-c path/to/config` (see the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more). - -A basic configuration comes with the pipeline, which loads the [`conf/base.config`](../../conf/base.config) by default. This means that you only need to configure the specifics for your system and overwrite any defaults that you want to change. - -## Cluster Environment -By default, pipeline uses the `local` Nextflow executor - in other words, all jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node. - -To specify your cluster environment, add the following line to your config file: - -```nextflow -process.executor = 'YOUR_SYSTEM_TYPE' -``` - -Many different cluster types are supported by Nextflow. For more information, please see the [Nextflow documentation](https://www.nextflow.io/docs/latest/executor.html). - -Note that you may need to specify cluster options, such as a project or queue. To do so, use the `clusterOptions` config option: - -```nextflow -process { - executor = 'SLURM' - clusterOptions = '-A myproject' -} -``` - - -## Software Requirements -To run the pipeline, several software packages are required. How you satisfy these requirements is essentially up to you and depends on your system. If possible, we _highly_ recommend using either Docker or Singularity. - -Please see the [`installation documentation`](../installation.md) for how to run using the below as a one-off. These instructions are about configuring a config file for repeated use. - -### Docker -Docker is a great way to run nf-core/proteomicslfq, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems. - -Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required - nextflow will automatically fetch the [nfcore/proteomicslfq](https://hub.docker.com/r/nfcore/proteomicslfq/) image that we have created and is hosted at dockerhub at run time. - -To add docker support to your own config file, add the following: - -```nextflow -docker.enabled = true -process.container = "nfcore/proteomicslfq" -``` - -Note that the dockerhub organisation name annoyingly can't have a hyphen, so is `nfcore` and not `nf-core`. - - -### Singularity image -Many HPC environments are not able to run Docker due to security issues. -[Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. - -To specify singularity usage in your pipeline config file, add the following: - -```nextflow -singularity.enabled = true -process.container = "nf-core/proteomicslfq" -``` - -If you intend to run the pipeline offline, nextflow will not be able to automatically download the singularity image for you. -Instead, you'll have to do this yourself manually first, transfer the image file and then point to that. - -First, pull the image file where you have an internet connection: - -```bash -singularity pull --name nf-core-proteomicslfq.simg nf-core/proteomicslfq -``` - -Then transfer this file and point the config file to the image: - -```nextflow -singularity.enabled = true -process.container = "/path/to/nf-core-proteomicslfq.simg" -``` - - -### Conda -If you're not able to use Docker or Singularity, you can instead use conda to manage the software requirements. -To use conda in your own config file, add the following: - -```nextflow -process.conda = "$baseDir/environment.yml" -``` diff --git a/docs/configuration/local.md b/docs/configuration/local.md deleted file mode 100644 index 350d3bb..0000000 --- a/docs/configuration/local.md +++ /dev/null @@ -1,47 +0,0 @@ -# nf-core/proteomicslfq: Local Configuration - -If running the pipeline in a local environment, we highly recommend using either Docker or Singularity. - -## Docker -Docker is a great way to run `nf-core/proteomicslfq`, as it manages all software installations and allows the pipeline to be run in an identical software environment across a range of systems. - -Nextflow has [excellent integration](https://www.nextflow.io/docs/latest/docker.html) with Docker, and beyond installing the two tools, not much else is required. The `nf-core/proteomicslfq` profile comes with a configuration profile for docker, making it very easy to use. This also comes with the required presets to use the AWS iGenomes resource, meaning that if using common reference genomes you just specify the reference ID and it will be automatically downloaded from AWS S3. - -First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/) - -Then, simply run the analysis pipeline: - -```bash -nextflow run nf-core/proteomicslfq -profile docker --genome '' --design '' -``` - -Nextflow will recognise `nf-core/proteomicslfq` and download the pipeline from GitHub. The `-profile docker` configuration lists the [nf-core/proteomicslfq](https://hub.docker.com/r/nfcore/proteomicslfq/) image that we have created and is hosted at dockerhub, and this is downloaded. - -For more information about how to work with reference genomes, see [`docs/configuration/reference_genomes.md`](reference_genomes.md). - -### Pipeline versions -The public docker images are tagged with the same version numbers as the code, which you can use to ensure reproducibility. When running the pipeline, specify the pipeline version with `-r`, for example `-r 1.0`. This uses pipeline code and docker image from this tagged version. - - -## Singularity image -Many HPC environments are not able to run Docker due to security issues. [Singularity](http://singularity.lbl.gov/) is a tool designed to run on such HPC systems which is very similar to Docker. Even better, it can use create images directly from dockerhub. - -To use the singularity image for a single run, use `-with-singularity`. This will download the docker container from dockerhub and create a singularity image for you dynamically. - -If you intend to run the pipeline offline, nextflow will not be able to automatically download the singularity image for you. Instead, you'll have to do this yourself manually first, transfer the image file and then point to that. - -First, pull the image file where you have an internet connection: - -> NB: The "tag" at the end of this command corresponds to the pipeline version. -> Here, we're pulling the docker image for version 1.0 of the nf-core/proteomicslfq pipeline -> Make sure that this tag corresponds to the version of the pipeline that you're using - -```bash -singularity pull --name nf-core-proteomicslfq-1.0.img docker://nf-core/proteomicslfq:1.0 -``` - -Then transfer this file and run the pipeline with this path: - -```bash -nextflow run /path/to/nf-core-proteomicslfq -with-singularity /path/to/nf-core-proteomicslfq-1.0.img -``` diff --git a/docs/configuration/reference_genomes.md b/docs/configuration/reference_genomes.md deleted file mode 100644 index 3a2c9df..0000000 --- a/docs/configuration/reference_genomes.md +++ /dev/null @@ -1,50 +0,0 @@ -# nf-core/proteomicslfq: Reference Genomes Configuration - -The nf-core/proteomicslfq pipeline needs a reference genome for alignment and annotation. - -These paths can be supplied on the command line at run time (see the [usage docs](../usage.md)), -but for convenience it's often better to save these paths in a nextflow config file. -See below for instructions on how to do this. -Read [Adding your own system](adding_your_own.md) to find out how to set up custom config files. - -## Adding paths to a config file -Specifying long paths every time you run the pipeline is a pain. -To make this easier, the pipeline comes configured to understand reference genome keywords which correspond to preconfigured paths, meaning that you can just specify `--genome ID` when running the pipeline. - -Note that this genome key can also be specified in a config file if you always use the same genome. - -To use this system, add paths to your config file using the following template: - -```nextflow -params { - genomes { - 'YOUR-ID' { - fasta = '/genome.fa' - } - 'OTHER-GENOME' { - // [..] - } - } - // Optional - default genome. Ignored if --genome 'OTHER-GENOME' specified on command line - genome = 'YOUR-ID' -} -``` - -You can add as many genomes as you like as long as they have unique IDs. - -## illumina iGenomes -To make the use of reference genomes easier, illumina has developed a centralised resource called [iGenomes](https://support.illumina.com/sequencing/sequencing_software/igenome.html). -Multiple reference index types are held together with consistent structure for multiple genomes. - -We have put a copy of iGenomes up onto AWS S3 hosting and this pipeline is configured to use this by default. -The hosting fees for AWS iGenomes are currently kindly funded by a grant from Amazon. -The pipeline will automatically download the required reference files when you run the pipeline. -For more information about the AWS iGenomes, see https://ewels.github.io/AWS-iGenomes/ - -Downloading the files takes time and bandwidth, so we recommend making a local copy of the iGenomes resource. -Once downloaded, you can customise the variable `params.igenomes_base` in your custom configuration file to point to the reference location. -For example: - -```nextflow -params.igenomes_base = '/path/to/data/igenomes/' -``` diff --git a/docs/images/nf-core-proteomicslfq_logo.png b/docs/images/nf-core-proteomicslfq_logo.png new file mode 100644 index 0000000000000000000000000000000000000000..465ca25d947e6dce2b026d8c77f805aa39aceba5 GIT binary patch literal 20821 zcmdSB1ydYt*9M3aTnBduKDY);aCdiich}(V4#7jP!QI^h1ed`HfkA_1kmY&5-5;@4 zZB<=;`t-5euB&5Jm1QtcNl;;6U@+ulCDmbI;B?=w7e64qAF~$w6y8rLuCjU_Fff=S z|Lw4uESRJ)FjO#dl3z4^3(gAzLdmo;uR_|ps+JUoIKE5g)z*^?9J#x_YYhHFQ~E*lGY)`v9}OE8M^Earq;V9sxU4v?I4->zSMF&& z^u_H*S51I_B~)8S?Bw|y7q?Mzz>ls!E1f3=0ho+&rT>4}78w<55LNM|)adyc^-hlj~}Y*X81-il6JNmq|ngdbyy=b`68~he-FLglYxpgP zA^ZC!9{>#zSmA^n>UvUZQG)fuO*qD24$%wVMs|N@Q-(ZWub?zA@}@br!n{8;HDY!; zh^#z4%nu}U8AOwL3d7YRyU!nD->bKyx(%F{SZ1lfQ$rJ2sk1kk@=OHA?0*>Ah_n&tvN@9r*%al~Tj6*mDg2H*iYej6eWG z)Uh%om9FHx3~DDeR>A6yLOOv98DC@xAP$9vFXroeJ2Ierm8sXdVTog@)<*mG?rBec z1x(?e3dHqoYuTg;F1vVR^C7O65roc9Pfj7G%VUYc_rEk|@`bHxX8+cX<%DpK2|MiO z6Db??d#qwS2zM>I-1#xap^+TczYF6@NXHV*sk#O`?m@rIEVr}n5Vwmxno32UelJU+ zX*Ua6fxl64BKtSIQm2%qtCse^FUUAPSOzKS=sIRmv`|?j`u!XdhCd672zlEb-qZ(X zm-3J#4ywN!hm8_(M{v%cvIF7K-ebg$>`mB*jmSf|)X;2qzXD6N2l7xyps1D?`af6A zmSOpdaRe1OA%&Un$rOyG6>_aNKWz!swGvr)>vnkS)kLai(3hZUqveD=2@y^C9U?`o z-<4AMM>}r7boN|c%0B#qUaZMEyC+4zL7;U6T51`Ve%uEvI#g53f5ze$eS4l|#d0dX z^hFA_U2e6OEORl*H>)Dn$n{)J`J`*&;p`Lui zmpXM!LLksyKQ`W-qn}uG&4O*$gK`aU6V)E=1ZdYgHfqK#3TqD9V=K5Ajcq{A{+U6l zt{}O(O$6~cL8y4o<_W2J24nb{tWT5{gSM4iQl|tv5_jtn*>FawinkZTn8TPDY^bk) z!?fx$WYqE!nRPLCtwK6NTMbs(hHfMRh1grv!x)wK*wv?gP1%%uoR(~R4_Jd7G%_8l ze2jrsrIZRLM^b;b!l9XFU3BXY=ULj*<&3kEfZDHmo=b?1fycSD1FuL5VlkP_wP=)9 znoLVqF0;y&jN4;^pDU;Mvf(BE6XE+4O%bpuJWQ|z$P$#Q1|$k7=7bH}SergJw*HN- zWOQKCtvz%xcrIAhwmHZfJ^t^f55J}Ag{2S1%lk7KPzOmYX)bcVp87j{1B(U|%2?ci&T ze$c15{RC!Rl*V~16(ozffp+2#8I0&^v-XOuOt{~y;>puW84Kste^gm_pgn3FNXf-m zO`G21fvinn?oB$7#ERwfDZ~k1H&?m%!cBVGhgM{`E9}#>k~x03DGgpQPmAJ}sMiM&;l6 z!qmsFhl=wGPb#_&Hd1w(qr~B*X1TU40d|Mo$%Km3Ei28s2`5g|RkODgA=~`CSXrf> z^kT?Kls3Is1xSWVWN!P7Cm$IW#e!7k&p3nVoe0cyV~!$CPt0?ZS~WQ7nDpI z7dvl=uMdh-fq6L3+6C#Qj9wlD%g{9zJBgz|DWsw!Kd%OacWDCOW*;pTnw~P}2RT!{ zA}Pa`cZi~@l<%V+^3t{&-)>*FS47g!g3QzTj~4EB?%^K>^H9ez@HnHRng7-x-MsQx z%vkyEoSgte9yXzT{`r?8XZa{5uu|2O@9AQ?eS2@t?|>D>@!9BjHYQRi!H|7>9LD!p z}tuUDmtn-h`sLx zKdbNR_IFD`V-V%hi;+MW!z0y^pU^|@F~r}vU(9N>Ywy?G-n$vo_;=hYNP!fqQKV1z zy27`c5Nj3*H|oS)H&E}U!!6##yHs~ zq6x}j=~-0tDw0F2}-7NwxY*Ws^YChDC)+;jw(G7q}tTTpDh}aZejYhMd_#t4vlDnec%MfVX@gA{zW0Fhfou~ zFIjmx+3VZzY-0>U-T)N5$pTY(uY&LjyMmM*we&!FvF}?KzJq@ONiGoFbUEOc+K9MC z_YInfpgpasp&f2N=aZ#gW0ilZXbM*lg+^95A=0Z_lOoI@&!i07$WmxNvuELaVSV4r36s0^o z5$gku181Sxv{Raf+Hl42eG0NpPS;OGi{R5Z6iE2~&IwyPe7?!E65ZvuA-D253XZR!11~e;`BYFa zW6L7x1FVnnBQw}W!sWcY=pWG45gyO%Uqi{SuxAszULY)Qa$|xM>b_ z>=N#rKDpvvX^(K_lrL`}zY_D0S~I&+b4@$Thc_LDI;29YwKVRbVdF$0DI738*o`0B zicE^8g+m^r`A&-Wn+z&q!w(wBe3E@3l5Jre?cdf-&ECe9J4zw$@l1Hhm0cgIjNF8) zMeeO15^~?R>Yb|01W?ZAhZ&4ixoE5z8%bnDu|TO#3K;rUZEgk_c5(6>CTmFqCEzPutwa0pv@Pn!vvw#c-re z?C2Loc*f^viAOGo*R@U4gk$Ce+MYlLgyib!x5ambNB*qnSZ!K2;>*)V#iQN%Bbv1o!BG4lvxX5Z6byU$ z{Z`;MloBQcR=G+fg9R(~h@VHgGmZ=I5n%4+NZ(p%ZJWP)-`5a~uyjBk;ZpUr6Ngpu zpB5vd1m7dsQ*F5k+h4oUm(nv3pMW)pJu;d0#n%$-x+$B#@CuHq2Gd@Kh8OIb&Fp9I z1Ufdem6ayQE#cR~d!HvoJ{)Ww)VXl;`G$w9A@7@IGfd$Z5t0%g;kRc`akrW7rOmVQ z{nJ!8qDR!jZr2@j;t*p@tk zFfYa|-T>%AJIUo}-PA5^)u6ot{5WTNWXfN{e}WYc4D;Ls2p^G3d-x)ZccC%7HF{(b zjFSg$Larh2O=s-7NX_7e4s7J>LXNo@F@!}y1U2Jxc>Z?UK<<*V-eWDt?CN)NGY(= zo8{QP&X>km(>R|?!06C%@z)=m|H{QL$$jyI9CFWzLz+$|Ju;lj~nA;N$%-zuRIFgpk}}LHuC+WWWSKt3=UL$ROi8qqA>Pn>kY~ z+g{l~w?0r3kOaH4YEKQf81s=OmN>r764srfP9B+?+-;J#tJolP<|}!!R*RZFQfP(@ z``XhAK`(NoDBth+S|F4xZp_zWZTr@DbLZvN>(&O!Pu4>D-@Zx#N)Fj7>V_J3gqW7F zUznVWDdiHJRIj?|M?^+URY7`Cgz}J!7r=^r?n$j`P z#&%bluc>ucdU_sZ-fZps#0n%9^SM3;WtVz-cL+lf1XTc?v~{GFYS&wN_q?WJfYn7{ z5}c@VfI&Uigkqof6dFIdoXC~-#>Ah!&m?Njf8-ET5ys)hsjJCIMg=u}CXMB}{DyHy zW?=oN>ofsfGeei7!J9W}Ouet;){xm2&y8XH_QL$a5h?bZxzwQa7Zs7th0VQ9HDr?I zQNX`hP(RPi+}}s24flEKjWF8lv0>48R8()3w0q=S`SUi;Pq7n3ky9Bf>*~^$gsdLb zQazNRs$H6aaq zmq2;g!}yjD@)dB`o-RG`DW((4jl3{|5xHJl7)y@(`jYdgCNeX|yn}cxyg);jf`L^H zABfBY$gSp&XguiU8enh}13Sqk)w-zpeM^_l$YtNlO1W1(4Ct6tE7Je|k$1QiTlmar zD8BE3AZoFi_|-KDHz4q*Ql8oqvJ=rk1~=1X!Bh1l*#9cl=ykEewEbeP8J?D~Jh785 z`#}XoBp7Kpp24OU%|`iFtkpaH1aX{wtP*LHc=&G-#SXs(d(wsl7rknwWBJ=d!g`v# zpUL#mEx5=!Ks(62-`Kgw28yq~6L9^4c6ExSpr)c+)pJ`c#v7!6%T`sYS?-6beW+K5 zFF@S7mg7oN&)Ii9qcyOM1ZB9?Lwe&BLRPti#kyeHEa-^HCFNi27D@l;lr+p=Jh<_+ z(8L=FmTo1rTi(y5@c3P1A_Og|I|=UHXFA_Hev6f(MUEcDlE?%qb*(B?6d~mU@r8iP zIr56@=7alrCC~M@>iAW_>~7gxL97s0^(tmv4eTQ-wg=jkhNFq;ke)N_tTjIoSGY8) zmX&O1$>9DJ{J>;FF)$craZy{#`9u)ZF?TbD`*BVe?a6<2d|ET$5i5{5fS#zp;=ij|ZP`-CPe|)yo=GJvj zaq;WZHYb>NG&^BFHuMDc#rQ{Rwu+j4>ANo^)0u>gmU~U%AD_6j`>r=Fpo`f6|HqO} zun6>%OLD3K$1{2{?Xc>tJE7TNjV=-ON9^}(R_bA(KmVXWzg1om^eCkio2M6GC9!xUr(}5AhKiNFu<{8dgbD+xz&Zs?v^hQ z3IK1Mr%d|%8PGhX(ch)v8IAs~xDfYM4L~uJp%cKm_@h09{JaT>{OrD+&-xmnTS;Qz zxt#f1Ck#c1#Lii``N71e_uI}A%Tq5$F~^4{u97G6DaEU6UH`LjJ^qeUqM1i_$ku(K zV=pAtd~ECf8e-nm<+dwwmkeVj&w$}l6uqz19I#Og5>A!K9G3z^r;hJQqUJ`P&l z*qWO~YJ;>|+-$QX_xo-gC{xT~oEKy~r6s?}dun#-d>Y-XdVpq&-X%nH_IzWQ9rY-h z4^H<-tr&~5BRrJ(bQHAB|H@D`|MK{_I?sB*dG)j~3C`uWONJt2=$L@4Bak{+fK|Ml zx=3mQLtv(t){)-hA768<4*Q<_FseHvOt}4^jkZ_5*Bxy$F$81w3*B!~0e&-eql?GW z8>q=HN(%`9q)rAdqo1#^LQQYWxFJXt)?HLIUIOE729#!pr;tdmopSu@9XXD)@na|I zU9c|&Utu@}rGv*PW8l0&7y`+vxR|=)TIA0-xBP?PgT~lvldYf*Qy06{Y zu=Sr{fXRHiWlW%AWZn6_1qbc)o^L$GdHT2Wv4?IY>WO?z7IBTLsmDI)T-G9 zlNHmt&ab>W>=RlgO$TmSlm;sXE%mR`5|C!Avn{3iRspe3zT#mBJ!s=ah;Vw&^d(R`c`5)c_e&U zo-DEpcPZr;#lb>-HE~|}Np3N$t_Hf#I8oAh{y9VsWQcgDhY}Z!xDM7}YUV4RH)+1? zTLxOkEzY&DVrfstcdK?)VpCHz!i{Q32cCaB*)#+7j-Go}#EEMz$V;EZ(znF5SPd&z zs%Pk_OVZ4;t{E29wIgJ)qx;%&QYMszoENNhI4)qB*C1{f}&% zRs42LMK2$46GJe>>Edodv`nN_vXi?;EBzrR3?0J&E5p2c(@ast)cI1jp3t0?^s5|E zB=ElkZC`Y&Xr)TY#KdQV9})m}C%}z!>}!rBy7nn00Z6d8u3dC!;utb|pD;LGe(haj z*~-ke&)GT}K`5N0+|<>_aaqc%SO4I)6^u7IBW{qWYizjn9ZHo|+lH2t^fNj@w8?!WL11U}FnKU6kt3O#m)Ih5U(Xb)97VCqkR2$=|LcHfK&) z++=QOBo-VZOtI}hOb1ajS$p;$2EU&jj3|yn(m|-BgRKeENNKUv#NvjsY>H*QnJg)R zMv$V`{+<2#*2BlH*nf4Fk70IRq?5PrnY1Z>O zVI_DIIH1!XWio35GFWK^@yhA3UE(;n+mO)Nhi)rqn1wk;Smoo}B$v^i} z!$ijowkb6M6K>pJ#)k4hJ^FOE>pXn}|7G(2u)aU}38h-R!-k6}p$4D57RTK)TLCRy zAP?XOuo@w4(ZVj%i2}}kmTYeC}q5(R#-;lK@mb=R^F?cN?O7AwX>Qz2D_ndv0r`zV{S5v95r(1BMO`UMTyr- zCve`69{9<_*OHqlaW$WMIe(CaHBFWWKu2^pbzdFB+1k~zzok*<&a$i#)4AT#eTMN# z0;Sf5U{6rI;<~;XVSr&^$oJil_o>3Grf_R%Ar{EoLab{ep>ev;}jgqq& zRmK_g5;)v)BJ7Z$Z8mX7KkQzr^@#Xp{WKH-cBJO%Vk>4H-EBtZLFpF8Nc>puX7eiz#{zVuKxq2C@q!2D^{i z8Ci>-h|>v+o-|Q(q=i4xLQQ;90{kQpkEja-b)WUvM@_HNjA==1u@RDGRz41%TUj)8 z*WHYFHtH-ZkV%;~j8uHbSe+A&C6lL$i>VQ+!>g?G2~T)`yN~J%^bPYOov^Fuw`tpF z>S|k1;@_c~zThX1SwdAc5=n;})5u1-uPbB{ovBQ!T|NGQV%bQRTA)L=d4!zcnWF$i z;Cr2w#GC1rgr{)&Hy;or22Vvxg`(WwHIuiR>H_@&RXa^8j#R3kM`hFef-qY^ua7$> zz*RUZbp|3Idgnw?njSR7`$dKFZ=xAd^hvJLbQaJdwl;3@gFgN$4krM4N_J0!v=!mw zl6GgDb}0e?U2(F77|=`i%JLVDB)7DnKz`L5orYa8+jKv2qId1Qi6)<{y*ReV@vQZi z;;@lkyoO=ddWW$}ihhtwMX!b`8S+d`(=5io)cdkIr!Hxi1?>*Je3O}4f>!He-*%(p zI+y08dRHXDsG=l%kCPdnoB$g8{*MNs2@yy3oMCZ+e2}|m5N^g8F+#Oi{F!69ha8QP zL_pJq8+5&;5`P9SX;VeYU1WK>q(dw~h5n~=*?MUAjUOF^eBcsxxyjhxJ__}5wVUAzNY=>q-Ezzv?N9U|`9`%VXV`G3tvGzv`FEbcdC0s#xfgi9< zbfe;uC}hdVKYyW@ad5Q$Jd^Hzrc$K&M;yvoY2$P@qLndX*B1^T9{n00`%1_7%w(Tj zEJ;x-&Q{xW&+(;k;VV}sp}UjRjR}C&df{0&k_*SkVep?vLXa|YB?D0Qy$`+3d-w0t z<_z4p&;_@w4Wq88Yx{vH#TE6250d%&YC2oR(*>l>da2UNm#G2UM)!Ak8AW{fiMXs^ zrH_dh7PFD0!Y5t>4pl%%-VK<^h_WAOu%Z1c4qvx{r1C_P9!q*N(GuvaIt)DF+cFVY z3XQkB5a}d%SgSm6Wk=NPdm|eUqA2k|MFx{t6qi*O!Ic0Mk#`LLKNLb4NTuT4z?6)T z%`jQ1_@=^y)35R{aq8Sa5?<-Rh_GWdMjbr2#*fq{oR~jqx5IDnyeL;<6DbSJ3R zoVX$@HVb58Y}^C^kBpPs*)lY-l;h?^9mCf-L7%r&Q$=^VJBq!CD7XFW8p6p_Q4@riX;H2^K%tJgEeARJgOq zr}3x^GAl9a9MdNRYX?5c(oBf#r?CJtC1N+(a0pA7ywI{T;Nm(^7(S}2LLJr>VP+?hz*`l!MXBuco3Dmel z=&efasE!YliKPrIf}Q)(G>d(}a)p2!wk?bH%zc`{Qv$q2i@;(co6hDMY=N`B0^o;}oH{gxF4%3AZ!s{z(y>%UP)U z%kX{7jJQ-b%!B>~^{1@-_n}%L+(rh(I>e{%G=lrXIWTwx9~5yl{sowcmSA|8O5Fr{ zH6WtVF)ZqYC68`}FkkJMu19r{ty0C(M5hG!eBvOKyT)$cnj_dH*)<%0@y9tE6^bb4 zr(MbanVAiX3HO@#5AIQjGxT5f`(zNuSRX|xE4$Q1iBN=vNYWOvsJl}Y9F%(X(e;l; zCG(^zu#s^B=9`9LyONbUeXA>7_B{SsW1hJl*SFm&#@cpW2dL7NqD-cCg@UcpD>3|+ z3TO1b0| z$+)lbfjF`4u^Lru8PhWOp|FRYc_d^F6fD>BpPJjYIb5mtgX>P7tKbpW-Wf?5kacqiM}LK z;3T%8DU)ze8sTv&v!jZ8(oM5vaHJJx8=PR(WSk~Fbw(qIX>!AaxP<8;2G-44_{OO7 zBIiWvcKq?{3#1GB!tm>%W(0#ysC%x=SA=xirBFwLG@C~^D(NWCVc@~f%8dr^^KY7# z!E;Dcta4cX&w(#}Z^Qq=?NAo~64i<;-*L?KESl4pU3iiBDVpZ7wGfN60W&L%r3wGSUAuAP4MWEw^k4%vjU{5Ol<23v*GpzF1`h1k=-EKWB-Plr^e;qD9DgOSU_ zNujeq@gTRk2c7;c7oZRbzL$pffE2QOB2tBQ&Fo1HWGy^oP3G&pl|I!lD)YQZV@~g_ zUD)Iok zEe>s8(r@2_A0Ry~+LC{uUB49iGC5URP5fsF+(Wu@(|}DZj3J8jMEGuD2MFgL_@fym zJuVjxonewP5^C_}Z6KHld1uk2aK5Lw*I_E0vY#n8T{YvYZC*G)dA`*QRMr|f52`2z zgj0uE-k3@`DwuVrddqLtlmVrSWf&zY=HIJefS_nv*Pc~O>vv)|J-3Wb6B@3ad=RSz zCqf{FFK}P`O!`8m{`A2X?AvquqxNB8_ug$X?=KlEn!LL)&VAUZoOBtDNF8oyl$X z%kfMOwEBP7RX2_iPDFPB0|pV`^x(aJvU9a)U$y|8zA$PZL}d?Hl~#GK@C#-GbGivPqLhs?8|F&djo0An_H3S=v_$>!M^U>tw==b?x9l41LH@WyNOYw=Nt zfxmQ7Vy1*$do3{q2^TwG5&w_Gn|twRBidI9c!iW#a9EGnL`)EMwdP(KD6w&)b=PAm zd3)njIgsxDH(%UGl`I%`@jv&olGa{UDTJn&eyNyLxl1~5Au_CJ4~s-BKQVXRB^UBy zGxq$L(o)}DSt1lRWF6rrBe;b#J?iZ8LojeU7CP7=6W%Rz=Zht+*5<-0>JcJfc zKdySFlyY_AG9`Obnwz0E_jYr@3p7sgk){lOUgh-Z7`I3h87`mcfv*>PX-VDo$k*<_ z5t$535~kJIm2lw?X=`S`LLU?heb_zo?3Lkw)1^WI2{czeY3c2a&r z02jq@I$Q~iugzd2_w70GnylJTKph)6k01J6u-XZPu|f=Xz&Gi2@DZy{!7D;``fs~J zRS-^61-=4JqPiv=@<0-*%$uDK4;}lbTfzc%O15_(EMdcB%FWzpd-2es&Zh)K;mf>j zewUCl6L4LueCGY<{{YhTTzb1^W5pr@A9}F)9~{O(;0EwfMbvEFBbgnn#nttUyYOu; z#o`0xLgWa1*L7rZ@iwzDA5cwZavc8IsSyY`oE63(YEZIV-Zc|6Eh&P$b}d^xcxHt02XT}1~q2Z>F9 zo#XHdD1x0&YWZIdCQHg5wt|-R<~P%WxJ5kN{luIUzgbHCtqt5M-aB}?m`67%DB`(6 zy78_a_21)D%bdY~z4q@^F#0MF{q_!eWW)m`|EVT+mvs$TP1|-#!J)5;j}<$rx^vnX zzzyuC`)Tgp8R!o)59Z{u`+3%S%0#gFusJ*LZ$?w_8~xO_ub<=lc6;!By8+Vic83=g zKkZWUzvY07A_E>je(MU2Q#q#Ou++f&)WYt6Z}(es=Y^ZdHqI22L8E257pyT0aY!T6 zHlc%UpDaJC$^QqG&xoDcHziK)4zFx?jm^ zHGzf#+7G?wHHa0fp0>NtA%U|6Y~TcT_>9QY)En6wJI96=8f9 zfzH#InteMDPW~BP1gQ*N^yo>H04kT6si!3$KA;e&R8u0}o;i*SCl{qa(#_kZ!U{wE zQ+3U{)e?+DseJ-aH%1=*r_z^URE-vyRcU4&%?*Bk$LJuZ^}YpfG^Y0Quz;e?j&a|r zAB;*;z;_;kqNF#>+xyT3wo9yAl;xE#-Yfw2?HKqb8ocpfzS%Y=mSS+$0%5U>oAf=3T;znQJX%2XW zIBz%!dN^~lXz!4r-Dxxy?rC)pkCv{BIO&ouG_T1kG*)*yju<@Mj&{hb9g{VWE-_FQ zuzKg|Ra>y=_x&9g8pk!L_w*3tPIMdbvqzIr-UWRn4sD)DBSE`8O1$;>x4+`<9vX`v za!CG94BbBr7LS9flZ2Cd)0bHYAT(N4~q1;6J(sV(61tfX~LmOv4p~lS*D`A%LIx z7Qd{{uaT7K+U;FJVovTY!9aLpl?D-# z>G8(j=d-KIl8&Zef!MSg^RZ_1{y+VnDjI6*vXCk{d3+oP6C zqsIx9>||nJ(FcRef%8=P{1&~0J;vQgQ}=;k)63zL*VMq^B_8kFRmfH%MpbYlip>wk z$^Tbzw*UAKT{gvv1{MUSGpV}Jd0m2CPQWxrR6wJl;77=IF!;=VUMBPDZ@$3uZji6{ z3!dXf55EdKR3Cm`TrDsD|CF3%?e!vdC^DY6beS+n=X$+u4&W--ocj@QpV}lO-@brm znF{wCeo&A%AA9tetf{B8WHEK)e_cPLxnh8Rbeel({xD_wk92f< z9U43y1gXue>f}PDA!t1hIaWNIQ!GCwu31HsV z?WyTe-xY?;h`};^x@a!N3IC5=1WQQ8CSHN^Jzn*l-lW!5Ts^>4M_^7<^1oD&It@0C zu(V2+$Qc!hE{B;+>hFZJ|DRzpVh3u>&$$!maD#BJ!cEta1u%3Ab1aSG^#7x8;a5Nf zx7-3KAANSKtcqQb`jEE}$TC6=jsDLK+oNZ$Se82rAHHO?nW3hw{fZY$7h`) z`d^t@G!p)451Y|u!F+L)9=!A63q3AL#C!QEW!#?i+yXzakWx=bivKU)q)wgpf8~)W z8pfeWGjx97jv)}m=jS_-kVrnjXA1AG zoJfy0BVB*y`7L>$cNX!+5q5=k)l&h_@{eaF?KS3JZ(fCOvNQY&T#fDgW2D{03-(UF zMoREd)n8%*Qr6)*oTXgWo3vc<(--ohCB8$lbbFQ_@7&#v;V(i4wwJ-x#Fi=0+Qqwd z>60yuLdk*_R5=l>JFdw6X3{^bd(EVM?3*{a_Bx0x@MLg3WWFdM>>Pz_DJ=}jg+5A{ z3z%T|EjbVTEcw6i@nIoftli1Nd%iAdiWY;0^$?#?9s#(nIU_NpSOTcNydZ=e704=Y zG^4%`Y&#NHo&4uYV`gx?8yBWciU%21v1NpR73?@%gH-*g;tw$cLCG(jSs6*5%zcfI7=Dz9uIzO(v zY6HJt%4JApbq(Ys8Lg%9AeDYH!O443ol9@DlGU>fwgw znL@OzGWC7RD`Vg#>?V?`IG*gYno z?fGOt-XtxE-;rn82d)ZL!Jf~!lxM$UlT3~ANF3KMLSifP(`bi=m4&HL&#y1ht-2hB zA&0$v`};9n+IY+a=2e+%_0n#abYg%7B?)fMOIP{hlWVF(YQP>-+7lZET9Oa`UADg* z?ZfQyL~AnblJagy7#0<(7uoTM?Za2*@t%ySpq)7)ClUU!#?onI^P+$ZzX6-jkd*UV zDA7#;o9ySYq)`dyy#ACPb>0;v56)%YTLDGbov;M?^gy@BX-bO;9@0m%h{ISmEZlx^ zs0k^n!PAYcqM$@2?UX%j{b_o~s!1&Xiz)M37ydAbtvlrm?_rd?==m0>;>3rpY;EOV zVL=@JWKc&KDhwO1|4NKga&pbyqB&^Lf z;dj=%>qyi1MmGaKZu~MnY2jWxp!&g^qp*gQTC{%3N4JP0HWNPemq=AK4IGCn5gr@t zl|M&ne@MOB#RsQof=tUyviA1v1#`m$x>HWtw9z&Th@3nl!k5njJ{q=hh1qk? zAyZU!5_SZx$9Bl?FQNtJ9VeDMXn}-BoECp%FNnn47_tbcOR0UTgDe zjQ!?!BJ7+;XFP=d^M?e@Dg)wVM0QkLD(zVy@9x9I;yKs+;QInE1?{kLyXx%AHl zh!sdz`7GD%@yYIq_8Py~npqba5y}tU`1Qk7sp=^LO$)@_oHAzc=Ah>dEAwfBJYV~0 zuq*T)MOXnoo2uE+(Cx01KjI)vW$_=GO>{~DV&yVfB*I$TeqLYlV>+quq1EOen^_6g z-o!ZInRXV8i(M=B0S946>V2Zqdk}`Grz2-P&U~>R?)K#)kwVHv|9R!3!D38gw=pl7 z{lz>wk29hsRZ%M3a#N~WkOW2c!m^UK(6g62dw-wWjeN=cJb=$tv1xyVK5;7_@^jD| z_hjkuL~t&^6!nVRRC}n2Zt}Ec{zWWRj`n>H(6fOoynMxuu}&34fO~1RU~3Ie*s34% z=}BO6e(OiJ%Azq=7K=&wo%|sBVYinV32yYNvZ91~1!Jqnhcxqw?MIr$>~q|J?27(9Edx!{Q#N3F2siXLFnT7X#RVI5xx z@l4##r0`cg{-o4tu}WK;ckR>0GKZjVC0mlr(OBQ;P&pUtA1sM0%UJLH@^>jx2!hY$ z9oUw=>9Yxwi{3FO!y! zNfi|%YKa|8gv-UjfWy>XOgs?z-io3V(1M+YxmMmIH}1?{5A7LW?t&nT7=^ppCMEh* zb@whtPhhT7>C302H$06o`&xv0iJL!;(p(5w>YDf>hlWjItQuh_)bgjohh~)VKG^IA zmwa@Sn?7*4%&AgaLB-PbjVx6(Ws{9bZ}4=azJdyF$;BO{lfhyXqz=L>QXOEKv#@q} zejQyON)TCRlF65Pg}T2q@+%?BASVLf_Tk2|t*}5HA1?2xW4)N6=djOggo^`yJlcEr zsWuQK%dth~Ls4YE%3~TWfbzFshpyx;m;TRvV9jR!dV8Vov2LE zWfJ*-lWf|)Oo~HWgu2wRsbs2wVoxySj`Q9RrqNzl-KraXkw56eDWL)RCCQPGW<9Da z_C@D#z4@~b=ES!^;kmNDg7mYn{FD=4B`OsXza%U;JsO%VfbX#Sw z{jmlz!5>OvMsv-suL zp)idQH!YAi?j_A%8g9dV!@v4Yad_t|ET}Aj^ERj?9+y@o%^KO{`dcT$3108CR!KcxgdAeXC~*QLK~C+lh( z$q&=kbUP2$s2H6OUf6ES9R!DeV@#q-v{*-9x}PccM+YAV0=_$cVzc2aw>O@jX7x*ffHC6~0W8I6_dpp30{KQgg6Ut;V-G^Df;(}^?#?sK_)RCvy5 zr#t;=YoB#~|Gg=@IK++@EF`N`pxUY73j9QeX(bO{{q7%@0nyPG^qdUs3HPN8Z8NBs zIo?ah8X?lZhu{D)Vwe=X7nRF{b{a^-Iz4xL#PjUK2?#yfAq}I5S$zqneM@7w=^^DA z24pR|4*~V|f(T#i!4o_mt#TIYpQ@7=XiAUB-SSIqsKq876zX1_5W;{&sWiHS*tWhq@hwE03roRlrEo+EjI}2!8t7K27=Qxdo+whsL3Gh#Jx>Q=0n9s zhN`n;m_6Fr&$E_~Y7&cQpWT>rm(bmCKi>&`L(6_|crgGg-|u~UJDV)*CS*xICS*CB zLsMcW+LmxtR!jDVT{|a)ov#Si%9gZqN}Cs7>Ueii{Rb*X(!ZfCW~PH37iDK}z_D?z z%5OC)hjRc#TwCQ%toc$gQJD1zHH?-&#)~*woG?ru=TW8B!MA@wYY&kFVwQv+wzHwq z?xMyQ9YJ}o#`MYQN$Uh*TwSz&28AW_=<0Mj3)_{6oAV{}ikjt*_`5I#VPp0VJWx_M z1tEHSDn|MUp&1%@`Aix>%KTg%I~!#2Nd?-8?a7w4|(0){}Q~0ZB$W8 zOR?yhCnhAFlGm-O{u+3AxhW;>rcchRKV)lSWWgbJGtAAm%L2T==)^DSIIXYFOgA01 z@L-SH^VP79FJpATWqr#tB`E*j# ze{T9AO|a8r5$g=@=$Yi>8jm?v`O-pVa7Ct5>D?L8I*z|v0nEC6$yqTK^MI%j@Mh>v z|5=qoB_>@DCToA__{59fIN~oas?*TEaazZir;LBY-M+o;KYXJvV`4k^z*GgRL?2T1 z?*20Foqk{#O4)m7;*uU_auz{aw)xBUxo&b6ZS(w2T#}Z=PLvYfq9&V;SKlw>Ro1L3 z*RlHiL!}0Q)qL4}H+ADX(c6|>2SHVoI4okn@ae#MNCE$>Q}70*kZ;!=+~FJWremV{ z$n#LGSNGqFq}>m;B$y#(DZ2n)85U@BR~yl&ydTD><&ZkNx>Uv3q%zL|1Yj*sr`ZB? zYq2;gv?I*`8C3Gh>RSo9b|Q>W25)dbpY_mh_AUslCuy6>3Re--l}nh3H*^yxM!DW0 zS@w3LMg{dvG!l>p94AB+!?j}Lo@b?CJw-SB;OlW>jGc&Zax|do9`fbwXv;<_b_pah zWDRC?3u!J?%;J)HkFCfHy?YF-quZo|bo;Dl7!4@!zD|LP;OLV-v755l*TT1- z`Y4lkAw(y+xK(pm|Z&jh$XUnlvcHXA0V#TAy- zZGF3WvfqKaY|RTJYYNOgi*vs&U+Y!|9wWF?u+m2x@H0)ul%5*EOX+QdIZ=J_0!;=! zsh7=gK5hlFerBlhex@j0l-wG$_2oB3?TT6%)P&i+il~E4j~!p&lb>-CgWA&` zFl5yZW_~-5y616|ay8)-6^SEH&%D?V%1_)YC_G9HRBcsLRKzF|jTXl_(w*+}OlG$M z?XyT^K9u!IjXpJJQI^e2q$Jl1u+zAF04h3A+||Z+wZELo^FjL(Px<*B+->;;x6C^e zhjAgC!;RKJ(?`$@s8rUMDBcWvM8yNiCC;>gu9?O{+Ff13;}GRMsCS7I_Afn82@=_T z*DcbEMW)Tsjx{nVJz^of#SGC3+!-5!hbXViBsSx9UL-<`B1r;S^qtYhPd*X8LFM+F z{KGwKOa1L__+Lh%xJh3Hs0`PYwzA0TfsT(3=v}v%tFlyJo7}+?@4x#DcyKN%Valvc zew=YFYDD#tU8RUVkRo2RMu!j9`TEje6!#VP(#8hoQw_g%!N zc_Z-Q<=9e~0Q#x4$8ILx+5GfUpN)$PZ;QMCNI#X)1qHr=aZ;fnEB=?G|lnh*$ zf~i#tS1|{+{|Llf?U>uu?#eBOqv@ZuOg>Q&qMu?~(VUCM4B*m@^ziX&z88b=Il+ts zrFyXt2FO5wz%+Sr|HC6^6&crKIkTwC2C!rWm!AVGpDZSa;p$CCEL5LVlsdDZ$6z=v zftE${fY&=I8yzRdPfiBqlb!|BD63fe4az&bEuUlq)j!wRk47Yg!bbK6`KQmD*v@#; zt#5jRST+HJ{yCZ}VfvpQD@> zC(v^leM{SkO+gf`tEqOBVZxyNh_ptN_C_+(CIaoZTSux?dn33zY6g3S8i2oegYi<8 zhKAdiB_O(OV;?Z!Z?g_P-%;VgXQwNDn@C;NC)^0*nc+k5)HPnn$q>8`<-_if5Z2v^ zLba1kxl+ZxYU!a|=}so+D|oKCWKO*!$AKVm@8DlD05>!DP8l2je z-wIUz#I0Lp{VA_c7~?wjHH`i16SdySXv%9@NRpKlBk)5|knD^Z7?dWAIJuJ$m+}({4~rIJhZ0Vw_@39UqJ2@ z7FR~xH2pLy6u)(Qu~X%U7ab0=A6L81n<<2|O01|59!vXBs%GRit@Ui)A!^H%ycBm3 zQzmq}MgRUkKh7~PvW zL=i}{=VvA~yqc$uCVJUij=LeO^bY`o4M!^K;x&0$gF zEW(}9hY!oCE!x93)OT%Guv0^(G3i4Lp%3@W#8U%@f*|-yr$szODrC-UilU+HBzaSK zF%V1}Ki>26jOUog45wtwJ>VRy}g!#VmkfrZN4qoVlF1PcVOkmjJq;<<_fR02ziw*A~F{0J4mMrp>!bM<7!bK zRD~y7iX6-Y6ZkDu;yW`s!E@x#GHvjS5NMqzM>T|HSCvDSPMNph*MRT$Hm7T~kQA96 zb51a2aF1hANo`)GW)7&*^Dl#@;MhvRB%NYk`1{%J!y7|bxZg!lGQU;Db8e=FU!6|c6LRlBJYlrPpI>brD>zF5n|D0H;eEw4kFct|gG$CU(5ADoi78TyF z4X6#9asL87O)X!hX$DK86!~H@JAM{D`l)w%iDIlc30XnCJy3JY1b*?+G0+;uUa$}I zPgUnf74wa{cN5WXfH!h!Gz5wQy%}L)GjUh^Z>BD75#$AlU32N^$1dC;VtP&H`h(h% zrO$30e>ZKhP`w--#PjS&awWU5<5~5yHK>B2{(b7dq9XX5(BU6W+PLW10aE2qZg)QA zs;0hEmvVD_Y$~}Gp8>(ZmUpXAQ)OqGpMH&eg=bgv9$sBwT{xNh(9i4pB3G`>V0SPB z%oD0oL0Wr2(3*))?VJ*r4z%F2su31;%X|sFX$Ek)Tjm?rbT%l{$(SpMn7$S1KI0Nr~MzqFonndX1u?rg-eU@6)%LtVJpX7fqkZbOqN=?Y9%QR9h vF}`pDEW2i!~dfxm6BCf3u{r8hhJ~{;}3mZW1V^}=ji_dE=qD# literal 0 HcmV?d00001 diff --git a/docs/installation.md b/docs/installation.md deleted file mode 100644 index 8c2e829..0000000 --- a/docs/installation.md +++ /dev/null @@ -1,113 +0,0 @@ -# nf-core/proteomicslfq: Installation - -To start using the nf-core/proteomicslfq pipeline, follow the steps below: - - - -* [Install NextFlow](#install-nextflow) -* [Install the pipeline](#install-the-pipeline) - * [Automatic](#automatic) - * [Offline](#offline) - * [Development](#development) -* [Pipeline configuration](#pipeline-configuration) - * [Docker](#docker) - * [Singularity](#singularity) - * [Conda](#conda) - * [Configuration profiles](#configuration-profiles) -* [Reference genomes](#reference-genomes) - - -## Install NextFlow -Nextflow runs on most POSIX systems (Linux, Mac OSX etc). It can be installed by running the following commands: - -```bash -# Make sure that Java v8+ is installed: -java -version - -# Install Nextflow -curl -fsSL get.nextflow.io | bash - -# Add Nextflow binary to your PATH: -mv nextflow ~/bin/ -# OR system-wide installation: -# sudo mv nextflow /usr/local/bin -``` - -See [nextflow.io](https://www.nextflow.io/) for further instructions on how to install and configure Nextflow. - -## Install the pipeline - -### Automatic -This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub if `nf-core/proteomicslfq` is specified as the pipeline name. - -### Offline -The above method requires an internet connection so that Nextflow can download the pipeline files. If you're running on a system that has no internet connection, you'll need to download and transfer the pipeline files manually: - -```bash -wget https://github.com/nf-core/proteomicslfq/archive/master.zip -mkdir -p ~/my-pipelines/nf-core/ -unzip master.zip -d ~/my-pipelines/nf-core/ -cd ~/my_data/ -nextflow run ~/my-pipelines/nf-core/proteomicslfq-master -``` - -To stop nextflow from looking for updates online, you can tell it to run in offline mode by specifying the following environment variable in your ~/.bashrc file: - -```bash -export NXF_OFFLINE='TRUE' -``` - -### Development - -If you would like to make changes to the pipeline, it's best to make a fork on GitHub and then clone the files. Once cloned you can run the pipeline directly as above. - - -## Pipeline configuration -By default, the pipeline loads a basic server configuration [`conf/base.config`](../conf/base.config) -This uses a number of sensible defaults for process requirements and is suitable for running -on a simple (if powerful!) local server. - -Be warned of two important points about this default configuration: - -1. The default profile uses the `local` executor - * All jobs are run in the login session. If you're using a simple server, this may be fine. If you're using a compute cluster, this is bad as all jobs will run on the head node. - * See the [nextflow docs](https://www.nextflow.io/docs/latest/executor.html) for information about running with other hardware backends. Most job scheduler systems are natively supported. -2. Nextflow will expect all software to be installed and available on the `PATH` - * It's expected to use an additional config profile for docker, singularity or conda support. See below. - -### Docker -First, install docker on your system: [Docker Installation Instructions](https://docs.docker.com/engine/installation/) - -Then, running the pipeline with the option `-profile docker` tells Nextflow to enable Docker for this run. An image containing all of the software requirements will be automatically fetched and used from dockerhub (https://hub.docker.com/r/nfcore/proteomicslfq). - -### Singularity -If you're not able to use Docker then [Singularity](http://singularity.lbl.gov/) is a great alternative. -The process is very similar: running the pipeline with the option `-profile singularity` tells Nextflow to enable singularity for this run. An image containing all of the software requirements will be automatically fetched and used from singularity hub. - -If running offline with Singularity, you'll need to download and transfer the Singularity image first: - -```bash -singularity pull --name nf-core-proteomicslfq.simg nf-core/proteomicslfq -``` - -Once transferred, use `-with-singularity` and specify the path to the image file: - -```bash -nextflow run /path/to/nf-core-proteomicslfq -with-singularity nf-core-proteomicslfq.simg -``` - -Remember to pull updated versions of the singularity image if you update the pipeline. - -### Conda -If you're not able to use Docker _or_ Singularity, you can instead use conda to manage the software requirements. -This is slower and less reproducible than the above, but is still better than having to install all requirements yourself! -The pipeline ships with a conda environment file and nextflow has built-in support for this. -To use it first ensure that you have conda installed (we recommend [miniconda](https://conda.io/miniconda.html)), then follow the same pattern as above and use the flag `-profile conda` - -### Configuration profiles - -See [`docs/configuration/adding_your_own.md`](configuration/adding_your_own.md) - -## Reference genomes - -See [`docs/configuration/reference_genomes.md`](configuration/reference_genomes.md) diff --git a/docs/output.md b/docs/output.md index be99ac1..cd5f280 100644 --- a/docs/output.md +++ b/docs/output.md @@ -2,40 +2,56 @@ This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. + ## Pipeline overview + The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* [FastQC](#fastqc) - read quality control -* [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline +* [FastQC](#fastqc) - Read quality control +* [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline +* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution ## FastQC -[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences. -For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. -> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the `trim_galore` directory. +For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). -**Output directory: `results/fastqc`** +**Output files:** -* `sample_fastqc.html` - * FastQC report, containing quality metrics for your untrimmed raw fastq files -* `zips/sample_fastqc.zip` - * zip file containing the FastQC report, tab-delimited data file and plot images +* `fastqc/` + * `*_fastqc.html`: FastQC report containing quality metrics for your untrimmed raw fastq files. +* `fastqc/zips/` + * `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. +> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. ## MultiQC -[MultiQC](http://multiqc.info) is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory. -The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability. +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarizing all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. + +The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. + +For more information about how to use MultiQC reports, see [https://multiqc.info](https://multiqc.info). + +**Output files:** + +* `multiqc/` + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. + +## Pipeline information -**Output directory: `results/multiqc`** +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. -* `Project_multiqc_report.html` - * MultiQC report - a standalone HTML file that can be viewed in your web browser -* `Project_multiqc_data/` - * Directory containing parsed statistics from the different tools used in the pipeline +**Output files:** -For more information about how to use MultiQC reports, see http://multiqc.info +* `pipeline_info/` + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. + * Documentation for interpretation of results in HTML format: `results_description.html`. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md deleted file mode 100644 index b534072..0000000 --- a/docs/troubleshooting.md +++ /dev/null @@ -1,30 +0,0 @@ -# nf-core/proteomicslfq: Troubleshooting - - - -## Input files not found - -If only no file, only one input file , or only read one and not read two is picked up then something is wrong with your input file declaration - -1. The path must be enclosed in quotes (`'` or `"`) -2. The path must have at least one `*` wildcard character. This is even if you are only running one paired end sample. -3. When using the pipeline with paired end data, the path must use `{1,2}` or `{R1,R2}` notation to specify read pairs. -4. If you are running Single end data make sure to specify `--singleEnd` - -If the pipeline can't find your files then you will get the following error - -```bash -ERROR ~ Cannot find any reads matching: *{1,2}.fastq.gz -``` - -Note that if your sample name is "messy" then you have to be very particular with your glob specification. A file name like `L1-1-D-2h_S1_L002_R1_001.fastq.gz` can be difficult enough for a human to read. Specifying `*{1,2}*.gz` wont work give you what you want Whilst `*{R1,R2}*.gz` will. - - -## Data organization -The pipeline can't take a list of multiple input files - it takes a glob expression. If your input files are scattered in different paths then we recommend that you generate a directory with symlinked files. If running in paired end mode please make sure that your files are sensibly named so that they can be properly paired. See the previous point. - -## Extra resources and getting help -If you still have an issue with running the pipeline then feel free to contact us. -Have a look at the [pipeline website](https://github.com/nf-core/proteomicslfq) to find out how. - -If you have problems that are related to Nextflow and not our pipeline then check out the [Nextflow gitter channel](https://gitter.im/nextflow-io/nextflow) or the [google group](https://groups.google.com/forum/#!forum/nextflow). diff --git a/docs/usage.md b/docs/usage.md index 39ef37c..e0eea66 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,61 +1,15 @@ # nf-core/proteomicslfq: Usage -## Table of contents - - - -* [Table of contents](#table-of-contents) -* [Introduction](#introduction) -* [Running the pipeline](#running-the-pipeline) - * [Updating the pipeline](#updating-the-pipeline) - * [Reproducibility](#reproducibility) -* [Main arguments](#main-arguments) - * [`-profile`](#-profile) - * [`--reads`](#--reads) - * [`--singleEnd`](#--singleend) -* [Reference genomes](#reference-genomes) - * [`--genome` (using iGenomes)](#--genome-using-igenomes) - * [`--fasta`](#--fasta) - * [`--igenomesIgnore`](#--igenomesignore) -* [Job resources](#job-resources) - * [Automatic resubmission](#automatic-resubmission) - * [Custom resource requests](#custom-resource-requests) -* [AWS Batch specific parameters](#aws-batch-specific-parameters) - * [`--awsqueue`](#--awsqueue) - * [`--awsregion`](#--awsregion) -* [Other command line parameters](#other-command-line-parameters) - * [`--outdir`](#--outdir) - * [`--email`](#--email) - * [`-name`](#-name) - * [`-resume`](#-resume) - * [`-c`](#-c) - * [`--custom_config_version`](#--custom_config_version) - * [`--custom_config_base`](#--custom_config_base) - * [`--max_memory`](#--max_memory) - * [`--max_time`](#--max_time) - * [`--max_cpus`](#--max_cpus) - * [`--plaintext_email`](#--plaintext_email) - * [`--monochrome_logs`](#--monochrome_logs) - * [`--multiqc_config`](#--multiqc_config) - - - ## Introduction -Nextflow handles job submissions on SLURM or other environments, and supervises running the jobs. Thus the Nextflow process must run until the pipeline is finished. We recommend that you put the process running in the background through `screen` / `tmux` or similar tool. Alternatively you can run nextflow within a cluster job submitted your job scheduler. -It is recommended to limit the Nextflow Java virtual machines memory. We recommend adding the following line to your environment (typically in `~/.bashrc` or `~./bash_profile`): - -```bash -NXF_OPTS='-Xms1g -Xmx4g' -``` - - + ## Running the pipeline + The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker +nextflow run nf-core/proteomicslfq --input '*_R{1,2}.fastq.gz' -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -70,6 +24,7 @@ results # Finished results (configurable, see below) ``` ### Updating the pipeline + When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash @@ -77,206 +32,90 @@ nextflow pull nf-core/proteomicslfq ``` ### Reproducibility + It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/proteomicslfq releases page](https://github.com/nf-core/proteomicslfq/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. +## Core Nextflow arguments -## Main arguments +> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). ### `-profile` -Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. Note that multiple profiles can be loaded, for example: `-profile docker` - the order of arguments is important! - -If `-profile` is not specified at all the pipeline will be run locally and expects all software to be installed and available on the `PATH`. - -* `awsbatch` - * A generic configuration profile to be used with AWS Batch. -* `conda` - * A generic configuration profile to be used with [conda](https://conda.io/docs/) - * Pulls most software from [Bioconda](https://bioconda.github.io/) -* `docker` - * A generic configuration profile to be used with [Docker](http://docker.com/) - * Pulls software from dockerhub: [`nfcore/proteomicslfq`](http://hub.docker.com/r/nfcore/proteomicslfq/) -* `singularity` - * A generic configuration profile to be used with [Singularity](http://singularity.lbl.gov/) - * Pulls software from DockerHub -* `test` - * A profile with a complete configuration for automated testing - * Includes links to test data so needs no other parameters - - - -### `--reads` -Use this to specify the location of your input FastQ files. For example: -```bash ---reads 'path/to/data/sample_*_{1,2}.fastq' -``` +Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Please note the following requirements: +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Conda) - see below. -1. The path must be enclosed in quotes -2. The path must have at least one `*` wildcard character -3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs. +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` - -### `--singleEnd` -By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--singleEnd` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--reads`. For example: - -```bash ---singleEnd --reads '*.fastq' -``` +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). -It is not possible to run a mixture of single-end and paired-end files in one run. +Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! +They are loaded in sequence, so later profiles can overwrite earlier profiles. +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -## Reference genomes +* `docker` + * A generic configuration profile to be used with [Docker](https://docker.com/) + * Pulls software from Docker Hub: [`nfcore/proteomicslfq`](https://hub.docker.com/r/nfcore/proteomicslfq/) +* `singularity` + * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) + * Pulls software from Docker Hub: [`nfcore/proteomicslfq`](https://hub.docker.com/r/nfcore/proteomicslfq/) +* `conda` + * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker or Singularity. + * A generic configuration profile to be used with [Conda](https://conda.io/docs/) + * Pulls most software from [Bioconda](https://bioconda.github.io/) +* `test` + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters -The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource. +### `-resume` -### `--genome` (using iGenomes) -There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the `--genome` flag. +Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. -You can find the keys to specify the genomes in the [iGenomes config file](../conf/igenomes.config). Common genomes that are supported are: +You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. -* Human - * `--genome GRCh37` -* Mouse - * `--genome GRCm38` -* _Drosophila_ - * `--genome BDGP6` -* _S. cerevisiae_ - * `--genome 'R64-1-1'` +### `-c` -> There are numerous others - check the config file for more. +Specify the path to a specific config file (this is a core NextFlow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for instructions on where to save such a file. +#### Custom resource requests -The syntax for this reference configuration is as follows: +Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. - +Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config: ```nextflow -params { - genomes { - 'GRCh37' { - fasta = '' // Used if no star index given - } - // Any number of additional genomes, key is used with --genome +process { + withName: star { + memory = 32.GB } } ``` - -### `--fasta` -If you prefer, you can specify the full path to your reference genome when you run the pipeline: - -```bash ---fasta '[path to Fasta reference]' -``` - -### `--igenomesIgnore` -Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`. - -## Job resources -### Automatic resubmission -Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. - -### Custom resource requests -Wherever process-specific requirements are set in the pipeline, the default value can be changed by creating a custom config file. See the files hosted at [`nf-core/configs`](https://github.com/nf-core/configs/tree/master/conf) for examples. +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition below). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. -If you have any questions or issues please send us a message on [`Slack`](https://nf-core-invite.herokuapp.com/). - -## AWS Batch specific parameters -Running the pipeline on AWS Batch requires a couple of specific parameters to be set according to your AWS Batch configuration. Please use the `-awsbatch` profile and then specify all of the following parameters. -### `--awsqueue` -The JobQueue that you intend to use on AWS Batch. -### `--awsregion` -The AWS region to run your job in. Default is set to `eu-west-1` but can be adjusted to your needs. - -Please make sure to also set the `-w/--work-dir` and `--outdir` parameters to a S3 storage bucket of your choice - you'll get an error message notifying you if you didn't. - -## Other command line parameters - - - -### `--outdir` -The output directory where the results will be saved. - -### `--email` -Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run. - -### `-name` -Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. - -This is used in the MultiQC report (if not default) and in the summary HTML / e-mail (always). +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -**NB:** Single hyphen (core Nextflow option) +### Running in the background -### `-resume` -Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. +Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. -You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. +The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file. -**NB:** Single hyphen (core Nextflow option) - -### `-c` -Specify the path to a specific config file (this is a core NextFlow command). +Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. +Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -**NB:** Single hyphen (core Nextflow option) - -Note - you can use this to override pipeline defaults. - -### `--custom_config_version` -Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default is set to `master`. - -```bash -## Download and use config file with following git commid id ---custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96 -``` +#### Nextflow memory requirements -### `--custom_config_base` -If you're running offline, nextflow will not be able to fetch the institutional config files -from the internet. If you don't need them, then this is not a problem. If you do need them, -you should download the files from the repo and tell nextflow where to find them with the -`custom_config_base` option. For example: +In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. +We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): ```bash -## Download and unzip the config files -cd /path/to/my/configs -wget https://github.com/nf-core/configs/archive/master.zip -unzip master.zip - -## Run the pipeline -cd /path/to/my/data -nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/ +NXF_OPTS='-Xms1g -Xmx4g' ``` - -> Note that the nf-core/tools helper package has a `download` command to download all required pipeline -> files + singularity containers + institutional configs in one go for you, to make this process easier. - -### `--max_memory` -Use to set a top-limit for the default memory requirement for each process. -Should be a string in the format integer-unit. eg. `--max_memory '8.GB'` - -### `--max_time` -Use to set a top-limit for the default time requirement for each process. -Should be a string in the format integer-unit. eg. `--max_time '2.h'` - -### `--max_cpus` -Use to set a top-limit for the default CPU requirement for each process. -Should be a string in the format integer-unit. eg. `--max_cpus 1` - -### `--plaintext_email` -Set to receive plain-text e-mails instead of HTML formatted. - -### `--monochrome_logs` -Set to disable colourful command line output and live life in monochrome. - -### `--multiqc_config` -Specify a path to a custom MultiQC configuration file. diff --git a/environment.yml b/environment.yml index 6b6aac1..a35b0d6 100644 --- a/environment.yml +++ b/environment.yml @@ -6,6 +6,10 @@ channels: - bioconda - defaults dependencies: + - conda-forge::python=3.7.3 + - conda-forge::markdown=3.1.1 + - conda-forge::pymdown-extensions=6.0 + - conda-forge::pygments=2.5.2 # TODO nf-core: Add required software dependencies here - - fastqc=0.11.8 - - multiqc=1.6 + - bioconda::fastqc=0.11.8 + - bioconda::multiqc=1.7 diff --git a/main.nf b/main.nf index 4550fb7..b06326b 100644 --- a/main.nf +++ b/main.nf @@ -9,7 +9,6 @@ ---------------------------------------------------------------------------------------- */ - def helpMessage() { // TODO nf-core: Add to this help message with new command line parameters log.info nfcoreHeader() @@ -19,42 +18,45 @@ def helpMessage() { The typical command for running the pipeline is as follows: - nextflow run nf-core/proteomicslfq --reads '*_R{1,2}.fastq.gz' -profile docker + nextflow run nf-core/proteomicslfq --input '*_R{1,2}.fastq.gz' -profile docker Mandatory arguments: - --reads Path to input data (must be surrounded with quotes) - -profile Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, awsbatch, test and more. + --input [file] Path to input data (must be surrounded with quotes) + -profile [str] Configuration profile to use. Can use multiple (comma separated) + Available: conda, docker, singularity, test, awsbatch, and more Options: - --genome Name of iGenomes reference - --singleEnd Specifies that the input is single end reads + --genome [str] Name of iGenomes reference + --single_end [bool] Specifies that the input is single-end reads - References If not specified in the configuration file or you wish to overwrite any of the references. - --fasta Path to Fasta reference + References If not specified in the configuration file or you wish to overwrite any of the references + --fasta [file] Path to fasta reference Other options: - --outdir The output directory where the results will be saved - --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits - --maxMultiqcEmailFileSize Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) - -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. + --outdir [file] The output directory where the results will be saved + --publish_dir_mode [str] Mode for publishing results in the output directory. Available: symlink, rellink, link, copy, copyNoFollow, move (Default: copy) + --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits + --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful + --max_multiqc_email_size [str] Threshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) + -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic AWSBatch options: - --awsqueue The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion The AWS Region for your AWS Batch job to run on + --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch + --awsregion [str] The AWS Region for your AWS Batch job to run on + --awscli [str] Path to the AWS CLI tool """.stripIndent() } -/* - * SET UP CONFIGURATION VARIABLES - */ - -// Show help emssage -if (params.help){ +// Show help message +if (params.help) { helpMessage() exit 0 } +/* + * SET UP CONFIGURATION VARIABLES + */ + // Check if genome exists in the config file if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" @@ -62,101 +64,104 @@ if (params.genomes && params.genome && !params.genomes.containsKey(params.genome // TODO nf-core: Add any reference files that are needed // Configurable reference genomes -fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false -if ( params.fasta ){ - fasta = file(params.fasta) - if( !fasta.exists() ) exit 1, "Fasta file not found: ${params.fasta}" -} // // NOTE - THIS IS NOT USED IN THIS PIPELINE, EXAMPLE ONLY -// If you want to use the above in a process, define the following: +// If you want to use the channel below in a process, define the following: // input: -// file fasta from fasta +// file fasta from ch_fasta // - +params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false +if (params.fasta) { ch_fasta = file(params.fasta, checkIfExists: true) } // Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name +// this has the bonus effect of catching both -name and --name custom_runName = params.name -if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){ - custom_runName = workflow.runName +if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + custom_runName = workflow.runName } - -if( workflow.profile == 'awsbatch') { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" - if (!workflow.workDir.startsWith('s3') || !params.outdir.startsWith('s3')) exit 1, "Specify S3 URLs for workDir and outdir parameters on AWSBatch!" - // Check workDir/outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!workflow.workDir.startsWith('s3:') || !params.outdir.startsWith('s3:')) exit 1, "Workdir or Outdir not on S3 - specify S3 Buckets for each to run on AWSBatch!" +// Check AWS batch settings +if (workflow.profile.contains('awsbatch')) { + // AWSBatch sanity checking + if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + // related: https://github.com/nextflow-io/nextflow/issues/813 + if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + // Prevent trace files to be stored on S3 since S3 does not support rolling files. + if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." } // Stage config files -ch_multiqc_config = Channel.fromPath(params.multiqc_config) -ch_output_docs = Channel.fromPath("$baseDir/docs/output.md") +ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() +ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) /* * Create a channel for input read files */ -if(params.readPaths){ - if(params.singleEnd){ +if (params.input_paths) { + if (params.single_end) { Channel - .from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0])]] } - .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } - .into { read_files_fastqc; read_files_trimming } + .from(params.input_paths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.input_paths was empty - no input files supplied" } + .into { ch_read_files_fastqc; ch_read_files_trimming } } else { Channel - .from(params.readPaths) - .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } - .ifEmpty { exit 1, "params.readPaths was empty - no input files supplied" } - .into { read_files_fastqc; read_files_trimming } + .from(params.input_paths) + .map { row -> [ row[0], [ file(row[1][0], checkIfExists: true), file(row[1][1], checkIfExists: true) ] ] } + .ifEmpty { exit 1, "params.input_paths was empty - no input files supplied" } + .into { ch_read_files_fastqc; ch_read_files_trimming } } } else { Channel - .fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 ) - .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --singleEnd on the command line." } - .into { read_files_fastqc; read_files_trimming } + .fromFilePairs(params.input, size: params.single_end ? 1 : 2) + .ifEmpty { exit 1, "Cannot find any reads matching: ${params.input}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --single_end on the command line." } + .into { ch_read_files_fastqc; ch_read_files_trimming } } - // Header log info log.info nfcoreHeader() def summary = [:] +if (workflow.revision) summary['Pipeline Release'] = workflow.revision summary['Run Name'] = custom_runName ?: workflow.runName // TODO nf-core: Report custom parameters here -summary['Reads'] = params.reads +summary['Reads'] = params.input summary['Fasta Ref'] = params.fasta -summary['Data Type'] = params.singleEnd ? 'Single-End' : 'Paired-End' +summary['Data Type'] = params.single_end ? 'Single-End' : 'Paired-End' summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output dir'] = params.outdir summary['Launch dir'] = workflow.launchDir summary['Working dir'] = workflow.workDir summary['Script dir'] = workflow.projectDir summary['User'] = workflow.userName -if(workflow.profile == 'awsbatch'){ - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue +if (workflow.profile.contains('awsbatch')) { + summary['AWS Region'] = params.awsregion + summary['AWS Queue'] = params.awsqueue + summary['AWS CLI'] = params.awscli } summary['Config Profile'] = workflow.profile -if(params.config_profile_description) summary['Config Description'] = params.config_profile_description -if(params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact -if(params.config_profile_url) summary['Config URL'] = params.config_profile_url -if(params.email) { - summary['E-mail Address'] = params.email - summary['MultiQC maxsize'] = params.maxMultiqcEmailFileSize +if (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description +if (params.config_profile_contact) summary['Config Profile Contact'] = params.config_profile_contact +if (params.config_profile_url) summary['Config Profile URL'] = params.config_profile_url +summary['Config Files'] = workflow.configFiles.join(', ') +if (params.email || params.email_on_fail) { + summary['E-mail Address'] = params.email + summary['E-mail on failure'] = params.email_on_fail + summary['MultiQC maxsize'] = params.max_multiqc_email_size } log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "\033[2m----------------------------------------------------\033[0m" +log.info "-\033[2m--------------------------------------------------\033[0m-" // Check the hostnames against configured profiles checkHostname() -def create_workflow_summary(summary) { - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ +Channel.from(summary.collect{ [it.key, it.value] }) + .map { k,v -> "
$k
${v ?: 'N/A'}
" } + .reduce { a, b -> return [a, b].join("\n ") } + .map { x -> """ id: 'nf-core-proteomicslfq-summary' description: " - this information is collected when the pipeline is started." section_name: 'nf-core/proteomicslfq Workflow Summary' @@ -164,21 +169,24 @@ def create_workflow_summary(summary) { plot_type: 'html' data: |
-${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} + $x
- """.stripIndent() - - return yaml_file -} - + """.stripIndent() } + .set { ch_workflow_summary } /* * Parse software version numbers */ process get_software_versions { + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, + saveAs: { filename -> + if (filename.indexOf(".csv") > 0) filename + else null + } output: - file 'software_versions_mqc.yaml' into software_versions_yaml + file 'software_versions_mqc.yaml' into ch_software_versions_yaml + file "software_versions.csv" script: // TODO nf-core: Get all tools to print their version number here @@ -187,82 +195,81 @@ process get_software_versions { echo $workflow.nextflow.version > v_nextflow.txt fastqc --version > v_fastqc.txt multiqc --version > v_multiqc.txt - scrape_software_versions.py > software_versions_mqc.yaml + scrape_software_versions.py &> software_versions_mqc.yaml """ } - - /* * STEP 1 - FastQC */ process fastqc { tag "$name" - publishDir "${params.outdir}/fastqc", mode: 'copy', - saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"} + label 'process_medium' + publishDir "${params.outdir}/fastqc", mode: params.publish_dir_mode, + saveAs: { filename -> + filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename" + } input: - set val(name), file(reads) from read_files_fastqc + set val(name), file(reads) from ch_read_files_fastqc output: - file "*_fastqc.{zip,html}" into fastqc_results + file "*_fastqc.{zip,html}" into ch_fastqc_results script: """ - fastqc -q $reads + fastqc --quiet --threads $task.cpus $reads """ } - - /* * STEP 2 - MultiQC */ process multiqc { - publishDir "${params.outdir}/MultiQC", mode: 'copy' + publishDir "${params.outdir}/MultiQC", mode: params.publish_dir_mode input: - file multiqc_config from ch_multiqc_config + file (multiqc_config) from ch_multiqc_config + file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) // TODO nf-core: Add in log files from your new processes for MultiQC to find! - file ('fastqc/*') from fastqc_results.collect().ifEmpty([]) - file ('software_versions/*') from software_versions_yaml - file workflow_summary from create_workflow_summary(summary) + file ('fastqc/*') from ch_fastqc_results.collect().ifEmpty([]) + file ('software_versions/*') from ch_software_versions_yaml.collect() + file workflow_summary from ch_workflow_summary.collectFile(name: "workflow_summary_mqc.yaml") output: - file "*multiqc_report.html" into multiqc_report + file "*multiqc_report.html" into ch_multiqc_report file "*_data" + file "multiqc_plots" script: rtitle = custom_runName ? "--title \"$custom_runName\"" : '' rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' // TODO nf-core: Specify which MultiQC modules to use with -m for a faster run time """ - multiqc -f $rtitle $rfilename --config $multiqc_config . + multiqc -f $rtitle $rfilename $custom_config_file . """ } - - /* * STEP 3 - Output Description HTML */ process output_documentation { - publishDir "${params.outdir}/pipeline_info", mode: 'copy' + publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode input: file output_docs from ch_output_docs + file images from ch_output_docs_images output: file "results_description.html" script: """ - markdown_to_html.r $output_docs results_description.html + markdown_to_html.py $output_docs -o results_description.html """ } - - /* * Completion e-mail notification */ @@ -270,8 +277,8 @@ workflow.onComplete { // Set up the e-mail variables def subject = "[nf-core/proteomicslfq] Successful: $workflow.runName" - if(!workflow.success){ - subject = "[nf-core/proteomicslfq] FAILED: $workflow.runName" + if (!workflow.success) { + subject = "[nf-core/proteomicslfq] FAILED: $workflow.runName" } def email_fields = [:] email_fields['version'] = workflow.manifest.version @@ -289,21 +296,20 @@ workflow.onComplete { email_fields['summary']['Date Completed'] = workflow.complete email_fields['summary']['Pipeline script file path'] = workflow.scriptFile email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId - if(workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository - if(workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId - if(workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - if(workflow.container) email_fields['summary']['Docker image'] = workflow.container + if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision email_fields['summary']['Nextflow Version'] = workflow.nextflow.version email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - // TODO nf-core: If not using MultiQC, strip out this code (including params.maxMultiqcEmailFileSize) + // TODO nf-core: If not using MultiQC, strip out this code (including params.max_multiqc_email_size) // On success try attach the multiqc report def mqc_report = null try { if (workflow.success) { - mqc_report = multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList){ + mqc_report = ch_multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList) { log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'multiqc', will use only one" mqc_report = mqc_report[0] } @@ -312,6 +318,12 @@ workflow.onComplete { log.warn "[nf-core/proteomicslfq] Could not attach MultiQC report to summary email" } + // Check if we are only sending emails on failure + email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + // Render the TXT template def engine = new groovy.text.GStringTemplateEngine() def tf = new File("$baseDir/assets/email_template.txt") @@ -324,82 +336,93 @@ workflow.onComplete { def email_html = html_template.toString() // Render the sendmail template - def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.maxMultiqcEmailFileSize.toBytes() ] + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] def sf = new File("$baseDir/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - if (params.email) { + if (email_address) { try { - if( params.plaintext_email ){ throw GroovyException('Send plaintext e-mail, not HTML') } - // Try to send HTML e-mail using sendmail - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "[nf-core/proteomicslfq] Sent summary e-mail to $params.email (sendmail)" + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (sendmail)" } catch (all) { - // Catch failures and try with plaintext - [ 'mail', '-s', subject, params.email ].execute() << email_txt - log.info "[nf-core/proteomicslfq] Sent summary e-mail to $params.email (mail)" + // Catch failures and try with plaintext + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html + log.info "[nf-core/proteomicslfq] Sent summary e-mail to $email_address (mail)" } } // Write summary e-mail HTML to a file - def output_d = new File( "${params.outdir}/pipeline_info/" ) - if( !output_d.exists() ) { - output_d.mkdirs() + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() } - def output_hf = new File( output_d, "pipeline_report.html" ) + def output_hf = new File(output_d, "pipeline_report.html") output_hf.withWriter { w -> w << email_html } - def output_tf = new File( output_d, "pipeline_report.txt" ) + def output_tf = new File(output_d, "pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_red = params.monochrome_logs ? '' : "\033[0;31m"; - if(workflow.success){ - log.info "${c_purple}[nf-core/proteomicslfq]${c_green} Pipeline complete${c_reset}" + c_reset = params.monochrome_logs ? '' : "\033[0m"; + + if (workflow.stats.ignoredCount > 0 && workflow.success) { + log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" + log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" + log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" + } + + if (workflow.success) { + log.info "-${c_purple}[nf-core/proteomicslfq]${c_green} Pipeline completed successfully${c_reset}-" } else { checkHostname() - log.info "${c_purple}[nf-core/proteomicslfq]${c_red} Pipeline completed with errors${c_reset}" + log.info "-${c_purple}[nf-core/proteomicslfq]${c_red} Pipeline completed with errors${c_reset}-" } } -def nfcoreHeader(){ +def nfcoreHeader() { // Log colors ANSI codes - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; + c_dim = params.monochrome_logs ? '' : "\033[2m"; + c_green = params.monochrome_logs ? '' : "\033[0;32m"; + c_purple = params.monochrome_logs ? '' : "\033[0;35m"; + c_reset = params.monochrome_logs ? '' : "\033[0m"; c_white = params.monochrome_logs ? '' : "\033[0;37m"; + c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - return """ ${c_dim}----------------------------------------------------${c_reset} + return """ -${c_dim}--------------------------------------------------${c_reset}- ${c_green},--.${c_black}/${c_green},-.${c_reset} ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} ${c_green}`._,._,\'${c_reset} ${c_purple} nf-core/proteomicslfq v${workflow.manifest.version}${c_reset} - ${c_dim}----------------------------------------------------${c_reset} + -${c_dim}--------------------------------------------------${c_reset}- """.stripIndent() } -def checkHostname(){ +def checkHostname() { def c_reset = params.monochrome_logs ? '' : "\033[0m" def c_white = params.monochrome_logs ? '' : "\033[0;37m" def c_red = params.monochrome_logs ? '' : "\033[1;91m" def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if(params.hostnames){ + if (params.hostnames) { def hostname = "hostname".execute().text.trim() params.hostnames.each { prof, hnames -> hnames.each { hname -> - if(hostname.contains(hname) && !workflow.profile.contains(prof)){ + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { log.error "====================================================\n" + " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + diff --git a/nextflow.config b/nextflow.config index bd585a6..898f213 100644 --- a/nextflow.config +++ b/nextflow.config @@ -10,30 +10,36 @@ params { // Workflow flags // TODO nf-core: Specify your pipeline's command line flags - reads = "data/*{1,2}.fastq.gz" - singleEnd = false + genome = false + input = "data/*{1,2}.fastq.gz" + single_end = false outdir = './results' + publish_dir_mode = 'copy' // Boilerplate options name = false - multiqc_config = "$baseDir/conf/multiqc_config.yaml" + multiqc_config = false email = false - maxMultiqcEmailFileSize = 25.MB + email_on_fail = false + max_multiqc_email_size = 25.MB plaintext_email = false monochrome_logs = false help = false - igenomes_base = "./iGenomes" + igenomes_base = 's3://ngi-igenomes/igenomes/' tracedir = "${params.outdir}/pipeline_info" - clusterOptions = false - awsqueue = false - awsregion = 'eu-west-1' - igenomesIgnore = false + igenomes_ignore = false custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" hostnames = false config_profile_description = false config_profile_contact = false config_profile_url = false + + // Defaults only, expecting to be overwritten + max_memory = 128.GB + max_cpus = 16 + max_time = 240.h + } // Container slug. Stable releases should specify release tag! @@ -51,19 +57,35 @@ try { } profiles { - awsbatch { includeConfig 'conf/awsbatch.config' } conda { process.conda = "$baseDir/environment.yml" } debug { process.beforeScript = 'echo $HOSTNAME' } - docker { docker.enabled = true } - singularity { singularity.enabled = true } + docker { + docker.enabled = true + // Avoid this error: + // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. + // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 + // once this is established and works well, nextflow might implement this behavior as new default. + docker.runOptions = '-u \$(id -u):\$(id -g)' + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + } test { includeConfig 'conf/test.config' } } // Load igenomes.config if required -if(!params.igenomesIgnore){ +if (!params.igenomes_ignore) { includeConfig 'conf/igenomes.config' } +// Export these variables to prevent local Python/R libraries from conflicting with those in the container +env { + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" +} + // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] @@ -86,20 +108,20 @@ dag { manifest { name = 'nf-core/proteomicslfq' - author = 'The Heumos Brothers - Simon and Lukas' + author = 'Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg' homePage = 'https://github.com/nf-core/proteomicslfq' description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' mainScript = 'main.nf' - nextflowVersion = '>=0.32.0' + nextflowVersion = '>=19.10.0' version = '1.0dev' } // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if(type == 'memory'){ + if (type == 'memory') { try { - if(obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) return params.max_memory as nextflow.util.MemoryUnit else return obj @@ -107,9 +129,9 @@ def check_max(obj, type) { println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" return obj } - } else if(type == 'time'){ + } else if (type == 'time') { try { - if(obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) return params.max_time as nextflow.util.Duration else return obj @@ -117,7 +139,7 @@ def check_max(obj, type) { println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" return obj } - } else if(type == 'cpus'){ + } else if (type == 'cpus') { try { return Math.min( obj, params.max_cpus as int ) } catch (all) { diff --git a/nextflow_schema.json b/nextflow_schema.json new file mode 100644 index 0000000..25df34d --- /dev/null +++ b/nextflow_schema.json @@ -0,0 +1,259 @@ +{ + "$schema": "https://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/proteomicslfq/master/nextflow_schema.json", + "title": "nf-core/proteomicslfq pipeline parameters", + "description": "Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.", + "type": "object", + "definitions": { + "input_output_options": { + "title": "Input/output options", + "type": "object", + "fa_icon": "fas fa-terminal", + "description": "Define where the pipeline should find input data and save output data.", + "required": [ + "input" + ], + "properties": { + "input": { + "type": "string", + "fa_icon": "fas fa-dna", + "description": "Input FastQ files.", + "help_text": "Use this to specify the location of your input FastQ files. For example:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character\n3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.\n\nIf left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`" + }, + "single_end": { + "type": "boolean", + "description": "Specifies that the input is single-end reads.", + "fa_icon": "fas fa-align-center", + "help_text": "By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--input`. For example:\n\n```bash\n--single_end --input '*.fastq'\n```\n\nIt is not possible to run a mixture of single-end and paired-end files in one run." + }, + "outdir": { + "type": "string", + "description": "The output directory where the results will be saved.", + "default": "./results", + "fa_icon": "fas fa-folder-open" + }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + } + } + }, + "reference_genome_options": { + "title": "Reference genome options", + "type": "object", + "fa_icon": "fas fa-dna", + "description": "Options for the reference genome indices used to align reads.", + "properties": { + "genome": { + "type": "string", + "description": "Name of iGenomes reference.", + "fa_icon": "fas fa-book", + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + }, + "fasta": { + "type": "string", + "fa_icon": "fas fa-font", + "description": "Path to FASTA genome file.", + "help_text": "If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible." + }, + "igenomes_base": { + "type": "string", + "description": "Directory / URL base for iGenomes references.", + "default": "s3://ngi-igenomes/igenomes/", + "fa_icon": "fas fa-cloud-download-alt", + "hidden": true + }, + "igenomes_ignore": { + "type": "boolean", + "description": "Do not load the iGenomes reference config.", + "fa_icon": "fas fa-ban", + "hidden": true, + "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + } + } + }, + "generic_options": { + "title": "Generic options", + "type": "object", + "fa_icon": "fas fa-file-import", + "description": "Less common options for the pipeline, typically set in a config file.", + "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", + "properties": { + "help": { + "type": "boolean", + "description": "Display help text.", + "hidden": true, + "fa_icon": "fas fa-question-circle" + }, + "publish_dir_mode": { + "type": "string", + "default": "copy", + "hidden": true, + "description": "Method used to save pipeline results to output directory.", + "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", + "fa_icon": "fas fa-copy", + "enum": [ + "symlink", + "rellink", + "link", + "copy", + "copyNoFollow", + "mov" + ] + }, + "name": { + "type": "string", + "description": "Workflow name.", + "fa_icon": "fas fa-fingerprint", + "hidden": true, + "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." + }, + "email_on_fail": { + "type": "string", + "description": "Email address for completion summary, only when pipeline fails.", + "fa_icon": "fas fa-exclamation-triangle", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", + "hidden": true, + "help_text": "This works exactly as with `--email`, except emails are only sent if the workflow is not successful." + }, + "plaintext_email": { + "type": "boolean", + "description": "Send plain-text email instead of HTML.", + "fa_icon": "fas fa-remove-format", + "hidden": true, + "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + }, + "max_multiqc_email_size": { + "type": "string", + "description": "File size limit when attaching MultiQC reports to summary emails.", + "default": "25.MB", + "fa_icon": "fas fa-file-upload", + "hidden": true, + "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached." + }, + "monochrome_logs": { + "type": "boolean", + "description": "Do not use coloured log outputs.", + "fa_icon": "fas fa-palette", + "hidden": true, + "help_text": "Set to disable colourful command line output and live life in monochrome." + }, + "multiqc_config": { + "type": "string", + "description": "Custom config file to supply to MultiQC.", + "fa_icon": "fas fa-cog", + "hidden": true + }, + "tracedir": { + "type": "string", + "description": "Directory to keep pipeline Nextflow logs and reports.", + "default": "${params.outdir}/pipeline_info", + "fa_icon": "fas fa-cogs", + "hidden": true + } + } + }, + "max_job_request_options": { + "title": "Max job request options", + "type": "object", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + } + }, + "institutional_config_options": { + "title": "Institutional config options", + "type": "object", + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog", + "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + } + } + } + }, + "allOf": [ + { + "$ref": "#/definitions/input_output_options" + }, + { + "$ref": "#/definitions/reference_genome_options" + }, + { + "$ref": "#/definitions/generic_options" + }, + { + "$ref": "#/definitions/max_job_request_options" + }, + { + "$ref": "#/definitions/institutional_config_options" + } + ] +} From 2f98ffc201c82c1d3acd40e3abf5972afb9b7985 Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Fri, 31 Jul 2020 13:24:54 +0200 Subject: [PATCH 298/374] Template update for nf-core/tools version 1.10.2 --- .github/CONTRIBUTING.md | 2 +- .github/workflows/awsfulltest.yml | 2 +- .github/workflows/branch.yml | 1 + .github/workflows/linting.yml | 9 ++++- .github/workflows/push_dockerhub.yml | 53 ++++++++++++++-------------- Dockerfile | 2 +- conf/test_full.config | 2 +- docs/usage.md | 2 +- main.nf | 2 +- 9 files changed, 42 insertions(+), 33 deletions(-) diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index e095919..3edff13 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -46,7 +46,7 @@ These tests are run both with the latest available version of `Nextflow` and als ## Patch -: warning: Only in the unlikely and regretful event of a release happening with a bug. +:warning: Only in the unlikely and regretful event of a release happening with a bug. * On your own fork, make a new branch `patch` based on `upstream/master`. * Fix the bug, and bump version (X.Y.Z+1). diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 282744b..5c9e9ea 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -21,7 +21,7 @@ jobs: run: conda install -c conda-forge awscli - name: Start AWS batch job # TODO nf-core: You can customise AWS full pipeline tests as required - # Add full size test data (but still relatively small datasets for few samples) + # Add full size test data (but still relatively small datasets for few samples) # on the `test_full.config` test runs with only one set of parameters # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command env: diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index 517f099..49657fe 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -17,6 +17,7 @@ jobs: # If the above check failed, post a comment on the PR explaining the failure + # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets - name: Post PR comment if: failure() uses: mshick/add-pr-comment@v1 diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index eb66c14..8e8d5bb 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -57,5 +57,12 @@ jobs: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core lint ${GITHUB_WORKSPACE} + run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} + + - name: Upload linting log file artifact + if: ${{ always() }} + uses: actions/upload-artifact@v2 + with: + name: linting-log-file + path: lint_log.txt diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub.yml index 6f7b031..b84523a 100644 --- a/.github/workflows/push_dockerhub.yml +++ b/.github/workflows/push_dockerhub.yml @@ -8,32 +8,33 @@ on: release: types: [published] -push_dockerhub: - name: Push new Docker image to Docker Hub - runs-on: ubuntu-latest - # Only run for the nf-core repo, for releases and merged PRs - if: ${{ github.repository == 'nf-core/proteomicslfq' }} - env: - DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} - DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 +jobs: + push_dockerhub: + name: Push new Docker image to Docker Hub + runs-on: ubuntu-latest + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/proteomicslfq' }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 - - name: Build new docker image - run: docker build --no-cache . -t nfcore/proteomicslfq:latest + - name: Build new docker image + run: docker build --no-cache . -t nfcore/proteomicslfq:latest - - name: Push Docker image to DockerHub (dev) - if: ${{ github.event_name == 'push' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:dev - docker push nfcore/proteomicslfq:dev + - name: Push Docker image to DockerHub (dev) + if: ${{ github.event_name == 'push' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:dev + docker push nfcore/proteomicslfq:dev - - name: Push Docker image to DockerHub (release) - if: ${{ github.event_name == 'release' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker push nfcore/proteomicslfq:latest - docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:${{ github.event.release.tag_name }} - docker push nfcore/proteomicslfq:${{ github.event.release.tag_name }} + - name: Push Docker image to DockerHub (release) + if: ${{ github.event_name == 'release' }} + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/proteomicslfq:latest + docker tag nfcore/proteomicslfq:latest nfcore/proteomicslfq:${{ github.event.release.tag_name }} + docker push nfcore/proteomicslfq:${{ github.event.release.tag_name }} diff --git a/Dockerfile b/Dockerfile index eaab6a6..ad7ed4a 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,4 +1,4 @@ -FROM nfcore/base:1.10.1 +FROM nfcore/base:1.10.2 LABEL authors="Julianus Pfeuffer, Lukas Heumos, Leon Bichmann, Timo Sachsenberg" \ description="Docker image containing all software requirements for the nf-core/proteomicslfq pipeline" diff --git a/conf/test_full.config b/conf/test_full.config index bef80aa..7021b8f 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -15,7 +15,7 @@ params { // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) // TODO nf-core: Give any required params for the test so that command line flags are not needed single_end = false - readPaths = [ + input_paths = [ ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] ] diff --git a/docs/usage.md b/docs/usage.md index e0eea66..4a22e5b 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -80,7 +80,7 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U ### `-c` -Specify the path to a specific config file (this is a core NextFlow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. +Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. #### Custom resource requests diff --git a/main.nf b/main.nf index b06326b..590f477 100644 --- a/main.nf +++ b/main.nf @@ -127,7 +127,7 @@ def summary = [:] if (workflow.revision) summary['Pipeline Release'] = workflow.revision summary['Run Name'] = custom_runName ?: workflow.runName // TODO nf-core: Report custom parameters here -summary['Reads'] = params.input +summary['Input'] = params.input summary['Fasta Ref'] = params.fasta summary['Data Type'] = params.single_end ? 'Single-End' : 'Paired-End' summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" From a811c2d5d62c7f3f5b85d6805df06f16877326c8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 31 Jul 2020 13:47:37 +0200 Subject: [PATCH 299/374] Add input param --- nextflow.config | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nextflow.config b/nextflow.config index b681587..fc9b323 100644 --- a/nextflow.config +++ b/nextflow.config @@ -8,6 +8,9 @@ // Global default params, used in configs params { + // Nf-core lint required params that are unused + input = 'foobarbaz' + // Workflow flags sdrf = '' root_folder = '' From dc9a547174348261a9715626c28eed6bd58b5a5b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 31 Jul 2020 14:14:57 +0200 Subject: [PATCH 300/374] actions lint --- .github/workflows/ci.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index bcff598..cb2aced 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -4,9 +4,10 @@ name: nf-core CI on: push: branches: - - master - dev pull_request: + release: + types: [published] jobs: test: From 37c3098e18607c7b425122e6758d6ce3e38beaa9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 31 Jul 2020 14:30:22 +0200 Subject: [PATCH 301/374] Make awsfulltest test full --- .github/workflows/awsfulltest.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 5c9e9ea..aa1608f 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -37,4 +37,4 @@ jobs: --job-name nf-core-proteomicslfq \ --job-queue $AWS_JOB_QUEUE \ --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' From 4398fbee08dcdbe221a95a2396c773acfbfc59a4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 31 Jul 2020 14:52:41 +0200 Subject: [PATCH 302/374] update conda versions --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 60cce1e..00294ca 100644 --- a/environment.yml +++ b/environment.yml @@ -13,7 +13,7 @@ dependencies: - bioconda::comet-ms - bioconda::luciphor2 - bioconda::percolator - - bioconda::bioconductor-msstats=3.20.0 # will include R + - bioconda::bioconductor-msstats=3.20.1 # will include R - bioconda::sdrf-pipelines=0.0.9 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 From df729ab3883fb787cb6e801b60527f66ea35c50a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 31 Jul 2020 16:26:49 +0200 Subject: [PATCH 303/374] update pymdown --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 00294ca..3d22958 100644 --- a/environment.yml +++ b/environment.yml @@ -21,6 +21,6 @@ dependencies: - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - - conda-forge::pymdown-extensions=6.0 + - conda-forge::pymdown-extensions=7.1 - conda-forge::pygments=2.5.2 From e307938ccc65073fbbbcd95dc25959fc1df06cb2 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 2 Aug 2020 16:31:07 +0200 Subject: [PATCH 304/374] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 22fd0c1..62860b8 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ See [usage docs](docs/usage.md) for all of the available options when running th ## Documentation -The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-core/proteomicslfq/docs](https://nf-core/proteomicslfq/docs) or find in the [`docs/` directory](docs). +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq/docs](https://nf-core/proteomicslfq/docs) or find in the [`docs/` directory](docs). From a8813ffc9b18aae204bdc9d1a217f7603d39eca0 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 2 Aug 2020 16:33:22 +0200 Subject: [PATCH 305/374] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 62860b8..673bc67 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ See [usage docs](docs/usage.md) for all of the available options when running th ## Documentation -The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq/docs](https://nf-core/proteomicslfq/docs) or find in the [`docs/` directory](docs). +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq/docs](https://nf-co.re/proteomicslfq/docs) or find in the [`docs/` directory](docs). From 79d8da2ca20decaf7b45b1e445e6bdbd81f1145d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 2 Aug 2020 16:58:28 +0200 Subject: [PATCH 306/374] Fix error introduced during merge --- nextflow_schema.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 4283f0b..ecb71a3 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -148,7 +148,7 @@ "default": "master", "hidden": true, "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" + "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```", "fa_icon": "fas fa-users-cog" }, "custom_config_base": { From d79c5d5d67b19eb06e6b427307b577eedd13dc43 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 2 Aug 2020 17:08:45 +0200 Subject: [PATCH 307/374] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 673bc67..da0acd2 100644 --- a/README.md +++ b/README.md @@ -36,11 +36,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool nextflow run nf-core/proteomicslfq -profile --spectra '*.mzml' --database '*.fasta' --expdesign '*.tsv' ``` -See [usage docs](docs/usage.md) for all of the available options when running the pipeline. +See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available options when running the pipeline. ## Documentation -The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq/docs](https://nf-co.re/proteomicslfq/docs) or find in the [`docs/` directory](docs). +The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq](https://nf-co.re/proteomicslfq) or partly find in the [`docs/` directory](docs). From 1a4fbec4c90d63202b045a358e32ab7f187f5b42 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 3 Aug 2020 18:26:42 +0200 Subject: [PATCH 308/374] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index da0acd2..475f55d 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool nextflow run nf-core/proteomicslfq -profile --spectra '*.mzml' --database '*.fasta' --expdesign '*.tsv' ``` -See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available options when running the pipeline. +See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available options when running the pipeline. Or configure the pipeline via +[nf-core launch](https://nf-co.re/launch) from the web or the command line. ## Documentation From 9d29e227837ffeb81c23fd5af0cc110d88779ec6 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 12 Aug 2020 13:43:13 +0200 Subject: [PATCH 309/374] Try new openms conda channel --- environment.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/environment.yml b/environment.yml index 195a21c..4d59a7b 100644 --- a/environment.yml +++ b/environment.yml @@ -3,12 +3,12 @@ name: nf-core-proteomicslfq-1.0dev channels: - - bgruening + - openms - conda-forge - bioconda dependencies: # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - - bgruening::openms + - openms::openms=2.6.0pre # nightly version - bioconda::thermorawfileparser - bioconda::msgf_plus - bioconda::comet-ms From 72ccac2887ba2c61ac641a54336e60a1b7392082 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 16 Aug 2020 13:43:54 +0200 Subject: [PATCH 310/374] Modified Comet params to new definitions --- main.nf | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 4ed2a42..e8b0688 100644 --- a/main.nf +++ b/main.nf @@ -559,18 +559,18 @@ process search_engine_comet { // Note: This uses an arbitrary rule to decide if it was hi-res or low-res // and uses Comet's defaults for bin size, in case unsupported unit "ppm" was given. if (frag_tol.toDouble() < 50) { - bin_tol = "0.03" + bin_tol = "0.015" bin_offset = "0.0" inst = params.instrument ?: "high_res" } else { - bin_tol = "1.0005" + bin_tol = "0.50025" bin_offset = "0.4" inst = params.instrument ?: "low_res" } log.warn "The chosen search engine Comet does not support ppm fragment tolerances. We guessed a " + inst + " instrument and set the fragment_bin_tolerance to " + bin_tol } else { - bin_tol = frag_tol + bin_tol = frag_tol.toDouble() / 2.0 bin_offset = frag_tol.toDouble() < 0.1 ? "0.0" : "0.4" if (!params.instrument) { @@ -598,7 +598,7 @@ process search_engine_comet { -threads ${task.cpus} \\ -database "${database}" \\ -instrument ${inst} \\ - -allowed_missed_cleavages ${params.allowed_missed_cleavages} \\ + -missed_cleavages ${params.allowed_missed_cleavages} \\ -num_hits ${params.num_hits} \\ -num_enzyme_termini ${params.num_enzyme_termini} \\ -enzyme "${enzyme}" \\ @@ -608,7 +608,7 @@ process search_engine_comet { -max_variable_mods_in_peptide ${params.max_mods} \\ -precursor_mass_tolerance ${prec_tol} \\ -precursor_error_units ${prec_tol_unit} \\ - -fragment_bin_tolerance ${bin_tol} \\ + -fragment_mass_tolerance ${bin_tol} \\ -fragment_bin_offset ${bin_offset} \\ -debug ${params.db_debug} \\ > ${mzml_file.baseName}_comet.log From 3b9ccb30c682f5857150b7f0498214c4ee3cfe6c Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sun, 16 Aug 2020 14:42:47 +0200 Subject: [PATCH 311/374] Adapt renamed percolator params --- main.nf | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index e8b0688..547f441 100644 --- a/main.nf +++ b/main.nf @@ -88,9 +88,9 @@ def helpMessage() { --test_FDR False discovery rate threshold for evaluating best cross validation result and reported end result --percolator_fdr_level Level of FDR calculation ('peptide-level-fdrs' or 'psm-level-fdrs') --description_correct_features Description of correct features for Percolator (0, 1, 2, 4, 8, see Percolator retention time and calibration) - --generic-feature-set Use only generic (i.e. not search engine specific) features. Generating search engine specific + --generic_feature_set Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly. - --subset-max-train Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other + --subset_max_train Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. --klammer Retention time features are calculated as in Klammer et al. instead of with Elude @@ -725,9 +725,9 @@ process percolator { -in ${id_file} \\ -out ${id_file.baseName}_perc.idXML \\ -threads ${task.cpus} \\ - -subset-max-train ${params.subset_max_train} \\ - -decoy-pattern ${params.decoy_affix} \\ - -post-processing-tdc \\ + -subset_max_train ${params.subset_max_train} \\ + -decoy_pattern ${params.decoy_affix} \\ + -post_processing_tdc \\ -score_type pep \\ > ${id_file.baseName}_percolator.log """ From bda2ee1135b39e04dc072a4cec32df48465adc34 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Aug 2020 17:26:39 +0200 Subject: [PATCH 312/374] Use input parameter and decide mode based on extension. --- main.nf | 25 ++++++++----- nextflow.config | 4 +-- nextflow_schema.json | 86 +++++++++++++++++++++++++++----------------- 3 files changed, 70 insertions(+), 45 deletions(-) diff --git a/main.nf b/main.nf index 025a727..6da3f8d 100644 --- a/main.nf +++ b/main.nf @@ -20,13 +20,14 @@ def helpMessage() { nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker Main arguments: - Either: - --sdrf Path to PRIDE Sample to data relation format file + --input Path/URI to PRIDE Sample to data relation format file (SDRF) OR path to input spectra as mzML or Thermo Raw + + For SDRF: --root_folder (Optional) If given, looks for the filenames in the SDRF in this folder, locally --local_input_type (Optional) If given and 'root_folder' was specified, it overwrites the filetype in the SDRF for local lookup and matches only the basename. - Or: - --spectra Path to input spectra as mzML or Thermo Raw - --expdesign Path to optional experimental design file (if not given, it assumes unfractionated, unrelated samples) + + For mzML/raw files: + --expdesign (Optional) Path to an experimental design file (if not given, it assumes unfractionated, unrelated samples) And: --database Path to input protein database as fasta @@ -193,10 +194,16 @@ ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) -// Validate inputs -if (!(params.spectra || params.sdrf) || (params.spectra && params.sdrf)) -{ - log.error "EITHER spectra data OR SDRF needs to be provided. Make sure you have used either of those options."; exit 1 +params.sdrf = "" +params.spectra = "" + +// Validate input +if (params.input.endsWith("sdrf") || params.input.endsWith("SDRF")) { + params.sdrf = params.input +} else if (params.input.endsWith("mzml") || params.input.endsWith("mzML") || params.input.endsWith("raw") || params.input.endsWith("RAW")) { + params.spectra = params.input +} else { + log.error "EITHER spectra data (mzML/raw) OR an SDRF needs to be provided as input."; exit 1 } params.database = params.database ?: { log.error "No protein database provided. Make sure you have used the '--database' option."; exit 1 }() diff --git a/nextflow.config b/nextflow.config index 1192027..760df48 100644 --- a/nextflow.config +++ b/nextflow.config @@ -12,11 +12,9 @@ params { input = 'foobarbaz' // Workflow flags - input = '' //TODO unused. Maybe we should allow SDRF and Raw and MzML as input and then pick the right mode from the file ending. - sdrf = '' + input = '' // the sdrf and spectra parameters are inferred from this one root_folder = '' local_input_type = '' - spectra = '' database = '' expdesign = '' diff --git a/nextflow_schema.json b/nextflow_schema.json index ecb71a3..0d2056f 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -16,9 +16,9 @@ "properties": { "input": { "type": "string", - "fa_icon": "fas fa-dna", - "description": "Input FastQ files.", - "help_text": "Use this to specify the location of your input FastQ files. For example:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character\n3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.\n\nIf left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`" + "fa_icon": "fas fa-vials", + "description": "URI/path to an [SDRF](https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects) file **OR** globbing pattern for URIs/paths of mzML or Thermo RAW files", + "help_text": "The input to the pipeline can be specified in two **mutually exclusive** ways:\n - using a path or URI to a PRIDE Sample to Data Relation Format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see [here](https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects) for examples). Input files will be downloaded and cached from the URIs specified in the SDRF file.\nAn OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the\nfollowing parameters will currently be overwritten by the ones specified in the SDRF:\n\n * `fixed_mods`,\n * `variable_mods`,\n * `precursor_mass_tolerance`,\n * `precursor_mass_tolerance_unit`,\n * `fragment_mass_tolerance`,\n * `fragment_mass_tolerance_unit`,\n * `fragment_method`,\n * `enzyme`\n - by specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format (e.g. `/data/experiment{1,2,3}_rep*.mzML`). An experimental design should be provided with the `expdesign` parameter." }, "outdir": { "type": "string", @@ -148,8 +148,7 @@ "default": "master", "hidden": true, "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```", - "fa_icon": "fas fa-users-cog" + "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" }, "custom_config_base": { "type": "string", @@ -188,15 +187,9 @@ "main_parameters__sdrf_": { "title": "Main parameters (SDRF)", "type": "object", - "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by using a path or URI to a PRIDE Sample to Data Relation Format file (SDRF), e.g. as part of a submitted and\nannotated PRIDE experiment (see [here](https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects) for examples). Alternatively, you can use the [TSV options](#main_parameters__tsv_)", + "description": "In case your input was an SDRF files, the following optional parameters can be set.", "default": "", "properties": { - "sdrf": { - "type": "string", - "description": "The URI or path to the SDRF file", - "fa_icon": "fas fa-vials", - "help_text": "The URI or path to the SDRF file. Input files will be downloaded and cached from the URIs specified in the SDRF file.\nAn OpenMS-style experimental design will be generated based on the factor columns of the SDRF. The settings for the\nfollowing parameters will currently be overwritten by the ones specified in the SDRF:\n\n* `fixed_mods`,\n* `variable_mods`,\n* `precursor_mass_tolerance`,\n* `precursor_mass_tolerance_unit`,\n* `fragment_mass_tolerance`,\n* `fragment_mass_tolerance_unit`,\n* `fragment_method`,\n* `enzyme`" - }, "root_folder": { "type": "string", "description": "Root folder in which the spectrum files specified in the SDRF are searched", @@ -212,21 +205,16 @@ }, "fa_icon": "far fa-chart-bar" }, - "main_parameters__tsv_": { - "title": "Main parameters (TSV)", + "main_parameters__spectra_files_": { + "title": "Main parameters (spectra files)", "type": "object", - "description": "The input to the pipeline can be specified in two **mutually exclusive** ways:\nHere by specifying globbing patterns to the input spectrum files in Thermo RAW or mzML format and a manual OpenMS-style experimental design file. Alternatively, you can use the [SDRF options](#main_parameters__sdrf_)", + "description": "In case your input was a globbing pattern to spectrum files in Thermo RAW or mzML format you can specify a manual OpenMS-style experimental design file here.", "default": "", "properties": { "expdesign": { "type": "string", + "description": "A tab-separated experimental design file in OpenMS' own format (TODO link). All input files need to be present as a row with exactly the same names. If no design is given, unrelated, unfractionated runs are assumed.", "fa_icon": "fas fa-file-csv" - }, - "spectra": { - "type": "string", - "description": "Location of mzML or Thermo RAW files", - "fa_icon": "fas fa-copy", - "help_text": "Use this to specify the location of your input mzML or Thermo RAW files:\n\n```bash\n--spectra 'path/to/data/*.mzML'\n```\n\nor\n\n```bash\n--spectra 'path/to/data/*.raw'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character" } }, "fa_icon": "far fa-chart-bar" @@ -244,7 +232,7 @@ "help_text": "Since the database is not included in an SDRF, this parameter always needs to be given to specify the input protein database\nwhen you run the pipeline. Remember to include contaminants (and decoys if not added in the pipeline with \\-\\-add-decoys)\n\n```bash\n--database '[path to Fasta protein database]'\n```" }, "add_decoys": { - "type": "string", + "type": "boolean", "description": "Generate and append decoys to the given protein database", "fa_icon": "fas fa-coins", "help_text": "If decoys were not yet included in the input database, they have to be appended by OpenMS DecoyGenerator by adding this flag (TODO allow specifying type).\nDefault: pseudo-reverse peptides" @@ -343,7 +331,11 @@ "type": "string", "description": "Precursor mass tolerance unit used for database search. Possible values are 'ppm' (default) and 'Da'.", "default": "ppm", - "fa_icon": "fas fa-sliders-h" + "fa_icon": "fas fa-sliders-h", + "enum": [ + "Da", + "ppm" + ] }, "fragment_mass_tolerance": { "type": "number", @@ -357,7 +349,11 @@ "description": "Fragment mass tolerance unit used for database search. Possible values are 'ppm' (default) and 'Da'.", "default": "Da", "fa_icon": "fas fa-list-ol", - "help_text": "Caution: for Comet we are estimating the `fragment_bin_tolerance` parameter based on this automatically." + "help_text": "Caution: for Comet we are estimating the `fragment_bin_tolerance` parameter based on this automatically.", + "enum": [ + "Da", + "ppm" + ] }, "fixed_mods": { "type": "string", @@ -451,7 +447,7 @@ "default": "", "properties": { "enable_mod_localization": { - "type": "string", + "type": "boolean", "description": "Turn the mechanism on.", "fa_icon": "fas fa-toggle-on" }, @@ -494,13 +490,21 @@ "type": "string", "description": "Do not fail if there are some unmatched peptides. Only activate as last resort, if you know that the rest of your settings are fine!", "default": "false", - "fa_icon": "far fa-check-square" + "fa_icon": "far fa-check-square", + "enum": [ + "false", + "true" + ] }, "IL_equivalent": { "type": "string", "description": "Should isoleucine and leucine be treated interchangeably when mapping search engine hits to the database? Default: true", "default": "true", - "fa_icon": "far fa-check-square" + "fa_icon": "far fa-check-square", + "enum": [ + "true", + "false" + ] } }, "fa_icon": "fas fa-project-diagram" @@ -514,7 +518,12 @@ "posterior_probabilities": { "type": "string", "description": "How to calculate posterior probabilities for PSMs:\n\n* 'percolator' = Re-score based on PSM-feature-based SVM and transform distance\n to hyperplane for posteriors\n* 'fit_distributions' = Fit positive and negative distributions to scores\n (similar to PeptideProphet)", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "default": "percolator", + "enum": [ + "percolator", + "fit_distributions" + ] }, "psm_pep_fdr_cutoff": { "type": "number", @@ -540,7 +549,11 @@ "type": "string", "description": "Calculate FDR on PSM ('psm-level-fdrs') or peptide level ('peptide-level-fdrs')?", "default": "peptide-level-fdrs", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "enum": [ + "peptide-level-fdrs", + "psm-level-fdrs" + ] }, "train_FDR": { "type": "number", @@ -585,7 +598,13 @@ "type": "string", "description": "How to handle outliers during fitting:\n\n* ignore_iqr_outliers (default): ignore outliers outside of `3*IQR` from Q1/Q3 for fitting\n* set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting\n* ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem)\n* none: do nothing", "default": "none", - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "enum": [ + "none", + "ignore_iqr_outliers", + "set_iqr_to_closest_valid", + "ignore_extreme_percentiles" + ] } }, "fa_icon": "far fa-star-half" @@ -671,7 +690,8 @@ "feature_intensity", "spectral_counting" ], - "fa_icon": "fas fa-list-ol" + "fa_icon": "fas fa-list-ol", + "hidden": true }, "mass_recalibration": { "type": "boolean", @@ -758,7 +778,7 @@ "$ref": "#/definitions/main_parameters__sdrf_" }, { - "$ref": "#/definitions/main_parameters__tsv_" + "$ref": "#/definitions/main_parameters__spectra_files_" }, { "$ref": "#/definitions/protein_database" @@ -800,4 +820,4 @@ "$ref": "#/definitions/quality_control" } ] -} +} \ No newline at end of file From 55e2397c3bf79c67484ced30db07ed881ed0bcb4 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Aug 2020 17:38:20 +0200 Subject: [PATCH 313/374] change tests accordingly. Use toLower for extension check --- conf/test.config | 2 +- conf/test_full.config | 2 +- conf/test_localize.config | 2 +- main.nf | 4 ++-- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/conf/test.config b/conf/test.config index 901ab90..eb847b9 100644 --- a/conf/test.config +++ b/conf/test.config @@ -17,7 +17,7 @@ params { max_time = 1.h // Input data - spectra = [ + input = [ 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', diff --git a/conf/test_full.config b/conf/test_full.config index e3bf16f..1c64a05 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -14,7 +14,7 @@ params { config_profile_description = 'Real-world sized test dataset to check pipeline function and sanity of results' // Input data - spectra = [ + input = [ 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R1.raw', 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R2.raw', 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/12/PXD001819/UPS1_2500amol_R3.raw', diff --git a/conf/test_localize.config b/conf/test_localize.config index d846bb6..4e3e7cd 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -18,7 +18,7 @@ params { max_time = 1.h // Input data - spectra = [ + input = [ 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', diff --git a/main.nf b/main.nf index 6da3f8d..218ce2d 100644 --- a/main.nf +++ b/main.nf @@ -198,9 +198,9 @@ params.sdrf = "" params.spectra = "" // Validate input -if (params.input.endsWith("sdrf") || params.input.endsWith("SDRF")) { +if (params.input.toLowerCase().endsWith("sdrf")) { params.sdrf = params.input -} else if (params.input.endsWith("mzml") || params.input.endsWith("mzML") || params.input.endsWith("raw") || params.input.endsWith("RAW")) { +} else if (params.input.toLowerCase().endsWith("mzml") || params.input.toLowerCase().endsWith("raw")) { params.spectra = params.input } else { log.error "EITHER spectra data (mzML/raw) OR an SDRF needs to be provided as input."; exit 1 From ba825e6f62ac27499ecbefb2583f391704e7c037 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Aug 2020 17:50:13 +0200 Subject: [PATCH 314/374] Try as if input is always a list --- main.nf | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/main.nf b/main.nf index 218ce2d..868f8c7 100644 --- a/main.nf +++ b/main.nf @@ -194,13 +194,13 @@ ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) -params.sdrf = "" -params.spectra = "" +params.sdrf = [] +params.spectra = [] // Validate input -if (params.input.toLowerCase().endsWith("sdrf")) { +if (params.input[0].toLowerCase().endsWith("sdrf")) { params.sdrf = params.input -} else if (params.input.toLowerCase().endsWith("mzml") || params.input.toLowerCase().endsWith("raw")) { +} else if (params.input[0].toLowerCase().endsWith("mzml") || params.input[0].toLowerCase().endsWith("raw")) { params.spectra = params.input } else { log.error "EITHER spectra data (mzML/raw) OR an SDRF needs to be provided as input."; exit 1 From 4601ac9e27efdd4730330ff21da5e356d2c0abee Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Aug 2020 17:54:53 +0200 Subject: [PATCH 315/374] Implement sugestion --- main.nf | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/main.nf b/main.nf index 868f8c7..3bd3b4d 100644 --- a/main.nf +++ b/main.nf @@ -193,14 +193,17 @@ if (workflow.profile.contains('awsbatch')) { ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) - -params.sdrf = [] -params.spectra = [] - // Validate input -if (params.input[0].toLowerCase().endsWith("sdrf")) { +if (isCollectionOrArray(params.input)) +{ + tocheck = params.input[0] +} else { + tocheck = params.input +} + +if (tocheck.toLowerCase().endsWith("sdrf")) { params.sdrf = params.input -} else if (params.input[0].toLowerCase().endsWith("mzml") || params.input[0].toLowerCase().endsWith("raw")) { +} else if (tocheck.toLowerCase().endsWith("mzml") || tocheck.toLowerCase().endsWith("raw")) { params.spectra = params.input } else { log.error "EITHER spectra data (mzML/raw) OR an SDRF needs to be provided as input."; exit 1 @@ -1381,3 +1384,7 @@ def checkHostname() { def hasExtension(it, extension) { it.toString().toLowerCase().endsWith(extension.toLowerCase()) } + +boolean isCollectionOrArray(object) { + [Collection, Object[]].any { it.isAssignableFrom(object.getClass()) } +} From fb78319aef95af8f3d5ea17c84dc90f2cc963de8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 18 Aug 2020 18:26:21 +0200 Subject: [PATCH 316/374] Last changes. Avoid equally named params and param groups. --- nextflow_schema.json | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 0d2056f..01c4e22 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -642,8 +642,8 @@ }, "fa_icon": "fas fa-code-branch" }, - "protein_inference": { - "title": "Protein inference", + "protein_inference_": { + "title": "Protein inference ", "type": "object", "description": "To group proteins, calculate scores on the protein (group) level and to potentially modify associations from peptides to proteins.", "default": "", @@ -653,7 +653,11 @@ "description": "The inference method to use. 'aggregation' (default) or 'bayesian'.", "default": "aggregation", "fa_icon": "fas fa-list-ol", - "help_text": "Infer proteins through:\n\n* 'aggregation' = aggregates all peptide scores across a protein (by calculating the maximum) (default)\n* 'bayesian' = computes a posterior probability for every protein based on a Bayesian network (i.e. using Epifany)\n* ('percolator' not yet supported)" + "help_text": "Infer proteins through:\n\n* 'aggregation' = aggregates all peptide scores across a protein (by calculating the maximum) (default)\n* 'bayesian' = compute a posterior probability for every protein based on a Bayesian network (i.e. using Epifany)\n* ('percolator' not yet supported)\n\n**Note:** If protein grouping is performed also depends on the `protein_quant` parameter (i.e. if peptides have to be unique or unique to a group only)", + "enum": [ + "aggregation", + "bayesian" + ] }, "protein_level_fdr_cutoff": { "type": "number", @@ -808,7 +812,7 @@ "$ref": "#/definitions/consensus_id" }, { - "$ref": "#/definitions/protein_inference" + "$ref": "#/definitions/protein_inference_" }, { "$ref": "#/definitions/protein_quantification" From db5e50df0c7eec58b4177af9d6452d11b7bd50e2 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 18 Aug 2020 18:45:28 +0200 Subject: [PATCH 317/374] Update Readme for recent param changes --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 475f55d..aec78dd 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ```bash - nextflow run nf-core/proteomicslfq -profile --spectra '*.mzml' --database '*.fasta' --expdesign '*.tsv' + nextflow run nf-core/proteomicslfq -profile --input '*.mzml' --database 'myProteinDB.fasta' --expdesign 'myDesign.tsv' ``` See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available options when running the pipeline. Or configure the pipeline via From 2585bbaa096dcad3b4217d59b104696755309e6e Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 18 Aug 2020 18:47:29 +0200 Subject: [PATCH 318/374] Reformat Readme codeboxes for websites --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index aec78dd..5cbb30a 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool ```bash - nextflow run nf-core/proteomicslfq -profile --input '*.mzml' --database 'myProteinDB.fasta' --expdesign 'myDesign.tsv' + nextflow run nf-core/proteomicslfq \ + -profile \ + --input '*.mzml' \ + --database 'myProteinDB.fasta' \ + --expdesign 'myDesign.tsv' ``` See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available options when running the pipeline. Or configure the pipeline via From cdfcd0dd1e989c50d201720c35189487c9ae266d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 19 Aug 2020 11:30:52 +0200 Subject: [PATCH 319/374] Allow tsv for sdrf extension --- main.nf | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/main.nf b/main.nf index b15fa32..1e90ac7 100644 --- a/main.nf +++ b/main.nf @@ -201,10 +201,10 @@ if (isCollectionOrArray(params.input)) tocheck = params.input } -if (tocheck.toLowerCase().endsWith("sdrf")) { - params.sdrf = params.input +if (tocheck.toLowerCase().endsWith("sdrf") || tocheck.toLowerCase().endsWith("tsv")) { + sdrf_file = params.input } else if (tocheck.toLowerCase().endsWith("mzml") || tocheck.toLowerCase().endsWith("raw")) { - params.spectra = params.input + spectra_files = params.input } else { log.error "EITHER spectra data (mzML/raw) OR an SDRF needs to be provided as input."; exit 1 } @@ -219,9 +219,9 @@ params.outdir = params.outdir ?: { log.warn "No output directory provided. Will //Filename FixedModifications VariableModifications Label PrecursorMassTolerance PrecursorMassToleranceUnit FragmentMassTolerance DissociationMethod Enzyme -if (!params.sdrf) +if (!sdrf_file) { - ch_spectra = Channel.fromPath(params.spectra, checkIfExists: true) + ch_spectra = Channel.fromPath(spectra_files, checkIfExists: true) ch_spectra .multiMap{ it -> id = it.toString().md5() comet_settings: msgf_settings: tuple(id, @@ -244,7 +244,7 @@ if (!params.sdrf) } else { - ch_sdrf = Channel.fromPath(params.sdrf, checkIfExists: true) + ch_sdrf = Channel.fromPath(sdrf_file, checkIfExists: true) /* * STEP 0 - SDRF parsing */ @@ -260,7 +260,7 @@ else file "openms.tsv" into ch_sdrf_config_file when: - params.sdrf + sdrf_file script: """ From 56c93dfd522399527a436068e43a04ca393ff674 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 19 Aug 2020 11:51:18 +0200 Subject: [PATCH 320/374] Init sdrf variable before --- main.nf | 2 ++ 1 file changed, 2 insertions(+) diff --git a/main.nf b/main.nf index 1e90ac7..b578dd4 100644 --- a/main.nf +++ b/main.nf @@ -201,6 +201,8 @@ if (isCollectionOrArray(params.input)) tocheck = params.input } +sdrf_file = null + if (tocheck.toLowerCase().endsWith("sdrf") || tocheck.toLowerCase().endsWith("tsv")) { sdrf_file = params.input } else if (tocheck.toLowerCase().endsWith("mzml") || tocheck.toLowerCase().endsWith("raw")) { From 8768b4f86effef04b225eb7291da7ecc5fb632a0 Mon Sep 17 00:00:00 2001 From: Gisela Gabernet Garriga Date: Mon, 24 Aug 2020 09:40:25 +0200 Subject: [PATCH 321/374] fix awstest --- .github/workflows/awstest.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 4f3e0fc..21fb8b7 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -37,4 +37,4 @@ jobs: --job-name nf-core-proteomicslfq \ --job-queue $AWS_JOB_QUEUE \ --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + --container-overrides '{"command": ["nf-core/proteomicslfq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/proteomicslfq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' From 6b7fe99352e6f00855d033cb7ae10b7d81a5550f Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Mon, 24 Aug 2020 15:15:20 +0200 Subject: [PATCH 322/374] Try new mzTab --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index ff911f8..90b624f 100644 --- a/environment.yml +++ b/environment.yml @@ -7,7 +7,7 @@ channels: - bioconda dependencies: # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - - openms::openms=2.6.0pre # nightly version + - openms::openms=2.6.0pre # nightly version of 2.6 - bioconda::thermorawfileparser - bioconda::msgf_plus - bioconda::comet-ms From a1750153f94806e3ea705726cf3bd6771445aa54 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Aug 2020 22:39:44 +0200 Subject: [PATCH 323/374] [FEATURE] create a folder per config in the artifact download for better inspection --- .github/workflows/ci.yml | 4 ++-- conf/test_localize.config | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 138d783..1570c88 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -63,7 +63,7 @@ jobs: sudo mv nextflow /usr/local/bin/ - name: Run pipeline with test data run: | - nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker + nextflow run ${GITHUB_WORKSPACE} "$TOWER" -name "$RUN_NAME" -profile $TEST_PROFILE,docker --outdir ${TEST_PROFILE}_results - name: Gather failed logs if: failure() || cancelled() run: | @@ -81,7 +81,7 @@ jobs: name: Upload results with: name: results - path: results + path: ${TEST_PROFILE}_results - uses: actions/upload-artifact@v1 if: always() name: Upload log diff --git a/conf/test_localize.config b/conf/test_localize.config index df84f00..81159f0 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -22,5 +22,5 @@ params { database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/phospho/pools_crap_targetdecoy.fasta' enable_mod_localization = true search_engines = 'comet,msgf' - luciphor_debug = 42 + enable_qc = true } \ No newline at end of file From 666b999e4319b885577fa5a704f362636a2c6716 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Aug 2020 22:50:07 +0200 Subject: [PATCH 324/374] env syntax --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 1570c88..0e13fce 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -81,7 +81,7 @@ jobs: name: Upload results with: name: results - path: ${TEST_PROFILE}_results + path: ${{ env.TEST_PROFILE }}_results - uses: actions/upload-artifact@v1 if: always() name: Upload log From dc28edb700d7890b3584f324108b6e2b0e417131 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 25 Aug 2020 23:55:19 +0200 Subject: [PATCH 325/374] create multiple artifacts --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 0e13fce..335dd98 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -80,7 +80,7 @@ jobs: if: always() name: Upload results with: - name: results + name: ${{ env.TEST_PROFILE }}_results path: ${{ env.TEST_PROFILE }}_results - uses: actions/upload-artifact@v1 if: always() From 324a8009bdc0d0d45a17bd09416356cc0e7da5c9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Aug 2020 11:13:55 +0200 Subject: [PATCH 326/374] try adding the MSstats output parser again --- bin/msstats_plfq.R | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 4db657d..b370aa0 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -93,6 +93,7 @@ if (length(lvls) == 1) groupComparisonPlots(data=test.MSstats$Volcano, type="VolcanoPlot", width=12, height=12,dot.size = 2,ylimUp = 7) + # Otherwise it fails since the behaviour is undefined if (nrow(constrast_mat) > 1) { groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", @@ -107,4 +108,62 @@ if (length(lvls) == 1) # address=F) # # try to plot all comparisons #} + + + # annotate how often the protein was quantified in each condition (NA values introduced by merge of completely missing are set to 1.0) + ############ also calculate missingness on condition level + + # input: ProcessedData matrix of MSstats + # output: + # calculate fraction of na in condition (per protein) + # Groups: PROTEIN [762] + # PROTEIN `1` `2` + # + # 1 sp|A1ANS1|HTPG_PELPD 0 0.5 + # 2 sp|A2I7N3|SPA37_BOVIN 0 0.5 + # 3 sp|A2VDF0|FUCM_HUMAN 0 0.5 + # 4 sp|A6ND91|ASPD_HUMAN 0.5 0.5 + # 5 sp|A7E3W2|LG3BP_BOVIN 0.5 0.5 + # 6 sp|B8FGT4|ATPB_DESAA 0 0.5 + + getMissingInCondition <- function(processedData) + { + p <- processedData + + # count number of samples per condition + n_samples = p %>% group_by(GROUP) %>% summarize(n_samples = length(unique((as.numeric(SUBJECT))))) + + p <- p %>% + filter(!is.na(INTENSITY)) %>% # remove rows with INTENSITY=NA + select(PROTEIN, GROUP, SUBJECT) %>% + distinct() %>% + group_by(PROTEIN, GROUP) %>% + summarize(non_na = n()) # count non-NA values for this protein and condition + + p <- left_join(p, n_samples) %>% + mutate(missingInCondition = 1 - non_na/n_samples) # calculate fraction of missing values in condition + + # create one column for every condition containing the missingness + p <- spread(data = p[,c("PROTEIN", "GROUP", "missingInCondition")], key = GROUP, value = missingInCondition) + return(p) + } + + print ("WTH") + mic <- getMissingInCondition(processed.quant$ProcessedData) + + test.MSstats$ComparisonResult <- merge(x=test.MSstats$ComparisonResult, y=mic, by.x="Protein", by.y="PROTEIN") + print ("WTH") + commoncols <- intersect(colnames(mic), colnames(test.MSstats$ComparisonResult)) + print ("WTH") + test.MSstats$ComparisonResult[, commoncols]<-test.MSstats$ComparisonResult %>% select(commoncols) %>% mutate_all(funs(replace(., is.na(.), 1))) + print ("WTH") + #write comparison to CSV (one CSV per contrast) + writeComparisonToCSV <- function(DF) + { + write.table(DF, file=paste0("comparison_",unique(DF$Label),".csv"), quote=FALSE, sep='\t', row.names = FALSE) + return(DF) + } + print ("WTH") + test.MSstats$ComparisonResult %>% group_by(Label) %>% do(writeComparisonToCSV(as.data.frame(.))) + } From 4b67c1e7651826dc4a4bf58d9e3390beeb50cc8a Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Aug 2020 14:01:08 +0200 Subject: [PATCH 327/374] Fixed the script --- bin/msstats_plfq.R | 191 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 171 insertions(+), 20 deletions(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index b370aa0..43a1081 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -1,27 +1,40 @@ #!/usr/bin/env Rscript args = commandArgs(trailingOnly=TRUE) -if (length(args)==0) { - stop("At least one argument must be supplied (input file).n", call.=FALSE) -} -if (length(args)<=1) { - # contrasts - args[2] = "pairwise" +usage <- "Rscript msstats_plfq.R input.csv input.mztab [list of contrasts or 'pairwise'] [default control condition or ''] [output prefix]" +if (length(args)<2) { + print(usage) + stop("At least the first two arguments must be supplied (input csv and input mzTab).n", call.=FALSE) } if (length(args)<=2) { - # default control condition - args[3] = "" + # contrasts + args[3] = "pairwise" } if (length(args)<=3) { + # default control condition + args[4] = "" +} +if (length(args)<=4) { # default output prefix - args[4] = "out" + args[5] = "msstats" } +csv_input <- args[1] +mzTab_input <- args[2] +contrast_str <- args[3] +control_str <- args[4] +out_prefix <- args[5] +folder <- dirname(mzTab_input) +filename <- basename(mzTab_input) +mzTab_output <- paste0(folder,'/',out_prefix,filename) + # load the MSstats library require(MSstats) +require(dplyr) +require(tidyr) # read dataframe into MSstats -data <- read.csv(args[1]) +data <- read.csv(csv_input) quant <- OpenMStoMSstatsFormat(data, removeProtein_with1Feature = FALSE) @@ -33,9 +46,9 @@ if (length(lvls) == 1) { print("Only one condition found. No contrasts to be tested. If this is not the case, please check your experimental design.") } else { - if (args[2] == "pairwise") + if (contrast_str == "pairwise") { - if (args[3] == "") + if (control_str == "") { l <- length(lvls) contrast_mat <- matrix(nrow = l * (l-1) / 2, ncol = l) @@ -55,7 +68,7 @@ if (length(lvls) == 1) } } } else { - control <- which(as.character(lvls) == args[3]) + control <- which(as.character(lvls) == control_str) if (length(control) == 0) { stop("Control condition not part of found levels.n", call.=FALSE) @@ -76,6 +89,9 @@ if (length(lvls) == 1) c <- c+1 } } + } else { + print("Specific contrasts not supported yet.") + exit(1) } print ("Contrasts to be tested:") @@ -94,7 +110,7 @@ if (length(lvls) == 1) width=12, height=12,dot.size = 2,ylimUp = 7) # Otherwise it fails since the behaviour is undefined - if (nrow(constrast_mat) > 1) + if (nrow(contrast_mat) > 1) { groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", width=12, height=12,dot.size = 2,ylimUp = 7) @@ -148,22 +164,157 @@ if (length(lvls) == 1) return(p) } - print ("WTH") mic <- getMissingInCondition(processed.quant$ProcessedData) test.MSstats$ComparisonResult <- merge(x=test.MSstats$ComparisonResult, y=mic, by.x="Protein", by.y="PROTEIN") - print ("WTH") + commoncols <- intersect(colnames(mic), colnames(test.MSstats$ComparisonResult)) - print ("WTH") - test.MSstats$ComparisonResult[, commoncols]<-test.MSstats$ComparisonResult %>% select(commoncols) %>% mutate_all(funs(replace(., is.na(.), 1))) - print ("WTH") + + test.MSstats$ComparisonResult[, commoncols]<-test.MSstats$ComparisonResult %>% select(all_of(commoncols)) %>% mutate_all(list(replace = function(x){replace(x, is.na(x), 1)})) + #write comparison to CSV (one CSV per contrast) writeComparisonToCSV <- function(DF) { write.table(DF, file=paste0("comparison_",unique(DF$Label),".csv"), quote=FALSE, sep='\t', row.names = FALSE) return(DF) } - print ("WTH") + test.MSstats$ComparisonResult %>% group_by(Label) %>% do(writeComparisonToCSV(as.data.frame(.))) + #replace quants in mzTab + ################# MzTab + # find start of the section + startSection <- function(file, section.identifier) { + data <- file(file, "r") + row = 0 + while (TRUE) { + row = row + 1 + line = readLines(data, n=1) + if (substr(line, 1, 3)==section.identifier) { + break + } + } + close(data) + return (row) + } + + # find start of the mzTab section tables + MTD.first_row <- startSection(mzTab_input, "MTD") + PRT.first_row <- startSection(mzTab_input, "PRH") + PEP.first_row <- startSection(mzTab_input, "PEH") + PSM.first_row <- startSection(mzTab_input, "PSH") + + # read entire mzTab and extract protein data + MTD <- read.table(mzTab_input, sep="\t", + skip=MTD.first_row-1, + nrows=PRT.first_row - MTD.first_row - 1 -1, # one extra empty line + fill=TRUE, + header=TRUE, + quote="", + na.strings=c("null","NA"), + stringsAsFactors=FALSE, + check.names=FALSE) + + + PRT <- read.table(mzTab_input, sep="\t", + skip=PRT.first_row-1, + nrows=PEP.first_row - PRT.first_row - 1 -1, # one extra empty line + fill=TRUE, + header=TRUE, + quote="", + na.strings=c("null","NA"), + stringsAsFactors=FALSE, + check.names=FALSE) + + noquant <- as.logical(PRT[,"opt_global_result_type"] == 'protein_details') + PRT_skipped <- PRT[noquant,] + PRT <- PRT[!noquant,] + + PEP <- read.table(mzTab_input, sep="\t", + skip=PEP.first_row-1, + nrows=PSM.first_row - PEP.first_row - 1 - 1, # one extra empty line + fill=TRUE, + header=TRUE, + quote="", + na.strings=c("null","NA"), + stringsAsFactors=FALSE, + check.names=FALSE) + + PSM <- read.table(mzTab_input, sep="\t", + skip=PSM.first_row-1, + fill=TRUE, + header=TRUE, + quote="", + na.strings=c("null","NA"), + stringsAsFactors=FALSE, + check.names=FALSE) + + #### Insert quantification data from MSstats into PRT section + # first we create a run level protein table form MSstats output + # then we merge the values into the mzTab PRT table + + + # Input: MSstats RunLevelData + # Output: Run level quantification + # Create a run level protein table + # PROTEIN `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15` `16` `17` `18` `19` `20` + # + # 1 sp|A1ANS1|HTPG_PELPD 24.2 24.9 22.8 25.3 24.7 22.9 24.6 25.1 24.0 22.1 25.0 24.3 23.6 NA NA NA NA NA NA NA + # 2 sp|A2I7N1|SPA35_BOVIN 22.9 23.6 22.4 23.8 23.4 NA 23.6 23.9 22.5 NA 23.7 23.5 22.5 22.5 23.0 23.0 22.6 22.2 22.1 22.8 + getRunLevelQuant <- function(runLevelData) + { + runlevel.long <- tibble(RUN=as.numeric(runLevelData$RUN), PROTEIN=runLevelData$Protein, INTENSITY=runLevelData$LogIntensities) + runlevel.wide <- spread(data = runlevel.long, key = RUN, value = INTENSITY) + return(runlevel.wide) + } + quant.runLevel=getRunLevelQuant(processed.quant$RunlevelData) + colnames(quant.runLevel)[1] = "accession" + + quant.runLevel$accession<-as.character(quant.runLevel$accession) + + for (col_nr in seq(from=2, to=length(colnames(quant.runLevel)))) + { + colnames(quant.runLevel)[col_nr]=(paste0("protein_abundance_assay[", colnames(quant.runLevel)[col_nr] , "]")) + } + + # TODO: check if assays in MzTab match to runs. Also true for fractionated data? + + # clear old quant values from ProteinQuantifier + PRT[,grepl( "protein_abundance_assay" , names(PRT))] = NA + PRT[,grepl( "protein_abundance_study_variable" , names(PRT))] = NA + + # merge in quant.runLevel values into PRT + PRT_assay_cols <- grepl("protein_abundance_assay", names(PRT)) + PRT_stdv_cols <- grepl("protein_abundance_study_variable", names(PRT)) + RL_assay_cols <- grepl("protein_abundance_assay", names(quant.runLevel)) + + for (acc in quant.runLevel$accession) + { + q<-which(quant.runLevel$accession==acc) + + # acc from MSstats might be a group e.g., "A;B" so + # we check the single leader protein in mzTab PRT$accession against both A and B + w<-which(PRT$accession %in% strsplit(acc, ";", fixed=TRUE)[[1]]) + + if (length(w) == 0) + { + # TODO: check why not all summarized protein accessions are in the mzTab. Minimum number of peptides/features different? + print(paste("Warning: ", acc, " not in mzTab but reported by MSstats")) + } + else + { + PRT[w, PRT_assay_cols] <- quant.runLevel[q, RL_assay_cols] + PRT[w, PRT_stdv_cols] <- quant.runLevel[q, RL_assay_cols] # we currently store same data in stdv and assay column + } + } + + write.table(MTD, mzTab_output, sep = "\t", quote=FALSE, row.names = FALSE, na = "null") + write("",file=mzTab_output,append=TRUE) + suppressWarnings(write.table(PRT_skipped, mzTab_output, sep = "\t", quote=FALSE, row.names = FALSE, append=TRUE, na = "null")) + suppressWarnings(write.table(PRT, mzTab_output, sep = "\t", col.names=FALSE, quote=FALSE, row.names = FALSE, append=TRUE, na = "null")) + write("",file=mzTab_output,append=TRUE) + suppressWarnings(write.table(PEP, mzTab_output, sep = "\t", quote=FALSE, row.names = FALSE, append=TRUE, na = "null")) + write("",file=mzTab_output,append=TRUE) + suppressWarnings(write.table(PSM, mzTab_output, sep = "\t", quote=FALSE, row.names = FALSE, append=TRUE, na = "null")) + } From ba0e482ec3196ff56de0704501541b0d42780db8 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Aug 2020 14:23:34 +0200 Subject: [PATCH 328/374] adapted main.nf for updated script --- main.nf | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index b578dd4..1ffea60 100644 --- a/main.nf +++ b/main.nf @@ -1099,17 +1099,19 @@ process msstats { input: file csv from out_msstats + file mztab from out_mztab output: // The generation of the PDFs from MSstats are very unstable, especially with auto-contrasts. // And users can easily fix anything based on the csv and the included script -> make optional file "*.pdf" optional true file "*.csv" + file "*.mzTab" file "*.log" script: """ - msstats_plfq.R ${csv} > msstats.log || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." + msstats_plfq.R ${csv} ${mztab} > msstats.log || echo "Optional MSstats step failed. Please check logs and re-run or do a manual statistical analysis." """ } From d7bffad263a7bb67b3ef363dabba6671ff2d07f9 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Aug 2020 14:53:03 +0200 Subject: [PATCH 329/374] case --- main.nf | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index 1ffea60..51182ca 100644 --- a/main.nf +++ b/main.nf @@ -1050,7 +1050,7 @@ process proteomicslfq { file fasta from plfq_in_db.mix(plfq_in_db_decoy) output: - file "out.mzTab" into out_mzTab + file "out.mzTab" into out_mztab_plfq, out_mztab_msstats file "out.consensusXML" into out_consensusXML file "out.csv" into out_msstats file "debug_mergedIDs.idXML" optional true @@ -1099,7 +1099,7 @@ process msstats { input: file csv from out_msstats - file mztab from out_mztab + file mztab from out_mztab_msstats output: // The generation of the PDFs from MSstats are very unstable, especially with auto-contrasts. @@ -1129,7 +1129,7 @@ process ptxqc { params.enable_qc input: - file mzTab from out_mzTab + file mzTab from out_mztab_plfq output: file "*.html" From 7e71168e7f11f9d4a7cef0ca273e00a354fbc7e2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Wed, 26 Aug 2020 16:30:08 +0200 Subject: [PATCH 330/374] make mztab output optional as well. MSstats might fail for very small datasets. --- bin/msstats_plfq.R | 2 +- main.nf | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 43a1081..85563f0 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -108,7 +108,7 @@ if (length(lvls) == 1) test.MSstats$Volcano = test.MSstats$ComparisonResult[!is.na(test.MSstats$ComparisonResult$pvalue),] groupComparisonPlots(data=test.MSstats$Volcano, type="VolcanoPlot", width=12, height=12,dot.size = 2,ylimUp = 7) - + # Otherwise it fails since the behaviour is undefined if (nrow(contrast_mat) > 1) { diff --git a/main.nf b/main.nf index 51182ca..cbf6c2a 100644 --- a/main.nf +++ b/main.nf @@ -1105,8 +1105,8 @@ process msstats { // The generation of the PDFs from MSstats are very unstable, especially with auto-contrasts. // And users can easily fix anything based on the csv and the included script -> make optional file "*.pdf" optional true + file "*.mzTab" optional true file "*.csv" - file "*.mzTab" file "*.log" script: From 4955c82ca5378970d02598107ca703887cf1ecf6 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Aug 2020 21:53:02 +0200 Subject: [PATCH 331/374] Fix versions of tools --- environment.yml | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/environment.yml b/environment.yml index 90b624f..845bc04 100644 --- a/environment.yml +++ b/environment.yml @@ -8,11 +8,11 @@ channels: dependencies: # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - openms::openms=2.6.0pre # nightly version of 2.6 - - bioconda::thermorawfileparser - - bioconda::msgf_plus - - bioconda::comet-ms - - bioconda::luciphor2 - - bioconda::percolator + - bioconda::thermorawfileparser=1.2.3-1 + - bioconda::msgf_plus=2020.08.05-0 + - bioconda::comet-ms=2019015-0 + - bioconda::luciphor2=2020_04_03-0 + - bioconda::percolator=3.5-0 - bioconda::bioconductor-msstats=3.20.1 # will include R - bioconda::sdrf-pipelines=0.0.9 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports From 89a2e90b7134300d9ef7791bbed0b1a192a2dc41 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Aug 2020 21:55:05 +0200 Subject: [PATCH 332/374] update percolator threads --- main.nf | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index b578dd4..ebd8af7 100644 --- a/main.nf +++ b/main.nf @@ -718,9 +718,8 @@ process percolator { //TODO Actually it heavily depends on the subset_max_train option and the number of IDs // would be cool to get an estimate by parsing the number of IDs from previous tools. label 'process_medium' - //TODO The current percolator version only supports up to 3-fold CV so the following might make sense now - // but in the next version it will have nested CV - cpus { check_max( 3, 'cpus' ) } + //Since percolator 3.5 it allows for 27 parallel tasks + cpus { check_max( 27, 'cpus' ) } publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' publishDir "${params.outdir}/raw_ids", mode: 'copy', pattern: '*.idXML' From da4177c87fd0ffb51a50b658c8ce7d8e3d0728fe Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Wed, 26 Aug 2020 21:56:53 +0200 Subject: [PATCH 333/374] Update environment.yml --- environment.yml | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/environment.yml b/environment.yml index 845bc04..08691b2 100644 --- a/environment.yml +++ b/environment.yml @@ -8,11 +8,11 @@ channels: dependencies: # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - openms::openms=2.6.0pre # nightly version of 2.6 - - bioconda::thermorawfileparser=1.2.3-1 - - bioconda::msgf_plus=2020.08.05-0 - - bioconda::comet-ms=2019015-0 - - bioconda::luciphor2=2020_04_03-0 - - bioconda::percolator=3.5-0 + - bioconda::thermorawfileparser=1.2.3 + - bioconda::msgf_plus=2020.08.05 + - bioconda::comet-ms=2019015 + - bioconda::luciphor2=2020_04_03 + - bioconda::percolator=3.5 - bioconda::bioconductor-msstats=3.20.1 # will include R - bioconda::sdrf-pipelines=0.0.9 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports From 374b0607b438f5426303fe6398493e5a16bc2c23 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Thu, 27 Aug 2020 12:35:50 +0200 Subject: [PATCH 334/374] Try to remove java pinning. Since we are not using the full openms-thirdparty, maybe there are no conflicts now? --- environment.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/environment.yml b/environment.yml index 08691b2..475e2be 100644 --- a/environment.yml +++ b/environment.yml @@ -18,7 +18,6 @@ dependencies: - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - - conda-forge::openjdk=8.* # pin java to 8 for MSGF (otherwise it somehow chooses 11) - conda-forge::python=3.8.1 - conda-forge::markdown=3.2.1 - conda-forge::pymdown-extensions=7.1 From 6d675edbe9828f58de206dd239e7f75d924d4e77 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 27 Aug 2020 17:28:14 +0200 Subject: [PATCH 335/374] fix some linter errors. --- main.nf | 22 +++++++++++++++++++++- nextflow.config | 3 --- nextflow_schema.json | 12 +++++++++--- 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/main.nf b/main.nf index ebd8af7..dab1927 100644 --- a/main.nf +++ b/main.nf @@ -1129,7 +1129,7 @@ process ptxqc { file mzTab from out_mzTab output: - file "*.html" + file "*.html" into ch_ptxqc_report file "*.yaml" file "*.Rmd" file "*.pdf" @@ -1288,6 +1288,20 @@ workflow.onComplete { email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = ch_ptxqc_report.getVal() + if (mqc_report.getClass() == ArrayList) { + log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'ptxqc', will use only one" + mqc_report = mqc_report[0] + } + } + } catch (all) { + log.warn "[nf-core/proteomicslfq] Could not attach PTXQC report to summary email" + } + // Check if we are only sending emails on failure email_address = params.email if (!params.email && params.email_on_fail && !workflow.success) { @@ -1404,11 +1418,17 @@ def checkHostname() { } } + +//--------------------------------------------------------------- // +//---------------------- Utility functions --------------------- // +//--------------------------------------------------------------- // + // Check file extension def hasExtension(it, extension) { it.toString().toLowerCase().endsWith(extension.toLowerCase()) } +// Check class of an Object for "List" type boolean isCollectionOrArray(object) { [Collection, Object[]].any { it.isAssignableFrom(object.getClass()) } } diff --git a/nextflow.config b/nextflow.config index aa35c34..ab2a02a 100644 --- a/nextflow.config +++ b/nextflow.config @@ -109,15 +109,12 @@ params { // Boilerplate options name = false - multiqc_config = false email = false email_on_fail = false plaintext_email = false monochrome_logs = false help = false - igenomes_base = 's3://ngi-igenomes/igenomes/' tracedir = "${params.outdir}/pipeline_info" - igenomes_ignore = false custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" hostnames = false diff --git a/nextflow_schema.json b/nextflow_schema.json index 01c4e22..745870c 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -23,7 +23,7 @@ "outdir": { "type": "string", "description": "The output directory where the results will be saved.", - "default": "./results", + "default": "./results", "fa_icon": "fas fa-folder-open" }, "email": { @@ -96,7 +96,7 @@ "tracedir": { "type": "string", "description": "Directory to keep pipeline Nextflow logs and reports.", - "default": "${params.outdir}/pipeline_info", + "default": "${params.outdir}/pipeline_info", "fa_icon": "fas fa-cogs", "hidden": true } @@ -153,7 +153,7 @@ "custom_config_base": { "type": "string", "description": "Base directory for Institutional configs.", - "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", "hidden": true, "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", "fa_icon": "fas fa-users-cog" @@ -476,6 +476,12 @@ "fa_icon": "fas fa-font", "help_text": "For handling the neutral loss from a decoy sequence. The syntax for this is identical to that of the normal neutral losses given above except that the residue is always 'X'. Syntax: DECOY_NL = X - (default: '[X -H3PO4 -97.97690]')", "hidden": true + }, + "luciphor_debug": { + "type": "integer", + "fa_icon": "fas fa-bug", + "description": "Debug level for Luciphor step. Increase for verbose logging and keeping temp files.", + "hidden": true } }, "fa_icon": "fas fa-search-location" From 0d70f443757efdd6951f54eb8b6e148357230426 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 27 Aug 2020 17:31:55 +0200 Subject: [PATCH 336/374] fix some linter errors. --- nextflow.config | 5 ----- 1 file changed, 5 deletions(-) diff --git a/nextflow.config b/nextflow.config index ab2a02a..acd2124 100644 --- a/nextflow.config +++ b/nextflow.config @@ -167,11 +167,6 @@ profiles { dev { includeConfig 'conf/dev.config' } } -// Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} - // Export these variables to prevent local Python/R libraries from conflicting with those in the container env { PYTHONNOUSERSITE = 1 From 502dd43faf252ac09843c0e2f6418f702f48aede Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 27 Aug 2020 17:37:56 +0200 Subject: [PATCH 337/374] fix some linter errors. --- conf/base.config | 1 - 1 file changed, 1 deletion(-) diff --git a/conf/base.config b/conf/base.config index 2f89b91..ad69cd0 100644 --- a/conf/base.config +++ b/conf/base.config @@ -59,5 +59,4 @@ params { max_memory = 128.GB max_cpus = 16 max_time = 240.h - igenomes_base = 's3://ngi-igenomes/igenomes/' } From ea944c1fa81e2ec5e879befc839ddaac91df993a Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sat, 5 Sep 2020 17:21:43 +0200 Subject: [PATCH 338/374] Minor changes to Comet tolerance conversion --- main.nf | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/main.nf b/main.nf index 621655f..3a974da 100644 --- a/main.nf +++ b/main.nf @@ -580,27 +580,30 @@ process search_engine_comet { file "*.log" //TODO we currently ignore the activation_method param to leave the default "ALL" for max. compatibility + //Note: OpenMS CometAdapter will double the number that is passed to fragment_mass_tolerance to "convert" + // it to a fragment_bin_tolerance script: if (frag_tol_unit == "ppm") { // Note: This uses an arbitrary rule to decide if it was hi-res or low-res - // and uses Comet's defaults for bin size, in case unsupported unit "ppm" was given. + // and uses Comet's defaults for bin size (i.e. by passing 0.5*default to the Adapter), in case unsupported unit "ppm" was given. if (frag_tol.toDouble() < 50) { - bin_tol = "0.015" - bin_offset = "0.0" + bin_tol = 0.015 + bin_offset = 0.0 inst = params.instrument ?: "high_res" } else { - bin_tol = "0.50025" - bin_offset = "0.4" + bin_tol = 0.50025 + bin_offset = 0.4 inst = params.instrument ?: "low_res" } log.warn "The chosen search engine Comet does not support ppm fragment tolerances. We guessed a " + inst + " instrument and set the fragment_bin_tolerance to " + bin_tol } else { - bin_tol = frag_tol.toDouble() / 2.0 - bin_offset = frag_tol.toDouble() < 0.1 ? "0.0" : "0.4" + //TODO expose the fragment_bin_offset parameter of comet + bin_tol = frag_tol.toDouble() + bin_offset = bin_tol <= 0.05 ? 0.0 : 0.4 if (!params.instrument) { - inst = frag_tol.toDouble() < 0.1 ? "high_res" : "low_res" + inst = bin_tol <= 0.05 ? "high_res" : "low_res" } else { inst = params.instrument } From d6f03cef8c504d2420542bb1acc4c1dd1e7c7bc9 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 19:19:15 +0100 Subject: [PATCH 339/374] Conda create timeOut increased --- nextflow.config | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/nextflow.config b/nextflow.config index acd2124..4b49b3a 100644 --- a/nextflow.config +++ b/nextflow.config @@ -144,7 +144,10 @@ try { } profiles { - conda { process.conda = "$baseDir/environment.yml" } + conda { + process.conda = "$baseDir/environment.yml" + conda.createTimeout = '1 h' + } debug { process.beforeScript = 'echo $HOSTNAME' } docker { docker.enabled = true From 8124b70d29a38532d47fc39d8362d146ec13e831 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 19:26:01 +0100 Subject: [PATCH 340/374] quantification method exposed --- main.nf | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/main.nf b/main.nf index 621655f..67343fc 100644 --- a/main.nf +++ b/main.nf @@ -21,11 +21,11 @@ def helpMessage() { Main arguments: --input Path/URI to PRIDE Sample to data relation format file (SDRF) OR path to input spectra as mzML or Thermo Raw - - For SDRF: + + For SDRF: --root_folder (Optional) If given, looks for the filenames in the SDRF in this folder, locally --local_input_type (Optional) If given and 'root_folder' was specified, it overwrites the filetype in the SDRF for local lookup and matches only the basename. - + For mzML/raw files: --expdesign (Optional) Path to an experimental design file (if not given, it assumes unfractionated, unrelated samples) @@ -121,6 +121,7 @@ def helpMessage() { --protein_level_fdr_cutoff Protein level FDR cutoff (this affects and chooses the peptides used for quantification) Quantification: + --quantification_method Quantification method supported by proteomicslfq ('feature_intensity' or 'spectral_counting', default: 'feature_intensity') --transfer_ids Transfer IDs over aligned samples to increase # of quantifiable features (WARNING: increased memory consumption). (default: false) TODO must specify true or false --targeted_only Only ID based quantification. (default: true) TODO must specify true or false @@ -608,7 +609,7 @@ process search_engine_comet { // for consensusID the cutting rules need to be the same. So we adapt to the loosest rules from MSGF // TODO find another solution. In ProteomicsLFQ we re-run PeptideIndexer (remove??) and if we - // e.g. add XTandem, after running ConsensusID it will lose the auto-detection ability for the + // e.g. add XTandem, after running ConsensusID it will lose the auto-detection ability for the // XTandem specific rules. if (params.search_engines.contains("msgf")) { @@ -1431,6 +1432,6 @@ def hasExtension(it, extension) { } // Check class of an Object for "List" type -boolean isCollectionOrArray(object) { +boolean isCollectionOrArray(object) { [Collection, Object[]].any { it.isAssignableFrom(object.getClass()) } } From 053aa3daeb6a3d77f9663e9cab31356e7305f822 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 19:27:46 +0100 Subject: [PATCH 341/374] added LSF timeout error --- conf/base.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conf/base.config b/conf/base.config index ad69cd0..b642b42 100644 --- a/conf/base.config +++ b/conf/base.config @@ -16,7 +16,7 @@ process { memory = { check_max( 8.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + errorStrategy = { task.exitStatus in [140,143,137,104,134,139] ? 'retry' : 'finish' } maxRetries = 2 maxErrors = '-1' From fd3185cb28e855d4922f800ebec79f55890911db Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:08:04 +0100 Subject: [PATCH 342/374] json updated --- nextflow_schema.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 745870c..dac7bf8 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -694,7 +694,7 @@ }, "quantification_method": { "type": "string", - "description": "Currently UNSUPPORTED in this workflow. Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs (spectral_counting).", + "description": "Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs (spectral_counting).", "default": "feature_intensity", "enum": [ "feature_intensity", @@ -830,4 +830,4 @@ "$ref": "#/definitions/quality_control" } ] -} \ No newline at end of file +} From d5f18d66be1c28eb0ec21cb11903619503f3bc69 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Mon, 7 Sep 2020 20:11:38 +0100 Subject: [PATCH 343/374] Update conf/base.config Co-authored-by: jpfeuffer --- conf/base.config | 1 + 1 file changed, 1 insertion(+) diff --git a/conf/base.config b/conf/base.config index b642b42..b2f1c4d 100644 --- a/conf/base.config +++ b/conf/base.config @@ -16,6 +16,7 @@ process { memory = { check_max( 8.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } + // nf-core defaults plus 140 (LSF out-of-time) errorStrategy = { task.exitStatus in [140,143,137,104,134,139] ? 'retry' : 'finish' } maxRetries = 2 maxErrors = '-1' From 21d92ea4eb5a36b01ba23ad6394635bab96459e5 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:23:19 +0100 Subject: [PATCH 344/374] skip MSstats when spectral_counting in quantitation_method --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 67343fc..d936316 100644 --- a/main.nf +++ b/main.nf @@ -1095,7 +1095,7 @@ process msstats { publishDir "${params.outdir}/msstats", mode: 'copy' when: - !params.skip_post_msstats + !params.skip_post_msstats && params.quantification_method == "feature_intensity" input: file csv from out_msstats From efece6c260ca6631afafd7a4f8f3ccac77567fdd Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:25:08 +0100 Subject: [PATCH 345/374] skip MSstats when spectral_counting in quantitation_method --- nextflow_schema.json | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index dac7bf8..9491864 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -700,8 +700,7 @@ "feature_intensity", "spectral_counting" ], - "fa_icon": "fas fa-list-ol", - "hidden": true + "fa_icon": "fas fa-list-ol" }, "mass_recalibration": { "type": "boolean", From f4671a40e3a0176278287a57f28482de6b81b61c Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:31:35 +0100 Subject: [PATCH 346/374] Add test for spectral counting --- .github/workflows/ci.yml | 6 +++--- conf/test_speccount.config | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 3 deletions(-) create mode 100644 conf/test_speccount.config diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 335dd98..d877a36 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -21,12 +21,12 @@ jobs: matrix: # Nextflow versions: check pipeline minimum and current latest nxf_ver: ['20.01.0', ''] - test_profile: ['test', 'test_localize'] + test_profile: ['test', 'test_localize', 'test_speccount'] steps: - uses: actions/checkout@v2 - name: Determine tower usage shell: bash - run: echo "::set-env name=TOWER::`[ -z "$TOWER_ACCESS_TOKEN" ] && echo '' || echo '-with-tower'`" + run: echo "::set-env name=TOWER::`[ -z "$TOWER_ACCESS_TOKEN" ] && echo '' || echo '-with-tower'`" id: tower_usage - name: Extract branch name if: github.event_name == 'push' @@ -37,7 +37,7 @@ jobs: ref=${ref#"refs/"} ref=${ref//\//-} echo "::set-env name=RUN_NAME::$ref" - id: extract_branch + id: extract_branch - name: Extract PR number if: github.event_name == 'pull_request' shell: bash diff --git a/conf/test_speccount.config b/conf/test_speccount.config new file mode 100644 index 0000000..f517d3b --- /dev/null +++ b/conf/test_speccount.config @@ -0,0 +1,36 @@ +/* + * ------------------------------------------------- + * Nextflow config file for running tests + * ------------------------------------------------- + * Defines bundled input files and everything required + * to run a fast and simple test. Use as follows: + * nextflow run nf-core/proteomicslfq -profile test, + */ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on Travis + max_cpus = 2 + max_memory = 6.GB + max_time = 1.h + + // Input data + input = [ + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA1_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA2_F2.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F1.mzML', + 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA3_F2.mzML' + ] + database = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/18Protein_SoCe_Tr_detergents_trace_target_decoy.fasta' + expdesign = 'https://raw.githubusercontent.com/nf-core/test-datasets/proteomicslfq/testdata/BSA_design.tsv' + posterior_probabilities = "fit_distributions" + quantification_method="spectral_counting" + search_engines = "msgf" + protein_level_fdr_cutoff = 1.0 + decoy_affix = "rev" + enable_qc = true +} From 99ec4ff0cd2465dfc95eb7a50d9eca299a3786db Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:38:29 +0100 Subject: [PATCH 347/374] Add test for spectral counting --- nextflow.config | 1 + 1 file changed, 1 insertion(+) diff --git a/nextflow.config b/nextflow.config index 4b49b3a..1fafa0f 100644 --- a/nextflow.config +++ b/nextflow.config @@ -167,6 +167,7 @@ profiles { test { includeConfig 'conf/test.config' } test_localize { includeConfig 'conf/test_localize.config' } test_full { includeConfig 'conf/test_full.config' } + test_speccount { includeConfig 'conf/test_speccount.config' } dev { includeConfig 'conf/dev.config' } } From a19d0929f1ab63a6a53f11db7e80ea33705119fb Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 20:42:13 +0100 Subject: [PATCH 348/374] Add test for spectral counting --- conf/test_speccount.config | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/conf/test_speccount.config b/conf/test_speccount.config index f517d3b..7049c5f 100644 --- a/conf/test_speccount.config +++ b/conf/test_speccount.config @@ -8,8 +8,8 @@ */ params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' + config_profile_name = 'Spectral Counting Quantification Method' + config_profile_description = 'Minimal test dataset to check pipeline function using spectral counting' // Limit resources so that this can run on Travis max_cpus = 2 From 0435839c05c10ad1fcf49b7b7c3ad94cfd88bfa9 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 22:01:07 +0100 Subject: [PATCH 349/374] Add test for spectral counting --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d877a36..d5d500f 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -68,7 +68,7 @@ jobs: if: failure() || cancelled() run: | mkdir failed_logs - failed=$(grep "FAILED" results/pipeline_info/execution_trace.txt | cut -f 2) + failed=$(grep "FAILED" ${TEST_PROFILE}_results/pipeline_info/execution_trace.txt | cut -f 2) while read -r line ; do cp $(ls work/${line}*/*.log) failed_logs/ | true ; done <<< "$failed" - uses: actions/upload-artifact@v1 if: failure() || cancelled() From 6896b5e68c4bbad3681995bdeda26105ee6f22e9 Mon Sep 17 00:00:00 2001 From: yperez Date: Mon, 7 Sep 2020 22:43:46 +0100 Subject: [PATCH 350/374] spectral counting conditional in proteomicslfq --- main.nf | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index d936316..41cc5d2 100644 --- a/main.nf +++ b/main.nf @@ -1062,8 +1062,28 @@ process proteomicslfq { file "*.log" script: - """ - ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ + if(params.quantification_method == "spectral_counting") + """ + ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ + -ids ${(id_files as List).join(' ')} \\ + -design ${expdes} \\ + -fasta ${fasta} \\ + -protein_inference ${params.protein_inference} \\ + -quantification_method ${params.quantification_method} \\ + -targeted_only ${params.targeted_only} \\ + -mass_recalibration ${params.mass_recalibration} \\ + -transfer_ids ${params.transfer_ids} \\ + -protein_quantification ${params.protein_quant} \\ + -out out.mzTab \\ + -threads ${task.cpus} \\ + -out_cxml out.consensusXML \\ + -proteinFDR ${params.protein_level_fdr_cutoff} \\ + -debug ${params.inf_quant_debug} \\ + > proteomicslfq.log + """ + else + """ + ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ -ids ${(id_files as List).join(' ')} \\ -design ${expdes} \\ -fasta ${fasta} \\ @@ -1080,7 +1100,8 @@ process proteomicslfq { -proteinFDR ${params.protein_level_fdr_cutoff} \\ -debug ${params.inf_quant_debug} \\ > proteomicslfq.log - """ + """ + } From 11ac18c87f81b4cf802356deecfdd7f5e52ee2b5 Mon Sep 17 00:00:00 2001 From: yperez Date: Tue, 8 Sep 2020 09:55:18 +0100 Subject: [PATCH 351/374] spectral counting conditional in proteomicslfq --- main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 41cc5d2..e811eac 100644 --- a/main.nf +++ b/main.nf @@ -1052,7 +1052,7 @@ process proteomicslfq { output: file "out.mzTab" into out_mztab_plfq, out_mztab_msstats file "out.consensusXML" into out_consensusXML - file "out.csv" into out_msstats + file "out.csv" optional true into out_msstats file "debug_mergedIDs.idXML" optional true file "debug_mergedIDs_inference.idXML" optional true file "debug_mergedIDsGreedyResolved.idXML" optional true From 5ab4f87f8f12365e145f6f324ec40bb85b585ee1 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Tue, 8 Sep 2020 15:20:28 +0200 Subject: [PATCH 352/374] Remove upper limit for significance plots. I had examples where you were missing important proteins. --- bin/msstats_plfq.R | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/bin/msstats_plfq.R b/bin/msstats_plfq.R index 85563f0..0338564 100755 --- a/bin/msstats_plfq.R +++ b/bin/msstats_plfq.R @@ -103,23 +103,23 @@ if (length(lvls) == 1) write.csv(test.MSstats$ComparisonResult, "msstats_results.csv") groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", - width=12, height=12,dot.size = 2,ylimUp = 7) + width=12, height=12,dot.size = 2) test.MSstats$Volcano = test.MSstats$ComparisonResult[!is.na(test.MSstats$ComparisonResult$pvalue),] groupComparisonPlots(data=test.MSstats$Volcano, type="VolcanoPlot", - width=12, height=12,dot.size = 2,ylimUp = 7) + width=12, height=12,dot.size = 2) # Otherwise it fails since the behaviour is undefined if (nrow(contrast_mat) > 1) { groupComparisonPlots(data=test.MSstats$ComparisonResult, type="Heatmap", - width=12, height=12,dot.size = 2,ylimUp = 7) + width=12, height=12,dot.size = 2) } #for (comp in rownames(contrast_mat)) #{ # groupComparisonPlots(data=test.MSstats$ComparisonResult, type="ComparisonPlot", - # width=12, height=12,dot.size = 2,ylimUp = 7, sig=1)#, + # width=12, height=12,dot.size = 2, sig=1)#, # which.Comparison = comp, # address=F) # # try to plot all comparisons From c0d90679a5b2acc7838e39e3f2c0199af2237739 Mon Sep 17 00:00:00 2001 From: yperez Date: Tue, 8 Sep 2020 16:02:54 +0100 Subject: [PATCH 353/374] spectral counting conditional in proteomicslfq --- main.nf | 57 +++++++++++++++++++-------------------------------------- 1 file changed, 19 insertions(+), 38 deletions(-) diff --git a/main.nf b/main.nf index e811eac..47183b0 100644 --- a/main.nf +++ b/main.nf @@ -1062,44 +1062,25 @@ process proteomicslfq { file "*.log" script: - if(params.quantification_method == "spectral_counting") - """ - ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ - -ids ${(id_files as List).join(' ')} \\ - -design ${expdes} \\ - -fasta ${fasta} \\ - -protein_inference ${params.protein_inference} \\ - -quantification_method ${params.quantification_method} \\ - -targeted_only ${params.targeted_only} \\ - -mass_recalibration ${params.mass_recalibration} \\ - -transfer_ids ${params.transfer_ids} \\ - -protein_quantification ${params.protein_quant} \\ - -out out.mzTab \\ - -threads ${task.cpus} \\ - -out_cxml out.consensusXML \\ - -proteinFDR ${params.protein_level_fdr_cutoff} \\ - -debug ${params.inf_quant_debug} \\ - > proteomicslfq.log - """ - else - """ - ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ - -ids ${(id_files as List).join(' ')} \\ - -design ${expdes} \\ - -fasta ${fasta} \\ - -protein_inference ${params.protein_inference} \\ - -quantification_method ${params.quantification_method} \\ - -targeted_only ${params.targeted_only} \\ - -mass_recalibration ${params.mass_recalibration} \\ - -transfer_ids ${params.transfer_ids} \\ - -protein_quantification ${params.protein_quant} \\ - -out out.mzTab \\ - -threads ${task.cpus} \\ - -out_msstats out.csv \\ - -out_cxml out.consensusXML \\ - -proteinFDR ${params.protein_level_fdr_cutoff} \\ - -debug ${params.inf_quant_debug} \\ - > proteomicslfq.log + def msstats_present = params.quantification_method == "feature_intensity" ? '-out_msstats out.csv' : '' + """ + ProteomicsLFQ -in ${(mzmls as List).join(' ')} \\ + -ids ${(id_files as List).join(' ')} \\ + -design ${expdes} \\ + -fasta ${fasta} \\ + -protein_inference ${params.protein_inference} \\ + -quantification_method ${params.quantification_method} \\ + -targeted_only ${params.targeted_only} \\ + -mass_recalibration ${params.mass_recalibration} \\ + -transfer_ids ${params.transfer_ids} \\ + -protein_quantification ${params.protein_quant} \\ + -out out.mzTab \\ + -threads ${task.cpus} \\ + ${msstats_present} \\ + -out_cxml out.consensusXML \\ + -proteinFDR ${params.protein_level_fdr_cutoff} \\ + -debug ${params.inf_quant_debug} \\ + > proteomicslfq.log """ } From d901dcca83e10195c3f5637bcdec6c77ba705786 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 8 Sep 2020 20:43:47 +0100 Subject: [PATCH 354/374] Update nextflow_schema.json Co-authored-by: jpfeuffer --- nextflow_schema.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index 9491864..a73e771 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -694,7 +694,7 @@ }, "quantification_method": { "type": "string", - "description": "Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs (spectral_counting).", + "description": "Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs ('spectral_counting'). WARNING: 'spectral_counting' is not compatible with our MSstats step yet. MSstats will therefore be disabled automatically with that choice.", "default": "feature_intensity", "enum": [ "feature_intensity", From 4e6db84c525ae594e0c77eb61cee63b074074127 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 11 Sep 2020 19:21:27 +0200 Subject: [PATCH 355/374] Add force flag to comet, fix missing email param --- main.nf | 3 ++- nextflow.config | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 621655f..d0e879a 100644 --- a/main.nf +++ b/main.nf @@ -637,6 +637,7 @@ process search_engine_comet { -fragment_mass_tolerance ${bin_tol} \\ -fragment_bin_offset ${bin_offset} \\ -debug ${params.db_debug} \\ + -force \\ > ${mzml_file.baseName}_comet.log """ } @@ -1322,7 +1323,7 @@ workflow.onComplete { def email_html = html_template.toString() // Render the sendmail template - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report ] + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] def sf = new File("$baseDir/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() diff --git a/nextflow.config b/nextflow.config index acd2124..f6aea60 100644 --- a/nextflow.config +++ b/nextflow.config @@ -111,6 +111,7 @@ params { name = false email = false email_on_fail = false + max_multiqc_email_size = 25.MB plaintext_email = false monochrome_logs = false help = false From 538b14276609d8d20d8f65547d68cc7cc5dcef08 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Fri, 11 Sep 2020 19:30:11 +0200 Subject: [PATCH 356/374] changed schema to add qc email size --- nextflow_schema.json | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index a73e771..f1312bc 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -86,6 +86,14 @@ "hidden": true, "help_text": "Set to receive plain-text e-mails instead of HTML formatted." }, + "max_multiqc_email_size": { + "type": "string", + "default": "25 MB", + "fa_icon": "fas fa-file-upload", + "description": "File size limit when attaching QC reports to summary emails.", + "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached.", + "hidden": true + }, "monochrome_logs": { "type": "boolean", "description": "Do not use coloured log outputs.", @@ -829,4 +837,4 @@ "$ref": "#/definitions/quality_control" } ] -} +} \ No newline at end of file From e5067e1ea4d460024077bf0528a6a03d001545d2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 12 Sep 2020 15:30:29 +0200 Subject: [PATCH 357/374] try to fix email function without qc. fix peakpicking channel --- main.nf | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/main.nf b/main.nf index 08208d6..217d5f5 100644 --- a/main.nf +++ b/main.nf @@ -1028,7 +1028,7 @@ process luciphor { // Join mzmls and ids by UID specified per mzml file in the beginning. // ID files can come directly from the Percolator branch, IDPEP branch or // after optional processing with Luciphor -mzmls_plfq +mzmls_plfq.mix(mzmls_plfq_picked) .join(plfq_in_id.mix(plfq_in_id_luciphor)) .multiMap{ it -> mzmls: it[1] @@ -1295,7 +1295,7 @@ workflow.onComplete { email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp // On success try attach the multiqc report - def mqc_report = null + def mqc_report = "" try { if (workflow.success) { mqc_report = ch_ptxqc_report.getVal() @@ -1341,7 +1341,7 @@ workflow.onComplete { } catch (all) { // Catch failures and try with plaintext def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] - if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { + if ( mqc != "" && mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { mail_cmd += [ '-A', mqc_report ] } mail_cmd.execute() << email_html From 6c8500434f728ec0f3bac99291cf91b4144218b3 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 12 Sep 2020 17:01:12 +0200 Subject: [PATCH 358/374] more fix to email functino --- main.nf | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/main.nf b/main.nf index 217d5f5..5f84228 100644 --- a/main.nf +++ b/main.nf @@ -1147,6 +1147,10 @@ process ptxqc { """ } +if (!params.enable_qc) +{ + ch_ptxqc_report = Channel.empty() +} //--------------------------------------------------------------- // @@ -1297,13 +1301,16 @@ workflow.onComplete { // On success try attach the multiqc report def mqc_report = "" try { - if (workflow.success) { + if (workflow.success && ch_ptxqc_report.println()) { mqc_report = ch_ptxqc_report.getVal() if (mqc_report.getClass() == ArrayList) { log.warn "[nf-core/proteomicslfq] Found multiple reports from process 'ptxqc', will use only one" mqc_report = mqc_report[0] } } + else { + mqc_report = "" + } } catch (all) { log.warn "[nf-core/proteomicslfq] Could not attach PTXQC report to summary email" } From dd2acf44297e9141344a8d137c200036f140ad24 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 2 Oct 2020 20:31:20 +0200 Subject: [PATCH 359/374] Fix openms+thirdparty versions --- environment.yml | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/environment.yml b/environment.yml index 475e2be..c1a8053 100644 --- a/environment.yml +++ b/environment.yml @@ -6,13 +6,8 @@ channels: - conda-forge - bioconda dependencies: - # TODO fix versions for release (and also for develop, as soon as we have official nightly conda packages, with e.g. a date as version) - - openms::openms=2.6.0pre # nightly version of 2.6 - - bioconda::thermorawfileparser=1.2.3 - - bioconda::msgf_plus=2020.08.05 - - bioconda::comet-ms=2019015 - - bioconda::luciphor2=2020_04_03 - - bioconda::percolator=3.5 + - bioconda::openms=2.6.0 + - bioconda::openms-thirdparty=2.6.0 - bioconda::bioconductor-msstats=3.20.1 # will include R - bioconda::sdrf-pipelines=0.0.9 # for SDRF conversion - conda-forge::r-ptxqc=1.0.5 # for QC reports From 0f1d9972bf50fb9034172de0148e8c9b27f0adb8 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 2 Oct 2020 20:35:59 +0200 Subject: [PATCH 360/374] Add workflow graph --- docs/images/proteomicslfq.svg | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 docs/images/proteomicslfq.svg diff --git a/docs/images/proteomicslfq.svg b/docs/images/proteomicslfq.svg new file mode 100644 index 0000000..3e866e0 --- /dev/null +++ b/docs/images/proteomicslfq.svg @@ -0,0 +1,3 @@ + + +
ID Comet
ID Comet
mix
mix
if multi-engine
if multi-engine
if single-engine
if single-engine
merge
merge
Percolator
Percolator
ID MSGF
ID MSGF
FDR (if !multi)
FDR (if !multi)
Distribution-based PEP
Distribution-based P...
ConsensusID
ConsensusID
Luciphor (if localize)
Luciphor (if localiz...
combined FDR
combined FDR
Quantification +
Inference and experiment-wide FDR filter
Quantification +...
Switch to q-value/FDR
Switch to q-value/FDR
IDFilter
IDFilter
MSstats
MSstats
PTXQC
PTXQC
Raw file conversion/Indexing
Raw file conversion/...
Viewer does not support full SVG 1.1
\ No newline at end of file From 40d17bb23daef5867004d4ceff2161f785c74c61 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 2 Oct 2020 20:38:13 +0200 Subject: [PATCH 361/374] Add graph to text --- docs/output.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/output.md b/docs/output.md index 26e2b55..1007599 100644 --- a/docs/output.md +++ b/docs/output.md @@ -17,6 +17,9 @@ and processes data using the following steps: 1. PSM/Peptide-level FDR filtering 1. Protein inference and labelfree quantification based on MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ +A rough visualization follows: +![proteomicslfq workflow](./images/proteomicslfq.svg) + ## Output Output is by default written to the $NXF_WORKSPACE/results folder. You can change that with TODO From ec2efa6b203e3c24d591d759a79f9de3278aa4a9 Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 2 Oct 2020 21:06:55 +0200 Subject: [PATCH 362/374] Update output.md --- docs/output.md | 47 +++++++++++++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/docs/output.md b/docs/output.md index 1007599..bec1328 100644 --- a/docs/output.md +++ b/docs/output.md @@ -9,44 +9,47 @@ and processes data using the following steps: 1. (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing 1. (optional) Decoy database generation for the provided DB (fasta) with OpenMS -1. Database search with either MSGF+ or Comet through OpenMS adapters -1. Re-mapping potentially identified peptides to the database for consistency and error-checking (using OpenMS' PeptideIndexer) -1. (Intermediate score switching steps to use appropriate scores for the next step) +1. Database search with either MSGF+ and/or Comet through OpenMS adapters +1. Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS' PeptideIndexer) 1. PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS -1. (Intermediate score switching steps to use appropriate scores for the next step) -1. PSM/Peptide-level FDR filtering -1. Protein inference and labelfree quantification based on MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ +1. If multiple search engines were chosen, the results are combined with OpenMS' ConsensusID +1. If multiple search engines were chosen, a combined FDR is calculated +1. Single run PSM/Peptide-level FDR filtering +1. If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter +1. Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein (and if requested peptide/PSM-level). A rough visualization follows: + ![proteomicslfq workflow](./images/proteomicslfq.svg) ## Output -Output is by default written to the $NXF_WORKSPACE/results folder. You can change that with TODO +Output is by default written to the $NXF_WORKSPACE/results folder. The output consists of the following folders: results * ids - * [${infile}\*.idXML](#identifications) -* logs - * ... + * [\*.idXML](#identifications) +* logs (extended log files for all steps) + * \*.log * msstats * [ComparisonPlot.pdf](#msstats-plots) * [VolcanoPlot.pdf](#msstats-plots) * [Heatmap.pdf](#msstats-plots) * [msstats\_results.csv](#msstats-table) -* pipeline\_info + * [msstats_out.mzTab](#msstats-mztab) +* pipeline\_info (general nextflow infos) * [...](#nextflow-pipeline-info) * proteomics\_lfq * [debug\_\*.idXML](#debug-output) * [out.consensusXML](#consenusxml) * [out.csv](#msstats-ready-quantity-table) * [out.mzTab](#mztab) -* ptxqc - * [report\_v1.0.2\_out.yaml](#ptxqc-yaml-config) - * [report\_v1.0.2\_out\_${hash}.html](#ptxqc-report) - * [report\_v1.0.2\_out\_${hash}.pdf](#ptxqc-report) +* ptxqc (quality control) + * [report\_vX.X.X\_out.yaml](#ptxqc-yaml-config) + * [report\_vX.X.X\_out\_${hash}.html](#ptxqc-report) + * [report\_vX.X.X\_out\_${hash}.pdf](#ptxqc-report) ### Nextflow pipeline info @@ -85,22 +88,26 @@ The `msstats` folder contains MSstats' post-processed (e.g. imputation, outlier measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results. The results will only be available if there was more than one condition. +#### MSstats mzTab + +The mzTab from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. Might contain less quantities since +MSstats filters proteins with too many missing values. + #### MSstats table -See MSstats vignette. +See [MSstats vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats.html). #### MSstats plots -See MSstats vignette for Heatmap, VolcanoPlot and ComparisonPlot (per protein). +See [MSstats vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats.html) for groupComparisonPlots (Heatmap, VolcanoPlot and ComparisonPlot (per protein)). ### PTXQC output -If activated, the `ptxqc` folder will contain the report of the PTXQC R package based on the mzTab output of proteomicsLFQ. -TODO link +If activated, the `ptxqc` folder will contain the report of the [PTXQC R package](https://cran.r-project.org/web/packages/PTXQC/index.html) based on the mzTab output of proteomicsLFQ. #### PTXQC report -See PTXQC vignette. In the report itself the calculated and visualized QC metrics are actually quite extensively described already. +See [PTXQC vignette](https://cran.r-project.org/web/packages/PTXQC/index.html). In the report itself the calculated and visualized QC metrics are actually quite extensively described already. #### PTXQC yaml config From cc9ee11211e82a9bb86a26de7198f88e7e6cf33f Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Fri, 2 Oct 2020 21:09:46 +0200 Subject: [PATCH 363/374] Update output.md --- docs/output.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/output.md b/docs/output.md index bec1328..5821261 100644 --- a/docs/output.md +++ b/docs/output.md @@ -22,10 +22,10 @@ A rough visualization follows: ![proteomicslfq workflow](./images/proteomicslfq.svg) -## Output +## Output structure Output is by default written to the $NXF_WORKSPACE/results folder. -The output consists of the following folders: +The output consists of the following folders (follow the links for a more detailed description): results @@ -51,6 +51,7 @@ results * [report\_vX.X.X\_out\_${hash}.html](#ptxqc-report) * [report\_vX.X.X\_out\_${hash}.pdf](#ptxqc-report) +## Output description ### Nextflow pipeline info Information about the execution and structure of the pipeline. If run with the corresponding nextflow parameters, From 93e8d3ff6d36e9799db7a6f0543df103fdeb6baa Mon Sep 17 00:00:00 2001 From: jpfeuffer Date: Sat, 3 Oct 2020 15:14:37 +0200 Subject: [PATCH 364/374] Update to latest python stuff --- environment.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/environment.yml b/environment.yml index c1a8053..6fd7c6a 100644 --- a/environment.yml +++ b/environment.yml @@ -13,8 +13,8 @@ dependencies: - conda-forge::r-ptxqc=1.0.5 # for QC reports - conda-forge::xorg-libxt=1.2.0 # until this R fix is merged: https://github.com/conda-forge/r-base-feedstock/pull/128 - conda-forge::fonts-conda-ecosystem=1 # for the fonts in QC reports - - conda-forge::python=3.8.1 - - conda-forge::markdown=3.2.1 - - conda-forge::pymdown-extensions=7.1 - - conda-forge::pygments=2.5.2 + - conda-forge::python=3.8.5 + - conda-forge::markdown=3.2.2 + - conda-forge::pymdown-extensions=8.0.1 + - conda-forge::pygments=2.7.1 From b65c4463c3ba2eced18c20e30d2326f4cb74d6f2 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 3 Oct 2020 17:03:29 +0200 Subject: [PATCH 365/374] Removed TODOs --- .github/workflows/awsfulltest.yml | 4 ---- .github/workflows/awstest.yml | 3 --- README.md | 7 ++----- conf/base.config | 2 -- docs/README.md | 2 -- docs/output.md | 2 -- docs/usage.md | 2 -- main.nf | 1 - 8 files changed, 2 insertions(+), 21 deletions(-) diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index aa1608f..6a9a16d 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -20,10 +20,6 @@ jobs: - name: Install awscli run: conda install -c conda-forge awscli - name: Start AWS batch job - # TODO nf-core: You can customise AWS full pipeline tests as required - # Add full size test data (but still relatively small datasets for few samples) - # on the `test_full.config` test runs with only one set of parameters - # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 21fb8b7..69b08a4 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -21,9 +21,6 @@ jobs: - name: Install awscli run: conda install -c conda-forge awscli - name: Start AWS batch job - # TODO nf-core: You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} diff --git a/README.md b/README.md index 5cbb30a..b39f398 100644 --- a/README.md +++ b/README.md @@ -30,8 +30,6 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool 4. Start running your own analysis! - - ```bash nextflow run nf-core/proteomicslfq \ -profile \ @@ -47,7 +45,7 @@ See [usage docs](https://nf-co.re/proteomicslfq/usage) for all of the available The nf-core/proteomicslfq pipeline comes with documentation about the pipeline which you can read at [https://nf-co.re/proteomicslfq](https://nf-co.re/proteomicslfq) or partly find in the [`docs/` directory](docs). - +It performs conversion to indexed mzML, database search (with multiple search engines), re-scoring (with e.g. Percolator), merging, FDR filtering, modification localization with Luciphor2 (e.g. phospho-sites), protein inference and grouping as well as label-free quantification by either spectral counting or feature-based alignment and integration. Downstream processing includes statistical post-processing with MSstats and quality control with PTXQC. For more info, see the [output docs](docs/output.md). ## Credits @@ -61,8 +59,7 @@ For further information or help, don't hesitate to get in touch on the [Slack `# ## Citation - - +If you use nf-core/proteomicslfq for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) You can cite the `nf-core` publication as follows: diff --git a/conf/base.config b/conf/base.config index b2f1c4d..9f80138 100644 --- a/conf/base.config +++ b/conf/base.config @@ -11,7 +11,6 @@ process { - // TODO nf-core: Check the defaults for all processes cpus = { check_max( 2 * task.attempt, 'cpus' ) } memory = { check_max( 8.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } @@ -22,7 +21,6 @@ process { maxErrors = '-1' // Process-specific resource requirements - // TODO nf-core: Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_very_low { cpus = { check_max( 2 * task.attempt, 'cpus' ) } diff --git a/docs/README.md b/docs/README.md index bba9686..fa91132 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,8 +2,6 @@ The nf-core/proteomicslfq documentation is split into the following pages: - - * [Usage](usage.md) * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) diff --git a/docs/output.md b/docs/output.md index aaf371a..4565802 100644 --- a/docs/output.md +++ b/docs/output.md @@ -4,8 +4,6 @@ This document describes the output produced by the pipeline. Most of the plots a The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. - - ## Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) diff --git a/docs/usage.md b/docs/usage.md index d35b9d2..86c986b 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -2,8 +2,6 @@ ## Introduction - - ## Running the pipeline The most simple command for running the pipeline is as follows: diff --git a/main.nf b/main.nf index 5f84228..fd9d0c4 100644 --- a/main.nf +++ b/main.nf @@ -1163,7 +1163,6 @@ log.info nfcoreHeader() def summary = [:] if (workflow.revision) summary['Pipeline Release'] = workflow.revision summary['Run Name'] = custom_runName ?: workflow.runName -// TODO nf-core: Report custom parameters here summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output dir'] = params.outdir From d7860b5bca8dca6e11d77cfc9db6cf00efbe850b Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sat, 3 Oct 2020 17:13:17 +0200 Subject: [PATCH 366/374] Added links --- docs/output.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/output.md b/docs/output.md index 5821261..fe2a7b7 100644 --- a/docs/output.md +++ b/docs/output.md @@ -61,7 +61,7 @@ info on memory consumption, CPU usage and runtimes. ### Identifications Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS' -internal idXML format. TODO link to schema. +internal [idXML](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/SCHEMAS/IdXML_1_5.xsd) format. TODO link to schema. ### ProteomicsLFQ main output @@ -70,7 +70,7 @@ And is avaible in three different formats. #### ConsensusXML -A consensusXML file (TODO link to schema or description) as the closest representation of the internal data +A [consensusXML](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/SCHEMAS/ConsensusXML_1_7.xsd) file as the closest representation of the internal data structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools. #### MSstats-ready quantity table @@ -81,18 +81,17 @@ about the experimental design used by MSstats. #### mzTab -A complete mzTab file ready for submission to PRIDE. TODO link to mzTab schema/guide. +A complete [mzTab](https://github.com/HUPO-PSI/mzTab) file ready for submission to [PRIDE](https://www.ebi.ac.uk/pride/). ### MSstats output -The `msstats` folder contains MSstats' post-processed (e.g. imputation, outlier removal) quantities and statistical +The `msstats` folder contains [MSstats](https://github.com/MeenaChoi/MSstats)' post-processed (e.g. imputation, outlier removal) quantities and statistical measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results. The results will only be available if there was more than one condition. #### MSstats mzTab -The mzTab from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. Might contain less quantities since -MSstats filters proteins with too many missing values. +The [mzTab](https://github.com/HUPO-PSI/mzTab) from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. This might contain less quantities since MSstats filters proteins with too many missing values. #### MSstats table From d3cda97e66af4697ff742fadfc52190721e5be9e Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Mon, 5 Oct 2020 13:26:52 +0200 Subject: [PATCH 367/374] Update version numbers for releaes --- .github/workflows/ci.yml | 4 ++-- Dockerfile | 4 ++-- environment.yml | 2 +- nextflow.config | 4 ++-- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d5d500f..7fd63b7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -51,12 +51,12 @@ jobs: environment.yml - name: Build new docker image if: env.GIT_DIFF - run: docker build --no-cache . -t nfcore/proteomicslfq:dev + run: docker build --no-cache . -t nfcore/proteomicslfq:1.0.0 - name: Pull docker image if: ${{ !env.GIT_DIFF }} run: | docker pull nfcore/proteomicslfq:dev - docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:dev + docker tag nfcore/proteomicslfq:dev nfcore/proteomicslfq:1.0.0 - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash diff --git a/Dockerfile b/Dockerfile index c73e069..baccf5a 100644 --- a/Dockerfile +++ b/Dockerfile @@ -7,14 +7,14 @@ COPY environment.yml / RUN conda env create --quiet -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0dev/bin:$PATH +ENV PATH /opt/conda/envs/nf-core-proteomicslfq-1.0.0/bin:$PATH # OpenMS Adapters need the raw jars of Java-based bioconda tools in the PATH. Not the wrappers that conda creates. RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) RUN cp $(find /opt/conda/envs/nf-core-proteomicslfq-*/share/luciphor2-*/luciphor2.jar -maxdepth 0) $(find /opt/conda/envs/nf-core-proteomicslfq-*/bin/ -maxdepth 0) # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-proteomicslfq-1.0dev > nf-core-proteomicslfq-1.0dev.yml +RUN conda env export --name nf-core-proteomicslfq-1.0.0 > nf-core-proteomicslfq-1.0.0.yml # Instruct R processes to use these empty files instead of clashing with a local version RUN touch .Rprofile diff --git a/environment.yml b/environment.yml index 6fd7c6a..05bd1b1 100644 --- a/environment.yml +++ b/environment.yml @@ -1,6 +1,6 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml -name: nf-core-proteomicslfq-1.0dev +name: nf-core-proteomicslfq-1.0.0 channels: - openms - conda-forge diff --git a/nextflow.config b/nextflow.config index 1244012..7dafd1d 100644 --- a/nextflow.config +++ b/nextflow.config @@ -132,7 +132,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/proteomicslfq:dev' +process.container = 'nfcore/proteomicslfq:1.0.0' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -206,7 +206,7 @@ manifest { description = 'Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.' mainScript = 'main.nf' nextflowVersion = '!>=20.01.0' - version = '1.0dev' + version = '1.0.0' } // Function to ensure that resource requirements don't go beyond From 51e75f522e4b1ac5aa1eebf4e4df643201c4a09d Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 13 Oct 2020 11:07:20 +0200 Subject: [PATCH 368/374] review comments --- CHANGELOG.md | 15 ++++++++++++++- bin/plotPercolatorWeights.py | 2 +- conf/big-nodes.config | 1 - conf/dev.config | 2 +- conf/test.config | 4 ++-- conf/test_full.config | 2 +- conf/test_localize.config | 4 ++-- conf/test_speccount.config | 2 +- docs/output.md | 2 +- docs/usage.md | 2 +- 10 files changed, 24 insertions(+), 12 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8f785de..af26124 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,12 +3,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0dev - [date] +## v1.0.0 - [14.10.2020] Initial release of nf-core/proteomicslfq, created with the [nf-core](https://nf-co.re/) template. ### `Added` +The initial version of the pipeline features the following steps: + + - (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing + - (optional) Decoy database generation for the provided DB (fasta) with OpenMS + - Database search with either MSGF+ and/or Comet through OpenMS adapters + - Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS' PeptideIndexer) + - PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS + - If multiple search engines were chosen, the results are combined with OpenMS' ConsensusID + - If multiple search engines were chosen, a combined FDR is calculated + - Single run PSM/Peptide-level FDR filtering + - If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter + - Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein level (and if requested peptide/PSM level). + ### `Fixed` ### `Dependencies` diff --git a/bin/plotPercolatorWeights.py b/bin/plotPercolatorWeights.py index 44ece9f..c21f5a4 100644 --- a/bin/plotPercolatorWeights.py +++ b/bin/plotPercolatorWeights.py @@ -49,4 +49,4 @@ # Create legend & Show graphic plt.legend() -plt.show() \ No newline at end of file +plt.show() diff --git a/conf/big-nodes.config b/conf/big-nodes.config index f92f903..1dc0a97 100644 --- a/conf/big-nodes.config +++ b/conf/big-nodes.config @@ -54,4 +54,3 @@ process { cache = false } } - diff --git a/conf/dev.config b/conf/dev.config index e508461..aef4458 100644 --- a/conf/dev.config +++ b/conf/dev.config @@ -4,7 +4,7 @@ * ------------------------------------------------- * Only overwrites the container. See dev/ folder for building instructions. * Use as follows: - * nextflow run nf-core/proteomicslfq -profile dev, + * nextflow run nf-core/proteomicslfq -profile dev, */ params { diff --git a/conf/test.config b/conf/test.config index eb847b9..d67132d 100644 --- a/conf/test.config +++ b/conf/test.config @@ -4,7 +4,7 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test, + * nextflow run nf-core/proteomicslfq -profile test, */ params { @@ -32,4 +32,4 @@ params { protein_level_fdr_cutoff = 1.0 decoy_affix = "rev" enable_qc = true -} \ No newline at end of file +} diff --git a/conf/test_full.config b/conf/test_full.config index 1c64a05..1b38869 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -4,7 +4,7 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a comprehensive test with data downloaded from PRIDE. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test_full, + * nextflow run nf-core/proteomicslfq -profile test_full, * * For a short test of functionality, see the 'test' profile/config. */ diff --git a/conf/test_localize.config b/conf/test_localize.config index 81159f0..0e7ef1e 100644 --- a/conf/test_localize.config +++ b/conf/test_localize.config @@ -5,7 +5,7 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test_localize, + * nextflow run nf-core/proteomicslfq -profile test_localize, */ params { @@ -23,4 +23,4 @@ params { enable_mod_localization = true search_engines = 'comet,msgf' enable_qc = true -} \ No newline at end of file +} diff --git a/conf/test_speccount.config b/conf/test_speccount.config index 7049c5f..af6ca95 100644 --- a/conf/test_speccount.config +++ b/conf/test_speccount.config @@ -4,7 +4,7 @@ * ------------------------------------------------- * Defines bundled input files and everything required * to run a fast and simple test. Use as follows: - * nextflow run nf-core/proteomicslfq -profile test, + * nextflow run nf-core/proteomicslfq -profile test, */ params { diff --git a/docs/output.md b/docs/output.md index e2aa277..0ac1895 100644 --- a/docs/output.md +++ b/docs/output.md @@ -67,7 +67,7 @@ results ### Identifications Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS' -internal [idXML](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/SCHEMAS/IdXML_1_5.xsd) format. TODO link to schema. +internal [idXML](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/SCHEMAS/IdXML_1_5.xsd) format. ### ProteomicsLFQ main output diff --git a/docs/usage.md b/docs/usage.md index 86c986b..4e33993 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -7,7 +7,7 @@ The most simple command for running the pipeline is as follows: ```bash -nextflow run nf-core/proteomicslfq --spectra '*.mzML' --database '*.fasta' -profile docker +nextflow run nf-core/proteomicslfq --input '*.mzML' --database '*.fasta' -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. From d9a3314b3fce39b2fae8df6e15ada9202731dda0 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 13 Oct 2020 11:34:21 +0200 Subject: [PATCH 369/374] new overview svg --- docs/images/proteomicslfq.svg | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/images/proteomicslfq.svg b/docs/images/proteomicslfq.svg index 3e866e0..480530d 100644 --- a/docs/images/proteomicslfq.svg +++ b/docs/images/proteomicslfq.svg @@ -1,3 +1,3 @@ -
ID Comet
ID Comet
mix
mix
if multi-engine
if multi-engine
if single-engine
if single-engine
merge
merge
Percolator
Percolator
ID MSGF
ID MSGF
FDR (if !multi)
FDR (if !multi)
Distribution-based PEP
Distribution-based P...
ConsensusID
ConsensusID
Luciphor (if localize)
Luciphor (if localiz...
combined FDR
combined FDR
Quantification +
Inference and experiment-wide FDR filter
Quantification +...
Switch to q-value/FDR
Switch to q-value/FDR
IDFilter
IDFilter
MSstats
MSstats
PTXQC
PTXQC
Raw file conversion/Indexing
Raw file conversion/...
Viewer does not support full SVG 1.1
\ No newline at end of file +
ID Comet
ID Comet
mix
mix
if multi-engine
if multi-engine
if single-engine
if single-engine
merge
merge
Percolator
Percolator
ID MSGF
ID MSGF
FDR
FDR
Distribution-based PEP
Distribution-based P...
ConsensusID
ConsensusID
Luciphor
Luciphor
combined FDR
combined FDR
Quantification +
Inference and experiment-wide FDR filter
Quantification +...
Switch to q-value/FDR
Switch to q-value/FDR
IDFilter
IDFilter
MSstats
MSstats
PTXQC
PTXQC
Raw file conversion/Indexing
Raw file conversion/...
if !multi-engine
if !multi-engine
if localize
if localize
if necessary
if necessary
Decoy generation
Decoy generation
if requested
if requested
Database
Database
Input
Input
+
+
Design
Spectra
Spectra
or
or
or
or
Viewer does not support full SVG 1.1
\ No newline at end of file From c52f654798925d5ed9e2ed4bd596d05a5bd68677 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Tue, 13 Oct 2020 12:38:41 +0200 Subject: [PATCH 370/374] indent with spaces --- CHANGELOG.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index af26124..dc9d814 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,16 +11,16 @@ Initial release of nf-core/proteomicslfq, created with the [nf-core](https://nf- The initial version of the pipeline features the following steps: - - (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing - - (optional) Decoy database generation for the provided DB (fasta) with OpenMS - - Database search with either MSGF+ and/or Comet through OpenMS adapters - - Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS' PeptideIndexer) - - PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS - - If multiple search engines were chosen, the results are combined with OpenMS' ConsensusID - - If multiple search engines were chosen, a combined FDR is calculated - - Single run PSM/Peptide-level FDR filtering - - If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter - - Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein level (and if requested peptide/PSM level). + - (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS' FileConverter if just an index is missing + - (optional) Decoy database generation for the provided DB (fasta) with OpenMS + - Database search with either MSGF+ and/or Comet through OpenMS adapters + - Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS' PeptideIndexer) + - PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS + - If multiple search engines were chosen, the results are combined with OpenMS' ConsensusID + - If multiple search engines were chosen, a combined FDR is calculated + - Single run PSM/Peptide-level FDR filtering + - If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter + - Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein level (and if requested peptide/PSM level). ### `Fixed` From eb5f7a004c01ed948a24be4c05d2e2309df94c24 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Thu, 15 Oct 2020 17:18:45 +0200 Subject: [PATCH 371/374] little fix in the schema --- nextflow_schema.json | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index f1312bc..54bbba3 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -716,9 +716,14 @@ "fa_icon": "far fa-check-square" }, "transfer_ids": { - "type": "boolean", - "description": "Transfer IDs over aligned samples to increase the number of quantifiable features (WARNING: increased memory consumption). (default: 'false')", - "fa_icon": "far fa-check-square" + "type": "string", + "description": "How to transfer IDs over aligned samples to increase the number of quantifiable features (WARNING: increased memory consumption). 'mean' uses the mean of the retention times of the identified and matched features to look for unidentified features in the rest of the maps [if at least 50% of the aligned maps had the same identification]. 'false' does nothing. (default: 'false')", + "default": "false", + "enum": [ + "false", + "mean" + ], + "fa_icon": "fas fa-list-ol" }, "targeted_only": { "type": "boolean", From ff170b2ddec0c8a0a0020784d87376e3917f2f15 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 18 Oct 2020 11:16:20 +0200 Subject: [PATCH 372/374] review suggestions --- CHANGELOG.md | 2 +- conf/big-nodes.config | 2 +- conf/igenomes.config | 421 ------------------------------------------ docs/output.md | 2 +- nextflow.config | 3 + nextflow_schema.json | 6 +- 6 files changed, 9 insertions(+), 427 deletions(-) delete mode 100644 conf/igenomes.config diff --git a/CHANGELOG.md b/CHANGELOG.md index dc9d814..e82beb6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0.0 - [14.10.2020] +## v1.0.0 - Lovely Logan [17.10.2020] Initial release of nf-core/proteomicslfq, created with the [nf-core](https://nf-co.re/) template. diff --git a/conf/big-nodes.config b/conf/big-nodes.config index 1dc0a97..c259445 100644 --- a/conf/big-nodes.config +++ b/conf/big-nodes.config @@ -3,7 +3,7 @@ * nf-core/proteomicslfq Nextflow big-nodes config file * ------------------------------------------------- * A 'big-nodes' config file, appropriate for general - * use on most high performace compute environments with datasets with big RAW + * use on most high performance compute environments with datasets with big RAW * files. This configuration is used for big mzML files and datasets where * the size of the mzML is higher than 10GB. It also contains parameters * for error handling. For example, errorStrategyError = 130 is used also diff --git a/conf/igenomes.config b/conf/igenomes.config deleted file mode 100644 index caeafce..0000000 --- a/conf/igenomes.config +++ /dev/null @@ -1,421 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for iGenomes paths - * ------------------------------------------------- - * Defines reference genomes, using iGenome paths - * Can be used by any config that customises the base - * path using $params.igenomes_base / --igenomes_base - */ - -params { - // illumina iGenomes reference file paths - genomes { - 'GRCh37' { - fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/GRCh37-blacklist.bed" - } - 'GRCh38' { - fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" - } - 'GRCm38' { - fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/GRCm38-blacklist.bed" - } - 'TAIR10' { - fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" - mito_name = "Mt" - } - 'EB2' { - fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" - } - 'UMD3.1' { - fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" - mito_name = "MT" - } - 'WBcel235' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" - mito_name = "MtDNA" - macs_gsize = "9e7" - } - 'CanFam3.1' { - fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" - mito_name = "MT" - } - 'GRCz10' { - fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'BDGP6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" - mito_name = "M" - macs_gsize = "1.2e8" - } - 'EquCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" - mito_name = "MT" - } - 'EB1' { - fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" - } - 'Galgal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'Gm01' { - fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" - } - 'Mmul_1' { - fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" - mito_name = "MT" - } - 'IRGSP-1.0' { - fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'CHIMP2.1.4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" - mito_name = "MT" - } - 'Rnor_6.0' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'R64-1-1' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" - mito_name = "MT" - macs_gsize = "1.2e7" - } - 'EF2' { - fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.21e7" - } - 'Sbi1' { - fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" - } - 'Sscrofa10.2' { - fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" - mito_name = "MT" - } - 'AGPv3' { - fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'hg38' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg38-blacklist.bed" - } - 'hg19' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${baseDir}/assets/blacklists/hg19-blacklist.bed" - } - 'mm10' { - fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.87e9" - blacklist = "${baseDir}/assets/blacklists/mm10-blacklist.bed" - } - 'bosTau8' { - fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'ce10' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "9e7" - } - 'canFam3' { - fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" - mito_name = "chrM" - } - 'danRer10' { - fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.37e9" - } - 'dm6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.2e8" - } - 'equCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" - mito_name = "chrM" - } - 'galGal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" - mito_name = "chrM" - } - 'panTro4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" - mito_name = "chrM" - } - 'rn6' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'sacCer3' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" - readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.2e7" - } - 'susScr3' { - fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" - mito_name = "chrM" - } - } -} diff --git a/docs/output.md b/docs/output.md index 0ac1895..cd62537 100644 --- a/docs/output.md +++ b/docs/output.md @@ -72,7 +72,7 @@ internal [idXML](https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/SCHE ### ProteomicsLFQ main output The `proteomics_lfq` folder contains the output of the pipeline without any statistical postprocessing. -And is avaible in three different formats. +It is available in three different formats: #### ConsensusXML diff --git a/nextflow.config b/nextflow.config index 7dafd1d..17e05e7 100644 --- a/nextflow.config +++ b/nextflow.config @@ -158,6 +158,9 @@ profiles { // once this is established and works well, nextflow might implement this behavior as new default. docker.runOptions = '-u \$(id -u):\$(id -g)' } + podman { + podman.enabled = true + } singularity { singularity.enabled = true singularity.autoMounts = true diff --git a/nextflow_schema.json b/nextflow_schema.json index 54bbba3..9a5f9a7 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -702,7 +702,7 @@ }, "quantification_method": { "type": "string", - "description": "Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs ('spectral_counting'). WARNING: 'spectral_counting' is not compatible with our MSstats step yet. MSstats will therefore be disabled automatically with that choice.", + "description": "Choose between feature-based quantification based on integrated MS1 signals ('feature_intensity'; default) or spectral counting of PSMs ('spectral_counting'). **WARNING:** 'spectral_counting' is not compatible with our MSstats step yet. MSstats will therefore be disabled automatically with that choice.", "default": "feature_intensity", "enum": [ "feature_intensity", @@ -717,7 +717,7 @@ }, "transfer_ids": { "type": "string", - "description": "How to transfer IDs over aligned samples to increase the number of quantifiable features (WARNING: increased memory consumption). 'mean' uses the mean of the retention times of the identified and matched features to look for unidentified features in the rest of the maps [if at least 50% of the aligned maps had the same identification]. 'false' does nothing. (default: 'false')", + "description": "Tries a targeted requantification in files where an ID is missing, based on aggregate properties (i.e. RT) of the features in other aligned files (e.g. 'mean' of RT). (**WARNING:** increased memory consumption and runtime). 'false' turns this feature off. (default: 'false')", "default": "false", "enum": [ "false", @@ -727,7 +727,7 @@ }, "targeted_only": { "type": "boolean", - "description": "Only looks for quantifiable features at locations with an identified spectrum. Set to false to include unidentified features so they can be linked to identified ones (=match between runs)", + "description": "Only looks for quantifiable features at locations with an identified spectrum. Set to false to include unidentified features so they can be linked and matched to identified ones (= match between runs). (default: 'true')", "default": true, "fa_icon": "far fa-check-square" }, From 433470afa1746060270c7affc6daeb25a351c31c Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 18 Oct 2020 11:30:53 +0200 Subject: [PATCH 373/374] Known issues --- CHANGELOG.md | 6 +++++- main.nf | 13 +++++++------ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e82beb6..c2520ca 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0.0 - Lovely Logan [17.10.2020] +## v1.0.0 - Lovely Logan [18.10.2020] Initial release of nf-core/proteomicslfq, created with the [nf-core](https://nf-co.re/) template. @@ -22,6 +22,10 @@ The initial version of the pipeline features the following steps: - If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter - Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS' ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein level (and if requested peptide/PSM level). +### `Known issues` + +If you experience nextflow running forever after a failed step, try settings errorStrategy = terminate. See https://github.com/nextflow-io/nextflow/issues/1457 + ### `Fixed` ### `Dependencies` diff --git a/main.nf b/main.nf index 689eb6c..8096a19 100644 --- a/main.nf +++ b/main.nf @@ -503,10 +503,10 @@ process search_engine_msgf { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' // --------------------------------------------------------------------------------------------------------------------- - // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- + // ------------- WARNING: If you experience nextflow running forever after a failure, set the following ---------------- // --------------------------------------------------------------------------------------------------------------------- - // I actually dont know, where else this would be needed. - errorStrategy 'terminate' + // This is probably true for other processes as well. See https://github.com/nextflow-io/nextflow/issues/1457 + // errorStrategy 'terminate' input: tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_msgf.mix(searchengine_in_db_decoy_msgf).combine(mzmls_msgf.mix(mzmls_msgf_picked).join(ch_sdrf_config.msgf_settings)) @@ -566,10 +566,11 @@ process search_engine_comet { publishDir "${params.outdir}/logs", mode: 'copy', pattern: '*.log' // --------------------------------------------------------------------------------------------------------------------- - // ------------- WARNING: THIS IS A HACK. IT JUST DOES NOT WORK IF THIS PROCESS IS RETRIED ----------------------------- + // ------------- WARNING: If you experience nextflow running forever after a failure, set the following ---------------- // --------------------------------------------------------------------------------------------------------------------- - // I actually dont know, where else this would be needed. - errorStrategy 'terminate' + // This is probably true for other processes as well. See https://github.com/nextflow-io/nextflow/issues/1457 + //errorStrategy 'terminate' + input: tuple file(database), mzml_id, path(mzml_file), fixed, variable, label, prec_tol, prec_tol_unit, frag_tol, frag_tol_unit, diss_meth, enzyme from searchengine_in_db_comet.mix(searchengine_in_db_decoy_comet).combine(mzmls_comet.mix(mzmls_comet_picked).join(ch_sdrf_config.comet_settings)) From 7d888dd5b4ac3cb910d5e9ad2848698690ecdee5 Mon Sep 17 00:00:00 2001 From: Julianus Pfeuffer Date: Sun, 18 Oct 2020 11:33:56 +0200 Subject: [PATCH 374/374] no bare url --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c2520ca..c2dbc3c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,7 +24,7 @@ The initial version of the pipeline features the following steps: ### `Known issues` -If you experience nextflow running forever after a failed step, try settings errorStrategy = terminate. See https://github.com/nextflow-io/nextflow/issues/1457 +If you experience nextflow running forever after a failed step, try settings errorStrategy = terminate. See the corresponding [nextflow issue](https://github.com/nextflow-io/nextflow/issues/1457). ### `Fixed`