Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trimming, fastqc, software versions, email notifications from nf-core #14

Open
wants to merge 36 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ef80511
Add trimming, fastqc, software versions, email notifications
olgabot Apr 29, 2019
f67cda6
Change order of test
olgabot Apr 29, 2019
4ed19f0
Add workflow summary from nf-core
olgabot Apr 29, 2019
32b5673
Add configuration folder
olgabot Apr 29, 2019
b2a7805
add resource requirements for fastp
olgabot Apr 29, 2019
b4aae64
nf-core/nf-large-assembly --> czbiohub/nf-kmer-simlarity
olgabot Apr 29, 2019
c5ec8cb
Add manifest
olgabot Apr 29, 2019
9fc69c1
Add fastp, fastqc, multiqc, R to dockerfile
olgabot Apr 29, 2019
0567c46
Set paramteer defaults
olgabot Apr 29, 2019
90c205e
use separate containers for processes as they are smaller
olgabot Apr 29, 2019
ab7539b
Use czbiohub/nf-kmer-similiarty container
olgabot Apr 30, 2019
353fce6
Add default outdir
olgabot Apr 30, 2019
a408777
Use czbiohub/nf-kmer-similiarty container for ALL
olgabot May 1, 2019
71fcb60
use docker profile for local testing
olgabot May 1, 2019
22f2c13
Copy utility scripts to docker image
olgabot May 1, 2019
2317840
add more padding to visual output for workflow summary
olgabot May 2, 2019
61c2294
Add assets e.g. email template
olgabot May 3, 2019
fe18699
Update conda, init conda
olgabot May 3, 2019
e255df8
use modern conda activate
olgabot May 3, 2019
e6126c6
Protect glob with quotes'
olgabot May 3, 2019
1135c7c
Always use czbiohub/nf-kmer-similarity container
olgabot May 3, 2019
93db8be
use conda to install khmer and sourmash dependencies
olgabot May 3, 2019
a7f44c4
Use just docker mode for testing
olgabot May 3, 2019
2a715d3
use just 'r' for environment name'
olgabot May 3, 2019
06ae43d
Install everything using conda
olgabot May 6, 2019
5af176e
No activating environments
olgabot May 6, 2019
1b9a2c7
install gcc
olgabot May 7, 2019
a5afb91
Don't require PRs to master from dev
olgabot May 21, 2019
739b4da
Add test with docker
olgabot May 21, 2019
a613853
Don't lint pipeline code
olgabot May 21, 2019
387c23c
add trimming to tests
olgabot May 21, 2019
9d010ac
initial commit of making trimming an option
olgabot May 21, 2019
24df9bc
Unassign read files trimming
olgabot May 21, 2019
16bd7a2
Add output.md file
olgabot May 21, 2019
1a23de1
fix output of trimmed reads
olgabot Jun 7, 2019
f479289
don't change folder
olgabot Jun 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ matrix:
fast_finish: true

before_install:
# PRs to master are only ok if coming from dev branch
- '[ $TRAVIS_PULL_REQUEST = "false" ] || [ $TRAVIS_BRANCH != "master" ] || ([ $TRAVIS_PULL_REQUEST_SLUG = $TRAVIS_REPO_SLUG ] && [ $TRAVIS_PULL_REQUEST_BRANCH = "dev" ])'
# Pull the docker image first so the test doesn't wait for this
- docker pull czbiohub/nf-kmer-similarity
# Fake the tag locally so that the pipeline runs properly
Expand All @@ -23,14 +21,12 @@ install:
# Install nf-core/tools
- pip install nf-core
# Reset
- mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests
# - mkdir ${TRAVIS_BUILD_DIR}/tests && cd ${TRAVIS_BUILD_DIR}/tests

env:
- NXF_VER='19.03.0-edge' # Specify a minimum NF version that should be tested and work
- NXF_VER='' # Plus: get the latest NF version and check that it works

script:
# Lint the pipeline code
- nf-core lint ${TRAVIS_BUILD_DIR}
# Run the pipeline with the test profile
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
32 changes: 20 additions & 12 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,53 +1,61 @@
FROM continuumio/anaconda3
FROM nfcore/base
MAINTAINER [email protected]

# Suggested tags from https://microbadger.com/labels
ARG VCS_REF
LABEL org.label-schema.vcs-ref=$VCS_REF \
org.label-schema.vcs-url="e.g. https://github.com/czbiohub/nf-kmer-similarity"
org.label-schema.vcs-url="https://github.com/czbiohub/nf-kmer-similarity"


WORKDIR /home

USER root

# Add user "main" because that's what is expected by this image
RUN useradd -ms /bin/bash main
# RUN useradd -ms /bin/bash main


ENV PACKAGES zlib1g git g++ make ca-certificates gcc zlib1g-dev libc6-dev procps

### don't modify things below here for version updates etc.

WORKDIR /home

ENV PACKAGES zlib1g git g++ make ca-certificates gcc zlib1g-dev libc6-dev procps
RUN apt-get update && \
apt-get install -y --no-install-recommends ${PACKAGES} && \
apt-get clean

RUN conda install --yes Cython bz2file pytest numpy matplotlib scipy sphinx alabaster
# Set always yes
RUN conda config --set always_yes yes --set changeps1 no

RUN cd /home && \
git clone https://github.com/dib-lab/khmer.git -b master && \
cd khmer && \
python3 setup.py install
RUN conda config --add channels conda-forge
RUN conda config --add channels bioconda

RUN conda update --yes -n base -c defaults conda && conda init $(basename $SHELL) && exec $SHELL
# Use conda to install khmer and sourmash scientific python dependencies
# RUN conda install --yes Cython bz2file pytest numpy matplotlib scipy sphinx alabaster khmer

COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/czbiohub-nf-kmer-similarity-0.1/bin:$PATH

# Check that khmer was installed properly
RUN trim-low-abund.py --help
RUN trim-low-abund.py --version

RUN conda install --channel bioconda --yes sourmash

# Required for multiprocessing of 10x bam file
# RUN pip install pathos bamnostic

# ENV SOURMASH_VERSION master
RUN cd /home && \
git clone https://github.com/dib-lab/sourmash.git && \
cd sourmash && \
python3 setup.py install
pip install .

RUN which -a python3
RUN python3 --version
RUN sourmash info
COPY docker/sysctl.conf /etc/sysctl.conf

# Copy utility scripts to docker image
COPY bin/* /usr/local/bin/
21 changes: 13 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,39 +22,44 @@ run_ndnd_local:
sudo nextflow run main.nf -work-dir ${HOME}/pure-scratch/nextflow/ \
-process.executor='local'

test_docker:
nextflow run -profile test,docker main.nf -ansi-log false
nextflow run -profile test,docker main.nf --no_trimming

test_sra:
nextflow run main.nf --sra "SRP016501" -profile local \
nextflow run main.nf --sra "SRP016501"\
--ksizes 11 \
--log2_sketch_sizes 2 \
--molecules dna
--molecules dna \
-profile docker,local


test_samplescsv:
nextflow run main.nf --ksizes 3,9 --log2_sketch_sizes 2,3 \
--outdir testing-output/samplescsv/ \
--molecules dna,protein \
--samples testing/samples.csv \
-profile local
-profile docker

test_read_pairs:
nextflow run main.nf \
-latest \
--ksizes 3,9 \
--log2_sketch_sizes 2,4 \
--molecules dna,protein \
--read_pairs testing/fastqs/*{1,2}.fastq.gz \
-profile local
--read_pairs 'testing/fastqs/*{1,2}.fastq.gz' \
-profile docker -dump-channels

test_fastas:
nextflow run main.nf \
--ksizes 3,9 \
--log2_sketch_sizes 2,4 \
--molecules dna,protein \
--fastas testing/fastas/*.fasta \
-profile local
--fastas 'testing/fastas/*.fasta' \
-profile docker


test: test_sra test_samplescsv test_read_pairs test_fastas
test: test_read_pairs test_fastas test_samplescsv test_sra



Expand Down
Binary file added assets/NGI_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/SciLifeLab_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions assets/biotypes_header.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# id: 'biotype-counts'
# section_name: 'Biotype Counts'
# description: "shows reads overlapping genomic features of different biotypes,
# counted by <a href='http://bioinf.wehi.edu.au/featureCounts'>featureCounts</a>."
# plot_type: 'bargraph'
# anchor: 'featurecounts_biotype'
# pconfig:
# id: "featureCounts_biotype_plot"
# title: "featureCounts: Biotypes"
# xlab: "# Reads"
# cpswitch_counts_label: "Number of Reads"
71 changes: 71 additions & 0 deletions assets/email_template.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<html>
<head>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">

<meta name="description" content="czbiohub/rnaseq: a bioinformatics best-practice analysis pipeline used for RNA sequencing data at the National Genomics Infrastructure at SciLifeLab Stockholm, Sweden.">
<title>czbiohub/rnaseq Pipeline Report</title>
</head>
<body>
<div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">

<img src="cid:czbiohubrnaseqlogo">

<h1>czbiohub/rnaseq: version ${version}</h1>
<h2>Run Name: $runName</h2>

<% if (!success){
out << """
<div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
<h4 style="margin-top:0; color: inherit;">czbiohub/rnaseq execution completed unsuccessfully!</h4>
<p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
<p>The full error message was:</p>
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
</div>
"""
} else if(skipped_poor_alignment.size() > 0) {
out << """
<div style="color: #856404; background-color: #fff3cd; border-color: #ffeeba; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
<h4 style="margin-top:0; color: inherit;">czbiohub/rnaseq execution completed with warnings!</h4>
<p>The pipeline finished successfully, but the following samples were skipped due to very low alignment (&lt; 5%):</p>
<ul>
<li><code>${skipped_poor_alignment.join('</code></li><li><code>')}</code></li>
</ul>
<p>
</div>
"""
} else {
out << """
<div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
czbiohub/rnaseq execution completed successfully!
</div>
"""
}
%>

<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
<p>The command used to launch the workflow was as follows:</p>
<pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>

<h3>Pipeline Configuration:</h3>
<table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
<tbody style="border-bottom: 1px solid #ddd;">
<% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
</tbody>
</table>

<p>czbiohub/rnaseq is a bioinformatics best-practice analysis pipeline used for RNA sequencing data at the National Genomics Infrastructure at SciLifeLab Stockholm, Sweden.</p>
<p>The pipeline uses Nextflow, a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.</p>
<p>For more information, please see the pipeline homepage: <a href="https://github.com/czbiohub/rnaseq">https://github.com/czbiohub/rnaseq</a></p>

<hr style="height: 3px; padding: 0; margin: 24px 0; background-color: #e1e4e8; border: 0;">

<img src="cid:scilifelablogo">
<img src="cid:ngilogo">

</div>

</body>
</html>
62 changes: 62 additions & 0 deletions assets/email_template.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
========================================
czbiohub/rnaseq: version ${version}
========================================
Run Name: $runName

<% if (success){
out << "## czbiohub/rnaseq execution completed successfully! ##"
} else {
out << """####################################################
## czbiohub/rnaseq execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:

${errorReport}
"""
} %>


<% if (!success){
out << """####################################################
## czbiohub/rnaseq execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:

${errorReport}
"""
} else if(skipped_poor_alignment.size() > 0) {
out << """##################################################
## czbiohub/rnaseq execution completed with warnings ##
##################################################
The pipeline finished successfully, but the following samples were skipped,
due to very low alignment (less than 5%):

- ${skipped_poor_alignment.join("\n - ")}
"""
} else {
out << "## czbiohub/rnaseq execution completed successfully! ##"
}
%>




The workflow was completed at $dateComplete (duration: $duration)

The command used to launch the workflow was as follows:

$commandLine



Pipeline Configuration:
-----------------------
<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>


--
czbiohub/rnaseq is a bioinformatics best-practice analysis pipeline used for RNA sequencing data at the National Genomics Infrastructure at SciLifeLab Stockholm, Sweden.
The pipeline uses Nextflow, a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
For more information, please see the pipeline homepage: https://github.com/czbiohub/rnaseq
11 changes: 11 additions & 0 deletions assets/heatmap_header.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#id: 'sample-similarity'
#section_name: 'edgeR: Sample Similarity'
#description: "is generated from normalised gene counts through
# <a href='https://bioconductor.org/packages/release/bioc/html/edgeR.html' target='_blank'>edgeR</a>.
# Euclidean distances between log<sub>2</sub> normalised CPM values are then calculated and clustered."
#plot_type: 'heatmap'
#anchor: 'ngi_rnaseq-sample_similarity'
#pconfig:
# title: 'edgeR: Euclidean distances'
# xlab: True
# reverseColors: True
11 changes: 11 additions & 0 deletions assets/mdsplot_header.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#id: 'edgeR-sample-distances'
#section_name: 'MDS Plot'
#description: "show relatedness between samples in a project.
# These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/edgeR.html'>edgeR</a>
# in the <a href='https://github.com/czbiohub/rnaseq/blob/master/bin/edgeR_heatmap_MDS.r'><code>edgeR_heatmap_MDS.r</code></a> script."
#plot_type: 'scatter'
#anchor: 'ngi_rnaseq-mds_plot'
#pconfig:
# xlab: 'Leading'
# title: 'MDS Plot'
# ylab: 'logFC'
20 changes: 20 additions & 0 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
extra_fn_clean_exts:
- _R1
- _R2
- .hisat
report_comment: >
This report has been generated by the <a href="https://github.com/czbiohub/rnaseq" target="_blank">czbiohub/rnaseq</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/czbiohub/rnaseq/blob/master/docs/output.md" target="_blank">documentation</a>.
swedac_accredited: true
top_modules:
- 'edgeR-sample-distances'
- 'sample-similarity'
- 'DupRadar'
- 'biotype-counts'

report_section_order:
software_versions:
order: -1000
czbiohub-rnaseq-summary:
order: -1100
Binary file added assets/nfcore-rnaseq_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions assets/sendmail_template.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="czbiohubmimeboundary"

--czbiohubmimeboundary
Content-Type: text/html; charset=utf-8

$email_html

--czbiohubmimeboundary
Content-Type: image/png;name="czbiohub-rnaseq_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <czbiohubrnaseqlogo>
Content-Disposition: inline; filename="czbiohub-rnaseq_logo.png"

<% out << new File("$baseDir/assets/czbiohub-rnaseq_logo.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>

<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--czbiohubmimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"

${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>

--czbiohubmimeboundary--
Loading