Skip to content

Commit

Permalink
fix: Illumina PE read headers for IRMA
Browse files Browse the repository at this point in the history
  • Loading branch information
peterk87 committed Nov 1, 2024
1 parent 10bb2e1 commit e02722a
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[3.5.3](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.5.3)] - 2024-11-01

This patch release fixes an issue ([#22](https://github.com/peterk87/nf-flu/issues/22)) with Illumina paired-end read analysis by IRMA producing empty consensus sequences when the forward and reverse reads do not contain "1:N:0:." or "2:N:0:." in the FASTQ header lines.

## [[3.5.2](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.5.2)] - 2024-10-18

This patch release fixes a few issues when running the pipeline.
Expand Down
21 changes: 17 additions & 4 deletions modules/local/cat_illumina_fastq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -59,34 +59,47 @@ process CAT_ILLUMINA_FASTQ {
// append 1:N:0:. or 2:N:0:. to forward and reverse reads if "[12]:N:.*"
// not present in the FASTQ header for compatability with IRMA assembly
"""
function modify_fastq_header() {
local replacement="\$1"
awk -v repl="\$replacement" '
NR % 4 == 1 {
# Only process the first line of each 4-line block
if (\$0 ~ /^@/ && \$0 !~ /[12]:N:.*/) {
sub(/\\s*\$/, " " repl ":N:0:."); # Append " <replacement>:N:0:."
}
}
{ print }
'
}
touch ${prefix}_1.merged.fastq.gz
touch ${prefix}_2.merged.fastq.gz
if [[ ${read1.size} > 0 ]]; then
cat ${read1.join(' ')} \\
| perl -ne 'if (\$_ =~ /^@.* .*/ && !(\$_ =~ /^@.* [12]:N:.*/)){ chomp \$_; print "\$_ 1:N:0:.\n"; } else { print "\$_"; }' \\
| modify_fastq_header 1 \\
| gzip -ck \\
>> ${prefix}_1.merged.fastq.gz
fi
if [[ ${read1gz.size} > 0 ]]; then
zcat ${read1gz.join(' ')} \\
| perl -ne 'if (\$_ =~ /^@.* .*/ && !(\$_ =~ /^@.* [12]:N:.*/)){ chomp \$_; print "\$_ 1:N:0:.\n"; } else { print "\$_"; }' \\
| modify_fastq_header 1 \\
| gzip -ck \\
>> ${prefix}_1.merged.fastq.gz
fi
if [[ ${read2.size} > 0 ]]; then
cat ${read2.join(' ')} \\
| perl -ne 'if (\$_ =~ /^@.* .*/ && !(\$_ =~ /^@.* [12]:N:.*/)){ chomp \$_; print "\$_ 2:N:0:.\n"; } else { print "\$_"; }' \\
| modify_fastq_header 2 \\
| gzip -ck \\
>> ${prefix}_2.merged.fastq.gz
fi
if [[ ${read2gz.size} > 0 ]]; then
zcat ${read2gz.join(' ')} \\
| perl -ne 'if (\$_ =~ /^@.* .*/ && !(\$_ =~ /^@.* [12]:N:.*/)){ chomp \$_; print "\$_ 2:N:0:.\n"; } else { print "\$_"; }' \\
| modify_fastq_header 2 \\
| gzip -ck \\
>> ${prefix}_2.merged.fastq.gz
fi
Expand Down
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ manifest {
description = 'Influenza A virus genome assembly pipeline'
homePage = 'https://github.com/CFIA-NCFAD/nf-flu'
author = 'Peter Kruczkiewicz, Hai Nguyen'
version = '3.5.2'
version = '3.5.3'
nextflowVersion = '!>=22.10.1'
mainScript = 'main.nf'
doi = '10.5281/zenodo.13892044'
Expand Down

0 comments on commit e02722a

Please sign in to comment.