Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concat fastqs from SRA manifest #227

Merged
merged 20 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /sra-manifest-to-concatenated-fastqs.ga
testParameterFiles:
- /sra-manifest-to-concatenated-fastqs-tests.yml
authors:
- name: Lucille Delisle
orcid: 0000-0002-1964-4960
- name: Pierre Osteil
orcid: 0000-0002-5832-6703
- name: Wolfgang Maier
orcid: 0000-0002-9464-6640
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
version: '0.1'
registries:
- url: https://workflowhub.eu
project: iwc
workflow: sra-manifest-to-concatenated-fastqs/main
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Changelog

## [0.1] 2023-10-09
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
First release.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# SRA manifest to concatenated fastqs

This workflow takes as input a SRA manifest from SRA Run Selector (or a tabular with a header line) and generate a single fastq or pair of fastq per sample (present in the SRA manifest with a column of user's choice).
lldelisle marked this conversation as resolved.
Show resolved Hide resolved

## Input dataset

- The workflow needs a single input which is a tabular with one column with SRA number and one column with sample identifier. The tabular must have a header line.
lldelisle marked this conversation as resolved.
Show resolved Hide resolved

## Input values

- Column number with SRA ID: Usually it is column 1
- Column number with final identifier: this is the number of the column that the user wants to use to label the collection items.
lldelisle marked this conversation as resolved.
Show resolved Hide resolved

## Processing

- The workflow download fastqs with fasterqdump (one job per SRA).
- The fastqs from the same sample are concatenated.
lldelisle marked this conversation as resolved.
Show resolved Hide resolved

## Outputs

- There are 2 outputs, one with paired-end datasets, one with single-read datasets.

## Warning

- We assume that the sample name does not contains '___' as this is used to concatenate and then split SRA and sample name.
- All characters which are not letters or digit or '_' will be converted to '-' in sample name.
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
- doc: Test for sra-list-to-concatenated-fastqs.ga
job:
SRA_manifest:
class: File
path: test-data/SRA.txt
Column number with SRA ID: 1
Column number with final identifier: 22
outputs:
paired_output:
element_tests:
GSM461177-:
element_tests:
forward:
asserts:
has_size:
value: 294000000
delta: 30000000
reverse:
asserts:
has_size:
value: 307000000
delta: 30000000
GSM461178:
element_tests:
forward:
asserts:
has_size:
value: 178000000
delta: 10000000
reverse:
asserts:
has_size:
value: 205000000
delta: 20000000
single_output:
element_tests:
GSM461176:
asserts:
has_size:
value: 139000000
delta: 10000000
GSM461179-ID:
asserts:
has_size:
value: 298000000
delta: 30000000
Loading