Skip to content

Commit

Permalink
Update env yaml to create self-contained FluViewer install in one step (
Browse files Browse the repository at this point in the history
#36)

* Update env yaml and README

* updates
  • Loading branch information
dfornika authored Jul 6, 2024
1 parent 04be147 commit 455ae76
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 13 deletions.
30 changes: 17 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,22 +86,17 @@ flowchart TD
1. Create a virtual environment and install the necessary dependencies using the YAML file provided in this repository. For example, if using conda:

```
conda create -n FluViewer -f environment.yaml
conda env create -n fluviewer -f environment.yaml
```

2. Activate the FluViewer environment created in the previous step. For example, if using conda:
...or using mamba:

```
conda activate FluViewer
mamba env create -n fluviewer -f environment.yaml
```

3. Install the latest version of FluViewer from this repo.

```
pip3 install git+https://github.com/BCCDC-PHL/FluViewer.git
```

4. Download and unzip the default FluViewer DB (FluViewer_db.fa.gz) provided in [the BCCDC-PHL/FluViewer-db](https://github.com/BCCDC-PHL/FluViewer-db) repository. Custom DBs can be created and used as well (instructions below).
2. Download and unzip the default FluViewer DB (FluViewer_db.fa.gz) provided in the [BCCDC-PHL/FluViewer-db](https://github.com/BCCDC-PHL/FluViewer-db) repository.
Custom DBs can be created and used as well (instructions below).

## Usage

Expand Down Expand Up @@ -155,10 +150,13 @@ optional arguments:
## FluViewer Database

FluViewer requires a curated FASTA file "database" of IAV reference sequences. Headers for these sequences must be formatted and annotated as follows:

```
>unique_id|strain_name(strain_subtype)|sequence_segment|sequence_subtype
```

Here are some example entries:

```
>CY230322|A/Washington/32/2017(H3N2)|PB2|none
TCAATTATATTCAGCATGGAAAGAATAAAAGAACTACGGAATCTAATGTCGCAGTCTCGCACTCGCGA...
Expand All @@ -169,13 +167,18 @@ CAAAAGCAACAAAAATGAAGGCAATACTAGTAGTTCTGCTATATACATTTACAACCGCAAATGCAGACA...
>MH669720|A/Iowa/52/2018(H3N2)|NA|N2
AGGAAAGATGAATCCAAATCAAAAGATAATAACGATTGGCTCTGTTTCTCTCACCATTTCCACAATATG...
```
For HA and NA segments, strain_subtype should reflect the HA and NA subtypes of the isolate (eg H1N1), but sequence_subtype should only indicate the HA or NA subtype of the segment sequence of the entry (eg H1 for an HA sequence or N1 for an NA sequence).

For internal segments (i.e. PB2, PB1, PA, NP, M, and NS), strain_subtype should reflect the HA/NA subtypes of the isolate, but 'none' should be entered for sequence_subtype. If strain_subtype is unknown, 'none' should be entered there as well.
For HA and NA segments, strain_subtype should reflect the HA and NA subtypes of the isolate (eg H1N1), but sequence_subtype should only
indicate the HA or NA subtype of the segment sequence of the entry (eg H1 for an HA sequence or N1 for an NA sequence).

For internal segments (i.e. PB2, PB1, PA, NP, M, and NS), strain_subtype should reflect the HA/NA subtypes of the isolate, but 'none'
should be entered for sequence_subtype. If strain_subtype is unknown, 'none' should be entered there as well.

FluViewer will only accept reference sequences composed entirely of uppercase canonical nucleotides (i.e. A, T, G, and C).

During analysis, FluViewer will check if a BLAST database has been built based on the fasta file that is supplied with the `-d` (or `--db`) flag, by looking for the `.nhr`, `.nin` and `.nsq` BLAST database files associated with the fasta database. If any of those files are not found, the BLAST database will be built using `makeblastdb`. FluViewer expects that it will be able to write those files alongside the fasta database when this occurs.
During analysis, FluViewer will check if a BLAST database has been built based on the fasta file that is supplied with the `-d` (or `--db`) flag,
by looking for the `.nhr`, `.nin` and `.nsq` BLAST database files associated with the fasta database. If any of those files are not found,
the BLAST database will be built using `makeblastdb`. FluViewer expects that it will be able to write those files alongside the fasta database when this occurs.

## FluViewer Output

Expand All @@ -187,6 +190,7 @@ FluViewer generates four main output files for each library:
4. Depth of coverage plots for each segment: `<out_name>_depth_of_cov.png`

Headers in the FASTA file have the following format:

```
>output_name|segment|subject
```
Expand Down
4 changes: 4 additions & 0 deletions environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,9 @@ dependencies:
- spades=3.15.3
- clustalw=2.1
- freebayes=1.3.6
- python=3
- pip
- pandas=2.0.3
- seaborn=0.12.2
- pip:
- git+https://github.com/BCCDC-PHL/[email protected]

0 comments on commit 455ae76

Please sign in to comment.