Skip to content

Commit

Permalink
docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
tkchafin committed Nov 19, 2024
1 parent a49bbd3 commit 4e176e0
Showing 1 changed file with 25 additions and 1 deletion.
26 changes: 25 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,15 +78,19 @@ The BlobToolKit pipeline can be run in many different ways. The default way requ

It is a good idea to put a date suffix for each database location so you know at a glance whether you are using the latest version. We are using the `YYYY_MM` format as we do not expect the databases to be updated more frequently than once a month. However, feel free to use `DATE=YYYY_MM_DD` or a different format if you prefer.

Note that all input databases may be optionall passed directly to the pipeline compressed as `.tar.gz`, and the pipeline will handle decompression.

#### 1. NCBI taxdump database

Create the database directory, retrieve and decompress the NCBI taxonomy:

```bash
DATE=2024_10
TAXDUMP=/path/to/databases/taxdump_${DATE}
TAXDUMP_TAR=/path/to/databases/taxdump_${DATE}.tar.gz
mkdir -p "$TAXDUMP"
curl -L ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz | tar -xzf - -C "$TAXDUMP"
curl -L ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz -o $TAXDUMP_TAR
tar -xzf $TAXDUMP_TAR -C "$TAXDUMP"
```

#### 2. NCBI nucleotide BLAST database
Expand All @@ -96,6 +100,7 @@ Create the database directory and move into the directory:
```bash
DATE=2024_10
NT=/path/to/databases/nt_${DATE}
NT_TAR=/path/to/databases/nt_${DATE}.tar.gz
mkdir -p $NT
cd $NT
```
Expand All @@ -113,6 +118,11 @@ done
wget "https://ftp.ncbi.nlm.nih.gov/blast/db/v5/taxdb.tar.gz" &&
tar xf taxdb.tar.gz -C $NT &&
rm taxdb.tar.gz

# Compress and cleanup
cd ..
tar -cvzf $NT_TAR $NT
rm -r $NT
```

#### 3. UniProt reference proteomes database
Expand All @@ -126,6 +136,7 @@ Create the database directory and move into the directory:
```bash
DATE=2024_10
UNIPROT=/path/to/databases/uniprot_${DATE}
UNIPROT_TAR=/path/to/databases/uniprot_${DATE}.tar.gz
mkdir -p $UNIPROT
cd $UNIPROT
```
Expand All @@ -152,6 +163,12 @@ diamond makedb -p 16 --in reference_proteomes.fasta.gz --taxonmap reference_prot
# clean up
mv extract/{README,STATS} .
rm -r extract
rm -r $TAXDUMP

# Compress final database and cleanup
cd ..
tar -cvzf $UNIPROT_TAR $UNIPROT
rm -r $UNIPROT
```

#### 4. BUSCO databases
Expand All @@ -161,6 +178,7 @@ Create the database directory and move into the directory:
```bash
DATE=2024_10
BUSCO=/path/to/databases/busco_${DATE}
BUSCO_TAR=/path/to/databases/busco_${DATE}.tar.gz
mkdir -p $BUSCO
cd $BUSCO
```
Expand All @@ -181,6 +199,12 @@ If you have [GNU parallel](https://www.gnu.org/software/parallel/) installed, yo
find v5/data -name "*.tar.gz" | parallel "cd {//}; tar -xzf {/}"
```

Finally re-compress and cleanup the files:
```bash
tar -cvzf $BUSCO_TAR $BUSCO
rm -r $BUSCO
```

## Changes from Snakemake to Nextflow

### Commands
Expand Down

0 comments on commit 4e176e0

Please sign in to comment.