version 2023.04 Creating a BAM files from non-sensitive fragments data (FinaleDB frag.tsv.bgz or fragment coordinate bed, bedpe files) using sequences extracted from a reference genome.
Make sure you have all the dependencies and you will be able to run the program.
- samtools version 1.7 or higher;
- bedtools version v2.30.0 or higher;
- awk version 20200816 or higher;
- gunzip (gzip) version 1.6 or higher;
- Python version 3.10 or higher, only if you install it as a Python package;
For installing fragmentstein from the Python PyPi repository:
pip install fragmentstein
Optional: you can install it in a dedicated Python environment:
conda create -n fragmentstein python=3.10 samtools bedtools -c bioconda
conda activate fragmentstein
pip install fragmentstein
You can also use the Mamba package manager:
mamba create -n fragmentstein python=3.10 samtools bedtools -c bioconda
mamba activate fragmentstein
pip install fragmentstein
Afterwards, you can use fragmentstein directly from your shell:
fragmentstein -h
Alternatively, you can install it from source: Clone the repository. (This will not take care of the dependencies)
git clone https://github.com/uzh-dqbm-cmi/fragmentstein
cd fragmentstein
Add the path of the './scripts/fragmentstein.sh' into your PATH. In your ~/.bashrc or ~/.zshrc using the following command:
echo 'export PATH=$(pwd)/scripts/fragmentstein.sh:$PATH' >> ~/.bashrc
The fragmentstein.sh script should be available in your shell:
fragmentstein.sh -h
The following examples will show you how to do a test run
mkdir results
fragmentstein.sh -i -i tests/data/test_sample1.tsv.bgz -o results/test_sample1.bam \
-g tests/data/resources/test_ref_hg38.fna -c tests/data/resources/test_ref.chrom.sizes
You can install the Python wrapper also from source as follows: First install the Python dependency management and packaging tool called Poetry:
curl -sSL https://install.python-poetry.org | python3 -
Followed by installing the fragmentstein Python wrapper from the root of the cloned repository:
poetry install
To run tests use the following command:
poetry run pytest
Required arguments
-i
or --input
Path to finaleDB frag.tsv.bgz
file or .bed
or .bedpe
file. Expected are either a 6-column BED file or a 10-column paired-end BEDPE file.
-g
or --genome
Path to the reference genome fasta file.
-c
or --chrom_sizes
Chromosome sizes file.
Optional arguments
-o
or --output
Path to and name of the output BAM file. Default is to substitute the .tsv.gz
part of the extension with .bam
.
-r
or --read_length
Both reverse and forward reads of a fragment will have this length unless the fragment is shorter than the read length. Default: 101.
-qf
or --map_quality_filter
Minimum mapping quality. Setting it to '0' accepts all fragments. Default: 30.
-qd
or --map_quality_default
Mapping quality to set for example if missing from the input files or if you want to change it for downstream analyses. Default: 60.
-bq
or --base_quality
ASCII of Phred-scaled base QUALity+33. Default: F (quality: 37).
-N
or --replace_incomplete_nucleotides
Replace all incompletely specified nucleotides with N.
-s
or --sort
Sort the output BAM file by coordinate. No value has to be specified, just type -s
for sorting.
-t
or --threads
Number of parallel threads to be used when possible. Default: 1.
--temp
Temporary folder where to store intermediate temporary files. Default: same folder as the output file.
Fragmentstein is developed and maintained by Zsolt Balázs and Todor Gitchev. To reference the tool, please cite our paper.