Releases: BigDataBiology/argNorm
Releases · BigDataBiology/argNorm
Version 0.6.0
Big change is adding GROOT support
Full Changelog:
- argNorm supports the GROOT v1.1.2 ARG annotation tool: https://github.com/will-rowe/groot
- GROOT support is via the
GrootNormalizer
(for use in python scripts) and thegroot
tool parameter with thegroot-db
,groot-core-db
,groot-argannot
,groot-card
, andgroot-resfinder
db
parameters in the CLI.
Other
__version__
attribute added to the package (accessible asargnorm.__version__
orargnorm.lib.__version__
)- Use atomic writing for outputs (https://github.com/untitaker/python-atomicwrites/tree/master)
funcscan integration
- argNorm has been included as an nf-core module: https://nf-co.re/modules/argnorm/
- argNorm will also be available on the funcscan pipeline: nf-core/funcscan#410
DB harmonisation
- SARG db link was changed in
crude_db_harmonisation
to https://raw.githubusercontent.com/xinehc/args_oap/a3e5cff4a6c09f81e4834cfd9a31e6ce7d678d71/src/args_oap/db/sarg.fasta as old link (Galaxy instance, http://smile.hku.hk/SARGs) is down - RGI outputs in
crude_db_harmonisation
are concatenated so frequencies ofperfect
,strict
, andloose
hits can be calculated from concatenated file
Version 0.5.0
Updated the drug categorization and improved manual curation
USER-FACING CHANGES
Improved drug categorization
drugs_to_drug_classes()
also uses the 'has_part' ARO relationship now to get drug classes for antibiotic mixtures. In case of antibiotic mixtures, the drug classes of the drugs associated with 'has_part' are returned rather than 'antibiotic mixture' (ARO:3000707).- 'antibiotic mixture' will not be reported as a drug class, rather the individual antibiotic classes making up the antibiotic mixture will be reported.
Improved manual curation
- manual curation (argannot):
(Tet)tetH:EF460464:6286-7839:1554
was incorrectly annotated as ARO:3004797 which is a beta-lactamase due to a loose RGI hit. This was manually curated to ARO:3000175. - Improved curation:
- resfinder_curation: grdA_1_QJX10702 -> 3007380 & EstDL136_1_JN242251 -> 3000557
- megares_curation: MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD -> 3000557
Bugfixes
confers_resistance_to()
now gets drugs information even if it is encoded at a higher level in the ARO. For example, OXA-19 previously only returned cephalosporin and penam, but now will also return oxacillin (from AMR gene family).drugs_to_drug_classes()
now correctly only returns the immediate child of 'antibiotic molecule' as the drug class (this was previously not the case for certain corner cases).- inconsistent ARO versions deeparg, megares, resfinderfg & sarg curation: ARO:3004445 -> ARO:3005440, this was due to a change in the ARO and the ARO number for the RSA2 gene changing, but the version of ARO bundled with argNorm was out of sync.
INTERNAL CHANGES
- AROs were previously handled as integers in the
get_aro_mapping_table()
function and this posed challenges when ARO numbers such as 'ARO:0010004' (leading zeros leading to issues). To fix this, AROs are now treated as strings so leading zeros can be maintained.
Version 0.4.0
Major changes:
- Bundle a specific version of ARO with the package instead of downloading it from the internet (ensures reproducibility)
- Add missing ARO mappings to manual curation.
- Command line tool accept database/tool names in case-independent way (by @sebastianLedzianowski)
lib.map_to_aro
returnsNone
if there is no mapping (raises an exception if the name is missing)
Version 0.3.0
Main changes are updates to the Resfinder and ARG-ANNOT mappings
Detailed changes
Handling gene clusters & reverse complements in resfinder
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
- Gene clusters were identified and were manually assigned ARO numbers.
- A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results.
- 40 gene clusters present.
- 9 genes in reverse complement form also present.
Using amino acid file for argannot & resfinder rather than nucleotide file
- ARG-ANNOT and Resfinder are comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of ARG-ANNOT & Resfinder are used with protein mode when running the database in RGI.
- ARG-ANNOT AA file is available online. Resfinder AA file is generated using biopython.
- One to many ARO mapping such as NG_047831:101-955 to Erm(K) and almG in ARG-ANNOT eliminated as protein mode used
- A total of 10 ARO mappings changed in ARG-ANNOT
argnorm.lib: Making argNorm more usable as a library
- Introduce
argnorm.lib
module - Users can import the
map_to_aro
function fromargnorm.lib
. The function takes a gene name as input, maps the gene to the ARO and returns a pronto term object with the ARO mapping. - The
get_aro_mapping_table
function, previously within the BaseNormalizer class, has also been moved tolib.py
to give users the ability to access the mapping tables being used for normalization. - With the introduction of
lib.py
, users will be able to access core mapping utilities throughargnorm.lib
, drug categorization throughargnorm.drug_categorization
, and the traditional normalizers throughargnorm.normalizers
.
Version 0.2.0
ARO Mapping & Normalization
- Updated mappings and manual curation tables for latest RGI
- Hamronized ResFinderFG support
- Removed python syntax in output
Drug Categorization
- Improved drug categorization by using superclasses whenever direct drug categorization is not possible
- Added better column headings for drug categorization (confers_resistance_to and resistance_to_drug_class)
Testing
- Improved pytest testing
- Added integration tests