Fast-tracking to v0.0.34 from v0.0.20
Enhancements
Pruning Support:
- Enabled pruning in
bnmf
,wnmf
, andnmf_recommender
. - Added pruning of additional matrices, e.g.,
MASK
, based onX
. - Included
pruned_cols
andpruned_rows
in saved outputs.
Matrix Factorization:
- Introduced new submodule
BNMFk
underNMFk
withnmf_method='bnmf'
. - Added
WEIGHT
andMASK
keys forWNMFk
andBNMFk
. - Implemented matrix deletion in subroutines to reduce memory consumption.
- Added
factor_thresholding
parameter to perform thresholding overNMFk
factors, making them boolean. Options include:coord_desc_thresh
WH_thresh
- Introduced
factor_thresholding_obj_params
for configuring thresholding subroutines. - Added
clustering_method
parameter with options:kmeans
bool
orboolean
(both are equivalent).
- Introduced
clustering_obj_params
to configure clustering subroutines. - Added new perturbation type for boolean matrices:
perturb_type='boolean'
orperturb_type='bool'
. - Updated examples to reflect new boolean-specific features.
- Path compatibility using
os.path.join
.
Thresholding and Clustering:
- Added
factor_thresholding_H_regression
with options:otsu_thresh
coord_desc_thresh
kmeans_thresh
- Default
factor_thresholding_H_regression
set tokmeans_thresh
. - Default
factor_thresholding
set tootsu_thresh
. - Introduced
factor_thresholding_H_regression_obj_params
to configure parameters. - Added K-means-based boolean thresholding for
W
andH
matrices:- Clusters values in each row of
W
andH
into two groups; then the boolean threshold is the midpoint of cluster centroids.
- Clusters values in each row of
Hardware and Device Management:
- Added
device
parameter toNMFk
for GPU management:device=-1
: Use all GPUs.device=0
: Use the GPU with ID 0.device=[0,1,...]
: Use a specific list of GPUs.- Negative values other than
-1
: Use(number of GPUs + device + 1)
.
Hierarchical NMFk (HNMFk) Improvements:
- Added new variables for nodes:
parent_node_factors_path
parent_node_k
factors_path
- Enabled dynamic renaming of paths when loading HNMFk models from different directories.
- Improved decomposition behavior:
- Nodes with fewer samples than the sample threshold no longer decompose unnecessarily.
- Added signature, centroid, and probabilities from parent nodes to child nodes.
- Introduced graph iterator methods for navigating to specific nodes by name.
- Updated node naming conventions to use ancestor-based indexing.
Result Storage:
- Added
W_all
to saved outputs ofNMFk
.
Installation and Documentation
- Migrated to a new installation system using pip and Poetry.
- Added a post-installation script for simplifying setup on different systems.
- Updated documentation for:
- New installation methods on Chicoma and Darwin.
Bug Fixes
- Corrected HNMFk behavior to return total data indices instead of indices of indices.
- Corrected naming inconsistencies in pruning variables in
NMFk
. - Fixed error calculation to consider only known locations when masking is applied.
- Resolved GPU transfer conflicts when using
MASK
. - Fixed default
device
parameter inNMFk
to be-1
(use all devices). - Addressed issues in
WNMFk
andBNMFk
examples. - Fixed checkpointing bugs:
- Made saving checkpoints true by default.
- Resolved issues when loading an HNMFk model during an ongoing process.
- Fixed scalar addition error with sparse matrices in
kl_mu
. - Resolved dependency conflicts with
numpy
andnumba
. - Updated HPC documentation for T-ELF installation.