Refactor neighbors-based metrics to use `NeighborsResults` #129

adamgayoso · 2023-12-27T22:26:07Z

Refactor neighbors-based metrics to use NeighborsResults, making it simpler for users as there is no confusion over passing distances or connectivities. Also reduces overhead of converting back and forth between sparse and dense representations of the neighbor results.
Compute umap-based connectivities in this package instead of scanpy, and make the implementation more efficient
Remove unnecessary vmap in pcr regression
Update notebooks
Drop usage of private scanpy functions. Scanpy can likely be dropped as a dependency in a future release if pca with implicit centering for sparse matrices makes it into sklearn
Bump to 0.5.0

Fixes #109 as lisi always uses the dense (indices, distances) representation for neighbors. The issue arose when the approximate neighbors method gave a distance of zero to non-self cells.

review-notebook-app · 2023-12-27T22:26:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

codecov · 2023-12-28T04:58:22Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (9bd6efb) 93.77% compared to head (a22c3e7) 91.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #129      +/-   ##
==========================================
- Coverage   93.77%   91.00%   -2.77%     
==========================================
  Files          25       25              
  Lines         931      956      +25     
==========================================
- Hits          873      870       -3     
- Misses         58       86      +28

Files	Coverage Δ
src/scib_metrics/_graph_connectivity.py	`100.00% <100.00%> (ø)`
src/scib_metrics/_lisi.py	`100.00% <100.00%> (ø)`
src/scib_metrics/benchmark/_core.py	`98.60% <100.00%> (-0.03%)`	⬇️
src/scib_metrics/nearest_neighbors/__init__.py	`100.00% <100.00%> (ø)`
src/scib_metrics/nearest_neighbors/_jax.py	`100.00% <100.00%> (ø)`
src/scib_metrics/nearest_neighbors/_pynndescent.py	`100.00% <100.00%> (ø)`
src/scib_metrics/utils/_diffusion_nn.py	`80.76% <100.00%> (-11.69%)`	⬇️
src/scib_metrics/utils/_lisi.py	`100.00% <100.00%> (ø)`
src/scib_metrics/utils/_pcr.py	`92.59% <100.00%> (-0.75%)`	⬇️
src/scib_metrics/_nmi_ari.py	`93.87% <88.88%> (+0.12%)`	⬆️
... and 2 more

... and 1 file with indirect coverage changes

jan-engelmann

LGTM!
Looked through and ran the tests.

Now for 500 000 cells the main time used is spent in pynndescent (86%) and in fuzzy_simplical_set (4%) see profiling results below. Also, tests are passing. test_kmeans failed for me once locally but has been passing ever since.

Let me know if I should have a look at anything specific!

K = 90
N = 500_000
Q = 30
X = np.random.randn(N, Q)
neigh_result = pynndescent(X, n_neighbors=K)
neigh_result = neigh_result.subset_neighbors(n=K)
new_connect = neigh_result.knn_graph_connectivities
new_dist = neigh_result.knn_graph_distances

Profiling Results

adamgayoso · 2024-01-04T17:23:36Z

@jan-engelmann thanks! I will make a release soon

Refactor neighbors-based metrics to use NeighborsResults, making it simpler for users as there is no confusion over passing distances or connectivities. Also reduces overhead of converting back and forth between sparse and dense representations of the neighbor results. Compute umap-based connectivities in this package instead of scanpy, and make the implementation more efficient Remove unnecessary vmap in pcr regression Update notebooks Drop usage of private scanpy functions. Scanpy can likely be dropped as a dependency in a future release if pca with implicit centering for sparse matrices makes it into sklearn Bump to 0.5.0 Fixes #109 as lisi always uses the dense (indices, distances) representation for neighbors. The issue arose when the approximate neighbors method gave a distance of zero to non-self cells. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Engelmann <[email protected]>

adamgayoso added 6 commits December 27, 2023 16:52

refactor treatment of neighbors

9d1e943

fix typing

db62285

use uns for neigh results

7edd232

remove unnecessary astype

3dffad5

actually subset neighbors in benchmarker

de79a18

fix guarantee over self as first neighbor

e552250

fix umap dep

cb5142a

adamgayoso mentioned this pull request Dec 27, 2023

Speedup Benchmarker.prepare (compute_connectivities_umap) #128

Closed

adamgayoso and others added 7 commits December 27, 2023 18:16

fix lisi with self edges

667f410

finalize and add test

9ce729d

[pre-commit.ci] auto fixes from pre-commit.com hooks

1a5d0f9

for more information, see https://pre-commit.ci

fix test

78aaca2

fix test

578e5c1

finalize

125ef84

docstring

b241ec7

adamgayoso added 6 commits December 28, 2023 16:10

finalize

96becb4

fix test

c0d39d7

test 3.11

6c1f706

kmeans test tolerance due to flakiness

e866f3f

kmeans test tolerance due to flakiness

a6e51a7

fix leiden

7cceca9

adamgayoso requested a review from martinkim0 December 28, 2023 17:11

bump version

890387f

jan-engelmann approved these changes Dec 29, 2023

View reviewed changes

adamgayoso added 3 commits January 4, 2024 10:15

lisi test

a9a4bd8

Update _lisi.py

1201b1b

fix lisi

a22c3e7

adamgayoso merged commit 26bc3a5 into main Jan 4, 2024
7 checks passed

adamgayoso deleted the neighbors branch January 4, 2024 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor neighbors-based metrics to use `NeighborsResults` #129

Refactor neighbors-based metrics to use `NeighborsResults` #129

adamgayoso commented Dec 27, 2023 •

edited

Loading

review-notebook-app bot commented Dec 27, 2023

codecov bot commented Dec 28, 2023 •

edited

Loading

jan-engelmann left a comment

adamgayoso commented Jan 4, 2024

Refactor neighbors-based metrics to use NeighborsResults #129

Refactor neighbors-based metrics to use NeighborsResults #129

Conversation

adamgayoso commented Dec 27, 2023 • edited Loading

review-notebook-app bot commented Dec 27, 2023

codecov bot commented Dec 28, 2023 • edited Loading

Codecov Report

jan-engelmann left a comment

Choose a reason for hiding this comment

adamgayoso commented Jan 4, 2024

Refactor neighbors-based metrics to use `NeighborsResults` #129

Refactor neighbors-based metrics to use `NeighborsResults` #129

adamgayoso commented Dec 27, 2023 •

edited

Loading

codecov bot commented Dec 28, 2023 •

edited

Loading