clustering coefficient update #1909

wyatt-joyner-pometry · 2025-01-08T17:27:34Z

What changes were proposed in this pull request?

Refactor global and local clustering coefficient. Add two variants of batch local clustering coefficient.

Why are the changes needed?

It's currently extremely inefficient to run LCC on a group of nodes. The batch versions should do a better job of parallelizing the process and reducing overhead.

Does this PR introduce any user-facing change? If yes is this documented?

'clustering_coefficient' is renamed to 'global_clustering_coefficient'. All of the clustering coefficient variants have been moved to a submodule of 'metrics' called 'clustering_coefficient'. The new batch implementations have corresponding docstrings.

How was this patch tested?

The two methods were tested for parity against the existing implementation in Rust and Python.

Are there any further changes required?

Currently working on an approximate version that uses HyperLogLog.

ljeub-pometry

the path-based algorithm has room for optimisation
we need a benchmark to decide whether it is worth keeping the set-based algorithm at all
the filtering of nodes for the batch versions is unnecessarily inefficient (no need for creating subgraph views)
python wrappers should raise proper errors instead of panicking

ljeub-pometry · 2025-01-10T15:20:52Z

...algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_intersection.rs

+    if all_src_nodes == false {
+        (nodes, src_nodes) = filter_nodes(graph, &v);
+        g = graph.subgraph(nodes);
+    } else {
+        g = graph.subgraph(graph.nodes());
+    }


This filter thing is rather inefficient? Most efficient is probably a Vec<bool> for the src_nodes?

ljeub-pometry · 2025-01-10T15:25:56Z

...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs

+            .filter_map(|nb| match g.has_edge(nb[0].id(), nb[1].id()) {
+                true => Some(1),
+                false => match g.has_edge(nb[1].id(), nb[0].id()) {
+                    true => Some(1),
+                    false => None,
+                },
+            })


this should use the internal ids, not global ids (much more efficient as this version incurs unnecessary hash map lookups)

There is also the option of considering nodes in degree order which eliminates the triple-counting of triangles and reduces the number of existence checks by quite a lot. We can then simply use atomic accumulators to keep track of the number of triangles at each node.

ljeub-pometry · 2025-01-10T15:26:29Z

...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs

+    if all_src_nodes == false {
+        (nodes, src_nodes) = filter_nodes(graph, &v);
+        g = graph.subgraph(nodes);
+    } else {
+        g = graph.subgraph(graph.nodes());
+    }


same problem as the other version, filter should be more efficient

ljeub-pometry · 2025-01-10T15:49:27Z

raphtory/src/algorithms/metrics/clustering_coefficient/mod.rs

@@ -0,0 +1,40 @@
+use crate::{core::entities::nodes::node_ref::AsNodeRef, db::api::view::*};


I don't think the filter_nodes bit is necessary

ljeub-pometry · 2025-01-10T15:50:15Z

raphtory/src/algorithms/motifs/local_triangle_count.rs

@@ -76,6 +77,7 @@ mod triangle_count_tests {
        prelude::NO_PROPS,
        test_storage,
    };
+    use tracing::info;


ljeub-pometry · 2025-01-10T15:55:55Z

...ory/src/algorithms/metrics/clustering_coefficient/local_clustering_coefficient_batch_path.rs

+///
+/// # Returns
+/// the local clustering coefficient of node v in g.
+pub fn local_clustering_coefficient_batch_path<G: StaticGraphViewOps, V: AsNodeRef>(


We need some benchmarks, is this actually much slower than the set-based version?

ljeub-pometry · 2025-01-13T08:18:35Z

raphtory/src/python/packages/algorithms.rs

+) -> AlgorithmResult<DynamicGraph, f64, OrderedFloat<f64>> {
+    local_clustering_coefficient_batch_intersection_rs(
+        &graph.graph,
+        process_node_param(v).unwrap(),


this needs to return a PyResult!

alternatively, we can push all of this to the pyo3 layer if we add an struct/enum which implements FromPyObject instead of the process_node_param function.

wyatt-joyner-pometry added 7 commits December 13, 2024 12:34

testing local state-based triangle counting implementation

ea87515

add batched version of lcc

0fb004e

rearranging clustering coefficient algos; minor bugfixes; cleanup

6b3bd9f

Merge branch 'master' into gcc-update

bcab93d

fix renaming issue of global clustering coefficient

baf4dba

more cleanup

95d813d

update precision of lcc python test

7b75fc2

wyatt-joyner-pometry requested review from miratepuffin and ljeub-pometry January 8, 2025 19:02

ljeub-pometry requested changes Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustering coefficient update #1909

clustering coefficient update #1909

wyatt-joyner-pometry commented Jan 8, 2025

ljeub-pometry left a comment

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 13, 2025

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 10, 2025

ljeub-pometry Jan 13, 2025

ljeub-pometry Jan 13, 2025

		@@ -0,0 +1,40 @@
		use crate::{core::entities::nodes::node_ref::AsNodeRef, db::api::view::*};

clustering coefficient update #1909

Are you sure you want to change the base?

clustering coefficient update #1909

Conversation

wyatt-joyner-pometry commented Jan 8, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change? If yes is this documented?

How was this patch tested?

Are there any further changes required?

ljeub-pometry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment