Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parallel loading from parquet and Pandas (#1732)
* start implementing parallel df loading (most of the infrastructure is there but still need to update all the loaders) * start implementing the parallel loaders * implement parallel loading from DfView * PropCols needs to return empty rows if not specified so the zipping doesn't terminate early * remove the len method from PropCol as it is not used anymore * fix merge issue * need to sort test output as order is no longer guaranteed * add missing feature tags * GID node state should implement Ord * make it possible to compare NodeState with dict * add sort_by_id for NodeState * clean up error handling and make missing values an error again * fix all the tests so they do not rely on insertion order which is no longer preserved * one more order-dependent test * resolve all nodes first * try to drop the pair lock earlier * try chunking by min of src/dst to reduce contention * pull the edge initialisation out of the node locks * num_shards exposed * try to improve the contention * expose number of shards to python * try to fix the decontention-sort * add jemallocator to fix some weirdness * snmalloc for slightly better performance and hopefully better compatibility * hopefully fix the python import error * fix python take 2 * last try * just on macos for now until we figure out what is going on * Revert the pre-sorting of the updates as it doesn't seem to help * clean up the handling of num_shards * remove unused method and bump the chunk size up in the pandas loader for a bit more speed * fix merge error and clean up allocator dependency management * fix dead code warnings * fix random breakage in async_graphql * no more debug symbols in the CI to hopefully save some disk space * fix the nextest invocation
- Loading branch information
47e3329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold
2
.lotr_graph/iterate nodes
13015
ns/iter (± 55
)4433
ns/iter (± 111
)2.94
lotr_graph_window_100/iterate nodes
13211
ns/iter (± 55
)4446
ns/iter (± 133
)2.97
lotr_graph_window_10/iterate nodes
14933
ns/iter (± 133
)5861
ns/iter (± 130
)2.55
lotr_graph_subgraph_10pc/num_nodes
31550
ns/iter (± 857
)15384
ns/iter (± 1182
)2.05
lotr_graph_subgraph_10pc/iterate nodes
11722
ns/iter (± 67
)3163
ns/iter (± 85
)3.71
lotr_graph_subgraph_10pc/iterate edges
18143
ns/iter (± 94
)8892
ns/iter (± 224
)2.04
lotr_graph_subgraph_10pc_windowed/iterate nodes
11840
ns/iter (± 120
)3375
ns/iter (± 41
)3.51
lotr_graph_subgraph_10pc_windowed/iterate edges
17553
ns/iter (± 57
)8390
ns/iter (± 178
)2.09
lotr_graph_window_50_layered/iterate nodes
16348
ns/iter (± 20
)7938
ns/iter (± 82
)2.06
This comment was automatically generated by workflow using github-action-benchmark.