Skip to content

Commit

Permalink
Parallel loading from parquet and Pandas (#1732)
Browse files Browse the repository at this point in the history
* start implementing parallel df loading (most of the infrastructure is there but still need to update all the loaders)

* start implementing the parallel loaders

* implement parallel loading from DfView

* PropCols needs to return empty rows if not specified so the zipping doesn't terminate early

* remove the len method from PropCol as it is not used anymore

* fix merge issue

* need to sort test output as order is no longer guaranteed

* add missing feature tags

* GID node state should implement Ord

* make it possible to compare NodeState with dict

* add sort_by_id for NodeState

* clean up error handling and make missing values an error again

* fix all the tests so they do not rely on insertion order which is no longer preserved

* one more order-dependent test

* resolve all nodes first

* try to drop the pair lock earlier

* try chunking by min of src/dst to reduce contention

* pull the edge initialisation out of the node locks

* num_shards exposed

* try to improve the contention

* expose number of shards to python

* try to fix the decontention-sort

* add jemallocator to fix some weirdness

* snmalloc for slightly better performance and hopefully better compatibility

* hopefully fix the python import error

* fix python take 2

* last try

* just on macos for now until we figure out what is going on

* Revert the pre-sorting of the updates as it doesn't seem to help

* clean up the handling of num_shards

* remove unused method and bump the chunk size up in the pandas loader for a bit more speed

* fix merge error and clean up allocator dependency management

* fix dead code warnings

* fix random breakage in async_graphql

* no more debug symbols in the CI to hopefully save some disk space

* fix the nextest invocation
  • Loading branch information
ljeub-pometry authored Sep 3, 2024
1 parent fcf885a commit 47e3329
Show file tree
Hide file tree
Showing 31 changed files with 1,657 additions and 1,409 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test_rust_disk_storage_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
RUSTFLAGS: -Awarnings ${{ matrix.flags }}
TEMPDIR: ${{ runner.temp }}
run: |
cargo nextest run --all --no-default-features --features "storage"
cargo nextest run --all --no-default-features --features "storage" --cargo-profile test-ci
- name: Check all features
env:
RUSTFLAGS: -Awarnings
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_rust_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
RUSTFLAGS: -Awarnings
TEMPDIR: ${{ runner.temp }}
run: |
cargo nextest run --all --no-default-features
cargo nextest run --all --no-default-features --cargo-profile test-ci
doc-test:
if: ${{ !inputs.skip_tests }}
name: "Doc tests"
Expand Down
Loading

1 comment on commit 47e3329

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: 47e3329 Previous: fcf885a Ratio
lotr_graph/iterate nodes 13015 ns/iter (± 55) 4433 ns/iter (± 111) 2.94
lotr_graph_window_100/iterate nodes 13211 ns/iter (± 55) 4446 ns/iter (± 133) 2.97
lotr_graph_window_10/iterate nodes 14933 ns/iter (± 133) 5861 ns/iter (± 130) 2.55
lotr_graph_subgraph_10pc/num_nodes 31550 ns/iter (± 857) 15384 ns/iter (± 1182) 2.05
lotr_graph_subgraph_10pc/iterate nodes 11722 ns/iter (± 67) 3163 ns/iter (± 85) 3.71
lotr_graph_subgraph_10pc/iterate edges 18143 ns/iter (± 94) 8892 ns/iter (± 224) 2.04
lotr_graph_subgraph_10pc_windowed/iterate nodes 11840 ns/iter (± 120) 3375 ns/iter (± 41) 3.51
lotr_graph_subgraph_10pc_windowed/iterate edges 17553 ns/iter (± 57) 8390 ns/iter (± 178) 2.09
lotr_graph_window_50_layered/iterate nodes 16348 ns/iter (± 20) 7938 ns/iter (± 82) 2.06

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.