Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1: Add FileStorage struct #10

Merged
merged 8 commits into from
Mar 31, 2024
Merged

#1: Add FileStorage struct #10

merged 8 commits into from
Mar 31, 2024

Conversation

twitu
Copy link
Collaborator

@twitu twitu commented Mar 7, 2024

FileStorage struct retains core functionality from Kotlin implementation. #1

  • Read data from file into a mapping from ResourceId to value
  • Write data mapping from generic key value pair to file
  • Verify file version
  • Delete file when dropped

@twitu
Copy link
Collaborator Author

twitu commented Mar 8, 2024

@Pushkarm029 the changes look good. Please move the logic into a single file in a bin directory. That way cargo knows it's meant to be compiled as an executable. Something like -

fs-storage/
   src/
       bin/
           cli.rs

https://doc.rust-lang.org/cargo/guide/project-layout.html

The main argument can choose to read or write based on the pass arguments. You can consider using clap (or similar library) to handle cli arguments.

@kirillt kirillt changed the title Add FileStorage struct #1: Add FileStorage struct Mar 11, 2024
@kirillt
Copy link
Member

kirillt commented Mar 11, 2024

@twitu it seems that we need to rebase this branch

@kirillt
Copy link
Member

kirillt commented Mar 11, 2024

The write sample produces an empty folder:

[kirill@lenovo debug]$ ./write /tmp/fs-storage-test
Storage directory created successfully at /tmp/fs-storage-test
Enter a key-value pair (key=value), or enter 'done' to finish:
1=a
Enter a key-value pair (key=value), or enter 'done' to finish:
2=b
Enter a key-value pair (key=value), or enter 'done' to finish:
c=3
Enter a key-value pair (key=value), or enter 'done' to finish:
done
Key-Value Pairs:
1: a
2: b
c: 3
[kirill@lenovo debug]$ ls -lah /tmp/fs-storage-test/
total 0
drwxr-xr-x.  2 kirill kirill  40 Mar 11 10:56 .
drwxrwxrwt. 29 root   root   920 Mar 11 10:56 ..

Comment on lines 19 to 22
pub struct FileStorage {
log_prefix: String,
label: String,
path: PathBuf,
timestamp: SystemTime,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about such API?

pub struct Mapping<K,V> {
    timestamp: SystemTime,
    values: HashMap<K, V>
}

pub struct Storage<K,V> {
    label: String,
    path: PathBuf,
    mapping: Option<Mapping<K,V>> // `None` means not loaded yet
}

impl Storage {
...
    fn new(label: String, path: &Path) -> Self { ... }
    fn load(self: Self) -> Self { ... }
    fn update(self: Self) -> Self { ... }
...
}

Or something like this:

pub struct Mapping<K,V> {
    timestamp: SystemTime,
    values: HashMap<K, V>
}

pub struct Storage<K,V> {
    label: String,
    path: PathBuf,
    mapping: Mapping<K,V>
}

impl Storage {
...
    fn provide(label: String, path: &Path) -> Self { ... }
    fn update(self: Self) -> Self { ... }
...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the api to remove the callback style and just return the hashmap directly. Currently I don't see how the additional complexity of wrapping the HashMap in a Mapping will help.

I think the current state of the PR is a good first version of a functional file storage implementation. I propose we merge it and add more features as separate issues. I prefer not to keep PRs open for too long because they lead to merge conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about the conflicts. But if the changes are small it's better to make it in same PR to avoid excessive task management.

Mapping wrapper is useful only because it also contains timestamp. We could use a tuple, but named structure is better. The rest looks good for now.

@kirillt
Copy link
Member

kirillt commented Mar 11, 2024

Good job so far. Apart from the comments above, here are couple other things to do:

  • Port functionality from BaseStorage.kt. It seems to me, that it should be a trait .
  • Port Monoid structure used during conflicts resolution. It's naive approach but it's simpler than CRDT.
  • Think about how will we implement asynchronous work with filesystem.

@kirillt kirillt requested a review from tareknaser March 11, 2024 15:13
Copy link
Collaborator

@tareknaser tareknaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together

Left some comments

fs-storage/src/cli/read.rs Outdated Show resolved Hide resolved
fs-storage/src/cli/read.rs Outdated Show resolved Hide resolved
fs-storage/src/cli/read.rs Outdated Show resolved Hide resolved
}

if let Err(err) = write_to_file(kv_pairs, storage_path) {
println!("Error writing to file: {}", err);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors should go to stderr using eprintln!()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this a dev cli errors are unwrapped in the new implementation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this a dev cli errors are unwrapped in the new implementation

I second this.
I know it's a grey area in rust where people take sides in general. For our case (CLI), I think we should never panic

I never want to see this output as a user when running a CLI command

thread 'main' panicked at fs-storage/src/bin/cli.rs:54:18:
Failed to read JSON file: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I prefer passing down the error or using anyhow::Context
Here is a reference from the "Command Line Applications in Rust" book
https://rust-cli.github.io/book/tutorial/errors.html

What do you think?

data-error/src/lib.rs Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
@tareknaser
Copy link
Collaborator

If the "Verify build" CI test is blocking this, you can cherry pick ba41bd8 to fix it

@twitu twitu requested a review from kirillt March 14, 2024 05:16
fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
@kirillt
Copy link
Member

kirillt commented Mar 16, 2024

@twitu The samples work, thanks. I've noticed one more thing to fix:

$ cargo run -- write /tmp/x a:1,b:2,c:3,d:4
...
$ cat /tmp/x
version 2
a:1
c:3
b:2
d:4
$ cargo run -- write /tmp/x a:1,b:2,c:3,d:4
...
$ cat /tmp/x
version 2
b:2
a:1
d:4
c:3
  • Storages should be persisted deterministically, i.e. the storage files must be equal for equal KV tables. In Rust, iterating through BTreeMap is deterministic.

@kirillt
Copy link
Member

kirillt commented Mar 16, 2024

I've converted #10 (comment) into separate issues:

But let's have deterministic output in this PR.

Also, it seems that this branch and main have same commits but with different hashes. This causes duplicate commits during rebase. Probably we can squash though.

Copy link
Collaborator

@tareknaser tareknaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments to make cargo clippy happy but LGTM
Thanks

fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Show resolved Hide resolved
Copy link
Collaborator

@tareknaser tareknaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the binary a spin locally and everything works as expected. Thanks for adding support for JSON files!

Left some comments
We also need to make sure the CI is green before merging. Currently the benchmarks one is failing but I added a suggestion to fix it

fs-storage/src/bin/README.md Outdated Show resolved Hide resolved
fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/Cargo.toml Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,3 @@
[toolchain]
version = "1.75.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this specific version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toolchain is what the compiler and tooling for this workspace will default too. Is there any requirement to support older versions? Using older versions can give difficulty later in upgrading libraries. Unless there's some specific requirement to support older versions we should go ahead with the latest and greatest. Especially so because we plan to pull in the tokio and async ecosystem which is still evolving.

fs-storage/src/bin/cli.rs Outdated Show resolved Hide resolved
fs-storage/Cargo.toml Outdated Show resolved Hide resolved
fs-storage/Cargo.toml Show resolved Hide resolved
fs-storage/Cargo.toml Outdated Show resolved Hide resolved
fs-storage/Cargo.toml Show resolved Hide resolved
@tareknaser
Copy link
Collaborator

Let's also rebase the branch on top of main

Copy link

Benchmark for 3ef631d

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.7±0.12µs 13.3±0.18µs -2.92%
../test-assets/test.pdf/compute_bytes 106.7±2.75µs 107.6±0.79µs +0.84%
compute_bytes_large/compute_bytes 134.8±1.11µs 134.1±1.80µs -0.52%
compute_bytes_medium/compute_bytes 27.5±0.19µs 27.5±0.40µs 0.00%
compute_bytes_small/compute_bytes 127.1±1.05ns 127.5±1.47ns +0.31%
index_build/index_build/../test-assets/ 158.6±0.71µs 160.0±4.56µs +0.88%

Copy link

Benchmark for 5590529

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.04µs 13.3±0.33µs 0.00%
../test-assets/test.pdf/compute_bytes 107.9±0.77µs 107.6±0.64µs -0.28%
compute_bytes_large/compute_bytes 136.1±1.70µs 140.9±1.54µs +3.53%
compute_bytes_medium/compute_bytes 27.0±0.25µs 29.5±0.39µs +9.26%
compute_bytes_small/compute_bytes 127.3±1.88ns 127.5±1.35ns +0.16%
index_build/index_build/../test-assets/ 158.4±7.90µs 156.3±0.94µs -1.33%

Copy link

Benchmark for 49a5a8b

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.4±0.12µs 13.3±0.12µs -0.75%
../test-assets/test.pdf/compute_bytes 111.3±0.62µs 110.7±0.46µs -0.54%
compute_bytes_large/compute_bytes 173.1±1.22µs 142.5±2.48µs -17.68%
compute_bytes_medium/compute_bytes 27.6±0.41µs 27.0±0.34µs -2.17%
compute_bytes_small/compute_bytes 127.7±4.08ns 127.7±1.95ns 0.00%
index_build/index_build/../test-assets/ 159.0±1.11µs 159.6±1.04µs +0.38%

Copy link

Benchmark for 4d167a1

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.07µs 13.3±0.42µs 0.00%
../test-assets/test.pdf/compute_bytes 112.8±0.37µs 110.4±0.84µs -2.13%
compute_bytes_large/compute_bytes 138.8±1.20µs 139.6±2.19µs +0.58%
compute_bytes_medium/compute_bytes 27.5±0.27µs 27.6±0.44µs +0.36%
compute_bytes_small/compute_bytes 127.1±0.90ns 128.3±5.14ns +0.94%
index_build/index_build/../test-assets/ 156.1±3.95µs 155.5±1.24µs -0.38%

Copy link

Benchmark for 2379496

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.09µs 13.4±0.25µs +0.75%
../test-assets/test.pdf/compute_bytes 107.2±0.69µs 108.2±2.40µs +0.93%
compute_bytes_large/compute_bytes 134.2±0.77µs 134.2±1.37µs 0.00%
compute_bytes_medium/compute_bytes 26.8±0.45µs 28.1±0.45µs +4.85%
compute_bytes_small/compute_bytes 127.6±3.28ns 128.3±5.55ns +0.55%
index_build/index_build/../test-assets/ 156.7±0.87µs 156.4±2.37µs -0.19%

Co-Authored-by: Ishan Bhanuka <[email protected]>
Co-Authored-by: Pushkar Mishra <[email protected]>
Co-Authored-by: Tarek <[email protected]>
Co-Authored-by: Kirill Taran <[email protected]>
Copy link

Benchmark for 62802d1

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.11µs 14.2±0.28µs +6.77%
../test-assets/test.pdf/compute_bytes 217.0±0.93µs 109.5±0.96µs -49.54%
compute_bytes_large/compute_bytes 136.7±1.57µs 140.5±1.55µs +2.78%
compute_bytes_medium/compute_bytes 26.8±0.26µs 27.5±0.22µs +2.61%
compute_bytes_small/compute_bytes 127.1±1.33ns 130.2±13.99ns +2.44%
index_build/index_build/../test-assets/ 159.9±2.88µs 157.6±0.65µs -1.44%

@twitu
Copy link
Collaborator Author

twitu commented Mar 25, 2024

Fixed the commit history.

I also experimented with creating a mapping struct in separate branch.

pub struct Mapping<K, V> {
    timestamp: SystemTime,
    values: HashMap<K, V>,
}

It adds extra complexity of generics and also a few unresolved problems around consistency of the timestamp. Basically we have two timestamps in the system now. One for the mapping and one for the file storage struct. This leads to duplication without clear distinction between how they are different or what their relation is to each other.

Secondly, the save function doesn't actually use the timestamp in the mapping since it reads the current time.

    /// Write data to file
    ///
    /// Data is a key-value mapping between [ResourceId] and a generic Value
    pub fn save<K, V>(&mut self, value_by_id: Mapping<K, V>) -> Result<()>

The mapping struct is only used in one place where it is returned by the load function.

    fn load<K, V>(&mut self) -> Result<Mapping<K, V>>

We can consider adding it later when it has more use cases.

This is a good first version of the FileStorage logic and we can merge it once @Pushkarm029 gives the cli a final refactor.

This was linked to issues Mar 25, 2024
@kirillt kirillt requested a review from tareknaser March 26, 2024 10:29
Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Pushkar Mishra <[email protected]>
Copy link

Benchmark for ec54217

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.4±0.18µs 13.3±0.09µs -0.75%
../test-assets/test.pdf/compute_bytes 107.8±0.41µs 108.6±1.02µs +0.74%
compute_bytes_large/compute_bytes 137.3±1.58µs 136.8±1.30µs -0.36%
compute_bytes_medium/compute_bytes 27.5±0.12µs 27.6±0.95µs +0.36%
compute_bytes_small/compute_bytes 127.2±2.04ns 128.0±2.73ns +0.63%
index_build/index_build/../test-assets/ 157.6±1.35µs 157.0±0.83µs -0.38%

Copy link

Benchmark for a48c673

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.05µs 13.3±0.04µs 0.00%
../test-assets/test.pdf/compute_bytes 110.5±0.59µs 317.5±0.66µs +187.33%
compute_bytes_large/compute_bytes 139.3±1.77µs 416.8±2.85µs +199.21%
compute_bytes_medium/compute_bytes 28.9±0.45µs 27.5±0.21µs -4.84%
compute_bytes_small/compute_bytes 128.7±8.09ns 127.4±1.27ns -1.01%
index_build/index_build/../test-assets/ 159.1±1.78µs 159.3±0.45µs +0.13%

Copy link
Collaborator

@tareknaser tareknaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
Thank you

All the comments here are nits. Feel free to ignore them

fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
fs-storage/src/file_storage.rs Show resolved Hide resolved
fs-storage/src/file_storage.rs Outdated Show resolved Hide resolved
Copy link

Benchmark for 63e38a5

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.5±0.13µs 14.3±0.09µs +5.93%
../test-assets/test.pdf/compute_bytes 153.6±1.16µs 110.9±0.66µs -27.80%
compute_bytes_large/compute_bytes 210.5±1.75µs 136.9±1.54µs -34.96%
compute_bytes_medium/compute_bytes 27.6±0.61µs 28.8±0.70µs +4.35%
compute_bytes_small/compute_bytes 127.5±4.29ns 127.9±1.65ns +0.31%
index_build/index_build/../test-assets/ 157.2±0.66µs 156.4±1.08µs -0.51%

Copy link

Benchmark for a78f010

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.5±0.10µs 13.3±0.09µs -1.48%
../test-assets/test.pdf/compute_bytes 111.1±2.95µs 181.4±3.53µs +63.28%
compute_bytes_large/compute_bytes 276.9±3.34µs 139.8±2.15µs -49.51%
compute_bytes_medium/compute_bytes 26.9±0.45µs 27.0±1.20µs +0.37%
compute_bytes_small/compute_bytes 127.7±4.75ns 128.4±3.52ns +0.55%
index_build/index_build/../test-assets/ 161.7±5.07µs 161.2±4.67µs -0.31%

Copy link

Benchmark for 20ec58e

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.12µs 13.3±0.14µs 0.00%
../test-assets/test.pdf/compute_bytes 109.9±0.72µs 110.8±2.27µs +0.82%
compute_bytes_large/compute_bytes 138.0±0.80µs 140.2±1.46µs +1.59%
compute_bytes_medium/compute_bytes 27.5±0.21µs 27.5±0.25µs 0.00%
compute_bytes_small/compute_bytes 127.1±0.90ns 127.8±1.45ns +0.55%
index_build/index_build/../test-assets/ 160.9±2.20µs 160.8±0.85µs -0.06%

Copy link

Benchmark for 02c30c8

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.13µs 13.4±0.06µs +0.75%
../test-assets/test.pdf/compute_bytes 107.5±1.49µs 124.7±0.65µs +16.00%
compute_bytes_large/compute_bytes 134.6±0.69µs 170.1±0.86µs +26.37%
compute_bytes_medium/compute_bytes 27.6±0.41µs 27.0±1.03µs -2.17%
compute_bytes_small/compute_bytes 127.2±2.08ns 127.8±1.54ns +0.47%
index_build/index_build/../test-assets/ 159.7±1.00µs 158.7±0.96µs -0.63%

Copy link

Benchmark for 4606153

Click to view benchmark
Test Base PR %
../test-assets/lena.jpg/compute_bytes 13.3±0.04µs 13.4±0.22µs +0.75%
../test-assets/test.pdf/compute_bytes 110.6±0.57µs 108.9±0.84µs -1.54%
compute_bytes_large/compute_bytes 136.5±2.05µs 137.0±1.57µs +0.37%
compute_bytes_medium/compute_bytes 27.8±0.18µs 26.9±0.30µs -3.24%
compute_bytes_small/compute_bytes 127.5±1.92ns 128.4±7.36ns +0.71%
index_build/index_build/../test-assets/ 158.8±6.51µs 157.8±2.37µs -0.63%

@twitu twitu merged commit 4856876 into main Mar 31, 2024
2 checks passed
tareknaser added a commit to tareknaser/ark-rust that referenced this pull request Apr 6, 2024
* Add FileStorage logic, example and documentation

Co-Authored-by: Ishan Bhanuka <[email protected]>
Co-Authored-by: Pushkar Mishra <[email protected]>
Co-Authored-by: Tarek <[email protected]>
Co-Authored-by: Kirill Taran <[email protected]>

* refactor done

Signed-off-by: Pushkar Mishra <[email protected]>

* fix cargo.toml

Signed-off-by: Pushkar Mishra <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Add doc comment for erase

* feat(fs-storage): refactor CLI write cmd to accept key-value pairs

Signed-off-by: Tarek <[email protected]>

---------

Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Tarek <[email protected]>
Co-authored-by: Pushkar Mishra <[email protected]>
Co-authored-by: Tarek <[email protected]>
Co-authored-by: Kirill Taran <[email protected]>
Co-authored-by: Tarek Elsayed <[email protected]>
tareknaser added a commit to tareknaser/ark-rust that referenced this pull request Apr 9, 2024
* Add FileStorage logic, example and documentation

Co-Authored-by: Ishan Bhanuka <[email protected]>
Co-Authored-by: Pushkar Mishra <[email protected]>
Co-Authored-by: Tarek <[email protected]>
Co-Authored-by: Kirill Taran <[email protected]>

* refactor done

Signed-off-by: Pushkar Mishra <[email protected]>

* fix cargo.toml

Signed-off-by: Pushkar Mishra <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Update fs-storage/src/file_storage.rs

Co-authored-by: Tarek Elsayed <[email protected]>

* Add doc comment for erase

* feat(fs-storage): refactor CLI write cmd to accept key-value pairs

Signed-off-by: Tarek <[email protected]>

---------

Signed-off-by: Pushkar Mishra <[email protected]>
Signed-off-by: Tarek <[email protected]>
Co-authored-by: Pushkar Mishra <[email protected]>
Co-authored-by: Tarek <[email protected]>
Co-authored-by: Kirill Taran <[email protected]>
Co-authored-by: Tarek Elsayed <[email protected]>
@kirillt kirillt deleted the file-storage branch April 12, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

Sample for storages fs-storage: Implement storages in Rust
4 participants