Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 'MlockManager' to prevent early 'munlock'ing #34

Closed
wants to merge 1 commit into from

Conversation

DrPeterVanNostrand
Copy link
Contributor

@DrPeterVanNostrand DrPeterVanNostrand commented Sep 14, 2018

  • Added MlockManager to secrets module.
  • Prevents unnecessary/duplicate calls to mlock for a page in memory, stops pages from being unlocked when they still contain secret values.
  • Added dependency page_size.

Closes issue #31

/// Round `ptr` down to the nearest page boundry (i.e returns the first address in the page that
/// contains `ptr`).
fn get_page_addr(ptr: *const u8) -> *const u8 {
let offset = ptr as usize % *PAGE_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this calculation could be incorrect if the memory referenced by ptr is not allocated page-aligned (by something like valloc, etc.). In other words, I'm not sure that pages always necessarily line up with 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm sure that this doesn't generalize at all. My thoughts were that ptr will only ever point to an Fr or to an [Fr], which are aligned because Fr is just a wrapper around [u64; 4].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone knows of a better way to get the first address in a page, I'm happy to change to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can micro-optimize that into a single bitmask instead of modulo by calculating the inverse, if we are assuming PAGE_SIZE is always a power of 2, which the compiler cannot assume at compile. But since that's a lot harder to read, prime case for annoying bugs and not on the hot path, I suggest you do not follow my suggestions =).

@c0gent
Copy link
Contributor

c0gent commented Sep 15, 2018

Code like this can be highly prone to nasty bugs. I'd recommend some significantly more comprehensive testing. Perhaps @mbr could suggest some things.

@DrPeterVanNostrand
Copy link
Contributor Author

DrPeterVanNostrand commented Sep 17, 2018

I'd recommend some significantly more comprehensive testing

I was considering doing a multithreaded test, where for example, I would create 5 threads each of which creates its own SecretKey, then the main thread asserts that the locked page counter is accurate. But anyone who has any suggestions for more/better tests, I'm happy to implement them.

@afck
Copy link
Collaborator

afck commented Sep 17, 2018

Since we're already building a dedicated MlockManager, should it maybe even deal with the allocation itself? I.e. should it allocate dedicated locked pages for secrets and hand out space in them on request? Because currently we're probably mlocking waaay more memory than strictly required: our secret data is scattered all over the place and wherever a bit of it happens to be allocated, we mlock the whole page.
Locked memory should probably be treated as a scarce resource, since locking causes overhead and has a limit.

(I'm beginning to have doubts whether it's worth all that trouble and whether it even makes sense to have this enabled by default. Some of the comments here make a few good points against it.)

Copy link
Contributor

@mbr mbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks aside, I am not sure this is the way to go - "locked memory efficiency" is going to be random, right? At worst, we will lock PAGE_SIZE memory for each secret?

A tiny allocator might be a better alternative, I think.

/// Round `ptr` down to the nearest page boundry (i.e returns the first address in the page that
/// contains `ptr`).
fn get_page_addr(ptr: *const u8) -> *const u8 {
let offset = ptr as usize % *PAGE_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can micro-optimize that into a single bitmask instead of modulo by calculating the inverse, if we are assuming PAGE_SIZE is always a power of 2, which the compiler cannot assume at compile. But since that's a lot harder to read, prime case for annoying bugs and not on the hot path, I suggest you do not follow my suggestions =).

}

/// Returns the total number of pages currently locked into RAM.
#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function might be useful outside testing for logging purposes.

@mbr
Copy link
Contributor

mbr commented Sep 17, 2018

An inefficient allocator can be somewhat concise, here's my take. Please keep in mind that this is a 1 hour effort, so treat it like a sketch: https://gist.github.com/mbr/9d2b8481a2668dfde1829f1fa7e10a72

@mbr
Copy link
Contributor

mbr commented Sep 17, 2018

Since we're already building a dedicated MlockManager, should it maybe even deal with the allocation itself? I.e. should it allocate dedicated locked pages for secrets and hand out space in them on request? Because currently we're probably mlocking waaay more memory than strictly required: our secret data is scattered all over the place and wherever a bit of it happens to be allocated, we mlock the whole page.
Locked memory should probably be treated as a scarce resource, since locking causes overhead and has a limit.

I should refresh the reviews page while writing reviews, but yes, I agree. See the gist for a proposed solution-like.

Limiting to a specific size also has the advantage of being able to lock it in advance and take action immediately, reducing the likely of runtime errors ("secure memory leaks" excepted).

@afck
Copy link
Collaborator

afck commented Sep 18, 2018

So we'd have a static global (or thread-local) Slab<Fr> and use that everywhere?

I guess we could add convenience methods to get whole contiguous slices, and make the Slab automatically grow (until it reaches the limit)?

@mbr
Copy link
Contributor

mbr commented Sep 18, 2018

So we'd have a static global (or thread-local) Slab and use that everywhere?

That's one way of doing it. Two choices here, if we use a Mutex internally, we get a Send + Sync slab, with a Cell a Slab that is limited to a single thread. It's basically the same kind of distinction that exists between Rc and Arc.

Those are the options for per-thread or per-program instances. If we can keep it to a reasonable amount of slabs, we could go a little more fine-grained (the further down we go, the higher the chance we waste valuable lockable memory though).

I guess we could add convenience methods to get whole contiguous slices, and make the Slab automatically grow (until it reaches the limit)?

You can turn it into a real allocator; I just chose that particular implementation because it was fast to write, simple and hopefully readable.

For slices, we'd need an extra type.

Another helpful thing would probably Deref implementations, making the SlabItems fairly seamless.

@afck
Copy link
Collaborator

afck commented Jun 11, 2019

Closing in favor of #42.

@afck afck closed this Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants