Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: introduce alternate global context scheme based on #346 #539

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

apoelstra
Copy link
Member

This implements, but does not use, the new global context scheme described in #346. As described in #538, we want to actually use the global context API throughout the entire codebase, rather than maintaining parallel global-context and non-global-context APIs.

This is in its own commit so that we can argue about it independently
of the rest of the PR. I believe this is correct and safe because the
docs for `AtomicPtr` say that (a) it is #[repr(C)] and (b) that it
has the same in-memory representation as a *mut T.
This covers both the std and no-std cases. This is perhaps an unconventional
strategy; see rust-bitcoin#346 for
discussion leading up to it.
Weirdly we have no tests that don't require std or alloc. We should
revisit this. In many cases it is because of the dependency on
rand-std but we should really try to narrow that down to specific
tests.
This commit should probably be removed before merging; it can be part
of the next PR.
@apoelstra apoelstra mentioned this pull request Nov 25, 2022
6 tasks
Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code is unsound. First how I understand what the code is doing:

There are two contexts, one gets re-randomized, the other doesn't. The pointer to currently usable context is stored in secp256k1_context_signing_1. When the context is being re-randomized it atomically sets a "busy", flag switches the pointers, modifies the context, then restores the pointers and finally unsets the busy flag.

The problem with this is that when setting the busy flag we don't know if the context is still being used by other thread (for reading). So it could happen that another thread holds the pointer it obtained before while it's being modified.

The only possible fix I see here is to implement RW lock - count the number of threads using the primary context and only rerandomize if there are none. It kinda worries me though that this may significantly decrease the frequency of rerandomizations. One thing we should take care of is to ensure that the rerandomized context is used for signing only. Verification should always use the alternate context.

I'm also thinking about alternative design:

  • Have 3 context: sig1, sig2, verify
  • verify is constant and accessed from appropriate operations
  • A thread accessing signing context tries sig1 and if it's busy goes for sig2
  • seed is 64B - rerandomization writes the remaining 32B into a (now locked) global buffer
  • A thread that is releasing read lock tries to obtain write lock for sig2 to rerandomize sig2
  • If successful it rerandomizes it with the seed obtained from global buffer
  • There needs to be some synchronization to ensure only one context is locked.
  • This whole thing probably should be modeled as a queue or something similar.

The expected outcome is that while it may happen that two threads reused the same context at least they wouldn't use a default, completely non-randomized one.


thread_local! {
pub static CONTEXT: RefCell<Secp256k1<All>> = RefCell::new(Secp256k1::new());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually use pure UnsafeCell and just make sure we don't access it twice from our methods nor call any user-provided callbacks. Maybe not worth optimizing though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this, but not on the first iteration. RefCell will let us do development "in debug mode" :)

///
/// Safety: you are being given a raw pointer. Do not mutate through it. Do not
/// move it, or any reference to it, across thread boundaries.
pub unsafe fn with_global_context<F: FnOnce(*const ffi::Context) -> R, R>(f: F) -> R {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use &ffi::Context instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, neat! I spent so long going back and forth on ffi::Context vs Secp256k1<All> vs some custom type that I didn't even consider the pointer type.

This is a good idea.

// The reason is that a panic at this point in the code would leave
// `LOCK` at true (as well as both `signing_1` and `signing_2` pointing
// at the same, never-rerandomized, place) and therefore prevent any
// future rerandomizations from succeeding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could still panic after unlocking 🤷‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, good idea! Again, I was so worked up over an unrelated thing (whether to call randomize or let the user pass in an arbitrary-maybe-panicking closure) that I didn't consider this.

Code review ftw.


unsafe {
// Obtain a pointer to the alternate context.
let alt_ctx = ffi::secp256k1_context_signing_2.load(Ordering::Acquire);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never modified so we shouldn't need atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. It was modified in a first iteration of this, and my on-paper design (when I was trying to use AtomicPtr::swap on two AtomicPtrs directly, which annoyingly you cannot do).

But as you noticed in your above comment, this whole scheme doesn't seem to work, so it's a moot point.

@apoelstra
Copy link
Member Author

The problem with this is that when setting the busy flag we don't know if the context is still being used by other thread (for reading). So it could happen that another thread holds the pointer it obtained before while it's being modified.

Ah, yes, you're correct. The pointers are okay but the actual context object may be seen while it's being modified. Dammit.

The only possible fix I see here is to implement RW lock

OK, this isn't too bad. I was scared of doing this because it could mean that reads might fail due to contention, but now I see that this won't be an issue.

I'll think a bit about whether having a rwlock would let me get rid of the second context object and the pointer-swaps, which would be nice (and might make this upstreamable).

It kinda worries me though that this may significantly decrease the frequency of rerandomizations

So, in theory if you have multiple threads constantly signing in a tight loop, it would reduce the frequency to 0 (since the RW lock would never count down to zero). It's not really a fair comparison, but this is the same as the current situation ;). Less unfairly, I don't think this is going to be a very common case -- even if you are signing in a multi-threaded context, I'd expect contention to only happen in short bursts, and if you can pull a rerandomization off even once in 100 sigantures that offers significant protection. (It's hard to quantify this stuff because the attack model is so murky "the attacker can see everything you're doing with the key...except not the actual key". But "typical" timing attacks on not-egregiously-broken code take well over 100 signatures to get enough signal to usefully weaken a key.)

In practice I think it'll be really rare to be signing in more than one thread anyway, and verification we already have planned to use a different context (the existing _no_precomp one) (with no synchronization). But if we can solve this, we should.

I'm also thinking about alternative design:

Heh, I like this in princple, but given that (a) we have no real locking facilities other than spinlocks which we've decided to eschew; (b) even having unbounded queues will be difficult.

@Kixunil
Copy link
Collaborator

Kixunil commented Nov 27, 2022

having a rwlock would let me get rid of the second context object

I don't see how. I don't suggest traditional RW lock but fall back to the read-only context when the lock is locked for writing. This read only context must exist. Otherwise we would have to wait which we don't want to.

But "typical" timing attacks on not-egregiously-broken code take well over 100 signatures to get enough signal to usefully weaken a key.

That's pretty comforting but I was pointing out at fall backs: if it happens that signing falls back to the constant, attacker-known context enough times the attacker might extract the key. Ironically, the more often re-randomization happens the more likely it is to trigger this. My idea with three contexts prevents that. (We could have two but then verifications would cause contention.)

even having unbounded queues will be difficult.

I meant bounded queue containing two items. But since then I tried to think about that and came up with a design without a queue:

Data structures

Contexts

  • extern "C" static mut GLOBAL_SIGN_CONTEXTS: [ffi::Context; 2];
  • extern "C" static GLOBAL_VERIFY_CONTEXT: ffi::Context;
  • static CONTEXTS_DIRTY: [AtomicBool; 2]; (Should have better perf than single AtomicU8 but I think some platforms need to lock a chunk of memory so maybe not)

Main lock

The main lock is AtomicUsize which stores two flags in the lowest bits and the reader count in the remaining bits. Thus atomically adding or subtracting 4 from it modifies the counter.
The two flags are:

  • Which context is active - least significant (0b1)
  • Needs swap - second least significant (0b10)

Rerandomization lock

Simple AtomicBool - true is locked, false is unlocked

Seed

[AtomicU8; 32] stores the seed for RNG. Notably it doesn't matter that a thread may read "inconsistent" content because it's all random data. If the array is being overwritten while being read it cannot worsen entropy and the exact data doesn't matter.

RNG

RNG is HMAC(seed, counter) where each rerandomization bumps the counter. The counter is behind rerandomization lock and doesn't need to be atomic.

Signing thread operations

  1. Add 4 with Acquire to main lock assigning the current lock value to last_lock
  2. Panic if last_lock >= usize::MAX / 2 - having usize::MAX threads should be impossible so if this happens it's because of a bug
  3. let ctx = &GLOBAL_SIGN_CONTEXTS[usize::from(last_lock & 1)];
  4. Sign with ctx
  5. CONTEXTS_DIRTY[usize::from(last_lock & 1)].set(true, Relaxed);
  6. Subtract 4 with Relaxed from main lock assinngning the current lock value to last_lock
  7. if last_lock & !1 == 0b10 { MAIN_LOCK.compare_xchange(last_lock, (last_lock & 1) ^ 1, Relaxed, Relaxed) }
  8. If the CAS in the previous step succeeded attempt to re-randomize

Reseeding

Public reseeding function takes 64B seed.

  1. RERAND_LOCK.compare_exchange(false, true, Acquire, Relaxed)
  2. If successful rerandomize if needed using first 32B and release the lock with Release
  3. Unconditionally overwrite the global seed with the remaining 32B

Alternatively we could take 32B, first overwrite the seed and then call rerandomization. Probably simpler code but calls into HMAC even in hypothetical case when supplying 64B is cheaper.

Rerandomization

  1. (unless previous operations already locked the lock) RERAND_LOCK.compare_exchange(false, true, Acquire, Relaxed)
  2. let prev_main = MAIN_LOCK.fetch_and(!0b10, Relaxed) - clear needs swap flag to avoid swap while this thread is mutating
  3. if prev_main & 0b10 == 0b10 - the context was already rerandomzied and doesn't need to be again. Restore the needs swap flag first, release RERAND_LOCK, try to swap if prev_main & !0b11 == 0, try rerandomizing again if swap succeeded, return otherwise.
  4. If !CONTEXTS_DIRTY[usize::from((prev_main & 0b01) ^ 1)].load(Relaxed) release RERAND_LOCK with Relaxed` and return
  5. Rerandomize GLOBAL_SIGN_CONTEXTS[usize::from((prev_main & 0b01) ^ 1)]
  6. CONTEXTS_DIRTY[usize::from((prev_main & 0b01) ^ 1)].store(false, Relaxed)
  7. let prev_main = MAIN_LOCK.fetch_or(0b10, Release)
  8. RERAND_LOCK.store(false, Release)
  9. MAIN_LOCK.compare_exchange((prev_main & 1) | 0b10, (prev_main & 1) ^ 1, Relaxed, Relaxed) - we held the lock so the other thread may have failed at rerandomizing
  10. If succeeded repeat rerandomization

@apoelstra
Copy link
Member Author

Just letting you know that I haven't forgotten about this, but I've read it twice and need to read it some more times before I can grok everything that's going on.

@Kixunil
Copy link
Collaborator

Kixunil commented Nov 29, 2022

If it helps the main idea is for signing threads to use whichever one the two contexts while the other is available for rerandomization. When there's no contention the contexts are swapped so that the used one can be re-randomized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants