-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Saffron] Add update function #3004
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3004 +/- ##
==========================================
- Coverage 76.80% 76.72% -0.08%
==========================================
Files 261 261
Lines 61882 62079 +197
==========================================
+ Hits 47530 47633 +103
- Misses 14352 14446 +94 ☔ View full report in Codecov by Sentry. |
saffron/e2e-test.sh
Outdated
fi | ||
} | ||
|
||
perturb_bytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, no it wasn't. I have replaced it with a python script instead because perl is like hieroglyphics to me. But actually this makes me realize that the user will should keep all the commitments on their end, not just the folded commitment with powers of alpha
. As of right now, they cannot update them without recomputing which is dumb.
saffron/e2e-test.sh
Outdated
ENCODED_FILE="${INPUT_FILE%.*}.bin" | ||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" | ||
PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}" | ||
ENCODED_DIFF_FILE="${ENCODED_FILE%.*}_diff.bin" | ||
DECODED_PERTURBED_FILE="${PERTURBED_FILE}-decoded${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want to remove the extension? i.e. ${PERTURBED_FILE%.*}
DECODED_PERTURBED_FILE="${PERTURBED_FILE%.*}-decoded${PERTURBED_FILE##*.}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok all the file names should be fixed now
saffron/e2e-test.sh
Outdated
ENCODED_FILE="${INPUT_FILE%.*}.bin" | ||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" | ||
PERTURBED_FILE="${INPUT_FILE%.*}_perturbed${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unify the usage of "-" and "_".
saffron/e2e-test.sh
Outdated
perturb_bytes "$INPUT_FILE" "$PERTURBED_FILE" 0.1 | ||
|
||
echo "Calculating diff for upated $INPUT_FILE (stored updated data in $PERTURBED_FILE)" | ||
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" $SRS_ARG | |
cargo run --release --bin saffron calculate-diff --old "$INPUT_FILE" --new "$PERTURBED_FILE" -o "$ENCODED_DIFF_FILE" "$SRS_ARG" |
(and the others below, to make shellcheck happy)
saffron/src/blob.rs
Outdated
.map(|i| { | ||
d.get(&i) | ||
.copied() | ||
.unwrap_or(<G as AffineRepr>::ScalarField::zero()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.unwrap_or(<G as AffineRepr>::ScalarField::zero()) | |
.unwrap_or(G::ScalarField::zero()) |
And you can remove 7 | use ark_ec::AffineRepr;
#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")] | ||
pub struct Diff<F> { | ||
#[serde_as(as = "Vec<HashMap<_, o1_utils::serialization::SerdeAs>>")] | ||
pub evaluation_diffs: Vec<HashMap<usize, F>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a vector is more efficient, if you don't need to access by the modified index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that updates can be sparse enough that using a full vector here would be overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a vector of type Vec<(idx, F)>, or a linear memory layout where idx and F are inlined (tuples are not, I think).
saffron/src/utils.rs
Outdated
#[serde_as] | ||
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)] | ||
#[serde(bound = "F: CanonicalDeserialize + CanonicalSerialize")] | ||
pub struct Diff<F> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can restrict it to PrimeField
directly.
saffron/src/utils.rs
Outdated
.par_iter() | ||
.zip(old_elems) | ||
.map(|(n, o)| { | ||
n.iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to optimize further by keeping the hashmap, you can also parallelize here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a rayon
pro, but I have a bad feeling about using par_iter
with something that can be really long (like thousands of elements) where the work is trivial (in this case subtracting field elems). On top of that, multiple nested par_iter
.
I don't have a problem using par_iter
on the chunks of size domain_size
since even a 500MB
blob is like 260 chunks (i.e. polynomials) and the work per chunk can be substantial.
If you know a reason I shouldn't worry about this, can you post a doc link? otherwise I would test it / time it before agreeing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simply benchmark. I did this in this branch: 3e583bb.
You can run using: cargo run --bin benchmark_diff --release -p saffron -- 22
for polynomials of size 2^22.
You get the following results:
> cargo run --bin benchmark_diff --release -p saffron -- 20
Time for difference: 16.75714ms
Time for difference in parallel: 5.640703ms
> cargo run --bin benchmark_diff --release -p saffron -- 16
Time for difference: 1.151862ms
Time for difference in parallel: 2.718978ms
> cargo run --bin benchmark_diff --release -p saffron -- 18
Time for difference: 4.807731ms
Time for difference in parallel: 2.612072ms
> cargo run --bin benchmark_diff --release -p saffron -- 17
Time for difference: 2.232708ms
Time for difference in parallel: 2.420316ms
This shows that for a domain size of 16 or 17, it is less efficient.
However, the difference is not negligible from 2^18.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would skip the inner parallelization for now as also, it might depend on how dense the polynomials/states are.
|
||
fn random_perturbation(threshold: f64, data: &[u8]) -> Vec<u8> { | ||
let mut rng = rand::thread_rng(); | ||
data.iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
par_iter
also here I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this comment
saffron/e2e-test.sh
Outdated
@@ -11,8 +11,34 @@ SRS_ARG="" | |||
if [ $# -eq 2 ]; then | |||
SRS_ARG="--srs-filepath $2" | |||
fi | |||
|
|||
ENCODED_FILE="${INPUT_FILE%.*}.bin" | |||
DECODED_FILE="${INPUT_FILE%.*}-decoded${INPUT_FILE##*.}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this doesn't handle the missing dot. And what about file without dot?
saffron/src/blob.rs
Outdated
domain: &Radix2EvaluationDomain<G::ScalarField>, | ||
diff: Diff<G::ScalarField>, | ||
) { | ||
let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what we want to do, it is not efficient. You only want to use the Lagrange polynomials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is what I figured as well, but I saw while working on this that it isn't part of the SRS
(just the commitments). I didn't see another type in the codebase that manages this cache, am i missing something?
If we need to make something, this update needs the lagrange polynomials in the monomial basis since this is what the data is stored in. If it's an expensive one time set up (seems like it to me), can I suggest leaving a TODO and a follow up PR which:
- creates a type that manages the cache of lagrange polynomials for a given domain
- provied (de)serialize instances
- add a cli arg to load the cache from a file, apply it where necessary
- add a script that generates it and serializes it, cache it for CI like we do with srs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to store the polynomials in the monomial basis, this has to be fixed as said in DM. You keep the evaluations. From there, you simply commit using the SRS on the diff.
saffron/src/blob.rs
Outdated
domain: &Radix2EvaluationDomain<G::ScalarField>, | ||
diff: Diff<G::ScalarField>, | ||
) { | ||
let updates: Vec<(usize, PolyComm<G>, DensePolynomial<G::ScalarField>)> = diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add also some timers, activated while in debug more. We want to know how fast it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments.
Thanks for the thorough review! Your comments made me realize I need to do a little work RE managing commitments on the user and server side. I would like to put this back in draft for now, and submit that work as a separate PR before continuing here. In the meantime can you address questions about |
…r throw exception
@dannywillems I managed to correct my errors and get this working, am currently breaking it up into smaller pieces. Starting with #3005 and #3006. Going to close this as review is no longer needed. I will cherry-pick the scripts and some of the CLI stuff in more follow ups |
Summary
The user needs to be able to update data with the storage provider. This is done by posting "diff" objects, not by simply posting the raw updated data. This PR implements this feature.
Changes
Diff
type and utils function for computing aDiff
over binary data.FieldBlob
type. Add tests for random updates on random data.calculate-diff
: used by the user to calculate theDiff
object using theold
andnew
files as input 6eeb13eupdate
: Ask the storage provider to update some given data using theDiff
object calculated by(1)
61bb85f