-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetch JUMPDESTs through RPC #427
base: develop
Are you sure you want to change the base?
Conversation
7ad89de
to
18d5da4
Compare
6255f00
to
6613802
Compare
bdcf935
to
270256f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Einar! Aside from the inline comments, a miss in this PR is the application of JD pre-fetching for jerigon
as well, not only the native tracer.
I haven't tested it yet as I write this, but I think your current branch would crash on any jerigon payload (from the ZeroTracer) because the RPC will provide a Vec<ZeroTxResult>
which contains TxnInfo
, i.e. with the jumpdest table field already processed, as opposed to the TxnMeta
from which you're starting the processing.
I think we're also missing fallback mechanisms, wherever the procedure fails for some reason, then we should default to the current behavior, i.e. no provided JDT and let the prover do the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to address the format change on all hardcoded witnesses, either by re-generating them, or adding default empty JDTs to each of them, and leaving the prover handle them internally as before.
Indeed. Which solution do you prefer and who could I talk to about this CI setup? |
I'm not sure I understand the relation between the broken tests and the CI. We don't have to update them all (especially because some dev blocks are from an outdated devchain IIRC), but at least have them have proper format so that they can run correctly. If your question was how to test this logic of JDT pre-fetching in the CI, we could do this with the |
@einar-polygon Has it been tested against some endpoint? Should I run it against a few dozen blocks to see how it behaves? (ideally after conflicts are addressed) |
I checked that the JUMPDEST_tables were identical for the small Calcun block and a few others against the paid Quicknode endpoint. But it would be good to have a set of interesting blocks with exceptions, reverts and what not. Here is a start: I will add a bash script to iterate over them while we test. |
I'll launch a run against the devchain once it's rebased (it has a conflicting Erigon version that requires some latest PRs here) |
a0f0438
to
05e065a
Compare
I got a panic with your endpoint on block 0x13adb68
|
So some more data with your endpoint. I tried 140 consecutive blocks independently.
|
a9b13a3
to
e351ff9
Compare
zero/src/rpc/jerigon.rs
Outdated
code_db: Default::default(), | ||
}, | ||
other_data, | ||
}) | ||
} | ||
|
||
/// Processes the transactions in the given block and updates the code db. | ||
pub async fn process_transactions<ProviderT, TransportT>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think having two very similar public process_transactions
(one in native one in jerigon) doing same work and returning different result is a good idea. Could we unify the logic for both ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up moving the fetching logic to Jerigon, this would become moot.
The idea was to reuse the same overall structure from Native but fetch exactly the data needed for Jerigon instead. The alternative, if I understand you correctly, would be that each function includes a dispatching mechanism based on a tag and then return a tagged value. I don't have a preference to either, but I would like some more inputs on this before rewriting it, @Nashtare @muursh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking if we could refactor and unify the logic somehow in the common module both jerigon and native would use to retrieve the transaction data. At least, you should rename this function not to introduce confusion with the native tracer process_transactions
.
|
||
/// Processes the transaction with the given transaction hash and updates the | ||
/// accounts state. | ||
pub async fn process_transaction<ProviderT, TransportT>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about duplicated code. Could we unify the logic (e.g. if tx_trace
is passed use if, if not go fetch it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point, let's keep the discussion in comment above.
.gitignore
Outdated
@@ -23,3 +23,4 @@ | |||
################# | |||
/proofs | |||
/**/*.zkproof | |||
/witnesses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
trace!( | ||
"code: {:?}, code_addr: {:?} <============", | ||
&code, | ||
&code_addr | ||
); | ||
trace!("code_map: {:?}", &code_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep all tracing until we are happy with the computed jumpdest tables. It doesn't takes longer to re-add it than to remove it.
#[derive(PartialEq, Eq, Debug, Clone, Serialize, Deserialize, Default)] | ||
pub struct ContextJumpDests(pub HashMap<usize, BTreeSet<usize>>); | ||
|
||
/// The result after proving a `JumpDestTableWitness`. | ||
#[derive(PartialEq, Eq, Debug, Clone, Serialize, Deserialize, Default)] | ||
pub(crate) struct JumpDestTableProcessed(HashMap<usize, Vec<usize>>); | ||
|
||
/// Map `CodeAddress -> (Context -> [JumpDests])` | ||
#[derive(PartialEq, Eq, Debug, Clone, Serialize, Deserialize, Default)] | ||
pub struct JumpDestTableWitness(HashMap<H256, ContextJumpDests>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already asked but I think my comment got resolved, why do we need actual wrappers here instead of type aliases? Most of the implementation below would come for free instead of having to reimplement everything. Or if it's more convenient for merging tables we could just expose the inner map like what ContextJumpDests
is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here was my reponse:
When I was diving into this, it made it more clear what the purpose of both types was. But otherwise it provides a bit of encapsulation and type safety.
I have a mild preference for this New Type pattern over type aliases, but it is not the hill I am going to die on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's keep it.
On another note, why do we need to expose the inner map in ContextJumpDests
? Also it is public but only used in this crate I think?
info!("Generating JUMPDEST tables"); | ||
// dbg!(&self.inputs.jumpdest_table); | ||
// w for witness | ||
// let txn_idx = self.next_txn_index - 1; | ||
// let rpcw = self.inputs.jumpdest_tables[txn_idx].as_ref();contract_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to remove
let rpc: Option<JumpDestTableProcessed> = rpcw | ||
.as_ref() | ||
.map(|jdt| set_jumpdest_analysis_inputs_rpc(jdt, &self.inputs.contract_code)); | ||
|
||
let sims = simulate_cpu_and_get_user_jumps("terminate_common", self); | ||
|
||
let (sim, simw): (Option<JumpDestTableProcessed>, Option<JumpDestTableWitness>) = | ||
sims.map_or_else(|| (None, None), |(sim, simw)| (Some(sim), Some(simw))); | ||
|
||
if let (Some(rw), Some(sw)) = (rpcw, simw) | ||
&& rw != &sw | ||
{ | ||
info!("SIMW {}", sw); | ||
info!("RPCW {}", rw); | ||
assert_eq!(rw, &sw); | ||
} | ||
|
||
info!("SIM {:#?}", sim); | ||
info!("RPC {:#?}", rpc); | ||
|
||
self.jumpdest_table = if rpc.is_some() { rpc } else { sim }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would need to change, as there is no simulation if the table exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will indeed be removed. I just need to keep it around until we are happy with the results, so I can use the simulation output as ground truth.
zero/src/rpc/jumpdest.rs
Outdated
ensure!(entry.stack.as_ref().is_some(), "No evm stack found."); | ||
// We reverse the stack, so the order matches our assembly code. | ||
let evm_stack: Vec<_> = entry.stack.as_ref().unwrap().iter().rev().collect(); | ||
let operands_used = 2; // actually 6 or 7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborated.
zero/src/rpc/jumpdest.rs
Outdated
|
||
if evm_stack.len() < operands_used { | ||
trace!( "Opcode {op} expected {operands_used} operands at the EVM stack, but only {} were found.", evm_stack.len()); | ||
// maybe increment ctx here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an outdated comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked on slack.
// We reverse the stack, so the order matches our assembly code. | ||
let evm_stack: Vec<_> = entry.stack.as_ref().unwrap().iter().rev().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called 4 times, I think we could do it just once ahead for clarity before the match statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only called once per StructLog
entry. The reason I don't just do it once, and for all op
s, is that for most op
s, we do not need to do anything, so reversing the stack would be wasteful.
zero/src/rpc/jumpdest.rs
Outdated
/// Provides a way to check in constant time if an address points to a | ||
/// precompile. | ||
fn precompiles() -> &'static HashSet<Address> { | ||
static PRECOMPILES: OnceLock<HashSet<Address>> = OnceLock::new(); | ||
PRECOMPILES.get_or_init(|| { | ||
HashSet::<Address>::from_iter((1..=0xa).map(|x| Address::from(U160::from(x)))) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is network specific, i.e. see trace_decoder/src/core.rs
related code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rg "precompile" -i trace_decoder
trace_decoder/src/core.rs
499: const PRECOMPILE_ADDRESSES: Range<alloy::primitives::Address> =
503: if receipt.status || !PRECOMPILE_ADDRESSES.contains(&addr.compat()) {
I don't see anything network specific here, but we could at least share the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done sharing.
@Nashtare @einar-polygon Just to summarize our previous discussion (#427 (comment)), we will remove jerigon changes and keep JUMPDEST generation only with native tracer? |
@einar-polygon I briefly looked at some Eth mainnet blocks, it seems there are a few contract deployments that return a valid, but empty JDT, which would then cause the prover to fail as missing proofs. |
Moving this to slack |
Please share block numbers. |
Tried from 20727640 to 20727650. It's visible with block
|
From the discussion on Slack: This PR should add a CLI parameter allowing switching between RPC and Simulation (i.e. setting
|
zero/src/rpc/mod.rs
Outdated
pub enum JumpdestSrc { | ||
Simulation, | ||
Zero, | ||
Jerigon, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing. Zero and Jerigon sound like they are the same source (or what else do you mean by Zero?) also we default jerigon nodes to Simulation so why do we have a Jerigon option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. How about:
pub enum JumpdestSrc {
ProverSimulation,
ClientFetchedStructlogs,
ServerFetchedStructlogs, // later
Serverside, // later
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atanmarko What do you think about these names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if they are convenient to type into cli, but OK
{ | ||
let tx_traces = tx_trace | ||
.iter() | ||
.map(|(h, t)| (Address::from(h.to_fixed_bytes()), t.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning TxnTrace
is not a good idea. Lot of overhead.
There could be 2 solutions:
- Define
generate_jumpdest_table
with:
tx_traces: impl Iterator<Item = (&'a Address, &'a TxnTrace)>,
- Convert
tx_trace
H160 to Address when you initially create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing here is that we use both type Address = H160
and alloy_primitives::Address
. But anyway, conversion should be done so that there is no unnecessary cloning of data.
resolves #290