Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreter for witgen effects #2301

Merged
merged 27 commits into from
Jan 13, 2025
Merged

Interpreter for witgen effects #2301

merged 27 commits into from
Jan 13, 2025

Conversation

pacheco
Copy link
Collaborator

@pacheco pacheco commented Jan 2, 2025

Implementation of an interpreter for jit witgen effects.
Performance seems to be around 0.20x of the compiled jit code (poseidon benchmark).

  • Disabled by default, can be enabled setting POWDR_JIT_INTERPRETER=1.
  • Also introduces a way to disable JIT in general, with POWDR_JIT_DISABLE=1.

}

// Execute the machine effects for the given the parameters
pub fn call<Q: QueryCallback<T>>(&self, params: WitgenFunctionParams<'_, T>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a higher-level interface than WitgenFunctionParams? They require unsafe code to be used and are tailored towards an FFI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CompactDataRef<'_, T>, would be the right thing to pass

idx
};

// load known inputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this code would be perfect inside a function called load_known_inputs ;)

RPNExpression { elems }
}

fn to_rpn_inner(&self, elems: &mut Vec<RPNExpressionElem<T, V>>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried about machine stack size or is the RPN transformation a performance optimization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from some profiling, the "evaluate expression" code was a hot spot... RPN was quite a bit faster.
I'm not sure if its just due to avoiding recursion or because the whole expression is kept in a vec (avoiding pointers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thought, about the same, locality might be a thing.

Could you move this function to the interpreter, though? I think SymbolicExpression should not care that there is a thing called RPNExpressionElem.

}
}

// the following functions come from the interface.rs file also included in the compiled jit code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it's duplicated? Any way it could be reused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I was lazy because the include was not part of the module tree, let me try and improve this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just use CompactDataRef

Copy link
Collaborator

@Schaeff Schaeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some mostly non-blocking comments

)
.unwrap()
let compiled_jit =
!matches!(std::env::var("POWDR_JIT_INTERPRETER"), Ok(val) if val == "1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like the use of environment variables. Could we move them as far up as possible? In my opinion, it should just be a configuration setting in the constructor of FuncitonCache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the other env var (for fully disabling jit), can we just keep this one as is for the moment (while interpreter is fully integrated) for easy testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I would prefer to remove the interpreter from function_cache for the time being...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed it completely

@chriseth
Copy link
Member

chriseth commented Jan 3, 2025

While this is a great piece of software, I think we should actually just not at all integrate it for now. The FunctionCache is still specialized for the BlockMachineProcessor (i.e. it does not support dynamic machines), and it will be even harder to disentangle if we also have to deal with the interpreter as well.

) -> Self {
let mut actions = vec![];
let mut var_idx = HashMap::new();
let mut vars = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var_counter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic moved to struct

let idx = map_var_idx(var);
actions.push(InterpreterAction::AssignExpression(
idx,
e.map_variables(&mut map_var_idx).to_rpn(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to perform variable mapping and rpn conversion in a single struct function where the variable mapping is internal state of the struct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

match action {
InterpreterAction::AssignExpression(idx, e) => {
let val = self.evaluate_expression(&mut eval_stack, &vars, e);
assert!(vars[*idx].replace(val).is_none());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this runtime assertion, we can actually check in the conversion inside the new function if each variable is written to exactly once (and only read after it has been written).

let mut arg_values: Vec<_> = arguments
.iter()
.map(|a| match a {
MachineCallArgument::Unknown(v) => (Some(v), Default::default()),
Copy link
Member

@chriseth chriseth Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Some(v) used anywhere?

Do you think there is a way to directly store a &mut of the variable inside args instead of having to construct this second arg_values vector?

Then we also don't need the second "write output variables" step. Of course you need to first store a Some(0) - but maybe we could also just initialize all the variables with 0 once we checked consistency (i.e. all variables written exactly once and only read after written)?

Copy link
Collaborator Author

@pacheco pacheco Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed things around a bit here: the knowns/unknowns are transformed into variable indexes (known arguments generate a previous assignment action)

for the second part your comment, i managed to do it, but the compiler didn't like it (needed a little unsafe)

}
}

// evaluates an expression using the provided variable values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// evaluates an expression using the provided variable values
/// Evaluates an expression using the provided variable values


// evaluates an expression using the provided variable values
fn evaluate_expression(
&self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is unused as far as I can see. can this be a function on RPNExpresssion?

BinaryOperator::Mul => left * right,
BinaryOperator::Div => left / right,
BinaryOperator::IntegerDiv => T::from(
left.to_integer().try_into_u64().unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to turn this into arbitrary integer. Or (maybe better) we require division on FieldElement::Integer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where's the right place to put the Div bound on FieldElement::Integer, if that's what you mean

};

use num_traits::Zero;
use powdr_number::FieldElement;

use crate::witgen::range_constraints::RangeConstraint;

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum RPNExpressionElem<T: FieldElement, S> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the interpreter?

/// A value that is known at run-time, defined through a complex expression
/// involving known cells or variables and compile-time constants.
/// Each of the sub-expressions can have its own range constraint.
#[derive(Debug, Clone)]
#[derive(Debug, Clone, PartialEq, Eq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialEq is a recursive operation - where do you need it? Maybe it can be done based on pointers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although maybe the PartialEq operation of Arc shortcuts if the pointers are the same...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm I don't recall why I added it, ill see if its needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 274 to 282
InterpreterAction::ReadCell(idx, _) => {
set.insert(*idx);
}
InterpreterAction::ReadParam(idx, _) => {
set.insert(*idx);
}
InterpreterAction::AssignExpression(idx, _) => {
set.insert(*idx);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
InterpreterAction::ReadCell(idx, _) => {
set.insert(*idx);
}
InterpreterAction::ReadParam(idx, _) => {
set.insert(*idx);
}
InterpreterAction::AssignExpression(idx, _) => {
set.insert(*idx);
}
InterpreterAction::ReadCell(idx, _) | InterpreterAction::ReadParam(idx, _) | InterpreterAction::AssignExpression(idx, _) => {
set.insert(*idx);
}

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 6ff230c Previous: 0b25a80 Ratio
jit-benchmark/sqrt_879882356 18085 ns/iter (± 39) 2603 ns/iter (± 1) 6.95
jit-benchmark/sqrt_1882356 13943 ns/iter (± 28) 2086 ns/iter (± 1) 6.68
jit-benchmark/sqrt_1187956 13613 ns/iter (± 17) 2063 ns/iter (± 1) 6.60
jit-benchmark/sqrt_56 7049 ns/iter (± 6) 1229 ns/iter (± 2) 5.74
jit-benchmark/sort_33 386485 ns/iter (± 466) 71906 ns/iter (± 99) 5.37
jit-benchmark/sort_100 1507571 ns/iter (± 1204) 269179 ns/iter (± 387) 5.60
jit-benchmark/sort_300 6501239 ns/iter (± 4108) 1035405 ns/iter (± 1164) 6.28
jit-benchmark/sort_900 34087991 ns/iter (± 26144) 4374283 ns/iter (± 7843) 7.79
jit-benchmark/sort_2700 222377682 ns/iter (± 104891) 20974189 ns/iter (± 70438) 10.60

This comment was automatically generated by workflow using github-action-benchmark.

log::trace!("Calling cargo...");
let r = powdr_jit_compiler::call_cargo(&code);
log::trace!("Done compiling, took {:.2}s", start.elapsed().as_secs_f32());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, must've been an oversight

@@ -227,7 +225,7 @@ extern "C" fn witgen(
/// Returns an iterator over all variables written to in the effect.
/// The flag indicates if the variable is the return value of a machine call and thus needs
/// to be declared mutable.
fn written_vars_in_effect<T: FieldElement>(
pub fn written_vars_in_effect<T: FieldElement>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be pub, then maybe turn it into a function of Effect?

use std::collections::{BTreeSet, HashMap};

/// Witgen effects compiled into interpreter instructions.
pub struct EffectsInterpreter<T: FieldElement> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterpreterEffects? InterpreterCode?

I am actually thinking about splitting up Effects into Effects (used at inference time) and Code (used for the final code), since they do not really overlap that much any more. Just as a side note...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, this is the actual interpreter. all good :)

params: &mut [LookupCell<T>],
data: CompactDataRef<'_, T>,
) {
let mut vars = vec![T::zero(); self.var_count];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually keep that one in the state - but can be done in another PR. There is also no need to reset it to zero between calls, since we checked that nothing is read before it is written.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i've done exactly this in the other branch I have

vars[*idx] = val;
}
InterpreterAction::ReadCell(idx, c) => {
let cell_offset: usize = c.row_offset.try_into().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to convert data.row_offset to i32 instead, do the addition in i32 and then convert back to usize.

vars[*idx] = get_param(params, *i);
}
InterpreterAction::WriteCell(idx, c) => {
let cell_offset: usize = c.row_offset.try_into().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@chriseth
Copy link
Member

chriseth commented Jan 9, 2025

Only some cosmetic comments left. It looks great!
Could you add some tests to the interpreter? Something like running the block machine processor on "../test_data/pil/binary.pil" and executing one block?

@chriseth
Copy link
Member

chriseth commented Jan 9, 2025

Or poseidon for that matter.

@chriseth
Copy link
Member

chriseth commented Jan 9, 2025

although it would be nice to have a submachine call in it.

@@ -60,6 +60,27 @@ pub struct Assertion<T: FieldElement, V> {
pub expected_equal: bool,
}

impl<T: FieldElement> Effect<T, Variable> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it is between struct Assertion and its impl. Can you move it up to effect?

data.append_new_rows(31);
let data_ref = CompactDataRef::new(&mut data, 0);
interpreter.call(&mutable_state, &mut param_lookups, data_ref);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an assertion about the outputs?

@chriseth chriseth added this pull request to the merge queue Jan 13, 2025
Merged via the queue into main with commit 3eba5c4 Jan 13, 2025
16 checks passed
@chriseth chriseth deleted the jit-interpreter branch January 13, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants