Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing API #32

Merged
merged 9 commits into from
Jul 25, 2018
Merged

Tracing API #32

merged 9 commits into from
Jul 25, 2018

Conversation

chfast
Copy link
Member

@chfast chfast commented May 11, 2018

Fixes #22.

Copy link
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has a serious lack of comments :)

size_t code_offset,
enum evmc_status_code status_code,
int64_t gas_left,
const struct evmc_uint256be* top_stack_item,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have a stack count? Does depth above mean stack height or call depth?


typedef void (*evmc_trace_callback)(struct evmc_tracer_context* context,
int depth,
int step,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called pc?

typedef void (*evmc_trace_callback)(struct evmc_tracer_context* context,
int depth,
int step,
size_t code_offset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with "pc" name I have is not everyone knows exactly that it is exactly. I think code_offest === pc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is code_offset and what is step ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step is the number of executed instructions so far (only increases).
The code_offset is the position in the code where we are currently. It's different than the step because of jump and push instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution_step perhaps? Not sure, step, without any documentation gave me the wrong indication about what it is.

Why don't you want to have a documentation above it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove this because this can be implemented easily on the client side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add documentation soon. I just need to finish other PRs first.

@chfast
Copy link
Member Author

chfast commented May 15, 2018

Has a serious lack of comments :)

Intentionally. I expect this to change a lot in the first phase. And I want this to be clear enough without comments.

size_t changed_memory_size,
const uint8_t* changed_memory);

typedef void (*evmc_set_tracer)(struct evmc_instance* instance,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return a pointer to previously set callback?
Why do you need to declare a pointer to function (evmc_set_tracer) if it's just to declare one function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can return previous callback, but it would be hard to also return the context, so you will not be able to restore previous state anyway.

(If I understand the question) The evmc_instance is a set of function pointers. It is easier to read the spec when function pointer types are declared separately. I forgot the _fn suffix in this type name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I didn't notice that it's a part of evmc_instance, sorry

@chfast chfast force-pushed the tracing branch 3 times, most recently from ce65d2d to 99f8c03 Compare June 18, 2018 11:50
@chfast
Copy link
Member Author

chfast commented Jun 18, 2018

  • Documentation added.
  • Depth param removed. This can be taken easily from the message.
  • @cdetrio, @ehildenb can you check English in docs?

@chfast
Copy link
Member Author

chfast commented Jun 18, 2018

The PoC is here: ethereum/aleth#5065

I believe the API is good enough to be merged as is. That would allow me to focus on geth support for EVMC.

@chfast chfast requested review from axic and gumb0 June 18, 2018 15:05
* @param context The pointer to the Client-side tracing context. This allows to
* implement the tracer in OOP manner.
* @param step The instruction counter for the current message.
* @param code_offset The instruction possition in the code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: position

*
* @param context The pointer to the Client-side tracing context. This allows to
* implement the tracer in OOP manner.
* @param step The instruction counter for the current message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to clarify it instead of "current message" call it " The instruction counter after the instruction execution."
And also below add "after the instruction execution" for code_offset, stack_num_items, memory_size

@chfast
Copy link
Member Author

chfast commented Jun 19, 2018

I changed the documentation.

* The callback to trace instructions execution in an EVM.
*
* This function informs the Client what instruction has been executed in the EVM implementation
* and what are the results of the execution this particular instruction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution of this or of executing this

* @param status_code The status code of the instruction execution.
* @param gas_left The amount of the gas left after the instruction execution.
* @param stack_num_items The current EVM stack height after the instruction execution.
* @param pushed_stack_item The top EVM stack item pushed as the result of the instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So based on these two the client needs to apply the stack changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tracer want to know what's on stack it has to track the changes. This is even more complicated when you consider SWAP and DUP instructions - the tracer would have to reapply SWAP and DUP semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure there aren't ever going to be instructions pushing more than one item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we have them now in form of DUP and SWAP.
In practice, not likely needed. We can also change the API in future. I will check WASM instructions.

* the instruction execution.
* @param changed_memory_size The size of the memory area modified as the result of
* the instruction execution.
* @param changed_memory The pointer to the memory area modified as the result of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed if one can calculate that via changed_memory_offset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot calculate it via changed_memory_offset without having the base memory pointer. We can return the base memory pointer instead of the pointer to the changed area. However that would put higher requirements on EVM memory layout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably best (esp for other implementations) to directly pass back the changed memory. Not all language give you access to memory via pointers, so having an explicit argument present here will make it easier to use as a guide for implementing traces in other languages.

My question is how will you handle it if there are multiple disjoint changed regions of memory? What if they are very spaced out (as is often the case when using the "hash of variable name" scheme for deciding where to place things)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly pass back the changed memory

I'm not sure what you exactly mean by this.

My question is how will you handle it if there are multiple disjoint changed regions of memory?

You don't. There should be a way to report without information about memory. Something like changed_memory_size == -1. I'm leaving it unanswered because I don't have good solution at the moment. I prefer do defer the decision until we have the use case for it.

/**
* Sets the EVM instruction tracer.
*
* When the tracer is set in the EVM instance, the EVM SHOULD callback the tracer with information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callback -> call or call back

*
* If the EVM does not support this feature the pointer can be NULL.
*/
evmc_set_tracer_fn set_tracer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be unset after its being set?

I don't fully remember (and seemingly our documentation isn't clear enough): can an evmc_instance reused for multiple executions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Answered in the latest commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit looks good.

@chfast
Copy link
Member Author

chfast commented Jun 20, 2018

I pushed additional commit with improved documentation. I will squeeze them later (easier to review changes now).

* This counter starts from 0 for every message passed to
* ::evmc_execute_fn.
* @param code_offset The current instruction position in the code.
* @param status_code The status code of the instruction execution.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is useful (99% of the time the value is 0). And the Client can get it from evm_result in the end.

* implement the tracer in OOP manner.
* @param step The instruction counter: number of instructions executed.
* This counter starts from 0 for every message passed to
* ::evmc_execute_fn.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call this step instead of the more conventional programCounter? Is this to facilitate ewasm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the program counter. Program counter points the instruction being executed, so it's code_offset in this case. The step is the trace entry index.

I'm ok with renaming any of those if you have better ideas for names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about tracing_step or trace_index ? Why is it needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely I wanted the "instruction counter" per current message (i.e. EVM call). This is not not needed (because the Host can count tracing callback calls on its own) UNLESS the VM will not report every instruction.
I propose to rename it to instruction_counter or remove it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's why trace_counter/trace_step/trace_index is the appropriate name.

I'm not sure this is a useful field though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this case where VM only traces storage instructions:

PUSH 0
SLOAD
PUSH 9
SSTORE

When counting the number of traces you should get:

(0, SLOAD, ...)
(1, SSTORE, ...)

When counting number of instructions you should get:

(1, SLOAD, ...)
(3, SSTORE, ...)

In the second case the Host can figure out that some executed instructions are missing in the trace. In the first case the counter is redundant as the Host can count it itself.

I'm going to remove the param.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this case where VM only traces storage instructions:

You mean a specific VM won't support tracing every instruction?

I have no problem keeping it, but step seems to be rather confusing. Consider some of the suggestions above.

@chfast
Copy link
Member Author

chfast commented Jun 25, 2018

@axic @ehildenb Whould you like to re-review this again (including review comments)?

@axic
Copy link
Member

axic commented Jul 13, 2018

Rebased.

@@ -9,6 +9,8 @@ struct examplevm
{
struct evmc_instance instance;
int verbose;
evmc_trace_callback trace_callback;
Copy link

@jwasinger jwasinger Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to have another callback to allow the EVM to log text back to the client.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a separate callback, actually it could just be added as an additional parameter to evmc_trace_callback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, EVM can just do this via stderr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ std::cerr works. We could add a logger method to capture VM logs nicely on the Host side. But that is different feature and should be done independently.

@chfast chfast force-pushed the tracing branch 2 times, most recently from 95d1f23 to 925b714 Compare July 18, 2018 11:30
* The pointer MUST NOT be stored by the Client.
* The Client MUST NOT assume that
* `changed_memory - changed_memory_offset` is a valid base pointer
* of the VM memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we discussed these on Gitter, much more cleaner now.

@axic axic merged commit 332d035 into master Jul 25, 2018
@axic axic deleted the tracing branch July 25, 2018 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants