Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Unlinkable after WASM error #2242

Open
eosusa opened this issue Feb 19, 2024 · 5 comments
Open

Node Unlinkable after WASM error #2242

eosusa opened this issue Feb 19, 2024 · 5 comments
Labels
more-info waiting for submitter to reply with more information

Comments

@eosusa
Copy link

eosusa commented Feb 19, 2024

v4.0.4 nodes and v5.0.0 nodes are both working perfectly fine and out of nowhere see a WASM processing error like below and then go into a permanent unlinkable blocks scenario. Simply restarts allow node to immediately detect proper fork and resync correct chain and resume normal operations. Has occured on public and internal API nodes as well as private internal block producers.

error 2024-02-19T05:28:49.918 nodeos    controller.cpp:2126           apply_block          ] *trace: {"id":"97dcf4e62dfcdd73518b66ff3f31567d4abd5996672eca1c99a428c4564a4c99","block_num":293696060,"block_time":"2024-02-19T05:28:50.000","producer_block_id":"1181723c55186bf4c41aea954f655b8d38ccd532a740dafe9328a67bf52c009f","elapsed":2915,"net_usage":116,"scheduled":false,"action_traces":[{"action_ordinal":1,"creator_action_ordinal":0,"closest_unnotified_ancestor_action_ordinal":0,"receiver":"play.mars","act":{"account":"play.mars","name":"run","authorization":[{"actor":"zhangzeren11","permission":"active"},{"actor":"fkkv42gatzdt","permission":"active"}],"data":"90d3cf8609b2215c"},"context_free":false,"elapsed":2871,"console":"","trx_id":"97dcf4e62dfcdd73518b66ff3f31567d4abd5996672eca1c99a428c4564a4c99","block_num":293696060,"block_time":"2024-02-19T05:28:50.000","producer_block_id":"1181723c55186bf4c41aea954f655b8d38ccd532a740dafe9328a67bf52c009f","account_ram_deltas":[],"except":{"code":3070002,"name":"wasm_execution_error","message":"Runtime Error Processing WASM","stack":[{"context":{"level":"error","file":"eos-vm.cpp","line":153,"method":"apply","hostname":"","thread_name":"nodeos","timestamp":"2024-02-19T05:28:49.918"},"format":"access violation","data":{}},{"context":{"level":"warn","file":"apply_context.cpp","line":124,"method":"exec_one","hostname":"","thread_name":"nodeos","timestamp":"2024-02-19T05:28:49.918"},"format":"pending console output: ${console}","data":{"console":""}}]},"error_code":"10000000000000000000","return_value":""}],"failed_dtrx_trace":null,"except":{"code":3070002,"name":"wasm_execution_error","message":"Runtime Error Processing WASM","stack":[{"context":{"level":"error","file":"eos-vm.cpp","line":153,"method":"apply","hostname":"","thread_name":"nodeos","timestamp":"2024-02-19T05:28:49.918"},"format":"access violation","data":{}},{"context":{"level":"warn","file":"apply_context.cpp","line":124,"method":"exec_one","hostname":"","thread_name":"nodeos","timestamp":"2024-02-19T05:28:49.918"},"format":"pending console output: ${console}","data":{"console":""}}]},"error_code":"10000000000000000000"}
error 2024-02-19T05:28:49.918 nodeos    controller.cpp:2171           apply_block          ] e.to_detail_string(): 3070002 wasm_execution_error: Runtime Error Processing WASM
warn  2024-02-19T05:28:49.918 nodeos    controller.cpp:2273           push_block           ] 3070002 wasm_execution_error: Runtime Error Processing WASM
error 2024-02-19T05:28:49.918 nodeos    producer_plugin.cpp:608       operator()           ] Exception on block 293696060: 3070002 wasm_execution_error: Runtime Error Processing WASM
error 2024-02-19T05:28:49.918 nodeos    net_plugin.cpp:3336           process_signed_block ] bad block exception connection 2: #293696060 55186bf4c41aea95...: Runtime Error Processing WASM (3070002)

@heifner
Copy link
Member

heifner commented Feb 19, 2024

Starting in 4.0 we broadcast blocks after validating the block header. When we broadcast blocks we add them into the net_plugin list of accepted blocks. However, if that block fails on applying the block, it is never removed from the net_plugin accepted block list so the net_plugin will never attempt to apply the block again. Need to remove blocks from the accepted block list when they are rejected.

@spoonincode
Copy link
Member

However, if that block fails on applying the block, it is never removed from the net_plugin accepted block list so the net_plugin will never attempt to apply the block again. Need to remove blocks from the accepted block list when they are rejected.

If a block fails consensus rules it's not clear why it should ever be tried again.

@eosusa
Copy link
Author

eosusa commented Feb 24, 2024

i get that general idea, but when the node is simply stopped and restarted, it knows to ask for that previous block and then all is well. so its in a state thats recoverable and at least on startup, the net plugin knows to ask for that block again instead of the next block which would again be unlinkable

@bhazzard bhazzard added more-info waiting for submitter to reply with more information and removed triage labels Feb 27, 2024
@bhazzard
Copy link

Waiting for more reports to decide how to proceed. Some ideas discussed:

  • Automatically shutdown in this scenario (but masks an underlying)
  • Provide more descriptive errors to disambiguate the different causes of this issue.

@spoonincode
Copy link
Member

For anyone experiencing seemingly spurious access violation errors, I'd be curious to hear how running with this change (which will be in 6.0) changes behavior,

diff --git a/libraries/eos-vm/include/eosio/vm/allocator.hpp b/libraries/eos-vm/include/eosio/vm/allocator.hpp
index 6ed2669..1943990 100644
--- a/libraries/eos-vm/include/eosio/vm/allocator.hpp
+++ b/libraries/eos-vm/include/eosio/vm/allocator.hpp
@@ -9,6 +9,7 @@
 #include <cstring>
 #include <map>
 #include <set>
+#include <span>
 #include <memory>
 #include <mutex>
 #include <utility>
@@ -372,6 +373,8 @@ namespace eosio { namespace vm {
 
       const void* get_code_start() const { return _code_base; }
 
+      std::span<std::byte> get_code_span() const {return {(std::byte*)_code_base, _code_size};}
+
       /* different semantics than free,
        * the memory must be at the end of the most recently allocated block.
        */
@@ -524,5 +527,7 @@ namespace eosio { namespace vm {
       inline T* create_pointer(uint32_t offset) { return reinterpret_cast<T*>(raw + offset); }
       inline int32_t get_current_page() const { return page; }
       bool is_in_region(char* p) { return p >= raw && p < raw + max_memory; }
+
+      std::span<std::byte> get_span() const {return {(std::byte*)raw, max_memory};}
    };
 }} // namespace eosio::vm
diff --git a/libraries/eos-vm/include/eosio/vm/execution_context.hpp b/libraries/eos-vm/include/eosio/vm/execution_context.hpp
index 7bdf8d1..6003ea2 100644
--- a/libraries/eos-vm/include/eosio/vm/execution_context.hpp
+++ b/libraries/eos-vm/include/eosio/vm/execution_context.hpp
@@ -357,11 +357,11 @@ namespace eosio { namespace vm {
 
                   vm::invoke_with_signal_handler([&]() {
                      result = execute<sizeof...(Args)>(args_raw, fn, this, base_type::linear_memory(), stack);
-                  }, &handle_signal);
+                  }, &handle_signal, {_mod->allocator.get_code_span(),  base_type::get_wasm_allocator()->get_span()});
                } else {
                   vm::invoke_with_signal_handler([&]() {
                      result = execute<sizeof...(Args)>(args_raw, fn, this, base_type::linear_memory(), stack);
-                  }, &handle_signal);
+                  }, &handle_signal, {_mod->allocator.get_code_span(),  base_type::get_wasm_allocator()->get_span()});
                }
             }
          } catch(wasm_exit_exception&) {
@@ -799,7 +799,7 @@ namespace eosio { namespace vm {
             setup_locals(func_index);
             vm::invoke_with_signal_handler([&]() {
                execute(visitor);
-            }, &handle_signal);
+            }, &handle_signal, {_mod->allocator.get_code_span(),  base_type::get_wasm_allocator()->get_span()});
          }
 
          if (_mod->get_function_type(func_index).return_count && !_state.exiting) {
diff --git a/libraries/eos-vm/include/eosio/vm/signals.hpp b/libraries/eos-vm/include/eosio/vm/signals.hpp
index 05bbdab..7387dea 100644
--- a/libraries/eos-vm/include/eosio/vm/signals.hpp
+++ b/libraries/eos-vm/include/eosio/vm/signals.hpp
@@ -7,6 +7,7 @@
 #include <cstdlib>
 #include <exception>
 #include <utility>
+#include <span>
 #include <signal.h>
 #include <setjmp.h>
 
@@ -16,6 +17,9 @@ namespace eosio { namespace vm {
    __attribute__((visibility("default")))
    inline thread_local std::atomic<sigjmp_buf*> signal_dest{nullptr};
 
+   __attribute__((visibility("default")))
+   inline thread_local std::vector<std::span<std::byte>> protected_memory_ranges;
+
    // Fixes a duplicate symbol build issue when building with `-fvisibility=hidden`
    __attribute__((visibility("default")))
    inline thread_local std::exception_ptr saved_exception{nullptr};
@@ -25,7 +29,20 @@ namespace eosio { namespace vm {
 
    inline void signal_handler(int sig, siginfo_t* info, void* uap) {
       sigjmp_buf* dest = std::atomic_load(&signal_dest);
-      if (dest) {
+
+      auto in_protected_range = [&]() {
+         //empty protection list means legacy catch-all behavior; useful for some of the old tests
+         if(protected_memory_ranges.empty())
+            return true;
+
+         for(const std::span<std::byte>& range : protected_memory_ranges) {
+            if(info->si_addr >= range.data() && info->si_addr < range.data() + range.size())
+               return true;
+         }
+         return false;
+      };
+
+      if (dest && in_protected_range()) {
          siglongjmp(*dest, sig);
       } else {
          struct sigaction* prev_action;
@@ -98,7 +115,9 @@ namespace eosio { namespace vm {
       sigaddset(&sa.sa_mask, SIGPROF);
       sa.sa_flags = SA_NODEFER | SA_SIGINFO;
       sigaction(SIGSEGV, &sa, &prev_signal_handler<SIGSEGV>);
+#ifndef __linux__
       sigaction(SIGBUS, &sa, &prev_signal_handler<SIGBUS>);
+#endif
       sigaction(SIGFPE, &sa, &prev_signal_handler<SIGFPE>);
    }
 
@@ -114,16 +133,17 @@ namespace eosio { namespace vm {
    /// with non-trivial destructors, then it must mask the relevant signals
    /// during the lifetime of these objects or the behavior is undefined.
    ///
-   /// signals handled: SIGSEGV, SIGBUS, SIGFPE
+   /// signals handled: SIGSEGV, SIGBUS (except on Linux), SIGFPE
    ///
    // Make this noinline to prevent possible corruption of the caller's local variables.
    // It's unlikely, but I'm not sure that it can definitely be ruled out if both
    // this and f are inlined and f modifies locals from the caller.
    template<typename F, typename E>
-   [[gnu::noinline]] auto invoke_with_signal_handler(F&& f, E&& e) {
+   [[gnu::noinline]] auto invoke_with_signal_handler(F&& f, E&& e, const std::vector<std::span<std::byte>>& protect_ranges) {
       setup_signal_handler();
       sigjmp_buf dest;
       sigjmp_buf* volatile old_signal_handler = nullptr;
+      protected_memory_ranges = protect_ranges;
       int sig;
       if((sig = sigsetjmp(dest, 1)) == 0) {
          // Note: Cannot use RAII, as non-trivial destructors w/ longjmp
diff --git a/libraries/eos-vm/tests/signals_tests.cpp b/libraries/eos-vm/tests/signals_tests.cpp
index 15e5caf..6dfba49 100644
--- a/libraries/eos-vm/tests/signals_tests.cpp
+++ b/libraries/eos-vm/tests/signals_tests.cpp
@@ -15,7 +15,7 @@ TEST_CASE("Testing signals", "[invoke_with_signal_handler]") {
          std::raise(SIGSEGV);
       }, [](int sig) {
          throw test_exception{};
-      });
+      }, {});
    } catch(test_exception&) {
       okay = true;
    }
@@ -25,7 +25,7 @@ TEST_CASE("Testing signals", "[invoke_with_signal_handler]") {
 TEST_CASE("Testing throw", "[signal_handler_throw]") {
    CHECK_THROWS_AS(eosio::vm::invoke_with_signal_handler([](){
       eosio::vm::throw_<eosio::vm::wasm_exit_exception>( "Exiting" );
-   }, [](int){}), eosio::vm::wasm_exit_exception);
+   }, [](int){}, {}), eosio::vm::wasm_exit_exception);
 }
 
 static volatile sig_atomic_t sig_handled;
@@ -54,9 +54,11 @@ TEST_CASE("Test signal handler forwarding", "[signal_handler_forward]") {
       sig_handled = 0;
       std::raise(SIGSEGV);
       CHECK(sig_handled == 42 + SIGSEGV);
+#ifndef __linux__
       sig_handled = 0;
       std::raise(SIGBUS);
       CHECK(sig_handled == 42 + SIGBUS);
+#endif
       sig_handled = 0;
       std::raise(SIGFPE);
       CHECK(sig_handled == 42 + SIGFPE);
@@ -73,9 +75,11 @@ TEST_CASE("Test signal handler forwarding", "[signal_handler_forward]") {
       sig_handled = 0;
       std::raise(SIGSEGV);
       CHECK(sig_handled == 142 + SIGSEGV);
+#ifndef __linux__
       sig_handled = 0;
       std::raise(SIGBUS);
       CHECK(sig_handled == 142 + SIGBUS);
+#endif
       sig_handled = 0;
       std::raise(SIGFPE);
       CHECK(sig_handled == 142 + SIGFPE);

@spoonincode spoonincode removed their assignment Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-info waiting for submitter to reply with more information
Projects
Status: Todo
Development

No branches or pull requests

6 participants