-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roudi Fatal error: Trying to convert a pointer to an index which is not aligned to the array #2380
Comments
@jmyvalour are you using iceoryx standalone or in combination with cyclone dds? |
@elfenpiff I am using iceoryx in standalone mode |
Here are the build constant used: 2024-12-03 09:05:46.777 [Trace]: Iceoryx contants is: |
@jmyvalour The ProcessIntrospection has a hard coded maximum number of processes which is 300. Are more than 300 processes active in your system? |
@elfenpiff thank you for your reply We have 26 processes per environment with two env running at the same time on the same server, so a total of 52 processes. |
@jmyvalour, this absolutely makes sense that it happens when a process stops/starts since the process introspection, which detects and publishes those events, fails. It seems like the mempool tries to access element 31257 despite there being only 31256 elements in there - like an off-by-one error? |
@jmyvalour something really weird is happening here which does not make sense at all. It seems like the pointer inside the internal sample of the process introspection was somehow corrupted. The weirdness is that this seems to only happening on your side and never occurred somewhere else. I think a memory corruption in the sample would have been detected quickly since it is a central construct in iceoryx. I think I need more details to grasp what is going on here.
|
Thank you a lot for your answer, time and clarification, That is really helpful to understand the underlying problem here. regarding your comment: It seems like the mempool tries to access element 31257 despite there being only 31256 elements in there - like an off-by-one error? You are correct, I think the publisher is trying to write at address 0x7367884ef378 (with memory pool start address 0x7367884e795f) a chunk size of 31256, so the check (0x7367884ef378 - 0x7367884e795f) % 31256 should gives us 0 (index 1), but 0x7367884ef378 - 0x7367884e795f is giving 31257, so indeed we are one byte away from the mem_pool[1] addr element (should be 0x7367884ef377 - 0x7367884e795f) for the index 1 for some unknow reason so far, ProcessIntrospectioniox chunk* is not at the correct place... I don't see anything out of the ordinary, our setup was running fine , if we let one environement run it's fine for the day , as soon as we start / restart the second environement, the error trigger, sometime we have: probably because we are killing the process ? Could it be that one of our component corrupt internal mempool memory somehow, and trigger this error in the roudi main app ? Don't see how that could be possible but we never know. Will keep you informed of the progress, might be good to update to latest release and give it another try ? |
I have a hard time reproducing it outside the environement within a minimal example however, i rebuilt iceoryx with the test enabled and I got some interesting fails using the following: ./posh/test/posh_moduletests --gtest_filter="ChunkSender*" : all other test are OK. It even more interesting because we recently change the MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY from 16 to 32 after we get a fatal error on reaching this limit, And I wonder that the problem could be related. I always get this warning during the test: Adding the testing reports, Note: Google Test filter = ChunkSender* Log start2024-12-03 16:12:41.795 [Warn ]: Mempool [m_chunkSize = 176, numberOfChunks = 20, used_chunks = 20 ] has no more space left Log endiceoryx_posh/test/moduletests/test_popo_chunk_sender.cpp:226: Failure Log start2024-12-03 16:12:41.795 [Warn ]: Mempool [m_chunkSize = 176, numberOfChunks = 20, used_chunks = 20 ] has no more space left Log endiceoryx_posh/test/moduletests/test_popo_chunk_sender.cpp:228: Failure Log start2024-12-03 16:12:41.795 [Warn ]: Mempool [m_chunkSize = 176, numberOfChunks = 20, used_chunks = 20 ] has no more space left Log end[ FAILED ] ChunkSender_test.allocate_Overflow (10 ms) /iceoryx_posh/test/moduletests/test_popo_chunk_sender.cpp:776: Failure |
I reverted the build to MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY=16 and all test are now passing, is there any hardcoded limit somewhere preventing the test to pass with MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY=32 ? could that be related to our problem here ? We can try to reproduce with the reverted value and see if this happen again, the thing is that we changed these values because it was hard for our sys to run without it, can't be sure that's going to work, we will try to get the max value for the test to pass and give it a try, Regardless of the problem, when reaching this MAX constant, does this mean we are doing too much memory loan (try allocate) before calling the actual publish of the chunck (that would free the chunk from my understanding) meaning taht to mitigate this constraint: we should try to send as much as possible chunk before creating new one ? |
@jmyvalour yes there is and this is on our technical debt list. The constants in: The idea after the refactoring is, that the dependencies between the constants are described mathematically so that you can never create an invalid configuration. Currently, we are focusing on iceoryx2 and do not have the capacity to do such a refactoring. Btw. we are offering commercial support and could fix it via a contract if you like ([email protected]) - this is how we finance the open source work. |
hello @elfenpiff and thank you for your answer, For now I won't have the available time to focus on constexpr the configuation values and check that they are correct mathematically, but I see the point here and that would be a great upgrade (at least feeling more secure when reaching some limits and updating the config), we reverted to MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY=16 so far and that fixed the reported problem. I will try and get the time to revisit this issue at a later stage and report. Maybe we could close this one issue and open a new one related to "static" configuration checking when a pr is created ? Thank you for your support, |
In the meantime, there is the option to run multiple RouDi in parallel. See also this example: The fully flexible solution which can be controlled with an There is also the option to build iceoryx with a different resource prefix, which has a similar effect in running multiple roudi instances in parallel. But with this, you have to compile iceoryx with different |
Required information
Operating system:
Ubuntu 24.04 LTS
Compiler version:
Clang 19
Eclipse iceoryx version:
v2.90.0 commit b9ab7ee
Observed result or behaviour:
Hello,
We are starting roudi as an external application, and starting a few components. After a while of stopping and restarting components, the roudi application exit on error with the following trace:
2024-12-02 22:53:55.122 [Fatal]: Trying to convert a pointer to an index which is not aligned to the array! Base address: 0x7367884e795f; item size: 31256; pointer address: 0x7367884ef378
2024-12-02 22:53:55.122 [Fatal]: iceoryx_posh/source/mepoo/mem_pool.cpp:122 [PANIC] Invalid access
it occurs in the freeChunk function, called from iox::roudi::ProcessIntrospectioniox::popo::PublisherPortUser::send()
This does not occur when we are not restarting the components, From what I understand, the roudi app is not part of the sending / receiving process but only manage the shared memory and provide information to the subsriber / publisher to operate. (I was looking into the component where they could send a bad message that could trigger this error, is this even possible to trigger this message from a user application in the roudi main app ?)
Looking at the backtrace sent below, it looks like this is the ProcessIntrospectioniox that is trying to publish a message and it fails to do so.
Any hint, help would be greatly appreciated, getting confused here.
Thank you for your help,
Best regards,
Conditions where it occurred / Performed steps:
Start and Stop publisher / server & client / subscriber a few times
Additional helpful information
If there is a backtrace where this is hit in the roudi app:
#1 0x000055555563bc9e in void iox::er::panic<char const (&) [15]>(iox::er::SourceLocation const&, char const (&) [15]) ()
#2 0x000055555563b4c6 in void iox::er::forwardPanic<char const (&) [15]>(iox::er::SourceLocation const&, char const (&) [15]) ()
#3 0x000055555563b16f in iox::mepoo::MemPool::pointerToIndex(void const*, unsigned long, void const*) ()
#4 0x000055555563b1c1 in iox::mepoo::MemPool::freeChunk(void const*) ()
#5 0x000055555563c6e1 in iox::mepoo::SharedChunk::freeChunk() ()
#6 0x000055555563f42c in iox::popo::ChunkSender<iox::popo::ChunkSenderData<32u, iox::popo::ChunkDistributorData<iox::DefaultChunkDistributorConfig, iox::popo::ThreadSafePolicy, iox::popo::ChunkQueuePusher<iox::popo::ChunkQueueData<iox::DefaultChunkQueueConfig, iox::popo::ThreadSafePolicy> > > > >::send(iox::mepoo::ChunkHeader*) ()
#7 0x000055555563ece0 in iox::popo::PublisherPortUser::sendChunk(iox::mepoo::ChunkHeader*) ()
#8 0x00005555555e9efc in iox::roudi::ProcessIntrospectioniox::popo::PublisherPortUser::send() ()
#9 0x00005555555ea38e in void iox::storable_function<128ul, void ()>::invoke<iox::storable_function<128ul, void ()>::invoke<iox::roudi::ProcessIntrospectioniox::popo::PublisherPortUser, void>(iox::roudi::ProcessIntrospectioniox::popo::PublisherPortUser&, void (iox::roudi::ProcessIntrospectioniox::popo::PublisherPortUser::)())::{lambda()#1}>(void) ()
#10 0x00005555555eabe2 in iox::concurrent::detail::PeriodicTask<iox::storable_function<128ul, void ()> >::run() ()
#11 0x00005555555eae2b in void* std::__1::__thread_proxy[abi:ne190103]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void (iox::concurrent::detail::PeriodicTask<iox::storable_function<128ul, void ()> >::)() noexcept, iox::concurrent::detail::PeriodicTask<iox::storable_function<128ul, void ()> >> >(void*) ()
#12 0x00007ffff7a9ca94 in start_thread (arg=) at ./nptl/pthread_create.c:447
#13 0x00007ffff7b29c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
The text was updated successfully, but these errors were encountered: