Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native MarkSweep is not marking objects atomically. #1254

Closed
wks opened this issue Jan 7, 2025 · 2 comments · Fixed by #1255
Closed

Native MarkSweep is not marking objects atomically. #1254

wks opened this issue Jan 7, 2025 · 2 comments · Fixed by #1255

Comments

@wks
Copy link
Collaborator

wks commented Jan 7, 2025

In src/policy/marksweepspace/native_ms/global.rs:

    fn trace_object<Q: ObjectQueue>(
        &self,
        queue: &mut Q,
        object: ObjectReference,
    ) -> ObjectReference {
        // ...
        if !VM::VMObjectModel::LOCAL_MARK_BIT_SPEC.is_marked::<VM>(object, Ordering::SeqCst) {
            VM::VMObjectModel::LOCAL_MARK_BIT_SPEC.mark::<VM>(object, Ordering::SeqCst);
            let block = Block::containing(object);
            block.set_state(BlockState::Marked);
            queue.enqueue(object);
        }
        object
    }

The marking is non-atomic. If two GC workers attempt to mark the same object, it will be marked twice, and will be enqueued twice. It has two problems.

Firstly, enqueuing twice will cause the same object to be scanned twice, resulting in more work packets to be generated to do duplicated jobs.

Secondly, some VM binding make assumptions that each live object is scanned exactly once during full-heap GC (or at most once during nursery GC). Such behavior is usually related to the behaviors of the existing GCs of the VM. One example is CRuby. It attempts to clean up some data inside objects during the mark phase of a GC. When using MMTk binding, it does so when scanning an object. But when using MMTk with the MarkSweep plan and multiple GC workers, two GC workers will attempt to clean up the same object. The data structure being cleaned is not thread-safe, and it will crash the VM.

I think it is a reasonable to assume that a GC enqueues each object and scans each object at most once, unless there is a compelling reason not to do so. MarkSweep is the only plan that marks object non-atomically, and I think it is a mistake.

Related discussions: https://mmtk.zulipchat.com/#narrow/channel/313365-mmtk-ruby/topic/MMTk.20and.20marking.20id_table.20entries.2E

@qinsoon
Copy link
Member

qinsoon commented Jan 7, 2025

It is intended for performance. See the discussion in #313.

However, what is strange is that for Immix, we actually atomically mark objects. In Java MMTk, marking for Immix and mark sweep is non atomic.

@wks
Copy link
Collaborator Author

wks commented Jan 7, 2025

I think it is worth adding a Cargo feature or a constant in the VMBinding trait to control this behavior, as @steveblackburn mentioned in this comment. VM bindings like CRuby can enable this feature to ensure we don't enqueue an object multiple times.

And this is not about duplicated edges, but duplicated nodes. The CRuby binding does not use slot-enqueuing, but may actually visit an edge (as in object graph edge represented as fields, visited in scan_object_and_trace_edges) multiple times due to calling both gc_mark_children and gc_update_object_reference in the binding. If we make the marking atomic, it will ensure each node (object) is enqueued at most once, and therefore scanned at most once.

wks added a commit to wks/mmtk-core that referenced this issue Jan 7, 2025
Added a constant `VMBinding::UNIQUE_OBJECT_ENQUEUING`.  When set to
true, MMTk will guarantee that each object is enqueued at most once in
each GC.  This can be useful for VMs that piggyback on object scanning
to visit objects during GC.

Implementation-wise, the mark bit is set atomically when
`VMBinding::UNIQUE_OBJECT_ENQUEUING` is true.  This PR only affects the
native MarkSweep space.  Other spaces already do this atomically.

Fixes: mmtk#1254
github-merge-queue bot pushed a commit that referenced this issue Jan 9, 2025
Added a constant `VMBinding::UNIQUE_OBJECT_ENQUEUING`. When set to true,
MMTk will guarantee that each object is enqueued at most once in each
GC. This can be useful for VMs that piggyback on object scanning to
visit objects during GC.

Implementation-wise, the mark bit is set atomically when
`VMBinding::UNIQUE_OBJECT_ENQUEUING` is true. This PR only affects the
native MarkSweep space. Other spaces already do this atomically.

Fixes: #1254
@wks wks closed this as completed in #1255 Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants