Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit: Threads: Add RmwCmpxchg #282

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

matetokodi
Copy link
Contributor

Add RmwCmpxchg

Depends on #280

@zherczeg
Copy link
Collaborator

Please update the patch, so I can review the actual changes.

sljit_s32 type = SLJIT_ARGS3V(P, P, P);
sljit_s32 faddr GET_FUNC_ADDR(sljit_sw, atomicRmwCmpxchg64);

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_EXTRACT_REG(addr.memArg.arg), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that if R0 is overwritten and used by srcExpectedArgPair or srcValueArgPair, then an incorrect value is written. I would copy the arguments to memory, if needed, and then initialize R0 to R2 after that.

@matetokodi matetokodi force-pushed the jit_threads_rmw_cmpxchg branch 2 times, most recently from 982e877 to c7c1d23 Compare August 21, 2024 06:10
src/jit/MemoryInl.h Show resolved Hide resolved
src/jit/MemoryInl.h Show resolved Hide resolved
sljit_emit_op2(compiler, SLJIT_ADD, SLJIT_R2, 0, kFrameReg, 0, SLJIT_IMM, srcValueArgPair.arg1w - WORD_LOW_OFFSET);
}

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_EXTRACT_REG(addr.memArg.arg), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If addr.memArg.arg is SLJIT_R1 or SLJIT_R2, the value has been overwritten. This should happen before the first add operation, and this should also be checked: addr.memArg.arg != SLJIT_R0

@@ -1147,9 +1214,14 @@ static void emitAtomic(sljit_compiler* compiler, Instruction* instr)
case ByteCode::I64AtomicRmwAndOpcode:
case ByteCode::I64AtomicRmwOrOpcode:
case ByteCode::I64AtomicRmwXorOpcode:
case ByteCode::I64AtomicRmwXchgOpcode: {
case ByteCode::I64AtomicRmwXchgOpcode:
case ByteCode::I64AtomicRmwCmpxchgOpcode: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the first case, the if operation below is unnecessary.

AtomicRmw* rmwOperation = reinterpret_cast<AtomicRmw*>(instr->byteCode());
offset = rmwOperation->offset();
if (operation != OP_CMPXCHG) {
AtomicRmw* rmwOperation = reinterpret_cast<AtomicRmw*>(instr->byteCode());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the copy of the old code, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is; the previous cases are int this early return block if (operation != OP_CMPXCHG), which is followed by the code for Cmpxchg


compareFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_TMP_DEST_REG, 0, srcExpectedReg, 0);
if (!(operationSize & SLJIT_32) && operationSize != SLJIT_MOV32) {
compareTopFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_IMM, 0, tmpReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand something. You check that the upper 32 bit must be zero, but don't check the bits of the lower 32 bit for 8/16 bit operations. Furthermore this should happen before sljit_emit_atomic_load since srcExpectedPair.arg2 should not change. Then tmpReg is changed to srcValue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lower part does not need to be checked for zeroes, because when loading a smaller value, the rest of the register is zeroed out.

It also cannot happen before the load, because the loaded value is required for the result; It would be possible to not the atomic load if the upper half is not zero, but a load would still be required for the result later, and it would complicate the code, I find this way cleaner.

if (srcExpectedArgPair.arg1 != SLJIT_MEM1(kFrameReg)) {
sljit_emit_op1(compiler, SLJIT_MOV, dstArgPair.arg1, dstArgPair.arg1w, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_LOW_OFFSET);
sljit_emit_op1(compiler, SLJIT_MOV, dstArgPair.arg2, dstArgPair.arg2w, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_HIGH_OFFSET);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is not right here. Is the expected value overwritten by the result? This should not happen, since the expected value is a read-only data.

Probably we should copy expected value to destination first (if needed), then do the update using the destination, and do nothing if destnation is a memory location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, std::atomic<T>::compare_exchange_weak modifies the expected value to the value loaded from memory if the comparison fails

I have adjusted it so we use dst for the expected value parameter if it is memory, and use context tmp if it is not.

if (operation != OP_XCHG) {
sljit_emit_op2(compiler, operation, srcReg, 0, SLJIT_TMP_DEST_REG, 0, srcReg, 0);

if (!(operationSize & SLJIT_32) && operationSize != SLJIT_MOV32) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done before sljit_emit_atomic_load

sljit_emit_atomic_store(compiler, operationSize | SLJIT_SET_ATOMIC_STORED, srcReg, SLJIT_EXTRACT_REG(addr.memArg.arg), SLJIT_TMP_DEST_REG);

compareFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_TMP_DEST_REG, 0, srcExpectedReg, 0);
sljit_emit_op1(compiler, SLJIT_MOV, tmpReg, 0, srcValue.arg, srcValue.argw);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done before sljit_emit_atomic_load as well (this is correct in the non SLJIT_32BIT_ARCHITECTURE case)

} else if (srcExpectedArgPair.arg1 != SLJIT_MEM1(kFrameReg)) {
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_LOW_OFFSET, srcExpectedArgPair.arg1, srcExpectedArgPair.arg1w);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_MEM1(kContextReg), OffsetOfContextField(tmp1) + WORD_HIGH_OFFSET, srcExpectedArgPair.arg2, srcExpectedArgPair.arg2w);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if both if-s failed?

This should look like this:

if dst is SLJIT_MEM1(kFrameReg)
  if dst != expected arg
   move expected to dst
else
  move expected to context field tmp1

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to ready. Please change draft mode to to ready.

sljit_s32 faddr GET_FUNC_ADDR(sljit_sw, atomicRmwCmpxchg64);

if (srcExpectedArgPair.arg1 == SLJIT_MEM1(kFrameReg)) {
if (dstArgPair.arg1 != srcExpectedArgPair.arg1 || dstArgPair.arg1w != srcExpectedArgPair.arg1w) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these cannot partly overlap, one if is enough.

if (!(operationSize & SLJIT_32) && operationSize != SLJIT_MOV32) {
compareTopFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_IMM, 0, srcExpectedPair.arg2, srcExpectedPair.arg2w);
}
sljit_emit_op1(compiler, SLJIT_MOV, tmpReg, 0, srcValue.arg, srcValue.argw);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a note. It looks like webassembly is sensitive to the expected value upper bits (must be 0), but does not care about the srcValue upper bits. Does not make much sense.

Signed-off-by: Máté Tokodi [email protected]
@matetokodi matetokodi marked this pull request as ready for review August 27, 2024 11:24
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clover2123 clover2123 merged commit 238685b into Samsung:main Aug 30, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants