Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit threading #273

Merged
merged 1 commit into from
Jul 24, 2024
Merged

Conversation

matetokodi
Copy link
Contributor

Initial version, contains Loads and Stores only (no RMW, wait, notify, or fence).

#if defined(WALRUS_32)
if (!(info & Instruction::kIs32Bit)) {
info = Instruction::kIsCallback;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only I64AtomicLoadOpcode needs a callback. The rest stores <= 32 bit, so they are supported on 32 bit systems.


Instruction* instr = compiler->append(byteCode, group, opcode, 2, 0);
#if defined(WALRUS_32)
if (!(info & Instruction::kIs32Bit)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

};

MemAddress(uint8_t baseReg, uint8_t offsetReg, uint8_t sourceReg)
: options(0)
MemAddress(uint8_t baseReg, uint8_t offsetReg, uint8_t sourceReg, uint32_t options = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options should be the first argument.

if (size > 1 && (options & CheckNaturalAlignment) && (memArg.argw & (size - 1)) != 0) {
context->appendTrapJump(ExecutionContext::UnalignedAtomicError, sljit_emit_jump(compiler, SLJIT_NOT_EQUAL));
}
#endif /* ENABLE_EXTENDED_FEATURES */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done after if (UNLIKELY(offset > context->maximumMemorySize - size)) {, and emit no instruction if there is an error. Maybe the 'if' part can be combined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could the two ifs be combined? the alignment check is guarded with ENABLE_EXTENDED_FEATURES while the other if is always included.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that is ugly. Then duplicate the if body. I hope the compiler notices that.

@@ -173,6 +182,16 @@ void MemAddress::check(sljit_compiler* compiler, Operand* offsetOperand, sljit_u
memArg.arg = SLJIT_MEM1(baseReg);
}
#endif /* SLJIT_32BIT_ARCHITECTURE */

#if defined(ENABLE_EXTENDED_FEATURES)
if (size > 1 && (options & CheckNaturalAlignment)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not set CheckNaturalAlignment when size == 1. An ASSERT can be added at the beginning of check.

case ByteCode::I32Store16Opcode:
opcode = SLJIT_MOV32_U16;
size = 2;
break;
#if defined(ENABLE_EXTENDED_FEATURES)
case ByteCode::I64AtomicStoreOpcode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a special case on 32 bit (only). Calling a helper could be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was already done via callback, I have now guarded it, so it does not get included in 32 bit mode.

@@ -923,6 +991,186 @@ static void emitMemory(sljit_compiler* compiler, Instruction* instr)
}
}

#if defined(ENABLE_EXTENDED_FEATURES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this code at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure which part of it you mean;
The callbacks are used already on 32 bit, and the switches set the parameters for checking the memory and deciding between native load-store / callback

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case ByteCode::I64AtomicLoad8UOpcode: is handled by emitLoad, and most other as well. This code also has a , case ByteCode::I64AtomicLoad8UOpcode: and that is not correct. The only one needs an emulation is 64 bit load/store on 32 bit.

@matetokodi matetokodi force-pushed the jit_threads_load_store_only branch 2 times, most recently from d21fdad to 5d798ae Compare July 16, 2024 09:22
@@ -1237,6 +1237,12 @@ void JITCompiler::compileFunction(JITFunction* jitFunc, bool isExternal)
emitStackInit(m_compiler, item->asInstruction());
break;
}
#if defined(ENABLE_EXTENDED_FEATURES)
case Instruction::Atomic: {
emitAtomic(m_compiler, item->asInstruction());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this at this point?

#if defined(ENABLE_EXTENDED_FEATURES)
if (options & CheckNaturalAlignment) {
if (SLJIT_IS_MEM2(memArg.arg)) {
sljit_emit_op2(compiler, SLJIT_ADD, TMP_REG1, 0, baseReg, 0, offsetReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does TMP_REG1 comes from? If you compute it, you could also overwrite memArg.arg to SLJIT_MEM1

sljit_emit_op2(compiler, SLJIT_ADD, TMP_REG1, 0, baseReg, 0, offsetReg, 0);
}
sljit_emit_op2u(compiler, SLJIT_AND | SLJIT_SET_Z, (SLJIT_IS_MEM2(memArg.arg) ? TMP_REG1 : baseReg), 0, SLJIT_IMM, size - 1);
context->appendTrapJump(ExecutionContext::UnalignedAtomicError, sljit_emit_jump(compiler, SLJIT_NOT_EQUAL));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLJIT_NOT_ZERO sounds better

@@ -233,6 +263,9 @@ static void emitLoad(sljit_compiler* compiler, Instruction* instr)
opcode = (instr->info() & Instruction::kHasFloatOperand) ? SLJIT_MOV_F64 : SLJIT_MOV;
size = 8;
break;
#if defined(ENABLE_EXTENDED_FEATURES)
case ByteCode::I32AtomicLoadOpcode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should set CheckNaturalAlignment.

I would create an options = 0 variable at the beginning of the function and overwrite it in these cases.

group = Instruction::Atomic;

Instruction* instr = compiler->append(byteCode, group, opcode, 1, 1);
#if defined(WALRUS_32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLJIT has a define for 32/64 bit cpus, it is used in this code.

sljit_s32 faddr = GET_FUNC_ADDR(sljit_sw, atomicRmwLoad64);

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, addr.baseReg, 0);
sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size? it should always be 64 bit

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, size);
sljit_emit_icall(compiler, SLJIT_CALL, type, SLJIT_IMM, faddr);
sljit_emit_op1(compiler, SLJIT_MOV, dst.arg2, dst.arg2w, SLJIT_R1, 0);
sljit_emit_op1(compiler, SLJIT_MOV, dst.arg1, dst.arg1w, SLJIT_R0, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every cpu do things this way.

@matetokodi matetokodi force-pushed the jit_threads_load_store_only branch 2 times, most recently from c9b79ff to 95abf1d Compare July 18, 2024 08:57
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the callback part is not correct

@@ -215,14 +246,26 @@ void MemAddress::load(sljit_compiler* compiler)
#endif /* HAS_SIMD */
}

#if defined(ENABLE_EXTENDED_FEATURES) && (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
static void atomicRmwLoad64(std::atomic<int64_t>& shared, uint64_t* result, int32_t size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a size here? Only 64 bit loads needs callbacks.

@@ -439,6 +539,24 @@ static void emitLoad(sljit_compiler* compiler, Instruction* instr)
#endif /* HAS_SIMD */

#if (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
#if defined(ENABLE_EXTENDED_FEATURES)
if (isAtomic && opcode == SLJIT_MOV) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add a helper function, which is called directly after I64AtomicLoadOpcode, then you don't need isAtomic boolean.

sljit_s32 type = SLJIT_ARGS3V(P, P, W);
sljit_s32 faddr = GET_FUNC_ADDR(sljit_sw, atomicRmwLoad64);

sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, addr.baseReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use addr.memArg similar to other cases.

@@ -358,6 +439,53 @@ static void emitLoad(sljit_compiler* compiler, Instruction* instr)
size = 8;
break;
#endif /* HAS_SIMD */
#if defined(ENABLE_EXTENDED_FEATURES)
case ByteCode::I32AtomicLoadOpcode: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code reused the regular opcodes. The only thing needs to be added a options |= MemAddress::CheckNaturalAlignment; and a fallthrough. Is it better this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this way is less visually busy, but I can change it back if you wish.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could introduce a macro, which does nothing, if the extension is not defined. Otherwise it defines the necessary lines. This way each new opcode requires one line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it back; I feel a macro would have to be very complex to handle the different cases of the CheckNaturalAlignment flag not always being there, and the callbacks for the various sizes and load/store variants.

When I originally changed it there were more things there, that are no longer there, so on a second look its not too busy.

*result = shared.load(std::memory_order_relaxed);
}

static void emitAtomicLoad64(sljit_compiler* compiler, Instruction* instr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we combine this with store to reduce code size?

@@ -215,11 +246,61 @@ void MemAddress::load(sljit_compiler* compiler)
#endif /* HAS_SIMD */
}

#if defined(ENABLE_EXTENDED_FEATURES) && (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
static void atomicRmwLoad64(std::atomic<int64_t>& shared, uint64_t* result)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this valid to pas a pointer as std::atomic<int64_t>&? For me it would be safer to pass the pointer, and convert it to this special type.

@matetokodi matetokodi force-pushed the jit_threads_load_store_only branch 3 times, most recently from 3ad4f2c to cef8b1a Compare July 18, 2024 12:42
MemoryStore* storeOperation = reinterpret_cast<MemoryStore*>(instr->byteCode());
offset = storeOperation->offset();
} else {
ASSERT_NOT_REACHED();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are too long for me. Just use } else { and add ASSERT(instr->opcode() == ByteCode::I64AtomicStoreOpcode);

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch should be converted to ready for review.

faddr = GET_FUNC_ADDR(sljit_sw, atomicRmwStore64);
}

if (valueArgPair.arg1 != SLJIT_MEM1(kFrameReg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done only for stores, am I right? Could be moved into the else above.

@matetokodi matetokodi marked this pull request as ready for review July 19, 2024 03:37
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matetokodi matetokodi changed the title WIP: jit threading Jit threading Jul 23, 2024
Copy link
Collaborator

@clover2123 clover2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If you update/complete JIT thread operations in the following patches, please verify it by running the test cases (TCs) located in test/extended

@clover2123 clover2123 merged commit f5eca2d into Samsung:main Jul 24, 2024
14 checks passed
@zherczeg
Copy link
Collaborator

Thank you. We are testing it, although tests are not fully running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants