Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

fanghuaqi · 2024-07-18T03:12:13Z

I have a question in understanding these lines of the zfencei spec

Lines 69 to 79 in 3c1d602

    
           The FENCE.I instruction is used to synchronize the instruction and data 
        
           streams. RISC-V does not guarantee that stores to instruction memory 
        
           will be made visible to instruction fetches on a RISC-V hart until that 
        
           hart executes a FENCE.I instruction. A FENCE.I instruction ensures that 
        
           a subsequent instruction fetch on a RISC-V hart will see any previous 
        
           data stores already visible to the same RISC-V hart. FENCE.I does _not_ 
        
           ensure that other RISC-V harts' instruction fetches will observe the 
        
           local hart's stores in a multiprocessor system. To make a store to 
        
           instruction memory visible to all RISC-V harts, the writing hart also 
        
           has to execute a data FENCE before requesting that all remote RISC-V 
        
           harts execute a FENCE.I.

As spec described below,

To make a store to instruction memory visible to all RISC-V harts, the writing hart also
has to execute a data FENCE before requesting that all remote RISC-V
harts execute a FENCE.I.

below updated

If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

There is a dissussion in linux implemention about this, see

The text was updated successfully, but these errors were encountered:

fanghuaqi · 2024-07-18T03:30:17Z

Here is a updated version:

If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

Actually, per our understanding to ISA-SPEC, we feel the answer is NO, i.e., the core 1-3 instruction fetch cannot see the data in core0's data-cache. The right sequence should be :

1, core 0 load instruction from external SD card memory and write data into its DCache
2, core 0 then do a FENCE.i to make sure its own ICache is synced with its DCache, and the core0's DCache will be flushed into main memory
3, core 0 then do a FENCE as barrier to make sure the preceding operation is visible to memory
4, core0 then ask for other core1-3 to do FENCE.i
5, and then core1-3 (after do FENCE.i) re-fetch the instruction from the main memory to get the latest instructions

Can you help to confirm that our understanding is correct?

Many Thanks

gfavor · 2024-07-18T05:45:45Z

On Wed, Jul 17, 2024 at 8:30 PM Huaqi Fang ***@***.***> wrote: Here is a updated version: If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

The missing piece in the preceding (which corresponds to "the writing hart also has to execute a data FENCE *before requesting that all remote RISC-V harts* execute a FENCE.I") is to perform the memory access(es) after the FENCE that cause a request to be sent to all the remote harts. All this of course presumes data cache coherency and hence no need for CBO's to explicitly push the written instruction out of core 0's data cache.

Actually, per our understanding to ISA-SPEC, we feel the answer is NO, i.e., the core 1-3 instruction fetch cannot see the data in core0's data-cache. The right sequence should be : - 1, core 0 load instruction from external SD card memory and write data into its DCache - 2, core 0 then do a FENCE.i to make sure its own ICache is synced with its DCache, and the core0's DCache will be flushed into main memory The FENCE.I doesn't cause core 0's DCache to be flushed. It just causes

its ICache and instruction fetch/etc. to become synchronized or consistent with the instruction written into its DCache.

- 3, core 0 then do a FENCE as barrier to make sure the preceding operation is visible to memory As noted above the FENCE is to order the write of the instruction into

core 0's DCache with the sending of a request (e.g. an IPI) to all remote harts. That ensures that the write is globally visible before any of the remote harts receive the request to do a FENCE.I (assuming all DCaches are coherent with each other as noted above).

- 4, core0 then ask for other core1-3 to do FENCE.i - 5, and then core1-3 (after do FENCE.i) re-fetch the instruction from the main memory to get the latest instructions Yes. The FENCE.I in a typical implementation flushes the fetch/decode

pipeline of the core and, if the ICache is not coherent with all the DCaches in the system, also flushes the ICache. (Details will vary depending on the details of an implementation.) Greg

…

- Can you help to confirm that our understanding is correct? Many Thanks — Reply to this email directly, view it on GitHub <#1544 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALLX6GS56UNY7AHMFT6NJMDZM4ZGZAVCNFSM6AAAAABLBZBEP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGI2DIMBXGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

fanghuaqi · 2024-07-18T06:33:09Z

Hi @gfavor , thanks for your reply.

As the email list https://lore.kernel.org/lkml/032536BCDC0EB6C4+dc9fc383-d69c-4cb0-b66d-f4e32c29ab67@nucleisys.com/T/#md476f6dadc6bb3184699c24d936f94fa1c7a9722 described below, I just copied a piece of latest discussion quoted

Finally,

Riscv spec describe the fence.i instruction as following:

The FENCE.I instruction is used to synchronize the instruction and
data streams.

RISC-V does not guarantee that stores to instruction memory will be
made visible to instruction fetches on a RISC-V hart until that hart
executes a FENCE.I instruction. A FENCE.I instruction ensures that a
subsequent instruction fetch on a RISC-V hart will see any previous
data stores already visible to the same RISC-V hart. FENCE.I does not
ensure that other RISC-V harts' instruction fetches will observe the
local hart’s stores in a multiprocessor system.

From this description, fence.i instruction only applies to local
core,making instruction fetch can see any previous data stores on the
same core.

Not on the same core, it is said: "A FENCE.I instruction ensures that
a subsequent instruction fetch on a RISC-V hart will see any previous
data stores already visible to the same RISC-V hart".

In other words, any store that is in the dcache of core0 should be
seen by the instruction fetcher of any other core right? Since any
core should be able to see what is in the other core's dcache right
(ie the dcaches are coherent)? If your instruction fetcher on the
other cores does not see the data, a simple memory barrier on core0
should make it visible, no need to flush the core0 dcache.

The commit[1] author(Alexandre Ghiti) thought If your instruction fetcher on the other cores does not see the data, a simple memory barrier on core0 should make it visible, no need to flush the core0 dcache., but we thought we need to do a fence.i to flush the core0 dcache not just a simple memory barrier, and then the other cores instruction fetcher then can see the data, could you help us to confirm which one is a correct understanding.

cc @palmer-dabbelt the commit co-author

[1] torvalds/linux@01261e2

Thanks
Huaqi

gfavor · 2024-07-18T19:14:08Z

The commit[1] author(Alexandre Ghiti) thought If your instruction fetcher on the other cores does not see the data, a simple memory barrier on core0 should make it visible, no need to flush the core0 dcache.

A memory barrier (aka FENCE) does not make specified prior memory accesses globally visible. It only ensures an order in which preceding and following memory accesses eventually become globally visible. And FENCE instructions do not cause flushing actions on caches. but we thought we need to do a fence.i to flush the core0 dcache not just a

simple memory barrier,

FENCE.I does not cause flushing actions on DCaches. It only establishes consistency between the I side of a core and its DCache. Greg

…

Message ID: ***@***.***>

fanghuaqi · 2024-07-19T03:38:56Z

Hi Greg, thanks for your reply, we will continue to discuss with the patch author.

fanghuaqi changed the title ~~Will fence.i followed by fence will make instruction fetch get the same instruction memory in a multicore system~~ Will fence and then fence.i make instruction fetch get the same instruction memory in a multicore system Jul 18, 2024

fanghuaqi changed the title ~~Will fence and then fence.i make instruction fetch get the same instruction memory in a multicore system~~ Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

fanghuaqi commented Jul 18, 2024 •

edited

Loading

fanghuaqi commented Jul 18, 2024

gfavor commented Jul 18, 2024 via email

fanghuaqi commented Jul 18, 2024

gfavor commented Jul 18, 2024 via email

fanghuaqi commented Jul 19, 2024

Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

Comments

fanghuaqi commented Jul 18, 2024 • edited Loading

fanghuaqi commented Jul 18, 2024

gfavor commented Jul 18, 2024 via email

fanghuaqi commented Jul 18, 2024

gfavor commented Jul 18, 2024 via email

fanghuaqi commented Jul 19, 2024

fanghuaqi commented Jul 18, 2024 •

edited

Loading