Add script to identify instructions type based on its arguments. #453

AFOliveira · 2025-02-03T15:47:34Z

Based on last meeting, I created a script that can identify instructions type (e.g. I-type, R-type) based on what arguments each instruction has. I have a couple questions:

Where should this sit at? Probably at backends?
To move forward I would need to play a bit with the erb templates and ruby, but I can't seem to figure where the json.dump comes from.

RV64::
+
[wavedrom, ,svg,subs='attributes',width="100%"]
....
<%= JSON.dump inst.wavedrom_desc(64) %>
....
====

Right now, outputs for I extensions are:

arch/inst/I/slli.yaml: Instruction type: Unknown
arch/inst/I/ld.yaml: Instruction type: I-Type
arch/inst/I/sllw.yaml: Instruction type: R-Type
arch/inst/I/ecall.yaml: Instruction type: Unknown
arch/inst/I/lw.yaml: Instruction type: I-Type
arch/inst/I/ori.yaml: Instruction type: I-Type
arch/inst/I/addw.yaml: Instruction type: R-Type
arch/inst/I/auipc.yaml: Instruction type: U-Type
arch/inst/I/srliw.yaml: Instruction type: R-Type
arch/inst/I/sb.yaml: Instruction type: S-Type
arch/inst/I/sltiu.yaml: Instruction type: I-Type
arch/inst/I/sraw.yaml: Instruction type: R-Type
arch/inst/I/slt.yaml: Instruction type: R-Type
arch/inst/I/bgeu.yaml: Instruction type: B-Type
arch/inst/I/addiw.yaml: Instruction type: I-Type
arch/inst/I/srlw.yaml: Instruction type: R-Type
arch/inst/I/addi.yaml: Instruction type: I-Type
arch/inst/I/lbu.yaml: Instruction type: I-Type
arch/inst/I/blt.yaml: Instruction type: B-Type
arch/inst/I/xor.yaml: Instruction type: R-Type
arch/inst/I/mret.yaml: Instruction type: Unknown
arch/inst/I/srl.yaml: Instruction type: R-Type
arch/inst/I/lb.yaml: Instruction type: I-Type
arch/inst/I/add.yaml: Instruction type: R-Type
arch/inst/I/andi.yaml: Instruction type: I-Type
arch/inst/I/srli.yaml: Instruction type: Unknown
arch/inst/I/sd.yaml: Instruction type: S-Type
arch/inst/I/bltu.yaml: Instruction type: B-Type
arch/inst/I/lui.yaml: Instruction type: U-Type
arch/inst/I/sh.yaml: Instruction type: S-Type
arch/inst/I/slliw.yaml: Instruction type: R-Type
arch/inst/I/slti.yaml: Instruction type: I-Type
arch/inst/I/beq.yaml: Instruction type: B-Type
arch/inst/I/jalr.yaml: Instruction type: I-Type
arch/inst/I/sra.yaml: Instruction type: R-Type
arch/inst/I/sw.yaml: Instruction type: S-Type
arch/inst/I/xori.yaml: Instruction type: I-Type
arch/inst/I/subw.yaml: Instruction type: R-Type
arch/inst/I/lhu.yaml: Instruction type: I-Type
arch/inst/I/sltu.yaml: Instruction type: R-Type
arch/inst/I/fence.yaml: Instruction type: Unknown
arch/inst/I/bge.yaml: Instruction type: B-Type
arch/inst/I/srai.yaml: Instruction type: Unknown
arch/inst/I/wfi.yaml: Instruction type: Unknown
arch/inst/I/or.yaml: Instruction type: R-Type
arch/inst/I/ebreak.yaml: Instruction type: Unknown
arch/inst/I/jal.yaml: Instruction type: J-Type
arch/inst/I/lh.yaml: Instruction type: I-Type
arch/inst/I/sub.yaml: Instruction type: R-Type
arch/inst/I/sraiw.yaml: Instruction type: R-Type
arch/inst/I/bne.yaml: Instruction type: B-Type
arch/inst/I/lwu.yaml: Instruction type: I-Type
arch/inst/I/sll.yaml: Instruction type: R-Type
arch/inst/I/and.yaml: Instruction type: R-Type

Signed-off-by: Afonso Oliveira <[email protected]>

ThinkOpenly · 2025-02-03T19:04:24Z

@AFOliveira, could you describe how you see this being used?

One of my motivations in including instruction format information was to be able to automatically validate that the defined instructions actually comply with their specified formats.

Arguably, this script does the converse: determines the format based on the instruction definition. Would we use this to ease the burden of seeding the YAML files with their respective formats, mostly as a one-off?

I can see adding a new enum-type field to the instructions, with a set of defined formats, ideally limited by the bits in the encoding (or maybe just CI validated to match that). And, further CI validation that the fields match the format.

Is that where we are heading?

AFOliveira · 2025-02-04T09:57:16Z

@AFOliveira, could you describe how you see this being used?

@ThinkOpenly Main point of this PR is to continue the discussion we had on the meeting. It can definitely be a one time writing "helper", but we need to agree on how to represent this. I suggest type : I , if that is fine I can put that in the schema and generate all the yaml replacement for that.

How we will use this in the future: first and probably easier use case is instruction documentation, i.e. when we generate a instruction, we can have "hints" for what each hardocoded field means. The example below is super specific to instruction, but even something as "opcode", "funct7", "funct3" would be great.

Then sure, I can also create the "inverse" script and do the validation based on instruction type, that should be quite smooth after we have the types merged in the yaml.

AFOliveira · 2025-02-04T18:27:43Z

@ThinkOpenly @dhower-qc If you can please review the first extensions mock-up it would be great! The only odd thing is that yaml libraries seem to prefer to have "" on the match strings. This does not affect the data, so I believe it won't be a problem. I can also filter them out if you prefer.

arch/inst/I/srliw.yaml

ThinkOpenly · 2025-02-04T22:21:34Z

arch/inst/I/addi.yaml

 description: Add an immediate to the value in rs1, and store the result in rd
 definedBy: I
 assembly: xd, xs1, imm
 encoding:
-  match: -----------------000-----0010011
+  match: "-----------------000-----0010011"


so the quotes are only needed in some places and not everywhere?

The quotes serve to force the field to be a string, I don't think we need them, but there is also no reason not to have them. Aditionally, all python parsers seem to place them, so I would need to filter them out in some ad-hoc way.

Yea, the quotes are (mostly) optional here. YAML type deduction is a suggested part of the spec. Most parsers do it in a predictable way but occasionally you run into issues. There are cases where the quotes are always needed (e.g., if match doesn't contain any '-', so that it's not interpreted as a number).

This highlights the tradeoff between YAML and JSON as the storage format; JSON is unambiguous but has lots of extra quotes, doesn't allow comments, etc. YAML is easier to read and format, but is actually an ambiguous spec when it comes to types. Paring YAML with JSON schema was my attempt to get the best of both.

Given that explanation (thanks, @dhower-qc!), the potential for ambiguity without quotes, and the known need for quotes in some cases, we should probably put quotes everywhere (at last for the "match" field, that is) for consistency. This is obviously not a high priority.

Yes, a pre-commit hook is formatting the yaml files with a tool prettier. The behavior of quoting could be changed if needed https://prettier.io/docs/options/#quote-props

My assumption with the existing default setting as-needed is if the match value was all digits, the author would put it in quotes and that would remain unchanged by prettier.

Presumably the check-jsonschema pre-commit hook would fail if prettier managed to format a string into a number of vice versa.

The case we don't want it prettier fighting with check-jsonchema.

Can we then run prettier on all instructions to get them all sorted and we solve it entirely?

pre-commit run --all-files prettier but I just ran it on your branch and it left quotes as is.

But doesn't it apply the changes to all files? Just so we don't run in this type of changes in commits like mine and already have it all sorted.

If you run that command with -v verbose you'll see it processing all the file but there are no changes:

[email protected]:~/rvi/repos/riscv-software-src/riscv-unified-db on  main [!] via  v22.13.1 via 🐍 v3.12.4 via 💎 v3.2.3 ❯ pre-commit run -v --all-files prettier | head prettier.................................................................Passed - hook id: prettier - duration: 1.4s arch/inst/B/slli.uw.yaml 14ms (unchanged) cfgs/qc_iu/arch_overlay/inst/Xqci/qc.insbprh.yaml 4ms (unchanged) arch/inst/F/fmaxm.s.yaml 6ms (unchanged) arch/inst/V/vmin.vv.yaml 4ms (unchanged) arch/inst/V/vluxseg3ei32.v.yaml 3ms (unchanged) arch/inst/V/vdivu.vx.yaml 5ms (unchanged)

I don't understand why prettier put the quotes in for addi.yaml to begin with.

I also can't find anything in prettier docs that would lead me to believe it can figure out as-needed for these files. I don't think is it capable of accessing the #yaml-language-server comment at the top that references the schema file, but if it could, it would not "need" to add quotes.

The quotes are actually being added by the python yaml package that I am using to parse the yamls, if we decide on removing them, I'll simply clean all the outputs once I'm done with adding all instruction formats.

ThinkOpenly · 2025-02-04T22:22:08Z

arch/inst/I/addiw.yaml

@@ -4,12 +4,15 @@ $schema: "inst_schema.json#"
 kind: instruction
 name: addiw
 long_name: Add immediate word
-description: Add an immediate to the 32-bit value in rs1, and store the sign extended result in rd
+type: I-Type
+description:


Showing my ignorance here, but does this need the | at the end like other multiline field values? This occurs elsewhere as well.

Without the |, this still gets interpreted as a string since there is no empty line in the description text. | (or >) is required if there were an empty line. Also, I believe that with no marker, the string is interpreted using something closer to the > (folded) block style that keeps around extra whitespace.

In general, I think it's going to be best practice to consistently use | here.

Yes, this was done by the pre-commit and I did not check it afterwards, hence the issue. I'll fix it on next commit and pay more attention

Here's prettier cfg for setting that made this change https://prettier.io/docs/options/#print-width

@kbroch-rivosinc can we set prettier to have the | character when it does this specific behaviour?

Without the |, this still gets interpreted as a string since there is no empty line in the description text. | (or >) is required if there were an empty line. Also, I believe that with no marker, the string is interpreted using something closer to the > (folded) block style that keeps around extra whitespace.

My reading of the YAML spec is that | denotes a "literal block scalar" that preserves all linebreaks. > denotes a "folded block scalar" (as you said), but this can fold lines where two non-space characters are separated by a single space.

Here, if we want that long line to be "preserved" when the YAML is processed, we'd want to use >.

There's probably grounds here to open an issue, right? sS we can have a discussion with a slighly larger group, and maybe migrate all multi-line descriptions into >

ThinkOpenly · 2025-02-04T22:25:08Z

arch/inst/I/ebreak.yaml

@@ -4,6 +4,7 @@ $schema: "inst_schema.json#"
 kind: instruction
 name: ebreak
 long_name: Breakpoint exception
+type: Unknown


Shall we just manually set this to "I-type"?

I'm not sure, is it? I found some cases like this one in which I could not find information on what type the instruction actually is. Moreover, the fences have the same problem

Yea, this is an interesting one. It has the same general structure as an I-type, but the immediate is actually an opcode. I think this is a different (unnamed in the spec) instruction format.

Pulling out the spec... ;-)

Section 2.8:

SYSTEM instructions are used to access system functionality that might require privileged access and are encoded using the I-type instruction format.

EBREAK is a SYSTEM instruction.

fences have the same problem

Fences are arguably I-type as well, given their 4-4-4-5-3-5-7 field layout only matches I-type's 12-5-3-5-7.

Although, I'm not saying that the application of formats in the spec is not sloppy at times. If we have a reasonably supportable case for reclassifying some instructions into new formats (or sub-formats?), we can take that forward for discussion to the spec owners. If we think that's a possibility, maybe we put a mark on the wall now, in the form of an issue here.

Any idea on how to define those as for type? Should I insert "form: System"?

The snippet from section 2.8 is saying that these are "I-type" format.

ThinkOpenly · 2025-02-04T22:25:44Z

arch/inst/I/ecall.yaml

@@ -4,6 +4,7 @@ $schema: "inst_schema.json#"
 kind: instruction
 name: ecall
 long_name: Environment call
+type: Unknown


ECALL is also a SYSTEM instruction.

schemas/inst_schema.json

ThinkOpenly · 2025-02-04T22:48:18Z

schemas/inst_schema.json

@@ -182,6 +180,19 @@
          "type": "string",
          "description": "Short description of the instruction"
        },
+        "type": {


If we hew closely to the ISA spec, the field name should probably be "format", especially if we carry the "-type" suffix into every format name, below.

The spec calls these, generally, "instruction formats", but calls each instance "X-type". (I presume there is some logic to this.)

Should we make this field required, or does that just break everything until all of the instructions have a "format" ?

I can make it required, but then this PR will have to put a type or format into all of the instructions, which actually seems feasible.

That's what I suspected. I think we're headed that way, anyway. Do you need help?

(If we choose not to do it now, let's create a reminder issue to "make the 'format' field required".)

Yes, we can sort it all out in this PR! I'll try to see how much of it I can automate (my guess is most of it) and then I'll need help for reviewing the whole thing and handling weirder types such as system calls and fences, is that alright?

Works for me. As for SYSTEM and FENCE instructions, they should all be I-type, I believe.

There is a problem with this types being all I-types. How shall we do verification on them? Some of them will be all hardcoded and there is still a way to check if all bits still align; however, I had a vision of more profound verification. Moreover, if we want sub-fields appearing in instruction documentation, we would need to specify the sub-types of all instructions, otherwise we will never get the correct "caption" on every field. For the first approach, I think I won't delve in this, but what if we define all sub-type of each type in a next PR? The ISA already specifies them as specilization of the main type. so actually giving them names shouldn't be a big problem. We can see how to do this here in the UDB and if it makes sense, this would be a great discussion to have with the ISA commitee at it's time, right?

Yeah, having all the fixed bits hardcoded in a "match" string for each instruction doesn't allow for that sort of field-level verification where the field value (or part of the field value!) is fixed. I avoided this in my Sail-to-JSON backend and left everything as individual fields, for good or bad. It's certainly handy to have a "match" string, but it's also pretty easy to compute a match string based on the individual field representations.

The more I look at this, the more I think we should probably just have all instructions as all fields described. This would only be possible with accurate instruction types and fields, that would precisely describe what every bit of every instruction is.

schemas/inst_schema.json

dhower-qc · 2025-02-05T11:37:08Z

@ThinkOpenly @dhower-qc If you can please review the first extensions mock-up it would be great! The only odd thing is that yaml libraries seem to prefer to have "" on the match strings. This does not affect the data, so I believe it won't be a problem. I can also filter them out if you prefer.

I'll try to get this this later today. I have some thoughts after a quick pass, but it will take me a bit to organize them.

Signed-off-by: Afonso Oliveira <[email protected]>

…al/ Zabha/ Zalasr/ Zawrs/ Zbkb/ Zbkx/ Zfbfmin/ Zfh/ Zicbom/ Zicboz/ Zicfilp/ Zicfiss/ Zicond/ Zicsr/ Zifencei/ Zimop/ Zk Zks/ Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira · 2025-02-10T14:55:18Z

@ThinkOpenly All but Compressed and Vector instructions should be now done. Also updated the schema.

Signed-off-by: Afonso Oliveira <[email protected]>

dhower-qc · 2025-02-10T16:07:22Z

Warning: I'm going to blow this PR up 💣

I'm wondering if we can align this with the goals of #285, which are more expansive than identifying the "format" of an instruction.

As the discussion here highlights, "format" is too ambiguous to truly capture an instruction type (e.g., ecall is "I-type" even though it has no immediate). For each field in an encoding, we need to identify:

Location
Type (opcode/variable)
- Recall that for opcodes, we want to capture the opcode field name (e.g., 'opcode', 'funct3', etc.)
- For variables, we want to capture the name and traits (e.g., 'signed', 'shift', 'not')
Relocation, if applicable

With this in mind, "format" actually maps to multiple combinations of the above.

There are many ways to represent this. A two initial options for discussion:

Option 1 - insts point to specific format

# ld.yaml
name: ld
# ...
encoding:
  match: -----------------011-----0000011
  format: { $ref: iformats/itype_R_RISCV_PCREL_LO12_I.yaml# }

# iformats/itype_R_RISCV_PCREL_LO12_I.yaml
common_name: I-type
fields:
  - name: imm
    location: 31-20
    type: signed immediate
    relocation: { $ref: relocations/R_RISCV_PCREL_LO12_I.yaml# }
  - name: xs1
    location: 19-15
    type: xregister
  - name: xd
    location: 11-7
    type: xregister
  - name: opcode
    location: 6-0
    type: opcode
  - name: funct3
    location: 14-12
    type: opcode

Option 2 - insts pick their fields

# ld.yaml
name: ld
# ...
encoding:
  match: -----------------011-----0000011
  format: I-type # still needed, unless we have a map of "common name" to field list elsewhere
  fields:
    - { $ref: ifields/itype_pc_rel_imm.yaml# }
    - { $ref: ifields/xs1.yaml# }
    - { $ref: ifields/xd.yaml# }
    - { $ref: ifields/major_opcode.yaml# }
    - { $ref: ifields/funct3.yaml# }

---
# ifields/itype_pc_rel_imm.yaml
- name: imm
  location: 31-20
  type: signed immediate
  relocation: { $ref: relocations/R_RISCV_PCREL_LO12_I.yaml# }

---
# ifields/xs1.yaml
- name: xs1
  location: 19-15
  type: xregister

---
# ifields/xd.yaml#
- name: xd
  location: 11-7
  type: xregister

---
# ifields/major_opcode.yam
- name: opcode
  location: 6-0
  type: opcode

--
# ifields/funct3.yaml
- name: funct3
  location: 14-12
  type: opcode

Option 1 winds up with more "iformats" since you need a new format for every variation. Option avoids that problem, but doesn't completely remove data duplication in the files since you will have a lot of instructions with the same field list. FWIW, Option 2 gives a little more information to the human reader.

@drom might have opinions.

ThinkOpenly · 2025-02-10T19:36:32Z

I'm in the "Option 1" camp. If we think that formats are important (and the ISA spec does treat them as important), then every instruction must comply with a format, even with idiosyncrasies. We should not give instructions the freedom to "build their own" format with a list of fields that may or may not be part of the format but the layout works because "everything fits". (This is what Sail does today, and I think we have a nice opportunity to improve significantly on that.)

I like the compartmentalization to be at a high enough level that the format definitions could be hardcoded in the database. So, instead of iformats/itype_R_RISCV_PCREL_LO12_I.yaml, the "itype" part would be its own directory: iformats/itype/R_RISCV_PCREL_LO12_I.yaml.

ThinkOpenly · 2025-02-10T19:48:11Z

I also think we should still have the format as a required field for each instruction. This can be used to validate the instruction definition matches the layout of the fields, and it eases displaying the instruction format as part of the documentation/web reference.

I didn't review all 415(!) files changed here, and it's an open question as to whether we need to retain type.py (maybe not), but I like the changes to the YAML files and would like to see those changes merged, at least after we also get the vector instructions covered.

dhower-qc · 2025-02-10T20:05:04Z

I like the compartmentalization to be at a high enough level that the format definitions could be hardcoded in the database. So, instead of iformats/itype_R_RISCV_PCREL_LO12_I.yaml, the "itype" part would be its own directory: iformats/itype/R_RISCV_PCREL_LO12_I.yaml.

I like that.

I also think we should still have the format as a required field for each instruction.

Agreed.

AFOliveira · 2025-02-11T09:31:31Z

I also prefer option 1; however, I'm still not sure how this would look. It seems to me that the task is going to be defining every subset of each instruction type and defining what fields it has. Moreover, how do we identify the fields hardcoded? Because sometimes it is the same type, but the field actually means different things, right? Take fence as an example, it is I-type but the imm section is not an immediate at all.

Should we define a subtype for each meaning of the fields? I.e. this would be

  # fence.yaml
    - format: I-type
      subformat: { $ref: iformats/itype_fence.yaml# }

and then for the fields, how do they get hardcoded? Can we specify them? And then which ones would we specify and which ones we wouldn't? Probably should do for all, even for opcodes

#iformats/itype_fence.yaml

  - name: imm
    location: 31-20
    type: signed immediate
    value: 0x0

dhower-qc · 2025-02-11T15:57:48Z

OK, let's go with Option 1.

I was thinking that formats never have hardcoded values (though they might include excluded values with not:, as you see frequently in the C extension). You can use the mask for values. For your fence example, I think it would like:

# inst/I/fence.yaml
name: fence
encoding: 0000-------------000-----0001111
format: { $ref: iformats/itype/fence.yaml# }

# inst/I/fence.tso.yaml
name: fence
encoding: 0001-------------000-----0001111
format: { $ref: iformats/itype/fence.yaml# }

# iformats/itype/fence.yaml
common_name: I-type  # this is mostly informative, since it's too ambiguous to mean much
fields:
  - name: fm
    location: 31-28
    type: opcode
  - name: pi
    location: 27
    type: fence set
  # ...

AFOliveira · 2025-02-11T16:12:48Z

Next question is: is this going to be a requirement to have this PR upstream? In theory, having instruction types on their own would already be a nice increment to what we have now and allow for some more verification; however, proposed changes aren't really minimal, and I don't think there is a way to automate, so we first need to figure how many different subtypes we have and describe them all - and I'll have a really hard/long time doing that on my own :)

dhower-qc · 2025-02-11T16:34:52Z

Next question is: is this going to be a requirement to have this PR upstream? In theory, having instruction types on their own would already be a nice increment to what we have now and allow for some more verification; however, proposed changes aren't really minimal, and I don't think there is a way to automate, so we first need to figure how many different subtypes we have and describe them all - and I'll have a really hard/long time doing that on my own :)

Yes, this is not a small task. And it's hard to see a "transition" period where we would support old and new schemas. If you can get things started, and get a better idea of the scope of work, then I can work on finding some help.

ThinkOpenly · 2025-02-11T19:15:01Z

Next question is: is this going to be a requirement to have this PR upstream? In theory, having instruction types on their own would already be a nice increment to what we have now and allow for some more verification; however, proposed changes aren't really minimal, and I don't think there is a way to automate, so we first need to figure how many different subtypes we have and describe them all - and I'll have a really hard/long time doing that on my own :)

Yes, this is not a small task. And it's hard to see a "transition" period where we would support old and new schemas. If you can get things started, and get a better idea of the scope of work, then I can work on finding some help.

If I understand the semantics of $ref correctly, it just gets replaced with the target contents. So, the example above:

# ld.yaml
name: ld
# ...
encoding:
  match: -----------------011-----0000011
  format: { $ref: iformats/itype_R_RISCV_PCREL_LO12_I.yaml# }

Is exactly the same and can be implemented as:

# ld.yaml
name: ld
# ...
encoding:
  match: -----------------011-----0000011
  format:
    common_name: I-type
    fields:
      - name: imm
        location: 31-20
...

Is that correct?

If so, it might be an "easier" task to first convert everything from the current format to the second format, above, meaning change this:

encoding:
  match: -----------------000-----0001111
  variables:
    - name: fm
      location: 31-28
    - name: pred
      location: 27-24
    - name: succ
      location: 23-20
    - name: rs1
      location: 19-15
    - name: rd
      location: 11-7

to this:

encoding:
  match: -----------------000-----0001111
  format:
    common_name: I-type
    fields:
      - name: fm
        location: 31-28
      - name: pred
        location: 27-24
      - name: succ
        location: 23-20
      - name: rs1
        location: 19-15
      - name: rd
        location: 11-7

Then, we can coalesce common format representations (and add types and relocations) at will.

AFOliveira · 2025-02-11T19:38:26Z

@ThinkOpenly Yes, that was what I was asking. Can we first get all the formats upstream and only then open another PR on the next changes?

dhower-qc · 2025-02-11T19:43:53Z

I like this line of thinking, but I think there are a few pitfalls as proposed. For example, we'll need to know opcode vs. non-opcode so the generators know if the field is a variable for IDL code / wavedrom diagramming, etc. That said, we might be able to tolerate some missing info for a period if it helps.

I'm wondering just how many formats we really have. Thinking aloud here, this should cover G:

I-type, pc-relative offset
I-type, sign-extended immediate (no relocation)
I-type, CSR (no relocation) (arguably same as prior, but want a different name for the field)
I-type, CSRI (no relocation)
I-type, fence
I-type, opcode (ecall/ebreak)
R-type
R-type, FP rm (arguably same as prior, but want different name for 'funct3')
R-type, FP cvt/sqrt/class/mv
R-type, RV64 shifts
R-type, RV32 shifts
R-type, LR
R-type, SC / AMO
R4-type
U-type, pc-relative offset
U-type, immediate (no relocation)
J-type
S-type
B-type

That's 19. Say C adds another 10. A few for vector. And some misc here and there...

@AFOliveira , how many of these are your scripts already looking for?

AFOliveira · 2025-02-11T19:52:02Z

@AFOliveira , how many of these are your scripts already looking for?

I only have the base I-type, R-type etc... I'm afraid there are more than those 19, per example, I think V extension also adds a variety of unnamed types. There's also R-type FP in which RS2 is hardcoded. I think the hardest part here is having that definitive list, since the ISA is not super clear on them, after that I can see how much is actually automatable, but I would guess I could figure it out.

ThinkOpenly · 2025-02-11T20:03:17Z

This could benefit from iterations, I think. However, there is value in getting the basic types (6 for "I": R, I, S, B, U, J; and 9 for "C": CR, CI, CSS, CIW, CL, CS, CA, CB, CJ) added. These are the only types (AFAIK) defined in the ISA. Adding them will be non-controversial (if that is possible).

I worry about running too far ahead of the ISA in these kinds of canonical areas.

Other formats as described should probably be considered sub-types and non-canonical, and maybe wait for the next iteration.

Add script to figure instructions type based on its arguments.

6938420

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira requested review from ThinkOpenly and james-ball-qualcomm February 3, 2025 15:47

ThinkOpenly requested changes Feb 4, 2025

View reviewed changes

Updated JSON schema to require format for all instructions

1ea910d

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira force-pushed the AFOliveira/instructionType branch from 3a5c55a to 1ea910d Compare February 10, 2025 10:05

add type.py functionalities

94b3441

Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira force-pushed the AFOliveira/instructionType branch from 83f357e to 0451c11 Compare February 10, 2025 11:04

AFOliveira added 2 commits February 10, 2025 14:32

add type.py with all formats, but Vector

07547b0

Signed-off-by: Afonso Oliveira <[email protected]>

add format to A/ B D/ F/ H/ I/ M/ Q/ S Sdext/ Smdbltrp/ Smrnmi/ Svinv…

beeecc1

…al/ Zabha/ Zalasr/ Zawrs/ Zbkb/ Zbkx/ Zfbfmin/ Zfh/ Zicbom/ Zicboz/ Zicfilp/ Zicfiss/ Zicond/ Zicsr/ Zifencei/ Zimop/ Zk Zks/ Signed-off-by: Afonso Oliveira <[email protected]>

AFOliveira force-pushed the AFOliveira/instructionType branch from 11ab941 to beeecc1 Compare February 10, 2025 14:54

add type.py with all formats, but Vector

087153d

Signed-off-by: Afonso Oliveira <[email protected]>

Add script to identify instructions type based on its arguments. #453

Are you sure you want to change the base?

Add script to identify instructions type based on its arguments. #453

Conversation

AFOliveira commented Feb 3, 2025

ThinkOpenly commented Feb 3, 2025

AFOliveira commented Feb 4, 2025

AFOliveira commented Feb 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AFOliveira Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

dhower-qc commented Feb 5, 2025

AFOliveira commented Feb 10, 2025

dhower-qc commented Feb 10, 2025

Option 1 - insts point to specific format

Option 2 - insts pick their fields

ThinkOpenly commented Feb 10, 2025

ThinkOpenly commented Feb 10, 2025

dhower-qc commented Feb 10, 2025

AFOliveira commented Feb 11, 2025

dhower-qc commented Feb 11, 2025

AFOliveira commented Feb 11, 2025

dhower-qc commented Feb 11, 2025

ThinkOpenly commented Feb 11, 2025

AFOliveira commented Feb 11, 2025

dhower-qc commented Feb 11, 2025

AFOliveira commented Feb 11, 2025

ThinkOpenly commented Feb 11, 2025

AFOliveira Feb 11, 2025 •

edited

Loading