Skip to content

Add type instructions and encodings as a separate schema #686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

AFOliveira
Copy link
Collaborator

Add instruction “type” schema & link sub‐types (closes #676)

  • inst_schema.json:
    • Introduce type object under encoding with a $ref to external inst_type/...#.yaml files
  • inst_type_schema.json:
    • Import of encoding schema for defining RV32/RV64 encodings for instruction types
  • Lays groundwork for sub‐types in R/I/B/S/U formats as discussed in Add sub-types to existing instruction formats #655

…ific encoding instead of pre-defining an encoding in the yaml file.

Signed-off-by: Afonso Oliveira <[email protected]>
…fic encoding instead of pre-defining an encoding in the instruction yaml file.

Signed-off-by: Afonso Oliveira <[email protected]>
Copy link
Collaborator

@dhower-qc dhower-qc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need sub_type?

@AFOliveira
Copy link
Collaborator Author

Do we also need sub_type?

I though you commented against sub_type in #655 (comment). However, now I see @ThinkOpenly proposed format and sub-type. However, with the URI path of /type/sub-type, isn't it redundant to have both?

Copy link
Collaborator

@ThinkOpenly ThinkOpenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: is the purpose of this PR to define a schema just for subtypes, or for both instruction types and subtypes?

I think we need to distinctly define the top-level types that are well-defined in the spec, and additionally define subtypes (that aren't well-defined in the spec).

"$ref": {
"type": "string",
"format": "uri-reference",
"pattern": "^inst_type/[A-Z]/[a-zA-Z0-9_]+\\.yaml#.*$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "pattern" won't accommodate "R4-type" as a top-level format, and we could consider a more restrictive naming scheme for the "$ref", such as "at least starts with a capital letter":

"pattern": "^inst_type/[A-Z]+[a-zA-Z0-9_]*]/[A-Z]+[a-zA-Z0-9_]*\\.yaml#.*$",

(But, we're inventing that convention.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last thing to do, do we really want it to start with inst_type? where will this files stay within UDB?

@@ -190,6 +203,25 @@
"$ref": "schema_defs.json#/$defs/requires_entry",
"description": "Extension(s) that defines the instruction"
},
"type": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing my type/subtype thought, this field would be for the top-level "type", and we'd add an additional field, "subtype", which defines the exact type of the instruction encoding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then type wouldn't be any reference, but just a string? (i.e. "I")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a reference to generic definitions of the top-level instruction formats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly mention "type", given that subtype presumably points to its parent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have resolved this too early. It would be convenient to have the top-level type here. One-stop shopping for anyone looking at the instruction, rather than having to look at every subtype to find the type and make sure they all agree. We could add validation that everything matches, too.

There currently isn't a pointer from subtype to parent. That's probably a good add, and offers the ability to do additional validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that having a parent in the sub-type if we did not describe the main type in the instructions, otherwise how can it not be redundant information?

Copy link
Collaborator

@ThinkOpenly ThinkOpenly May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that having a parent in the sub-type if we did not describe the main type in the instructions, otherwise how can it not be redundant information?

I'm not following. I propose having some indication of the top-level type in the instruction and in each subtype. (They do not necessarily need to be the same thing, although they could be.) In this way, both have "full information" at hand to facilitate validation, rather than needing an additional level of indirection in order to determine the top-level format.

Is that redundant? Yes, but it allows each to be standalone (for some definition of that term).

If you remove the top-level type from the instruction, you'd need to look at every subtype (usually one, but sometimes 2 [RV32, RV64]) to find the top-level type, and then make sure they all agree.

If you remove the top-level type from the subtype, you'd need to go find all of the instructions that use the subtype to find the top-level type, and then make sure they all agree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't what you are proposing precisely the opposite of what was decided on the definedBy field? In which we removed the B extension that implied Zbx extensions? I notice that the relationship between extensions and types is not the same, but they close enough to me in this terms. I don't have any strong opinion on this, it just seems counter-intuitive and against the previous line of thought

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see definedBy differently. That attribute a fairly static description of precisely which extension(s) actually define the instruction, rather than just include it -- which would be somewhat dynamic. It would be an ongoing maintenance effort (nightmare?) as new umbrella extensions are added to go through the entire extensions hierarchy and add that new extension to the definedBy for newly included instructions. I was trying to avoid that error-prone update requirement.

For types/subtypes, that is absolutely static. So, the convenience/cost ratio of any redundancy is pretty high. I described the pros/cons above.

Is your preference to only define the top-level type in subtypes, and not in the instruction itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that distinction between definedBy and type — the comparison makes total sense. I also don’t think the choice is hard to reverse, so let’s go with your approach for now, and if it makes more sense to switch later, that’s totally fine.

….json since now it is common to multiple files

Signed-off-by: Afonso Oliveira <[email protected]>
…l parameter of inst_data

Signed-off-by: Afonso Oliveira <[email protected]>
Copy link
Collaborator

@ThinkOpenly ThinkOpenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we move this along. Seems like an awesome start.

@ThinkOpenly
Copy link
Collaborator

I suggest we move this along. Seems like an awesome start.

After "field" moves from "inst_schema" to "schema_defs", of course.

@AFOliveira
Copy link
Collaborator Author

I know I am delaying @Unique-Usman work, I'm just super busy, I'll try to either fix this today or monday

@dhower-qc
Copy link
Collaborator

@AFOliveira, if you OK with I can finish this one up.

@AFOliveira
Copy link
Collaborator Author

@AFOliveira, if you OK with I can finish this one up.

Be my guest :) Thanks!

@dhower-qc
Copy link
Collaborator

I still need to update the Ruby classes, but I pushed the schema updates for review.

format:
type: { "$ref": "inst_type/R.yaml#" }
subtype: { "$ref": "inst_subtype/R/R-x.yaml#" }
encoding: 0000100----------000-----0111011
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this? This might be a longer discussion.
Is there a way to parameterize the type and/or subtype such that the "opcodes" that are known to the instruction can be associated with the opcode fields in the type/subtype, rather than by extraction from this "encoding" bit string? Put differently, can we put here "this is an R-type, R-x-subtype instruction where funct7=0b0000100, funct3=0b000, opcode=0b0111011"?

And eventually, perhaps: "funct7=ADD.UW, funct3=ADD.UW, opcode=OP-32", or something like that, to match the spec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Now that we are enumerating all the opcodes, we can assign them individually. One question is whether we want to go the $inherits route or keep it as a $ref for the subtype. $inherits is probably a better human description, and lets us use JSON Schema validation on the resolved file. The downside is that we are only using $inherits right now for profiles, so this would make that custom-to-UDB operator pervasive, which some might not like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's what it could look like with $inherits:

format:
  $inherits: "inst_subtype/R/R-x.yaml#"
  opcodes:
    funct7:
      display_name: ADD.UW
      value: 0b0000100
    funct3:
      display_name: ADD.UW
      value: 0b000
    opcode:
      display_name: OP-32
      value: 0b0111011

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significantly better.

However, for at least the "opcode" values, these are shared among potentially many instructions. Should we / can we create a database for them? I guess that would be realized with another $ref?

opcode: { "$ref": "opcodes/OP-32.yaml#" }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support multiple inheritance, so it could look like this:

format:
  $inherits:
    - inst_subtype/R/R-x.yaml#
    - inst_opcodes/OP-32.yaml#
  opcodes:
    funct7:
      display_name: ADD.UW
      value: 0b0000100
    funct3:
      display_name: ADD.UW
      value: 0b000

Comment on lines +7 to +8
data:
location: 14-12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortuntely, "funct3" is not always at this location. It depends on the format. For example, CI, CSS, CIW, CL, CS all have a "funct3" at 15-13.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer we just drop the indirection or call it something like R-funct3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about something like this?

  • arch/inst_type/R.yaml (unchanged)
  • arch/inst_subtype/R-x.yaml:
    data:
      type: { "$ref": "inst_type/R.yaml#" }
      subtype: { "$ref": "inst_subtype/R/R-x.yaml#" }
      $inherits:
        - inst_type/R.yaml#/opcodes
        - inst_var/xs2.yaml#/data
        - inst_var/xs1.yaml#/data
        - inst_var/xd.yaml#/data
  • arch/inst/Zba/add.uw.yaml:
    format:
      $inherits: inst_subtype/R/R-x.yaml#
      opcodes:
        funct7:
          display_name: ADD.UW
          value: 0b0000100
        funct3:
          display_name: ADD.UW
          value: 0b000
        opcode: { $inherits: inst_opcode/OP-32.yaml#/data }
  • arch/inst_var/xs2.yaml:
    data:
      $inherits: inst_type/R.yaml#/variables/rs2
      type: { "$ref": "inst_var_type/x_src_reg.yaml#" }

etc.

So, R-x subtype inherits opcodes from the base type, R.
And xs2 inherits attributes (just "location") from the base type R's rs2.
(I removed the top-level "inherits: OP-32" from "add.uw.yaml". Was that needed?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add type to instruction schema
3 participants