From 4c26154ed93a73049a7ccf65f952a1e486b54558 Mon Sep 17 00:00:00 2001 From: Yang Yujie Date: Mon, 5 Jun 2023 18:10:00 +0800 Subject: [PATCH] Release v2.10, based on the original LoongArch ELF ABI document --- CONTRIBUTING.md | 39 +++ Makefile | 24 ++ README.md | 91 ++++++ VERSION | 7 + la-abi.adoc | 37 +++ ladwarf.adoc | 108 ++++++++ laelf.adoc | 713 ++++++++++++++++++++++++++++++++++++++++++++++++ lapcs.adoc | 687 ++++++++++++++++++++++++++++++++++++++++++++++ 8 files changed, 1706 insertions(+) create mode 100644 CONTRIBUTING.md create mode 100644 Makefile create mode 100644 README.md create mode 100644 VERSION create mode 100644 la-abi.adoc create mode 100644 ladwarf.adoc create mode 100644 laelf.adoc create mode 100644 lapcs.adoc diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..0e252a1 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,39 @@ +# Contributing to the Application Binary Interface + +## Thank you for taking the time to contribute! + +We accept bug fixes and feature proposals through two means: by filing an issue, +or by submitting a pull request (PR). + +## Create an issue + +You can create an issue at https://github.com/loongson/la-abi-specs/issues to +submit bug reports or make proposals. + +## Pull request + +To contribute fixes or improvements, you are welcome to submit a pull request +on https://github.com/loongson/la-abi-specs/pull. The workflow for submitting +a pull request is as follows: + +### Sign Contributor License Agreement (CLA) + +Contributors must sign CLA before their pull requests can be merged. Please +contact wuqingling@loongson.cn regarding how to sign the CLA. + +### Make the actual pull request + +Follow [Github pull requests documentation](https://docs.github.com/en/pull-requests) +to submit the PR. + +### Review of pull request + +Pull requests need to be reviewed before they can be merged. Anyone can review +the requests, but approval from at least one Loongson reviewer is required. + +### Merging the change + +Pull request can be merged once the change has been reviewed properly, which +can only be done by one of the administrators. If your change hasn't been merged +for more than a week after it has been accepted, leave a comment on the pull +request. diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..c32d468 --- /dev/null +++ b/Makefile @@ -0,0 +1,24 @@ +SRC = lapcs.adoc laelf.adoc ladwarf.adoc +PDF = la-abi.pdf + +PDF_THEME = themes/la-abi-pdf.yml + +.PHONY: all clean + +$(PDF): $(PDF:.pdf=.adoc) $(SRC) $(PDF_THEME) + asciidoctor-pdf \ + -a compress \ + -a date="$(DATE)" \ + -a monthyear="$(MONTHYEAR)" \ + -a pdf-style="$(PDF_THEME)" \ + -a pdf-fontsdir=fonts \ + -v \ + $< -o $@ + +html: $(patsubst %.adoc, %.html, $(SRC)) + +%.html: %.adoc + asciidoctor $^ -o $@ + +clean: + -rm -rf $(patsubst %.adoc, %.html, $(SRC)) diff --git a/README.md b/README.md new file mode 100644 index 0000000..94ac1bc --- /dev/null +++ b/README.md @@ -0,0 +1,91 @@ +# Application Binary Interface for the LoongArch™ Architecture + +This is the official documentation of the Application Binary Interface +for the LoongArch™ Architecture. + +## Releases + +The latest ABI documentation releases are available at +https://github.com/loongson/la-abi-specs and are licensed under the Creative +Commons Attribution-NonCommercial-NoDerivatives 4.0 International +(CC BY-NC-ND 4.0) License. + +## Defect reports + +Please report defects in or enhancements to the specifications in this folder to +the [issue tracker page on GitHub](https://github.com/loongson/la-abi-specs/issues). + +## List of documents + +specification | latest +--- | --- +Procedure Call Standard for the LoongArch™ Architecture | [lapcs](lapcs.adoc) +ELF for the LoongArch™ Architecture | [laelf](laelf.adoc) +DWARF for the LoongArch™ Architecture | [ladwarf](ladwarf.adoc) + +## Contributing + +Please refer to the contribution guidelines in [CONTRIBUTING](CONTRIBUTING.md). + +## License + +The ABI documents and their source files are currently licensed under the +Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International +(CC BY-NC-ND 4.0) License. Contributions to these files are accepted under +the same license. + +To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ +or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. + +## Revision History + +Legacy versions of the documents were released at the Github repository +[LoongArch-Documentation](https://github.com/loongson/LoongArch-Documentation). +These include version 1.00, 2.00 and 2.01. + +All changes to the documents in a subsequent release should be declared in their +change history section respectively. Timestamps in the form `YYYYMMDD` should be used +for versioning of the individual documents in this folder, and a global version +number which correspond to a combination of individual document versions will still +be assigned to every new release of this collection. + +This global version number will continue to follow the legacy versioning scheme, +where a change of the major version (currently 1 and 2) could potentially degrade +binary compatibility between objects conforming to these specifications, while a +change to the two-digit minor version signifies other bugfixes and improvements. + +Please note that we do not expect the major version to change at any time in the +forseeable future, and the minor version may increase by more than 1 in a new public +release for project management purposes. + +- **v1.00** + + * Add register usage convention, data type conventions and the list of ELF relocation types. + +- **v2.00** + + * Add description of ILP32 data model. + * Add description of return value register aliases. + * Add relocation types with direct immediate-filling semantics. + * Add ABI version porting guidelines for toolchain implementations. + * Add link to SysV gABI documentation. + * Adjust asciidoc code style. + +- **v2.01** + + * Adjust description of ABI type encoding scheme. + * Add header for all tables. + +- **v2.10** + + * Split the original psABI documentation (v2.01) into the `lapcs` and `laelf` documents. + * Add the *DWARF standard for the LoongArch™ architecture* (`ladwarf`) document. + * Differentiate machine data types with the C/C++ types. + * Clarify parameter passing rules for small `struct`s that contain both floating-point and integer members. + * Clarify parameter passing rules for `struct`s that contain zero-length arrays or bitfields. + +## I18n + +This specification is written in both English and Chinese. In the event of any +inconsistency between the same document version in two languages, the Chinese +version shall prevail. diff --git a/VERSION b/VERSION new file mode 100644 index 0000000..40077fa --- /dev/null +++ b/VERSION @@ -0,0 +1,7 @@ +Application Binary Interface for the LoongArch™ Architecture, version 2.10 + +List of documents: + +* Procedure Call Standard for the LoongArch™ Architecture, version 20230519 +* ELF for the LoongArch™ Architecture, version 20230519 +* DWARF for the LoongArch™ Architecture, version 20230425 diff --git a/la-abi.adoc b/la-abi.adoc new file mode 100644 index 0000000..2763609 --- /dev/null +++ b/la-abi.adoc @@ -0,0 +1,37 @@ += Application Binary Interface for the LoongArch™ Architecture +Version 2.10 +Copyright © Loongson Technology 2023. All rights reserved. +:toc: macro +:toclevels: 3 +:toctitle: +:doctype: article +:icons: font + +toc::[] + +== Preamble + +This is the official documentation of the Application Binary Interface +for the LoongArch™ Architecture. + +The latest ABI documentation releases are available at +https://github.com/loongson/la-abi-specs and are licensed under the Creative +Commons Attribution-NonCommercial-NoDerivatives 4.0 International +(CC BY-NC-ND 4.0) License. + +To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ +or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. + +This specification is written in both English and Chinese. In the event of any +inconsistency between the same document version in two languages, the Chinese +version shall prevail. + +:sectnums: +<<< +include::lapcs.adoc[] + +<<< +include::laelf.adoc[] + +<<< +include::ladwarf.adoc[] diff --git a/ladwarf.adoc b/ladwarf.adoc new file mode 100644 index 0000000..9d9b4d9 --- /dev/null +++ b/ladwarf.adoc @@ -0,0 +1,108 @@ += DWARF for the LoongArch™ Architecture +Version 20230425 + +Copyright © Loongson Technology 2023. All rights reserved. + +== Abstract + +This document describes the use of the DWARF debugging information format +in the Application Binary Interface (ABI) for the LoongArch architecture. + +== Keywords + +LoongArch, DWARF, Stack frame, CFA, CIE + +== Version History + +[%header,cols="^2,8"] +|==== +|Version +^|Description + +|20230425 +|initial version. +|==== + +== Overview + +The DWARF debugging format for LoongArch uses _DWARF Standard_ <>. +This specification only describes LoongArch-specific definitions. + +== Terms and Abbreviations + +**DWARF** + +Debugging With Attributed Record Formats. + +== LoongArch-specific DWARF Definitions + +=== DWARF Register Numbers + +DWARF Standard suggests that the mapping from a DWARF register name to a +target register number should be defined by the ABI for the target architecture. +DWARF register names are encoded as unsigned LEB128 integers. + +The table below lists the mapping from DWARF register numbers to LoongArch64 +registers. + +.Mapping from DWARF register numbers to LoongArch64 registers +[%header,cols="^1,^1,^2"] +[width=80%] +|=== +| DWARF Register Number | LoongArch64 Register Name | Description + +| 0 - 31 | `$r0` - `$r31` | General-purpose Register +| 32 - 63 | `$f0` - `$f31` | Floating-point Register +| 64 - | | Reserved for future standard extensions +|=== + + +=== CFA (Canonical Frame Address) + +The term Canonical Frame Address (CFA) is defined in _DWARF Debugging Information Format Version 5_ <>, §6.4 Call Frame Information. + +This ABI adopts the typical definition of `CFA` given there: + + the CFA is defined to be the value of the stack pointer at the call site in the + previous frame (which may be different from its value on entry to the current frame). + +The position of `CFA` in frame structure of LoongArch is shown below: + + | | + | previous frame | + |________________| + CFA----->|________________|<------ previous sp + |_______ra_______| + |_______fp_______| + | | + | current frame | + |________________| + |________________|<------ current sp + + +=== CIE (Common Information Entry) + +The `$r1` register is used to store the return address of the function, +and the value of the return address register field in `CIE` structure is `1`. + +The default `CFA` register at the function entry is `$r3`, and initial_instructions +field in `CIE` structure can define `3` as the default `CFA` register. + +=== Call frame instructions + +Using the existing definitions in DWARF Standard. + + +=== DWARF expression operations + +Using the existing definitions in DWARF Standard. + + +[bibliography] +== References + +* [[[dwarfstd]]] DWARF Standard, +https://dwarfstd.org/ + +* [[[dwarf5]]] DWARF Debugging Information Format Version 5, +https://dwarfstd.org/doc/DWARF5.pdf + + diff --git a/laelf.adoc b/laelf.adoc new file mode 100644 index 0000000..21b4329 --- /dev/null +++ b/laelf.adoc @@ -0,0 +1,713 @@ += ELF for the LoongArch™ Architecture +Version 20230519 + +Copyright © Loongson Technology 2023. All rights reserved. + +== Abstract + +This document describes the use of the ELF binary file format in the Application +Binary Interface (ABI) of the LoongArch Architecture. + +== Keywords + +LoongArch, ELF, ABI, SysV gABI, ELF header, Relocations + +== Version History + +[%header,cols="^2,8"] +|==== +|Version +^|Description + +|20230519 +|initial version, derived from the original __LoongArch ELF psABI__ document. +|==== + +== Introduction + +This specification provides the processor-specific definitions required by +ELF for LoongArch-based systems. + +All common ELF definitions referenced in this section +can be found in http://www.sco.com/developers/gabi/latest/contents.html[the latest SysV gABI specification]. + +== Terms and Abbreviations + +**ELF** + +Executable and Linking Format + +**SysV gABI** + +Generic System V Application Binary Interface + +**PC** + +Program Counter + +**GOT** + +Global Offset Table + +**PLT** + +Procedure Linkage Table + +**TLS** + +Thread-Local Storage + +== ELF Header +=== e_machine: Identifies the machine + +An object file conforming to this specification must have the value `EM_LOONGARCH (258, 0x102)`. + +=== e_flags: Identifies ABI type and version + +.ABI-related bits in `e_flags` +[%header,cols="^1,^1,^1,^1"] +|==== +| Bit 31 - 8 | Bit 7 - 6 | Bit 5 - 3 | Bit 2 - 0 +| (reserved) | ABI version | ABI extension | Base ABI Modifier +|==== + +The ABI type of an ELF object is uniquely identified by `EI_CLASS` and `e_flags[7:0]` in its header. + +Within this combination, `EI_CLASS` and `e_flags[2:0]` correspond to the **base ABI** type, +where the expression of C integral and pointer types (data model) is uniquely determined by +`EI_CLASS` value, and `e_flags[2:0]` represents additional properties of the base ABI type, +including the FP calling convention. We refer to `e_flags[2:0]` as the **base ABI modifier**. + +As a result, programs in `lp64*` / `ilp32*` ABI should only be encoded with ELF64 / ELF32 +object files, respectively. + +`0x0` `0x4` `0x5` `0x6` `0x7` are reserved values for `e_flags[2:0]`. + +.Base ABI types +[%header,cols="^1m,^1m,^3m,^3"] +|=== +|Name +|EI_CLASS | Base ABI Modifier (e_flags[2:0]) +|Description + +|lp64s | ELFCLASS64 | 0x1 +|Uses 64-bit GPRs and the stack for parameter passing. +Data model is `LP64`, where `long` and pointers are 64-bit while `int` is 32-bit. + +|lp64f | ELFCLASS64 | 0x2 +|Uses 64-bit GPRs, 32-bit FPRs and the stack for parameter passing. +Data model is `LP64`, where `long` and pointers are 64-bit while `int` is 32-bit. + +|lp64d | ELFCLASS64 | 0x3 +|Uses 64-bit GPRs, 64-bit FPRs and the stack for parameter passing. +Data model is `LP64`, where `long` and pointers are 64-bit while `int` is 32-bit. + +|ilp32s | ELFCLASS32 | 0x1 +|Uses 32-bit GPRs and the stack for parameter passing. +Data model is `ILP32`, where `int`, `long` and pointers are 32-bit. + +|ilp32f | ELFCLASS32 | 0x2 +|Uses 32-bit GPRs, 32-bit FPRs and the stack for parameter passing. +Data model is `ILP32`, where `int`, `long` and pointers are 32-bit. + +|ilp32d | ELFCLASS32 | 0x3 +|Uses 32-bit GPRs, 64-bit FPRs and the stack for parameter passing. +Data model is `ILP32`, where `int`, `long` and pointers are 32-bit. +|=== + +`e_flags[5:3]` correspond to the ABI extension type. + +.ABI extension types +[%header,cols="^1m,^1,^3"] +|=== +|Name +|e_flags[5:3] +|Description + +|base +|`0x0` +|No extra ABI features. + +| +|`0x1` - `0x7` +|(reserved) +|=== + +[[abi-versioning]] +`e_flags[7:6]` marks the ABI version of an ELF object. + +.ABI version +[%header,cols="^1,^1,^5"] +|=== +|ABI version +|Value +|Description + +|`v0` +|`0x0` +|Stack operands base relocation type. + +|`v1` +|`0x1` +|Supporting relocation types directly writing to immediate slots. Can be implemented separately without compatibility with v0. + +| +|`0x2` `0x3` +|Reserved. +|=== + +=== EI_CLASS: File class + +.ELF file classes +[%header,cols="^1m,^1m,^3"] +|=== +|EI_CLASS +|Value +|Description + +|ELFCLASS32 +|1 +|ELF32 object file + +|ELFCLASS64 +|2 +|ELF64 object file +|=== + +== Relocations + +.ELF relocation types +[%header,cols="^1,^4m,^4,^4"] +|=== +|Enum +|ELF reloc type +|Usage +|Detail + +|0 +|R_LARCH_NONE +| +| + +|1 +|R_LARCH_32 +|Runtime address resolving +|`+*(int32_t *) PC = RtAddr + A+` + +|2 +|R_LARCH_64 +|Runtime address resolving +|`+*(int64_t *) PC = RtAddr + A+` + +|3 +|R_LARCH_RELATIVE +|Runtime fixup for load-address +|`+*(void **) PC = B + A+` + +|4 +|R_LARCH_COPY +|Runtime memory copy in executable +|`+memcpy (PC, RtAddr, sizeof (sym))+` + +|5 +|R_LARCH_JUMP_SLOT +|Runtime PLT supporting +|_implementation-defined_ + +|6 +|R_LARCH_TLS_DTPMOD32 +|Runtime relocation for TLS-GD +|`+*(int32_t *) PC = ID of module defining sym+` + +|7 +|R_LARCH_TLS_DTPMOD64 +|Runtime relocation for TLS-GD +|`+*(int64_t *) PC = ID of module defining sym+` + +|8 +|R_LARCH_TLS_DTPREL32 +|Runtime relocation for TLS-GD +|`+*(int32_t *) PC = DTV-relative offset for sym+` + +|9 +|R_LARCH_TLS_DTPREL64 +|Runtime relocation for TLS-GD +|`+*(int64_t *) PC = DTV-relative offset for sym+` + +|10 +|R_LARCH_TLS_TPREL32 +|Runtime relocation for TLE-IE +|`+*(int32_t *) PC = T+` + +|11 +|R_LARCH_TLS_TPREL64 +|Runtime relocation for TLE-IE +|`+*(int64_t *) PC = T+` + +|12 +|R_LARCH_IRELATIVE +|Runtime local indirect function resolving +|`+*(void **) PC = (((void *)(*)()) (B + A)) ()+` + +4+|... Reserved for dynamic linker. + +|20 +|R_LARCH_MARK_LA +|Mark la.abs +|Load absolute address for static link. + +|21 +|R_LARCH_MARK_PCREL +|Mark external label branch +|Access PC relative address for static link. + +|22 +|R_LARCH_SOP_PUSH_PCREL +|Push PC-relative offset +|`+push (S - PC + A)+` + +|23 +|R_LARCH_SOP_PUSH_ABSOLUTE +|Push constant or absolute address +|`+push (S + A)+` + +|24 +|R_LARCH_SOP_PUSH_DUP +|Duplicate stack top +|`+opr1 = pop (), push (opr1), push (opr1)+` + +|25 +|R_LARCH_SOP_PUSH_GPREL +|Push for access GOT entry +|`+push (G)+` + +|26 +|R_LARCH_SOP_PUSH_TLS_TPREL +|Push for TLS-LE +|`+push (T)+` + +|27 +|R_LARCH_SOP_PUSH_TLS_GOT +|Push for TLS-IE +|`+push (IE)+` + +|28 +|R_LARCH_SOP_PUSH_TLS_GD +|Push for TLS-GD +|`+push (GD)+` + +|29 +|R_LARCH_SOP_PUSH_PLT_PCREL +|Push for external function calling +|`+push (PLT - PC)+` + +|30 +|R_LARCH_SOP_ASSERT +|Assert stack top +|`+assert (pop ())+` + +|31 +|R_LARCH_SOP_NOT +|Stack top operation +|`+push (!pop ())+` + +|32 +|R_LARCH_SOP_SUB +|Stack top operation +|`+opr2 = pop (), opr1 = pop (), push (opr1 - opr2)+` + +|33 +|R_LARCH_SOP_SL +|Stack top operation +|`+opr2 = pop (), opr1 = pop (), push (opr1 << opr2)+` + +|34 +|R_LARCH_SOP_SR +|Stack top operation +|`+opr2 = pop (), opr1 = pop (), push (opr1 >> opr2)+` + +|35 +|R_LARCH_SOP_ADD +|Stack top operation +|`+opr2 = pop (), opr1 = pop (), push (opr1 + opr2)+` + +|36 +|R_LARCH_SOP_AND +|Stack top operation +|`+opr2 = pop (), opr1 = pop (), push (opr1 & opr2)+` + +|37 +|R_LARCH_SOP_IF_ELSE +|Stack top operation +|`+opr3 = pop (), opr2 = pop (), opr1 = pop (), push (opr1 ? opr2 : opr3)+` + +|38 +|R_LARCH_SOP_POP_32_S_10_5 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [14 ... 10] = opr1 [4 ... 0]+` + +with check 5-bit signed overflow + +|39 +|R_LARCH_SOP_POP_32_U_10_12 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [21 ... 10] = opr1 [11 ... 0]+` + +with check 12-bit unsigned overflow + +|40 +|R_LARCH_SOP_POP_32_S_10_12 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [21 ... 10] = opr1 [11 ... 0]+` + +with check 12-bit signed overflow + +|41 +|R_LARCH_SOP_POP_32_S_10_16 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [25 ... 10] = opr1 [15 ... 0]+` + +with check 16-bit signed overflow + +|42 +|R_LARCH_SOP_POP_32_S_10_16_S2 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [25 ... 10] = opr1 [17 ... 2]+` + +with check 18-bit signed overflow and 4-bit aligned + +|43 +|R_LARCH_SOP_POP_32_S_5_20 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [24 ... 5] = opr1 [19 ... 0]+` + +with check 20-bit signed overflow + +|44 +|R_LARCH_SOP_POP_32_S_0_5_10_16_S2 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [4 ... 0] = opr1 [22 ... 18],+` + +`+(*(uint32_t *) PC) [25 ... 10] = opr1 [17 ... 2]+` + +with check 23-bit signed overflow and 4-bit aligned + +|45 +|R_LARCH_SOP_POP_32_S_0_10_10_16_S2 +|Instruction imm-field relocation +|`+opr1 = pop (), (*(uint32_t *) PC) [9 ... 0] = opr1 [27 ... 18],+` + +`+(*(uint32_t *) PC) [25 ... 10] = opr1 [17 ... 2]+` + +with check 28-bit signed overflow and 4-bit aligned + +|46 +|R_LARCH_SOP_POP_32_U +|Instruction fixup +|`+(*(uint32_t *) PC) = pop ()+` + +with check 32-bit unsigned overflow + +|47 +|R_LARCH_ADD8 +|8-bit in-place addition +|`+*(int8_t *) PC += S + A+` + +|48 +|R_LARCH_ADD16 +|16-bit in-place addition +|`+*(int16_t *) PC += S + A+` + +|49 +|R_LARCH_ADD24 +|24-bit in-place addition +|`+*(int24_t *) PC += S + A+` + +|50 +|R_LARCH_ADD32 +|32-bit in-place addition +|`+*(int32_t *) PC += S + A+` + +|51 +|R_LARCH_ADD64 +|64-bit in-place addition +|`+*(int64_t *) PC += S + A+` + +|52 +|R_LARCH_SUB8 +|8-bit in-place subtraction +|`+*(int8_t *) PC -= S + A+` + +|53 +|R_LARCH_SUB16 +|16-bit in-place subtraction +|`+*(int16_t *) PC -= S + A+` + +|54 +|R_LARCH_SUB24 +|24-bit in-place subtraction +|`+*(int24_t *) PC -= S + A+` + +|55 +|R_LARCH_SUB32 +|32-bit in-place subtraction +|`+*(int32_t *) PC -= S + A+` + +|56 +|R_LARCH_SUB64 +|64-bit in-place subtraction +|`+*(int64_t *) PC -= S + A+` + +|57 +|R_LARCH_GNU_VTINHERIT +|GNU C++ vtable hierarchy +| + +|58 +|R_LARCH_GNU_VTENTRY +|GNU C++ vtable member usage +| + +4+|... Reserved + +|64 +|R_LARCH_B16 +|18-bit PC-relative jump +|`+(*(uint32_t *) PC) [25 ... 10] = (S+A-PC) [17 ... 2]+` + +with check 18-bit signed overflow and 4-bit aligned + +|65 +|R_LARCH_B21 +|23-bit PC-relative jump +|`+(*(uint32_t *) PC) [4 ... 0] = (S+A-PC) [22 ... 18],+` + +`+(*(uint32_t *) PC) [25 ... 10] = (S+A-PC) [17 ... 2]+` + +with check 23-bit signed overflow and 4-bit aligned + +|66 +|R_LARCH_B26 +|28-bit PC-relative jump +|`+(*(uint32_t *) PC) [9 ... 0] = (S+A-PC) [27 ... 18],+` + +`+(*(uint32_t *) PC) [25 ... 10] = (S+A-PC) [17 ... 2]+` + +with check 28-bit signed overflow and 4-bit aligned + +|67 +|R_LARCH_ABS_HI20 +| [31 ... 12] bits of 32/64-bit absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (S+A) [31 ... 12]+` + +|68 +|R_LARCH_ABS_LO12 +|[11 ... 0] bits of 32/64-bit absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (S+A) [11 ... 0]+` + +|69 +|R_LARCH_ABS64_LO20 +|[51 ... 32] bits of 64-bit absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (S+A) [51 ... 32]+` + +|70 +|R_LARCH_ABS64_HI12 +|[63 ... 52] bits of 64-bit absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (S+A) [63 ... 52]+` + +|71 +|R_LARCH_PCALA_HI20 +|[31 ... 12] bits of 32/64-bit PC-relative offset +|`+(*(uint32_t *) PC) [24 ... 5] = (((S+A) & ~0xfff) - (PC & ~0xfff)) [31 ... 12]+` + +`+Note: The lower 12 bits are not included when calculating the PC-relative offset.+` + +|72 +|R_LARCH_PCALA_LO12 +|[11 ... 0] bits of 32/64-bit address +|`+(*(uint32_t *) PC) [21 ... 10] = (S+A) [11 ... 0]+` + +|73 +|R_LARCH_PCALA64_LO20 +|[51 ... 32] bits of 64-bit PC-relative offset +|`+(*(uint32_t *) PC) [24 ... 5] = (S+A - (PC & ~0xffffffff)) [51 ... 32]+` + +|74 +|R_LARCH_PCALA64_HI12 +|[63 ... 52] bits of 64-bit PC-relative offset +|`+(*(uint32_t *) PC) [21 ... 10] = (S+A - (PC & ~0xffffffff)) [63 ... 52]+` + +|75 +|R_LARCH_GOT_PC_HI20 +|[31 ... 12] bits of 32/64-bit PC-relative offset to GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (((GP+G) & ~0xfff) - (PC & ~0xfff)) [31 ... 12]+` + +|76 +|R_LARCH_GOT_PC_LO12 +|[11 ... 0] bits of 32/64-bit GOT entry address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+G) [11 ... 0]+` + +|77 +|R_LARCH_GOT64_PC_LO20 +|[51 ... 32] bits of 64-bit PC-relative offset to GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+G - (PC & ~0xffffffff)) [51 ... 32]+` + +|78 +|R_LARCH_GOT64_PC_HI12 +|[63 ... 52] bits of 64-bit PC-relative offset to GOT entry +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+G - (PC & ~0xffffffff)) [63 ... 52]+` + +|79 +|R_LARCH_GOT_HI20 +|[31 ... 12] bits of 32/64-bit GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+G) [31 ... 12]+` + +|80 +|R_LARCH_GOT_LO12 +|[11 ... 0] bits of 32/64-bit GOT entry absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+G) [11 ... 0]+` + +|81 +|R_LARCH_GOT64_LO20 +|[51 ... 32] bits of 64-bit GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+G) [51 ... 32]+` + +|82 +|R_LARCH_GOT64_HI12 +|[63 ... 52] bits of 64-bit GOT entry absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+G) [63 ... 52]+` + +|83 +|R_LARCH_TLS_LE_HI20 +|[31 ... 12] bits of TLS LE 32/64-bit offset from TP register +|`+(*(uint32_t *) PC) [24 ... 5] = T [31 ... 12]+` + +|84 +|R_LARCH_TLS_LE_LO12 +|[11 ... 0] bits of TLS LE 32/64-bit offset from TP register +|`+(*(uint32_t *) PC) [21 ... 10] = T [11 ... 0]+` + +|85 +|R_LARCH_TLS_LE64_LO20 +|[51 ... 32] bits of TLS LE 64-bit offset from TP register +|`+(*(uint32_t *) PC) [24 ... 5] = T [51 ... 32]+` + +|86 +|R_LARCH_TLS_LE64_HI12 +|[63 ... 52] bits of TLS LE 64-bit offset from TP register +|`+(*(uint32_t *) PC) [21 ... 10] = T [63 ... 52]+` + +|87 +|R_LARCH_TLS_IE_PC_HI20 +|[31 ... 12] bits of 32/64-bit PC-relative offset to TLS IE GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (((GP+IE) & ~0xfff) - (PC & ~0xfff)) [31 ... 12]+` + +|88 +|R_LARCH_TLS_IE_PC_LO12 +|[11 ... 0] bits of 32/64-bit TLS IE GOT entry address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+IE) [11 ... 0]+` + +|89 +|R_LARCH_TLS_IE64_PC_LO20 +|[51 ... 32] bits of 64-bit PC-relative offset to TLS IE GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+IE - (PC & ~0xffffffff)) [51 ... 32]+` + +|90 +|R_LARCH_TLS_IE64_PC_HI12 +|[63 ... 52] bits of 64-bit PC-relative offset to TLS IE GOT entry +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+IE - (PC & ~0xffffffff)) [63 ... 52]+` + +|91 +|R_LARCH_TLS_IE_HI20 +|[31 ... 12] bits of 32/64-bit TLS IE GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+IE) [31 ... 12]+` + +|92 +|R_LARCH_TLS_IE_LO12 +|[11 ... 0] bits of 32/64-bit TLS IE GOT entry absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+IE) [11 ... 0]+` + +|93 +|R_LARCH_TLS_IE64_LO20 +|[51 ... 32] bits of 64-bit TLS IE GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+IE) [51 ... 32]+` + +|94 +|R_LARCH_TLS_IE64_HI12 +|[63 ... 52] bits of 64-bit TLS IE GOT entry absolute address +|`+(*(uint32_t *) PC) [21 ... 10] = (GP+IE) [63 ... 52]+` + +|95 +|R_LARCH_TLS_LD_PC_HI20 +|[31 ... 12] bits of 32/64-bit PC-relative offset to TLS LD GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (((GP+GD) & ~0xfff) - (PC & ~0xfff)) [31 ... 12]+` + +|96 +|R_LARCH_TLS_LD_HI20 +|[31 ... 12] bits of 32/64-bit TLS LD GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+IE) [31 ... 12]+` + +|97 +|R_LARCH_TLS_GD_PC_HI20 +|[31 ... 12] bits of 32/64-bit PC-relative offset to TLS GD GOT entry +|`+(*(uint32_t *) PC) [24 ... 5] = (((GP+GD) & ~0xfff) - (PC & ~0xfff)) [31 ... 12]+` + +|98 +|R_LARCH_TLS_GD_HI20 +|[31 ... 12] bits of 32/64-bit TLS GD GOT entry absolute address +|`+(*(uint32_t *) PC) [24 ... 5] = (GP+IE) [31 ... 12]+` + +|99 +|R_LARCH_32_PCREL +|32-bit PC relative +|`+(*(uint32_t *) PC) = (S+A-PC) [31 ... 0]+` + +|100 +|R_LARCH_RELAX +|Instruction can be relaxed, paired with a normal relocation at the same address +| + +|101 +|R_LARCH_DELETE +|The instruction should be deleted at link time. +| + +|102 +|R_LARCH_ALIGN +|Alignment statement. The addend indicates the number of bytes occupied by nop instructions at the relocation offset. The alignment boundary is specified by the addend rounded up to the next power of two. +| + +|103 +|R_LARCH_PCREL20_S2 +|22-bit PC-relative offset +|`+(*(uint32_t *) PC) [24 ... 5] = (S + A - PC) [21 ... 2]+` + +|104 +|R_LARCH_CFA +|Canonical Frame Address +| + +|105 +|R_LARCH_ADD6 +|low 6-bit in-place addition +|`+(*(int8_t *) PC) += ((S + A) & 0x3f)+` + +|106 +|R_LARCH_SUB6 +|low 6-bit in-place subtraction +|`+(*(int8_t *) PC) -= ((S + A) & 0x3f)+` + +|107 +|R_LARCH_ADD_ULEB128 +|ULEB128 in-place addition +|`+(*(uleb128 *) PC) += S + A+` + +|108 +|R_LARCH_ADD_ULEB128 +|ULEB128 in-place subtraction +|`+(*(uleb128 *) PC) -= S + A+` + +|109 +|R_LARCH_64_PCREL +|64-bit PC relative +|`+(*(uint64_t *) PC) = (S+A-PC) [63 ... 0]+` +|=== + +[bibliography] +== References + +* [[[SysVelf]]] __System V Application Binary Interface - DRAFT__, +10 Jun. 2013, http://www.sco.com/developers/gabi/latest/contents.html diff --git a/lapcs.adoc b/lapcs.adoc new file mode 100644 index 0000000..2ec1e59 --- /dev/null +++ b/lapcs.adoc @@ -0,0 +1,687 @@ += Procedure Call Standard for the LoongArch™ Architecture +Version 20230519 + +Copyright © Loongson Technology 2023. All rights reserved. + +== Abstract + +This document describes the Procedure Call Standard used by the Application +Binary Interface (ABI) of the LoongArch Architecture. + +== Keywords + +LoongArch, Procedure call, Calling conventions, Data layout + +== Version History + +[%header,cols="^2,8"] +|==== +|Version +^|Description + +|20230519 +|initial version, derived from the original __LoongArch ELF psABI__ document. +|==== + +== Introduction + +This document defines the constraints on the program contexts exchanged between +the caller and called subroutines or a subroutine and the execution environment. +The subroutines following these constraints can be compiled and assembled separately +and work together in the same program. The terms "subroutine", "function" and "procedure" +may be used interchangeably throughout this document. + +That includes constraints on: + +- The initial program context created by the caller for the callee. +- The program context when the callee finishes its execution and returns to the caller. +- How subroutine arguments and return values should be encoded in these program contexts. +- How certain global states may be accessed and preserved by all subroutines. + +However, this document does not formally define how entities of standard programming +languages other than ISO C should be represented in the machine's program context, and +these language bindings should be described separately if needed. + +== Terms and Abbreviations + +*FP* + +Floating-point. + +*GPR* + +General-purpose register. + +*FPR* + +Floating-point register. + +*FPU* + +Floating-point unit, containing the floating-point registers. + +*GAR* + +General-purpose argument register, belonging to a fixed subset of GPRs. + +*FAR* + +Floating-point argument register, belonging to a fixed subset of FPRs. + +*GRLEN* + +The bit-width of a general-purpose register of the current ABI variant. + +*FRLEN* + +The bit-width of a floating-point register of the current ABI variant + + +== Processor Architecture + +=== The registers + +All LoongArch machines have 32 general-purpose registers and optionally 32 +floating-point registers. Some of these registers may be used for passing +arguments and return values between the caller and callee subroutines. + +The bit-width of both general-purpose registers and floating-point registers +may be either 32- or 64-bit, depending on whether the machine implements the LA32 +or LA64 instruction set, and whether or not do they have a single- or double-precision FPU. + +NOTE: In the following text, we use the term "temporary register" for +referring to caller-saved registers and "static registers" for callee-saved registers. + +==== General-purpose registers + +.General-purpose register convention +[%header,cols="^2,^2,^5,^3"] +|=== +|Name +|Alias +|Meaning +|Preserved across calls + +|`$r0` +|`$zero` +|Constant zero +|(Constant) + +|`$r1` +|`$ra` +|Return address +|No + +|`$r2` +|`$tp` +|Thread pointer +|(Non-allocatable) + +|`$r3` +|`$sp` +|Stack pointer +|Yes + +|`$r4 - $r5` +|`$a0 - $a1` +|Argument registers / return value registers +|No + +|`$r6 - $r11` +|`$a2 - $a7` +|Argument registers +|No + +|`$r12 - $r20` +|`$t0 - $t8` +|Temporary registers +|No + +|`$r21` +| +|Reserved +|(Non-allocatable) + +|`$r22` +|`$fp / $s9` +|Frame pointer / Static register +|Yes + +|`$r23 - $r31` +|`$s0 - $s8` +|Static registers +|Yes +|=== + +==== Floating-point registers + +.Floating-point register convention +[%header,cols="^2,^2,^5,^3"] +|=== +|Name +|Alias +|Meaning +|Preserved across calls + +|`$f0 - $f1` +|`$fa0 - $fa1` +|Argument registers / return value registers +|No + +|`$f2 - $f7` +|`$fa2 - $fa7` +|Argument registers +|No + +|`$f8 - $f23` +|`$ft0 - $ft15` +|Temporary registers +|No + +|`$f24 - $f31` +|`$fs0 - $fs7` +|Static registers +|Yes +|=== + +=== The memory and the byte order + +The memory is byte-addressable for LoongArch machines, and the ordering of the bytes +in machine-supported multi-byte data types is *little-endian*. That is, the least +significant byte of a data object is at the lowest byte address the data object +occupies in memory. + +The least significant byte of a 32-bit GPR / 64-bit GPR / 32-bit FPR / 64-bit GPR +is defined as storing the lowest byte of the data loaded from the memory with one +`ld.w` / `ld.d` / `fld.s` / `fld.d` instruction. This byte order is also respected +by other instructions that move typed data across registers such as `movgr2fr.d`. + +In this document, when referring to a data object (of any type) stored in a register, +it is assumed that these objects begin with the least significant byte of the register +with no lower-byte paddings. + +=== The base ABI variants + +Depending on the bit-width of the general-purpose registers and the floating-point +registers, different ABI variants can be adopted to preserve arguments and return +values in the registers as long as it is possible. + +[[base-abi-types]] +.Base ABI types +[%header,cols="^1m,5"] +|=== +|Name +^|Description + +|lp64s +|Uses 64-bit GARs and the stack for passing arguments and return values. +Data model is <> for programming languages. + +|lp64f +|Uses 64-bit GARs, 32-bit FARs and the stack for passing arguments and return values. +Data model is <> for programming languages. + +|lp64d +|Uses 64-bit GARs, 64-bit FARs and the stack for passing arguments and return values. +Data model is <> for programming languages. + +|ilp32s +|Uses 32-bit GARs and the stack for passing arguments and return values. +Data model is <> for programming languages. + +|ilp32f +|Uses 32-bit GARs, 32-bit FARs and the stack for passing arguments and return values. +Data model is <> for programming languages. + +|ilp32d +|Uses 32-bit GARs, 64-bit FARs and the stack for passing arguments and return values. +Data model is <> for programming languages. +|=== + +Different ABI variants are not expected to be compatible and linking objects in these +variants may result in linker errors or run-time failures. + +== Data Representation + +This specification defines machine data types that represents ISO C's scalar, +aggregate (structure and array) and union data types, as well as their layout +within the program context when passed as arguments and return values of procedures. + +=== Fundamental types + +.Byte size and byte alignment of the fundamental data (scalar) types +[%header,cols="^2,^5,^3,^6,^4"] +|=== +|Class +|Machine type +|Size (bytes) +|Natural alignment (bytes) +|Note + +.8+| Integral | Unsigned byte | 1 | 1 .2+| Character + | Signed byte | 1 | 1 + | Unsigned half-word | 2 | 2 | + | Signed half-word | 2 | 2 | + | Unsigned word | 4 | 4 | + | Signed word | 4 | 4 | + | Unsigned double-word | 8 | 8 | + | Signed double-word | 8 | 8 | + +.2+| Pointer | 32-bit data pointer | 4 | 4 | + | 64-bit data pointer | 8 | 8 | + +.3+| Floating Point | Single precision (fp32) | 4 | 4 .3+| IEEE 754-2008 + | Double precision (fp64) | 8 | 8 + | Quad-precision (fp128) | 16 | 16 +|=== + +NOTE: In the following text, the term "integral object" or +"integral type" also covers the pointers. + +[[int_ext_rules]] +When passed in registers as subroutine arguments or return values, +the unsigned integral objects are zero-extended, and the signed +integer data types are sign-extended if the containing register +is larger in size. + +One exception to the above rule is that in the *LP64D* ABI, unsigned words, +such as those representing `unsigned int` in <>, +are stored in general-purpose registers as proper _sign extensions_ of +their 32-bit values. + +=== Structures, arrays and unions + +The following conventional rules are respected: + +* Structures, arrays and unions assume the alignment of their most strictly +aligned components (i.e. with the largest natural alignment). + +* The size of any object is always a multiple of its alignment. +Tail paddings are applied to aggregates and unions if it is necessary +to comply with this rule. The state of the padding bits are not defined. + +* Each member within a structure or an array is consecutively +assigned to the lowest available offset with the appropriate alignment, +in the order of their definitions. + +Structs and unions may be passed in registers as arguments or return values. +The layout rules of their members within the registers are described +in the following section. + +=== Bit-fields + +Structures and unions may include bit-fields, which are integral values of +a declared integral type with a specified bit-width. The specified bit-width +of a bit-field may not be greater than the width of its declared type. + +A bit-field must be contained in a block of memory that is appropriate to +store its declared type, but it can share an addressable byte with +other members of the struct or union. + +When determining the alignment and size of the structure or the union, +only the member bitfields' declared integral types are considered, and +their specified width is irrelevant. + +It is possible to define unnamed bit-fields in C. The declared type of these +bit-fields do not affect the alignment of a structure or union. + +Zero-length bitfields defined in C and C++ should be ignored and take up +no storage space in memory or in the registers. + + +== Subroutine Calling Sequence + +A subroutine as described in this specification may have none or arbitrary number +of *arguments* and one *return value*. Each argument or return value have +exactly one of the machine data types. + +The standard calling requirements apply only to functions exported to linker-editors +and dynamic loaders. Local functions that are not reachable from other compilation +units may use other calling conventions. + +Empty structure / union arguments and return values should be simply ignored by C +compilers which support them as a non-standard extension. + +=== The registers + +The rationale of the LoongArch procedure calling convention is to pass +arguments and return values in registers as long as it is possible, so that +memory access and/or cache usage can be reduced to improve program performance. + +The registers that can be used for passing arguments and returning values are +the *argument registers*, which include: + +* *GARs*: 8 general-purpose registers `$a0` - `$a7`, where `$a0` and `$a1` are +also used for integral values. + +* *FARs*: 8 floating-point registers `$fa0` - `$fa7`, where `$fa0` and `$fa1` +are also used for returning values. + +An argument is passed using the stack only when no appropriate argument register +is available. + +Subroutines should ensure that the initial values of the general-purpose registers +`$s0` - `$s9` and floating-point registers `$fs0` - `$fs7` are preserved across +the call. + +At the entry of a procedure call, the return address of the call site is stored +in `$ra`. A branch jump to this address should be the last instruction executed +in the called procedure. + +=== The stack + +Each called subroutine in a program may have a stack frame on the run-time stack. +A stack frame is a contiguous block of memory with the following layout: + +[caption=] +[%header,cols="^1,^2,^1"] +|=== +|Position |Content |Frame + +|incoming `$sp` + +(high address) +|_(optional padding)_ + +incoming stack arguments +|Previous + +| +|... + +saved registers + +local variables + +paddings +.2+|Current + +|outgoing `$sp` + +(low address) +|_(optional padding)_ + +outgoing stack arguments +|=== + +The stack frame is allocated by subtracting a positive value from the stack +pointer `$sp`. Upon procedure entry, the stack pointer is required to be +divisible by 16, ensuring a 16-byte alignment of the frame. + +The first argument object passed on the stack (which may be the argument itself +or its on-stack portion) is located at offset 0 of the incoming stack pointer; +the following argument objects are stored at the lowest subsequent addresses that +meet their respective alignment requirements. + +Procedures must not assume the persistence of on-stack data of which +the addresses lie below the stack pointer. + + +=== Passing arguments + +When determining the layout of argument data, the arguments should be assigned to +their locations in the program context sequentially, in the order they appear in +the argument list. + +The location of an argument passed by value may be either one of: + +1. An argument register. +2. A pair of argument registers with adjacent numbers. +3. A GAR and an FAR. +4. A contiguous block of memory in the stack arguments region, with a constant +offset from the caller's outgoing `$sp`. +5. A combination of 1. and 4. + +The on-stack part of the structure and scalar arguments are aligned to +the greater of the type alignment and GRLEN bits, except when this alignment +is larger than the 16-byte stack alignment. In this case, the part of the +argument should be 16-byte-aligned. + +In a procedure call, GARs / FARs are generally only used for passing +non-floating-point / floating-point argument data, respectively. +However, the floating-point member of a structure or union argument, +or a floating-point argument wider than FRLEN may be passed in a GAR. +For example, a quadruple-precision floating-point argument may be passed or +returned in a pair of GARs if the GARs are 64-bit wide, otherwise it would be +passed or returned on the stack. + +NOTE: Currently, the following detailed description of parameter passing rules +is only guaranteed to cover the `lp64d` and `lp64s` variant, that is, `GRLEN` is +`64` and `FRLEN` is `64` or `0`. + +NOTE: In the following text, w~arg~ is used for denoting the size of the +argument object in bits. + +==== Scalars of fundamental types + +There are two cases: + +*0 < w~arg~ ≤ GRLEN* + +* The argument is passed in a single argument register, or on the stack by value +if none is available. + +* An fp32 / fp64 argument is passed in an FAR if there is one available. +Otherwise, it is passed in a GAR, or on the stack if none of the GARs are +available. When passed in registers or on the stack, fp32 / fp64 arguments +narrower than GRLEN bits are widened to GRLEN bits, with the upper bits undefined. + +* An integral argument is passed in a GAR if there is one available. +Otherwise, it is passed on the stack. If the argument is narrower than the +containing GAR, the <> +applies. + +*GRLEN < w~arg~ ≤ 2 × GRLEN* + +* The argument is passed in a pair of GARs with adjacent numbers, with the +lower-ordered GRLEN bits in the low-numbered register. If only one GAR +is available, the lower-ordered GRLEN bits are passed in this register +and the most-significant GRLEN bits are passed on the stack. If no GAR is +available, the whole argument is passed on the stack. + +==== Structures + +The storage location of structures during argument passing is also determined +by w~arg~ and GRLEN: + +NOTE: Zero-length array members in structures should be ignored by both C and +C++ compilers. + +*w~arg~ > 2 × GRLEN* + +* The structure is passed by reference, i.e. replaced in the argument list +with its memory address. If there is an available GAR, the address is passed +in the GAR, otherwise it is passed on the stack. + +*0 < w~arg~ ≤ GRLEN* + +* If the structure contains only one or two fp32 / fp64 members, +and the same number of FARs are available, the members are passed in +the FARs respectively, with the first member in the lower-numbered +register. + +* If the structure contains a fp32 / fp64 member and an integral +member, when an FAR and a GAR is available, the fp32 / fp64 member +of the structure is passed in the FAR, and the integer member of the +structure is passed in the GAR. If no FAR but one GAR is available, +the structure is passed in the GAR. If no GAR is available, +is then passed on the stack. + +* In other cases, if there is an available GAR, the structure is passed in +this register, otherwise it is passed on the stack. + +*GRLEN < w~arg~ ≤ 2 × GRLEN* + +* If the structure contains only two fp64 members, or an fp64 member and +an fp32 member, they are passed in a pair of adjacent FARs. If the FAR pair +is not available, the structure is passed in a pair of GARs. In both cases, +the first member is stored in the lower-numbered register. If only one GAR +is available, the second member is passed on the stack instead. If no GAR +is available, the structure is then passed on the stack. + +* If the structure is in of either of the following situation: + +** consists of only one fp128 member +** consists of one fp64 member and two adjacent fp32 members +** consists of 3-4 fp32 members + ++ +The argument is passed in a pair of available GARs, with the lower-ordered +bits in the lower-numbered register. If only one GAR is available, only the +lower-ordered bits are passed in the GAR, while the rest of the data is +passed on the stack. If no GAR is available, the structure is passed on +the stack. + +* If the structure has only one fp32 / fp64 member and one integral +member, the floating-point member of the structure is passed in +the FAR, and the integral member is passed in the GAR. If no FAR +is available, the members will be passed in two adjacent GARs separately. +If only one GAR is available, then the first member is passed in the GAR +and the second one is passed in on the stack. If neither a GAR nor an FAR +is available, all of the structure is passed on the stack. + +* Otherwise, the structure is passed in a pair of available GARs, with +the least significant bits stored in the lower-numbered register. If only +one GAR is available, the high-ordered bits passed on the stack instead. +If no GAR is available, all of the structure is passed on the stack. + +==== Unions + +Unions are only passed in the GARs or on the stack, depending on its width. + +*w~arg~ > 2 × GRLEN* + +* The union is passed by reference and is replaced in the argument list with +its memory address. If there is an available GAR, the reference is passed in +the GAR, otherwise, the address is passed on the stack. + +*0 < w~arg~ ≤ GRLEN* + +* The argument is passed in a GAR, or on the stack by value if no GAR is available. + +*GRLEN < w~arg~ ≤ 2 × GRLEN* + +* The argument is passed in a pair of available GARs, with the low-order bits +in the lower-numbered GAR and the high-order bits in the higher-numbered GAR. +If only one GAR is available, the low-order bits are in the GAR and the high-order +bits are on the stack. The arguments are passed on the stack when no GAR is available. + +==== Complex floating-points + +A complex floating-point number, or a structure containing just one complex +fp32 / fp64 number, is passed as though it were a structure containing two +fp32 / fp64 members. + +==== Variadic arguments + +A variadic argument list can appear at the end of a procedure's argument list, +which contains argument objects whose number and types are not statically +declared with the procedure itself. + +A variadic argument's location is also decided using its bit-width. If one +of the variadic arguments is passed on the stack, all subsequent arguments +should also be passed on the stack. The variadic arguments never occupy the +FARs. + +*w~arg~ > 2 × GRLEN* + +* The arguments are passed by reference and are replaced in the argument list +with the address. If there is an available GAR, the reference is passed in +the GAR, and passed on the stack if no GAR is available. + +*0 < w~arg~ ≤ GRLEN* + +* The variadic arguments are passed in a GAR, or on the stack by value if no +GAR is available. + +*GRLEN < w~arg~ ≤ 2 × GRLEN* + +* An argument object in the variadic argument list with 2 × GRLEN alignment +and size (e.g. an fp128 object) is passed in a pair of adjacent available GARs +of which the first register is even-numbered. If only one GAR is available, +the argument is passed on the stack, and this GAR would not be used for passing +subsequent argument objects. + +* For other types of argument objects, the variadic arguments are passed in a +pair of GARs. If only one GAR is available, the low-order bits are in the GAR +and the high-order bits are on the stack. + +* If no GAR is available, the argument object is passed on the stack by value. + +=== Returning + +In general, `$a0` and `$a1` are used for returning non-floating-point values, +while `$fa0` and `$fa1` are used for returning floating-point values. + +Values are returned in the same manner the first named argument +of the same type would be passed. If such an argument would have +been passed by reference, the caller should allocate memory for the +return value, and passes the address as an implicit first argument +that is stored in `$a0`. + +[[C-data-types]] +== Appendix: C data types and machine data types + +NOTE: For all base ABI types of LoongArch, the `char` data type in C is +signed by default. + +[[dm-lp64]] +.LP64 data model (base ABI types: `lp64d` `lp64f` `lp64s`) +[%header,cols="^1,^1"] +|=== +|Scalar type +|Machine type + +|`bool` / `_Bool` +|Unsigned byte + +|`unsigned char` / `char` +|Unsigned / signed byte + +|`unsigned short` / `short` +|Unsigned / signed half-word + +|`unsigned int` / `int` +|Unsigned / signed word + +|`unsigned long` / `long` +|Unsigned / signed double-word + +|`unsigned long long` / `long long` +|Unsigned / signed double-word + +|pointer types +|64-bit data pointer + +|`float` +|Single precision (IEEE754) + +|`double` +|Double precision (IEEE754) + +|`long double` +|Quadruple precision (IEEE754) +|=== + +[[dm-ilp32]] +.ILP32 data model (base ABI types: `ilp32d` `ilp32f` `ilp32s`) +[%header,cols="^1,^1"] +|=== +|Scalar type +|Machine type + +|`bool` / `_Bool` +|Unsigned byte + +|`unsigned char` / `char` +|Unsigned / signed byte + +|`unsigned short` / `short` +|Unsigned / signed half-word + +|`unsigned int` / `int` +|Unsigned / signed word + +|`unsigned long` / `long` +|Unsigned / signed word + +|`unsigned long long` / `long long` +|Unsigned / signed double-word + +|pointer types +|32-bit data pointer + +|`float` +|Single precision (IEEE754) + +|`double` +|Double precision (IEEE754) + +|`long double` +|Quadruple precision (IEEE754) +|=== +