diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt index 52dce7b4e9..b7391befb8 100644 --- a/docs/CMakeLists.txt +++ b/docs/CMakeLists.txt @@ -33,7 +33,6 @@ install(FILES manual/dynamic_specs.md manual/file_triggers.md manual/format.md - manual/hregions.md manual/index.md manual/large_files.md manual/lua.md diff --git a/docs/manual/format.md b/docs/manual/format.md index 0325c4123e..a0f37feef5 100644 --- a/docs/manual/format.md +++ b/docs/manual/format.md @@ -2,13 +2,13 @@ layout: default title: rpm.org - RPM Package format --- + # Package format -This document describes the RPM file format version 3.0, which is used -by RPM versions 2.1 and greater. The format is subject to change, and -you should not assume that this document is kept up to date with the -latest RPM code. That said, the 3.0 format should not change for -quite a while, and when it does, it will not be 3.0 anymore :-). +This document describes the RPM file format version 4.0. The format +is subject to change, and you should not assume that this document is +kept up to date with the latest RPM code. With that said, the basic +principles have not and are not likely to change significantly over time. \warning In any case, THE PROPER WAY TO ACCESS THESE STRUCTURES IS THROUGH THE RPM LIBRARY!! @@ -23,17 +23,20 @@ package file is divided in 4 logical sections: . Payload -- compressed archive of the file(s) in the package (aka "payload") ``` -All 2 and 4 byte "integer" quantities (int16 and int32) are stored in -network byte order. When data is presented, the first number is the -byte number, or address, in hex, followed by the byte values in hex, -followed by character "translations" (where appropriate). +All multi-byte "integer" quantities (int16, int32 and int64) are stored +in network byte order (big-endian). When data is presented, the first +number is the byte number, or address, in hex, followed by the byte +values in hex, followed by character "translations" (where appropriate). ## Lead The Lead is basically for file(1). All the information contained in the Lead is duplicated or superceded by information in the Header. Much of the info in the Lead was used in old versions of RPM but is -now ignored. The Lead is stored as a C structure: +now ignored. The details here are left for historical reasons, but +current and future development should use the Header structure instead. + +The Lead is stored as a C structure: \code struct rpmlead { @@ -48,31 +51,31 @@ struct rpmlead { }; \endcode -and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm -package: +and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm package: ``` 00000000: ed ab ee db 03 00 00 00 ``` -The first 4 bytes (0-3) are "magic" used to uniquely identify an RPM -package. It is used by RPM and file(1). The next two bytes (4, 5) -are int8 quantities denoting the "major" and "minor" RPM file format -version. This package is in 3.0 format. The following 2 bytes (6-7) -form an int16 which indicates the package type. As of this writing -there are only two types: 0 == binary, 1 == source. +The first 4 bytes (0-3) are the "magic" number used to uniquely +identify a file as an RPM package. It is used by RPM and file(1). +The next two bytes (4, 5) are int8 quantities denoting the "major" +and "minor" RPM file format version. For legacy reasons, this +version is always "3.0" (major version "3", minor version "0"), even +with packages built by RPM 4.0+ (referred to as RPM v4 packages). The +following 2 bytes (6-7) form an int16 which indicates the package type. +As of this writing there are only two types: 0 == binary, 1 == source. ``` 00000008: 00 01 72 70 6d 2d 32 2e ..rpm-2. ``` The next two bytes (8-9) form an int16 that indicates the architecture -the package was built for. While this is used by file(1), the true -architecture is stored as a string in the Header. See, lib/misc.c for -a list of architecture->int16 translations. In this case, 1 == i386. -Starting with byte 10 and extending to byte 75, are 65 characters and -a null byte which contain the familiar "name-version-release" of the -package, padded with null (0) bytes. +that the package was built for. While this is used by file(1), the +true architecture is stored as a string in the Header. In this case, +1 == i386. Starting with byte 10 and extending to byte 75, are 65 +characters and a null byte which contain the familiar +"name-version-release" of the package, padded with null (0) bytes. ``` 00000010: 31 2e 32 2d 31 00 00 00 1.2-1... @@ -88,85 +91,74 @@ package, padded with null (0) bytes. Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the package was built for. In this case, 1 == Linux. The next 2 bytes (78-79) form an int16 that indicates the signature type. This tells -RPM what to expect in the Signature. For version 3.0 packages, this -is 5, which indicates the new "Header-style" signatures. +RPM what to expect in the Signature. This is generally expected to +be 5, which indicates the use of "Header-style" signatures. ``` 00000050: 04 00 00 00 68 e6 ff bf ........ 00000058: ab ad 00 08 3c eb ff bf ........ ``` -The remaining 16 bytes (80-95) are currently unused and are reserved -for future expansion. +The remaining 16 bytes (80-95) are unused. ## Signature -A 3.0 format signature (denoted by signature type 5 in the Lead), uses -the same structure as the Header. For historical reasons, this -structure is called a "header structure", which can be confusing since -it is used for both the Header and the Signature. The details of the -header structure are given below, and you'll want to read them so the -rest of this makes sense. The tags for the Signature are defined in -lib/signature.h. - -The Signature can contain multiple signatures, of different types. -There are currently only three types, each with its own tag in the -header structure: - -``` - Name Tag Header Type - ---- ---- ----------- - SIZE 1000 INT_32 - MD5 1001 BIN - PGP 1002 BIN -``` - -The MD5 signature is 16 bytes, and the PGP signature varies with -the size of the PGP key used to sign the package. - -As of RPM 2.1, all packages carry at least SIZE and MD5 signatures, -and the Signature section is padded to a multiple of 8 bytes. +"Header-style" signatures use the same structure as the Header. For +historical reasons, this structure is called a "header structure", +which can be confusing since it is used for both the Header and the +Signature. The details of the header structure are given below, and +you'll want to read them so the rest of this makes sense. The tags +for the Signature are defined in include/rpm/rpmtag.h. + +The Signature can contain multiple different types of signatures, +stored under unique tags (just like the Header). Details about these +tags and the information they store can be found [here](signatures_digests.md). + +RPM v4 packages are expected to contain at least one of the SHA1HEADER +or SHA256HEADER tags, providing a cryptographic digest of the main +header, and may contain one or both of the PAYLOADDIGEST and +PAYLOADDIGESTALT tags, providing a cryptographic digest of the package +payload in the compressed and uncompressed forms, respectively. + +If the package has been cryptographically signed using OpenPGP, an +RSAHEADER or DSAHEADER tag ought to be present, which contains an +OpenPGP signature of the package header. Which tag is present +depends on which of the two (supported) OpenPGP algorithms was used +at signing time. Using a key based upon the RSA algorithm to sign +the package will result in the signature being stored in the +RSAHEADER tag, whereas the use of the EdDSA (ed25519) algorithm will +use the DSAHEADER tag instead. Older packages may use the +now-considered-obsolete DSA algorithm, and in that case the signature +would be stored in the DSAHEADER tag. + +As the package header itself contains a checksum of the payload (as +of RPM 4.14+), the header signature is sufficient to establish +cryptographic provenance of the package. + +Other signature tags which may be present are considered legacy and +their use is discouraged if a more modern option is available. ## Header The Header contains all the information about a package: name, version, file list, etc. It uses the same "header structure" as the -Signature, which is described in detail below. A complete list of the -tags for the Header would take too much space to list here, and the -list grows fairly frequently. For the complete list see lib/rpmlib.h -in the RPM sources. - -## Payload - -The Payload is currently a cpio archive, gzipped by default. The cpio archive -type used is SVR4 with a CRC checksum. - -As cpio is limited to 4 GB (32 bit unsigned) file sizes RPM since -version 4.12 uses a stripped down version of cpio for packages with -files > 4 GB. This format uses `07070X` as magic bytes and the file -header otherwise only contains the index number of the file in the RPM -header as 8 byte hex string. The file metadata that is normally found -in a cpio file header - including the file name - is completely -omitted as it is stored in the RPM header already. - -To use a different compression method when building new packages with -`rpmbuild(8)`, define the `%_binary_payload` or `%_source_payload` macros for -the binary or source packages, respectively. These macros accept an -[RPM IO mode string](https://ftp.osuosl.org/pub/rpm/api/4.17.0/group__rpmio.html#example-mode-strings) -(only `w` mode). +Signature, which is described in further detail below. A complete +list of the tags for the Header would take too much space to list +here, and the list grows fairly frequently. For the complete list +see include/rpm/rpmtag.h in the RPM sources. -## The Header Structure +### The Header Structure The header structure is a little complicated, but actually performs a -very simple function. It acts almost like a small database in that it -allows you to store and retrieve arbitrary data with a key called a -"tag". When a header structure is written to disk, the data is -written in network byte order, and when it is read from disk, is is -converted to host byte order. +very simple function. It acts almost like a small database in that +it allows you to store and retrieve arbitrary data with a key called +a "tag". When a header structure is written to disk, the data is +written in network byte order (big-endian), and when it is read from +disk, is is converted to host byte order. -Along with the tag and the data, a data "type" is stored, which indicates, -obviously, the type of the data associated with the tag. There are -currently 9 types: +Along with the tag and the data, a data "type" is stored, which +indicates, obviously, the type of the data associated with the tag. +There are currently 9 types: ``` Type Number @@ -178,7 +170,7 @@ currently 9 types: INT32 4 INT64 5 STRING 6 - BIN 7 + BIN 7 STRING_ARRAY 8 I18NSTRING_TYPE 9 ``` @@ -264,3 +256,101 @@ could start at byte 589, byte that is an improper boundary for an INT32. As a result, 3 null bytes are inserted and the date for the SIZE actually starts at byte 592: "00 09 9b 31", which is 629553). +### Immutable header regions + +One useful feature of RPM is the ability to preserve the original +header from a package, so that metadata can be verified separately +from the payload, e.g. using signatures saved in the rpm database. +This ability was added in RPM 4.0.12 with the concept of immutable +header regions. + +A short description of the implementation is as follows. +As described above, an rpm header has three sections: +``` + 1) intro (# entries in index, # bytes of data) + 2) index 16 byte entries, one per tag, big endian + 3) data tag values, properly aligned, big endian +``` +Representing sections in the header (ignoring the intro) with +``` + A,B,C index entries sorted by tag number + a,b,c variable length entry data + | boundary between index/data +``` +a header with 3 tag/value pairs (A,a) can be represented something like +``` + ABC|abc +``` +The change is to introduce a new tag that keeps track of a contiguous +region (i.e. the original header). Representing the boundaries with +square/angle brackets, an "immutable region" in the header thus becomes +``` + [ABC|abc] +``` +or more generally (spaces added for clarity) +``` + [ABC> QRS | [DEF> QRS | > QRS | < QRS XYZ | QRS D | 4 GB. This format uses `07070X` as magic bytes and the file +header otherwise only contains the index number of the file in the +RPM header as 8 byte hex string. The file metadata that is normally +found in a cpio file header - including the file name - is completely +omitted as it is stored in the RPM header already. + +To use a different compression method when building new packages with +`rpmbuild(8)`, define the `%_binary_payload` or `%_source_payload` +macros for the binary or source packages, respectively. These macros +accept an [RPM IO mode string](https://ftp.osuosl.org/pub/rpm/api/4.17.0/group__rpmio.html#example-mode-strings) +(only `w` mode). diff --git a/docs/manual/hregions.md b/docs/manual/hregions.md deleted file mode 100644 index 681f4aa135..0000000000 --- a/docs/manual/hregions.md +++ /dev/null @@ -1,91 +0,0 @@ ---- -layout: default -title: rpm.org - Immutable header regions ---- -# Immutable header regions in rpm-4.0.1 and later - -The header data structure has changed in rpm-4.0.[12] to preserve the -original header from a package. The goal is to keep the original -header intact so that metadata can be verified separately from the -payload by the RHN up2date client and by the rpm command line verify -mode using signatures saved in the rpm database. I believe the change -is entirely forward and backward compatible, and will not require -any artifacts like changing the version number of packaging or -adding an "rpmlib(...)" tracking dependency. We'll see ... - -Here's a short description of the change. An rpm header has three sections: -``` - 1) intro (# entries in index, # bytes of data) - 2) index 16 byte entries, one per tag, big endian - 3) data tag values, properly aligned, big endian -``` - -Representing sections in the header (ignoring the intro) with -``` - A,B,C index entries sorted by tag number - a,b,c variable length entry data - | boundary between index/data -``` -a header with 3 tag/value pairs (A,a) can be represented something like -``` - ABC|abc -``` - -The change is to introduce a new tag that keeps track of a contiguous -region (i.e. the original header). Representing the boundaries with -square/angle brackets, an "immutable region" in the header thus becomes -``` - [ABC|abc] -``` -or more generally (spaces added for clarity) -``` - [ABC> QRS | [DEF> QRS | > QRS | < QRS XYZ | QRS D | 4GB. +Dsaheader | 267 | bin | OpenPGP DSA or EdDSA signature of the header (if thus signed) +Rsaheader | 268 | bin | OpenPGP RSA signature of the header (if thus signed). +Sha256header | 273 | string | SHA256 digest of the header. Payloaddigest | 5092 | string array | Cryptographic digest of the compressed payload. Payloaddigestalgo | 5093 | int32 | ID of the payload digest algorithm. Payloaddigestalt | 5097 | string array | Cryptographic digest of the uncompressed payload. -Rsaheader | 268 | bin | OpenPGP RSA signature of the header (if thus signed). -Sha1header | 269 | string | SHA1 digest of the header. -Sha256header | 273 | string | SHA256 digest of the header. -Siggpg | 262 | bin | OpenPGP DSA signature of the header+payload (if thus signed). -Sigmd5 | 261 | bin | MD5 digest of the header+payload. -Sigpgp | 259 | bin | OpenPGP RSA signature of the header+payload (if thus signed). -Sigsize | 257 | int32 | Header+payload size. +Sigsize | 257 | int32 | Deprecated: Header+payload size. +Sigpgp | 259 | bin | Deprecated: OpenPGP RSA signature of the header+payload (if thus signed). +Sigmd5 | 261 | bin | Deprecated: MD5 digest of the header+payload. +Siggpg | 262 | bin | Deprecated: OpenPGP DSA signature of the header+payload (if thus signed). +Sha1header | 269 | string | Deprecated: SHA1 digest of the header. +Longsigsize | 270 | int64 | Deprecated: Header+payload size if > 4GB. ## Installed package headers only diff --git a/include/rpm/rpmtag.h b/include/rpm/rpmtag.h index dec9c9244c..65c295e327 100644 --- a/include/rpm/rpmtag.h +++ b/include/rpm/rpmtag.h @@ -435,7 +435,7 @@ typedef enum rpmSigTag_e { RPMSIGTAG_RESERVEDSPACE = 1008,/*!< internal space reserved for signatures */ RPMSIGTAG_BADSHA1_1 = RPMTAG_BADSHA1_1, /*!< internal Broken SHA1, take 1. */ RPMSIGTAG_BADSHA1_2 = RPMTAG_BADSHA1_2, /*!< internal Broken SHA1, take 2. */ - RPMSIGTAG_DSA = RPMTAG_DSAHEADER, /*!< internal DSA header signature. */ + RPMSIGTAG_DSA = RPMTAG_DSAHEADER, /*!< internal DSA or EdDSA header signature. */ RPMSIGTAG_RSA = RPMTAG_RSAHEADER, /*!< internal RSA header signature. */ RPMSIGTAG_SHA1 = RPMTAG_SHA1HEADER, /*!< internal sha1 header digest. */ RPMSIGTAG_LONGSIZE = RPMTAG_LONGSIGSIZE, /*!< internal Header+Payload size (64bit) in bytes. */