Skip to content

feat: Add recommendation for detecting base64-encoded messages #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
92 changes: 92 additions & 0 deletions changes/2020-07-13_detect-base64-encoded-messages/change.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
[//]: # "Copyright Amazon.com Inc. or its affiliates. All Rights Reserved."
[//]: # "SPDX-License-Identifier: CC-BY-SA-4.0"

# Detect Base64-encoded Messages

## Affected Features

This serves as a reference of all features that this change affects.

| Feature |
| --------------------------------------- |
| [Decrypt](../../client-apis/decrypt.md) |

## Affected Specifications

This serves as a reference of all specification documents that this change affects.

| Specification |
| --------------------------------------- |
| [Decrypt](../../client-apis/decrypt.md) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the SHA be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the example here, which only includes the SHA in "Affected Features" but not in "Affected Specifications": https://raw.githubusercontent.com/awslabs/aws-encryption-sdk-specification/master/changes/2020-05-13_remove-keyring-trace/change.md

@mattsb42-aws Do you think we need to use permalinks in both places? Whatever we decide should be written down in #137.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern that I've been following is to use permalinks when referencing features that are being removed or referencing specific lines in files. I think that a standard relative link is appropriate here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comment on #137, switching the existing permalink back to relative.


## Affected Implementations

This serves as a reference for all implementations that this change affects.

| Language | Repository |
| ---------- | ------------------------------------------------------------------------------------- |
| C | [aws-encryption-sdk-c](https://github.com/aws/aws-encryption-sdk-c) |
| Java | [aws-encryption-sdk-java](https://github.com/aws/aws-encryption-sdk-java) |
| Javascript | [aws-encryption-sdk-javascript](https://github.com/aws/aws-encryption-sdk-javascript) |
| Python | [aws-encryption-sdk-python](https://github.com/aws/aws-encryption-sdk-python) |

## Definitions

### Conventions used in this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).

## Summary

When a decryption attempt is made on the Base64-encoding of a valid message,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a decryption attempt is made on the Base64-encoding of a valid message

Does this mean a user is trying to decrypt a Base64-string, the utf-8 encoding of a Base64-string, or both/either?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UTF-8 encoding of the string - decrypt accepts a message as a byte sequence after all.

FWIW, a Base64 string only uses ASCII characters so nearly all encodings will use the same byte values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decrypt accepts a message as a byte sequence after all

The spec must acknowledge dynamically typed languages though, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! But it MUST interpret whatever input it is given as a sequence of bytes somehow. If it's idiomatic to accept a string because it quacks like a byte sequence, the operation has to be clear about how it will interpret it as bytes: whether it will use a particular hard-coded encoding like UTF-8, or rely on an encoding attached to the string value, etc.

In a particular language implementation, I would expect the documentation to cover this if it's not obvious by context.

implementations SHOULD detect this and provide a more specific error message.

## Motivation

The message format is a binary format
but users commonly attempt to decrypt the Base64 encoding of this data.
Because the first two bytes of the message format have a very limited set of possible values
(currently they are in fact fixed),
the first two bytes of the Base64 encoding of a valid message are also simple to recognize.

## Drawbacks

If the version value ever advances beyond the hex value `40`,
it will not be possible to catch this error for all supported versions
since `41` would then conflict with the Base64-encoded value for `1`.
This is acceptable given it is a best-effort guard,
and also extremely unlikely
given how rarely we intended to change the version.

## Security Implications

This change SHOULD NOT have any security implications.

## Operational Implications

This change should have positive operational impact
because it removes ambiguity on a common failure mode,
reducing potential for customer confusion
and increasing the likelihood that customers can
diagnose this root cause without needing to contact us.

## Guide-level Explanation

When a decryption attempt is made on the Base64-encoding of a valid message,
implementations SHOULD detect this and provide a more specific error message on a best-effort basis.

## Reference-level Explanation

Implementations SHOULD detect the first two bytes of the Base64 encoding of any supported message [versions](../data-format/message-header.md#version-1)
and [types](../data-format/message-header.md#type)
and fail with a more specific error message.
In particular, the hex values to detect for the current set of versions and types are:

| Version and type (hex) | Base64 encoding (ascii) | Base64 encoding (hex) |
| ---------------------- | ----------------------- | --------------------- |
| 01 80 | A Y ... | 41 59 ... |

Note that the bytes that follow the initial two in the Base64 encoding
partially depend on subsequent bytes in the binary message format
and hence are not as predictable.
Comment on lines +81 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than stating a set of static values here, we should state how to find those values and provide an example of what they might be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to at least keep the table so that each implementation doesn't have to go through that process themselves.

30 changes: 30 additions & 0 deletions client-apis/decrypt.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@

0.1.0-preview

### Changelog

- 0.2.0

- [Detect Base64-encoded Messages](../changes/2020-07-13_detect-base64-encoded-messages/change.md)

- 0.1.0-preview

- Initial record

## Implementations

- [C](https://github.com/aws/aws-encryption-sdk-c/blob/master/source/session_decrypt.c)
Expand Down Expand Up @@ -52,6 +62,26 @@ Each key in the encrypted data key list is an encrypted version of the single pl
The encryption context is the additional authenticated data that was used during encryption.
The algorithm suite ID refers to the algorithm suite used to encrypt the message and is required to decrypt the encrypted message.

#### Encrypted Message Format

The message format is a binary format, but it is a common mistake for users to attempt decryption on the Base64 encoding of this data instead.
Because the first two bytes of the message format have a very limited set of possible values
(currently they are in fact fixed),
the first two bytes of the Base64 encoding of a valid message are also simple to recognize.

To make diagnosing this mistake easier, implementations SHOULD detect the first two bytes of the Base64 encoding of any supported message [versions](../data-format/message-header.md#version-1)
and [types](../data-format/message-header.md#type)
and fail with a more specific error message.
In particular, the hex values to detect for the current set of versions and types are:

| Version and type (hex) | Base64 encoding (ascii) | Base64 encoding (hex) |
| ---------------------- | ----------------------- | --------------------- |
| 01 80 | A Y ... | 41 59 ... |

Note that the bytes that follow the initial two in the Base64 encoding
partially depend on subsequent bytes in the binary message format
and hence are not as predictable.

### Cryptographic Materials Manager

A CMM that implements the [CMM interface](../framework/cmm-interface.md).
Expand Down