Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended Approach to Handle Serializing / Deserializing Bytes with Different Encodings? #635

Open
whatamithinking opened this issue May 21, 2024 · 2 comments

Comments

@whatamithinking
Copy link

Open API / JSON Schema have different encodings (base64, base32, base32hex, etc.)
apischema does not seem to support these out of the box and instead base64 encodes all bytes.
What is the recommended approach to supporting these?

I have a workaround (see below), but it might help others if other encodings were built in, maybe in a way that avoids creating custom types as I have below.

import base64 
import apischema
from apischema import ValidationError
from apischema.validation.errors import merge_errors
from apischema.deserialization.methods import (
    ConversionUnionMethod,
    StrMethod,
    BoolMethod,
    IntMethod,
    FloatMethod,
    NoneMethod,
    ConstrainedStrMethod,
    ConstrainedIntMethod,
    ConstrainedFloatMethod,
)


class Base32HexBytes(bytes): ...


class Base64Bytes(bytes): ...


class Base32Bytes(bytes): ...


class Base16Bytes(bytes): ...


class Base32HexStr(str): ...


class Base64Str(str): ...


class Base32Str(str): ...


class Base16Str(str): ...


def _serialize_base_32_hex_bytes(data: Base32HexBytes) -> str:
    return base64.b32hexencode(data).decode()


def _serialize_base_64_bytes(data: Base64Bytes) -> str:
    return base64.b64encode(data).decode()


def _serialize_base_32_bytes(data: Base32Bytes) -> str:
    return base64.b32encode(data).decode()


def _serialize_base_16_bytes(data: Base16Bytes) -> str:
    return base64.b32encode(data).decode()


def _deserialize_base_32_hex_bytes(data: Base32HexStr) -> str:
    return base64.b32hexdecode(data).decode()


def _deserialize_base_64_bytes(data: Base64Str) -> str:
    return base64.b64decode(data).decode()


def _deserialize_base_32_bytes(data: Base32Str) -> str:
    return base64.b32decode(data).decode()


def _deserialize_base_16_bytes(data: Base16Str) -> str:
    return base64.b16decode(data).decode()


apischema.serializer(_serialize_base_32_hex_bytes, source=Base32HexBytes)
apischema.serializer(_serialize_base_64_bytes, source=Base64Bytes)
apischema.serializer(_serialize_base_32_bytes, source=Base32Bytes)
apischema.serializer(_serialize_base_16_bytes, source=Base16Bytes)

apischema.deserializer(
    apischema.conversions.Conversion(base64.b32hexdecode, str, Base32HexBytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_32_hex_bytes, Base32HexStr, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b64decode, str, Base64Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_64_bytes, Base64Str, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b32decode, str, Base32Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_32_bytes, Base32Str, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b16decode, str, Base16Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_16_bytes, Base16Str, str)
)
@wyfo
Copy link
Owner

wyfo commented May 21, 2024

Bytes have default (de)serializer registered is on purpose to make it convenient for user as it base64 is AFAIK the "standard" way of doing.

If your API use an other format, you can register a different (de)serializer, and override the default one. That's the "recommended approach": leveraging apischema adaptability.

Or you can also use different types as you're doing in your code if you're mixing encoding. But the purpose of apischema is to not embed these custom types in its own code; I don't want a pydantic-like mess with dozens of predefined conversions provided when 3 lines of code are enough, thanks to conversion feature. apischema should only supports standard types, with an appropriate default (de)serializer.

The fact you opened the issue may highlight the lack of a dedicated example in the documentation, or maybe a FAQ. I will keep this issue opened as a reminder.

@whatamithinking
Copy link
Author

I think I understand where you are coming from, but it seems like a library with so much focus on openapi should support the different encodings that standard calls out, which are finite and supported by the base64 package in the python standard library. Adding support for these encodings internally, one way or another, would allow things to seemingly "just work" from the perspective of newcomers and could make apischema more of the goto for anyone solely focused on working with openapi, as I am.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants