Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transformer for JSON strings #7

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

tiurin
Copy link

@tiurin tiurin commented Jan 29, 2025

Motivation

Some services response contains JSON strings within JSON strings, e.g. Cloudwatch Logs for EventBridge Pipes events.

We do have first level JSON strings parsed but we don't go recursively into parsed JSONs. This leads to the need of extracting and parsing such payloads separately in the test because otherwise only regex transformers can be applied to such string. Ideally, snapshot library should take care of this. See corresponding discussion (thanks @gregfurman for inspiration!).

Changes

Starting to parse nested JSONs in common transformation flow is a breaking change for existing recorded snapshots. This PR proposes non-breaking approach by introducing a separate transformer that can be applied on demand. This draft shows a usage example: https://github.com/localstack/localstack-ext/pull/3971

@pytest.mark.parametrize(
"input_value,transformed_value",
[
pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add test case for empty value

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for thinking about a backward-compatible way of adding such a neat feature, making lists of JSON-encoded snapshots much easier to handle 👏 .

As discussed, the main open part is handling the special edge case of empty strings. Otherwise, I added some suggestions and ideas for future iterations 📈

Idea 💡: One of the disadvantages of automatic parsing is that we are loosing structural parity information, we can:

  • a) highlight this in our snapshot testing guidelines
  • b) implement a technical solution at some point such as using some kind of metadata to store the raw value or structural information (e.g., key__META__: {"json": True})

Sidenotes (as discussed and partly related):

  • Clarify dev installation instructions with troubleshooting for the weird editable install issue (separate quick docs PR)
  • Improve Utility inheritance structure such that LocalStack Core only adds AWS-specific transformer utils (e.g., lambda_api) and utils such as sorting are implemented in the snapshot library

@pytest.mark.parametrize(
"input_value,transformed_value",
[
pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great parametrization coverage 💯
I like the inline id within pytest.param rather having a disconnected id list

class JsonStringTransformer:
"""
Parses JSON string at key.
Additionally, attempts to parse any JSON strings inside the parsed JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could provide a simple example (e.g., the first test case) to clarify usage and foster easy adoption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, highlighting the difference to the default behavior might be most relevant, clarifying that it adds extra coverage for JSON lists starting with [ such as [{},{}]

json_value = json.loads(input_data)
input_data = self._transform_nested(json_value)
except JSONDecodeError:
SNAPSHOT_LOGGER.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good log level choice given nested parsing is best effort and we might encounter false positives here 👏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it's worth logging here rather than continuing silently similar to the top-level behavior here:

pass # parsing error can be ignored

if k == self.key and isinstance(v, str):
try:
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON")
json_value = json.loads(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, we're missing a test and fix for empty strings here "" to avoid verbose exception logging


key: str

def __init__(self, key: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: To facilitate an easy transition, it could be nice to have an easy way to apply this transformer globally, for example, when not providing a key.

sidenote: it ultimately comes down to separating anchors (e.g., JSON path selectors) from actual transformer functionality, but that's far out of scope :)

@joe4dev
Copy link
Member

joe4dev commented Feb 4, 2025

👍 I tested this change against the localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 with the following changes (not necessary anymore once the Utility re-use hierarchy is fixed) and can confirm it works as expected (apart from the minor empty string stack trace):

         snapshot.add_transformer(
-            snapshot.transform.sorting(
+            SortingTransformer(
                 "events",
                 lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
             )
         )
-        snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        snapshot.add_transformer(JsonStringTransformer("payload"))

✅ There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document")) in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration (X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants