Ponder Implicitly Removes Null Bytes from Event Arguments #1456

shrugs · 2025-01-20T22:32:41Z

Version

0.8.28

Current behavior

In indexing ENS, some labels with null bytes are emitted, but these labels cannot be stored in postgres (which is likely the reason ponder trims them to begin with). Unfortunately this removes information from the application-layer who might handle these cases differently. In ENS's case, it's important that 'vitalik\x00'.eth and vitalik.eth are differentiable.

In this specific case, we can use const hadNullBytes = labelhash(name) !== label as a proxy for null-byte detection, since the label argument in that event is just labelhash(name) and will be different for the original name value that includes the null byte.

More information is available here, where I debugged this behavior 👇

namehash/ensnode#36

Expected behavior

I believe the current behavior of ponder is extremely reasonable, but I worry that the developer integrating ponder loses this information and may accidentally index the wrong data. In our case we were treating these two names as identical, which messed up the resulting data (and would have perhaps enabled someone resolving vitalik.eth to receive the imposter's information. We assumed we were getting the raw argument data from ponder, intending to handle the null-byte ourselves, but were actually receiving strings with null bytes already removed.

It seems that the default behavior of postgres is to throw an error if a null byte is inserted, though I'm not sure why this didn't cause subgraph indexers from failing to index these names in the original postmortem.

https://ens.mirror.xyz/9GN77d-MqGvRypm72FcwgxlUnPSuKWhG3rWxddHhRwM

In any case, I think that the default behavior, where postgres throws an unrecoverable error, forcing the user to handle this case, is likely the correct path. Ponder can/should provide null-byte helper functions to make this easier on users, for example a containsNullCharacters or removeNullCharacters helper fn.

Steps to reproduce

You don't really need to reproduce this, but here's an offending tx: 0x535bf909ec258054cb03ab03498b4de0b365e1037a43fcdfbdcb28202382a9cd which you can access via ponder with the following config:

EthRegistrarControllerOld: {
      network: "mainnet",
      abi: EthRegistrarControllerOld,
      address: "0x283Af0B28c62C092C9727F1Ee09c02CA627EB7F5",
      startBlock: 16369364,
      endBlock: 16369364 + 1,
    },

and you can log the arguments provided to your NameRegistered event handler to see that the null byte is removed.

the relevant line in ponder is https://github.com/ponder-sh/ponder/blob/178bcbfd36b5eb418ee118c0e6b02a46c9c7ed6d/packages/core/src/sync/events.ts#L505C25-L505C45

Link to repository

https://github.com/namehash/ensnode

Anything else?

No response

The text was updated successfully, but these errors were encountered:

typedarray · 2025-01-20T23:13:41Z

Hi, thanks for opening. Was not aware of the historical context here, thanks for sharing.

We shipped this change ~2 weeks ago in 0.8.18, indeed by removing the null characters from decoded event arguments (event.args) before passing them to userland.

I can definitely see how this makes it difficult to achieve the desired behavior here for ENS, so I'm open to a different solution.

For context, we received several reports (both ENS and non-ENS apps) running into this issue because they were trying to store user-provided text in a postgres TEXT column. They had no idea where these characters came from. To put it another way - if we don't remove them, any ponder app that stores decoded event args containing user-provided bytes or string values is essentially vulnerable to the postgres insertion error crashing their app.

So, it's a bit of a dilemma. Couple ideas:

Add a global configuration option removeNullBytes which defaults to true
Remove the built-in sanitization but improve observability around this, eg log a warning when a decoded arg contains null bytes, and improve the postgres insertion error with a link to the docs
Change some behavior at the query builder / driver layer ?

My instinct is that ENS is the outlier here, so I struggle with any option that doesn't sanitize by default. That direction regresses to every ponder app having to manually sanitize event.args before writing to the database. If we can find examples of protocols other than ENS where null bytes need special treatment, that would be helpful.

Appreciate your take!

shrugs · 2025-01-20T23:50:41Z

It's a bit strict of me but I think the postgres error should be intentional behavior. It does mean user-education & burden on devs to sanitize user-provided strings, which is definitely an overhead.

I don't love a configuration flag, in a code-over-config way, it'd be more explicit for someone in userland to say .values({ comment: removeNullCharacters(event.args.comment) }) than a config laying about in a different file. It specifies which arguments are user-provided and that they explicitly no longer contain null bytes.

I agree that ENS is an outlier, but it does seem that storing user-provided strings is pretty universal. The unique aspect here is that in ENS these strings are hyper specific unique values and identity matters, versus something like the zora on-chain comments.

I'll admit that if postgres threw an error, it'd be a really shitty experience to not know about this and then have your index crash randomly in the future because of a user providing a malformed string. So ensuring that doesn't happen would be quite nice for DX — that's a point for the config option w/ defaults. That said the alternative (bad data because dev didn't know about null character removal) is a much worse type of bug (implicit, difficult to track down) that could occur in the future as well, depending on how the dev is using those user-provided values. Failing hard and fast seems preferable in this case.

But if ENS-style apps are a clear outlier, then the config + documentation seems like a reasonable choice.
A balanced option would be a WARNING log, like you mentioned, though in a long-running index those logs disappear quite quickly — any ideas for ensuring the dev becomes aware of the warning?

github-project-automation bot moved this to Todo in Ponder Roadmap Jan 20, 2025

github-project-automation bot added this to Ponder Roadmap Jan 20, 2025

shrugs mentioned this issue Jan 20, 2025

issues surfaced by diff tool at block 21M namehash/ensnode#36

Closed

5 tasks

lightwalker-eth mentioned this issue Jan 22, 2025

Index labels with null bytes namehash/ensnode#45

Open

holic mentioned this issue Feb 4, 2025

fix(store-sync): skip invalid utf-8 characters in strings before inserting into postgres latticexyz/mud#3562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ponder Implicitly Removes Null Bytes from Event Arguments #1456

Ponder Implicitly Removes Null Bytes from Event Arguments #1456

shrugs commented Jan 20, 2025 •

edited

Loading

typedarray commented Jan 20, 2025

shrugs commented Jan 20, 2025

Ponder Implicitly Removes Null Bytes from Event Arguments #1456

Ponder Implicitly Removes Null Bytes from Event Arguments #1456

Comments

shrugs commented Jan 20, 2025 • edited Loading

Version

Current behavior

Expected behavior

Steps to reproduce

Link to repository

Anything else?

typedarray commented Jan 20, 2025

shrugs commented Jan 20, 2025

shrugs commented Jan 20, 2025 •

edited

Loading