Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB CDC: Improve type mappings #25

Closed
amotl opened this issue Aug 19, 2024 · 11 comments
Closed

DynamoDB CDC: Improve type mappings #25

amotl opened this issue Aug 19, 2024 · 11 comments

Comments

@amotl
Copy link
Member

amotl commented Aug 19, 2024

Problem

Validation failed for data: Cannot cast expression `'[]'` of type `text` to `text_array`

Details

@wierdvanderhaar reported that when the CDC transformer processes an UPDATE event, generating an SQL statement like that

UPDATE "demo" SET data['items'] = '[]'

an exception is raised.

sqlalchemy.exc.ProgrammingError: (crate.client.exceptions.ProgrammingError) ColumnValidationException[Validation failed for data: Cannot cast expression `'[]'` of type `text` to `text_array`]
@amotl
Copy link
Member Author

amotl commented Aug 19, 2024

Hi. I guess this change might be the root cause of the issue?

It pretends to leave it to CrateDB to auto-cast values. Maybe that does not work for empty arrays with CrateDB?

@amotl
Copy link
Member Author

amotl commented Aug 19, 2024

With a demo table in CrateDB like that...

create table foo (data OBJECT(DYNAMIC));
insert into foo (data) values ('{"items": ["foo", "bar"]}');

This statement works,

UPDATE "foo" SET data['items'] = [];

while this statement doesn't. It raises the same error message like presented in the OP.

UPDATE "foo" SET data['items'] = '[]';
ColumnValidationException[Validation failed for data: Cannot cast expression `'[]'` of type `text` to `text_array`]

@amotl
Copy link
Member Author

amotl commented Aug 19, 2024

@wierdvanderhaar: Can I humbly ask you check if downgrading to commons-codec 0.0.6 could be an easy option for you?

Version 0.0.7 includes a change "DynamoDB: Fixed a syntax issue with text data type in UPDATE statements", which is probably responsible for this hiccup.

@amotl
Copy link
Member Author

amotl commented Aug 20, 2024

This patch includes a proposal to apply better type mapping for UPDATE statements.

@wierdvanderhaar
Copy link

@amotl Thank you for your effort and input. The downgrade to [https://commons-codec.readthedocs.io/changes.html#v0-0-6](common-codec 0.0.6) is not really an option here because that version 0.0.7 fixed an issue raised in this https://github.com/daq-tools/commons-codec/pull/19.

In general, we will need to do more to apply correct type mapping.

@amotl
Copy link
Member Author

amotl commented Aug 20, 2024

... carrying over two more reports by @hlcianfagna from #24 (review). Thanks!

Top-level arrays

We need a test case for the situation of a first level field being an array of objects, for instance

"testarrayobj": [{"field1":1},{"field2":"abc"}]

This cast is successful in CrateDB:

'[{"field1":1},{"field2":"abc"}]'::ARRAY(OBJECT)

String Set

Another situation is the one of a DynamoDB String Set for which CrateDB may have already got an ARRAY(TEXT) defined, but the actual payload may have a single string, for the INSERT or UPDATE to succed we need to wrap this in [ ].

@amotl amotl changed the title DynamoDB CDC: Validation failed for data: Cannot cast expression '[]' of type text to text_array DynamoDB CDC: Improve type mappings Aug 20, 2024
@amotl amotl transferred this issue from crate/cratedb-toolkit Aug 20, 2024
@amotl
Copy link
Member Author

amotl commented Aug 23, 2024

We need a test case for the situation of a first level field being an array of objects [...]

GH-34 may improve this situation already, including test cases. Do you agree?

Another situation is the one of a DynamoDB String Set for which CrateDB may have already got an ARRAY(TEXT) defined, but the actual payload may have a single string, for the INSERT or UPDATE to succed we need to wrap this in [ ].

That would be a serious anomaly to fix up, mostly because a stateless transformation routine will not know about the fact that CrateDB already may have something stored as a certain type.

If we do an inquiry upfront inserting, the situation may change, but a) this adds additional complexity and b) might produce edge case situations when NO data has been inserted beforehand, and may create new kinds of ambiguities. That doesn't mean we will not implement it as advised in order to gain DWIM, but I still wanted to mention it that it's not straight-forward.

If you have any idea how to solve it more elegantly, please let me know.

@amotl
Copy link
Member Author

amotl commented Aug 28, 2024

I think we are in a reasonable better position now, after improving the DynamoDB translator implementation on behalf of the latest releases. In this spirit, we may close this issue, and open different / more specific ones when applicable.

@wierdvanderhaar / @hlcianfagna: Please 👍 if you agree.

@amotl
Copy link
Member Author

amotl commented Aug 28, 2024

Just using the chance to convey a big learning for me, which you may also like, if you didn't know yet. It is about how to use parameterized queries together with SQL CASTs, which might not be possible using the more ad hoc value::TYPE syntax.

if isinstance(value, dict):
rval = f"CAST(:{column} AS OBJECT)"
elif isinstance(value, list) and value and isinstance(value[0], dict):
rval = f"CAST(:{column} AS OBJECT[])"

I picked it up from some MSSQL or Oracle page I guess and discussed with @seut that it would be worth to also bring it more highlighted to our documentation in one way or another. It is just a small yet important detail, I believe.

/cc @matkuliak

@amotl
Copy link
Member Author

amotl commented Aug 28, 2024

Closing this now. Please re-open when applicable.

@amotl amotl closed this as completed Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants