Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Automatic Schema Retrieval in DDL for Kafka Sources Using Confluent Schema Registry #692

Open
hazelnut-99 opened this issue Jul 22, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@hazelnut-99
Copy link

When creating a new source connection through the web UI and selecting Avro as the data format with Confluent Schema Registry as the schema type, users can omit specifying the schema, as it is automatically loaded from the Confluent Schema Registry.

However, when defining a source using DDL within a pipeline, it currently requires explicit schema definition. For instance, the following DDL statement:

CREATE TABLE my_kafka_source WITH (
    'connector' = 'kafka',
    'avro.confluent_schema_registry' = 'true',
    'bootstrap_servers' = 'my_server',
    'schema_registry.endpoint' = 'my_endpoint',
    'type' = 'source',
    'topic' = 'my_topic',
    'bad_data': 'drop',
    'source.offset': 'latest',
    'source.read_mode': 'read_committed',
    'sink.commit_mode' = 'at_least_once',
    'format' = 'avro'
);

leads to an error when subsequently trying to query the table:

SELECT my_field FROM my_kafka_source;

Error: Schema error: No field named my_field.

It would be nice if ad-hoc DDLs inside pipeline definition could support automatic schema retrieval from the Confluent Schema Registry, similar to the functionality available in the web UI.

@mwylde mwylde added the enhancement New feature or request label Jul 23, 2024
@mwylde
Copy link
Member

mwylde commented Jul 23, 2024

Agreed, this would be a great feature. It's a bit tricky because the schema is needed for planning, so this would add a dependency on schema registry as part of SQL planning. The schema might also change, which means that the same query might plan today but fail tomorrow. There also wouldn't be feedback for the user as to what the schema is. I think these issues are surmountable, but will require some design work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants