Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

schema is an invalid JSON Schema instance #94

Open
Hatko opened this issue May 24, 2021 · 7 comments
Open

schema is an invalid JSON Schema instance #94

Hatko opened this issue May 24, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@Hatko
Copy link

Hatko commented May 24, 2021

I'm not sure if this issue should be reported here, but any help will be highly appreciated.

Describe the bug
On running meltano with tap-postgres on the tables from my DB I get an error:

CRITICAL ('`schema` is an invalid JSON Schema instance: {"type": "SCHEMA", "stream": "public-blogPosts", "schema": {"type": "object", "properties": {"slug": {"type": ["string"], "maxLength": 255}, "title": {"type": ["null", "string"], "maxLength": 255}, "body": {"type": ["null", "string"]}, "tags": {"type": ["null", "array"], "items": {"$ref": "#/definitions/sdc_recursive_string_array"}}, "image": {"type": ["null", "string"]}, "created_at": {"type": ["null", "string"], "format": "date-time"}, "updated_at": {"type": ["null", "string"], "format": "date-time"}}, "definitions": {"sdc_recursive_integer_array": {"type": ["null", "integer", "array"], "items": {"$ref": "#/definitions/sdc_recursive_integer_array"}}, "sdc_recursive_number_array": {"type": ["null", "number", "array"], "items": {"$ref": "#/definitions/sdc_recursive_number_array"}}, "sdc_recursive_string_array": {"type": ["null", "string", "array"], "items": {"$ref": "#/definitions/sdc_recursive_string_array"}}, "sdc_recursive_boolean_array": {"type": ["null", "boolean", "array"], "items": {"$ref": "#/definitions/sdc_recursive_boolean_array"}}, "sdc_recursive_timestamp_array": {"type": ["null", "string", "array"], "format": "date-time", "items": {"$ref": "#/definitions/sdc_recursive_timestamp_array"}}, "sdc_recursive_object_array": {"type": ["null", "object", "array"], "items": {"$ref": "#/definitions/sdc_recursive_object_array"}}}}, "key_properties": ["slug"], "bookmark_properties": []}\n', '`$ref` path "{\'type\': [\'null\', \'string\', \'array\'], \'items\': {\'$ref\': \'#/definitions/sdc_recursive_string_array\'}}" is recursive')

To Reproduce
Steps to reproduce the behavior:

  1. Follow the tutorial from meltano
  2. Run the pipeline: meltano elt tap-postgres target-postgres --job_id=gitlab-to-postgres

Expected behavior
Data is imported

Your environment

  • Version of tap: [?]
  • Version of python [e.g. 3.7.9]
@Hatko Hatko added the bug Something isn't working label May 24, 2021
@Hatko
Copy link
Author

Hatko commented May 24, 2021

After further investigation, it seems that I'm getting this error for the tables which have fields of type array

@aaronsteers
Copy link

aaronsteers commented May 25, 2021

@Hatko and Pipelinewise team - This issue has come up several times in our slack debugging recently.
Instead of emitting...

{"type": ["null", "array"], "items": {"$ref": "#/definitions/sdc_recursive_string_array"}},

could this be changed to...

{"type": ["null", "array"], "items": {"type": "string"}}

?

Conversation links in Meltano slack here: https://meltano.slack.com/archives/C01TCRBBJD7/p1621970218073800

@Samira-El
Copy link
Contributor

Hey folks, can you provide the DDL for a table that's causing the issue?

@socketbox
Copy link

socketbox commented Jun 29, 2021

Hi there!

I've attached an example table that cannot be parsed by the AdSwerve BigQuery target/loader.

Here's the error:

tap-postgres--telemetry (out)    | {"type": "RECORD", "stream": "public-nutritionfacts_channelstatistics", "record": {"anon_visitors_with_logs": null, "birth_year_stats_learners_id": null, "birth_year_stats_non_learners_id": null, "channel_id": "a9b25ac981", "gender_stats_learners_id": null,                         ᠁"gender_stats_non_learners_id": null, "id": 196, "pingback_id": 88456, "popular_counts": [148, 87, 71, 68, 67, 62, 60, 59, 58, 56, 53, 52, 51, 51, 51, 51, 50, 49, 49, 48, 48, 48, 47, 47, 46, 46, 46, 46, 46, 45, 45, 45, 45, 44, 44, 44, 43, 43, 43, 43, 43, 43, 43, 43, 41, 41, 41, 41, 41, 41], "popular_ids":          ᠁["68770fb21f", "c5f86f9eeb", "de25ff7354", "ab4080f3ae", "bfe54c8616", "f75cebb06c", "ccad5c6e3b", "164ef090e3", "aff3c12082", "c354384854", "2d64c10c14", "106160699a", "1d2b921ec6", "6e0b3d41d9", "a08a37526d", "bf41e2189d", "ae498332d6", "524ae4fbb4", "8dede44b0e", "526cd2b76a", "8b9955d8d4", "ddcb669f6a",        ᠁"4b690798fc", "bfcac40564", "ae67a9b8ec", "bae9e238a4", "c3f4a1fdf0", "c43a01a3a9", "c73f23f556", "19eeb24a4c", "570a45419d", "8bd746f710", "d74f658aea", "1a70bc4927", "200a333f5a", "ff75462038", "04e8310d53", "20e81e3d5c", "535654245c", "8248b708ad", "8379ddb0ab", "866d5f190b", "867b9d82b1", "fc20e9e999",         ᠁"3ccef806f0", "4b31ac99b3", "58b302ba4b", "61ea0cb2cc", "b0c3656b26", "f63ffee36c"], "sess_anon_count": 482, "sess_anon_count_no_visitor_id": null, "sess_anon_time": 1094, "sess_kinds": {"audio": 945, "document": 5834}, "sess_user_count": 6297, "sess_user_time": 23319, "statspingback_id": null, "storage": 174,     ᠁"summ_complete": 1260, "summ_started": 1845, "updated": "2018-08-22T00:00:00+00:00", "users_with_logs": null, "version": 11}, "version": 1624940197788, "time_extracted": "2021-06-29T04:16:37.788165Z"}
meltano                          | DEBUG Deleted configuration at /project/.meltano/run/elt/telemetry-el/e92e3c6a-5e89-4965-96d1-96b36180f3e8/target.config.json
meltano                          | DEBUG Deleted configuration at /project/.meltano/run/elt/telemetry-el/e92e3c6a-5e89-4965-96d1-96b36180f3e8/tap.config.json
meltano                          | ERROR Loading failed (2): CRITICAL ['Traceback (most recent call last):\n', '  File "/project/.meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/__init__.py", line 93, in main\n    for state in state_iterator:\n', '  File "/project/.       ᠁meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/process.py", line 40, in process\n    for s in handler.handle_record_message(msg):\n', '  File "/project/.meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/processhandler.py", line  ᠁110, in handle_record_message\n    new_rec = filter_by_schema(schema, msg.record)\n', '  File "/project/.meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/schema.py", line 75, in filter\n    record[key])  # adswerve fix to match schema field name\n', '  File "/project/.    ᠁meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/schema.py", line 84, in filter\n    prop_type, _ = get_type(props)\n', '  File "/project/.meltano/loaders/target-bigquery--telemetry/venv/lib/python3.7/site-packages/target_bigquery/schema.py", line 23, in get_type\n        ᠁f"\'type\' or \'anyOf\' are required fields in property: {property}"\n', "ValueError: 'type' or 'anyOf' are required fields in property: {'$ref': '#/definitions/sdc_recursive_string_array'}\n"] 
meltano                          | DEBUG ELT could not be completed: Loader failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/meltano/cli/elt.py", line 237, in _run_elt
  ⦙ await _run_extract_load(elt_context, output_logger)
  File "/usr/local/lib/python3.7/site-packages/meltano/cli/elt.py", line 280, in _run_extract_load
  ⦙ loader_out=loader_out_writer,
  File "/usr/local/lib/python3.7/site-packages/meltano/core/runner/singer.py", line 263, in run
  ⦙ loader_out=loader_out,
  File "/usr/local/lib/python3.7/site-packages/meltano/core/runner/singer.py", line 233, in invoke
  ⦙ raise RunnerError("Loader failed", {PluginType.LOADERS: target_code})
meltano.core.runner.RunnerError: Loader failed

ddl.sql.txt

@Somtom
Copy link

Somtom commented Aug 15, 2021

AdSwerve BigQuery target/loader.

Having the same issue with BigQuery

@Samira-El
Copy link
Contributor

Hi all, I've used the provided DDL statement to try to reproduce this issue, I tried the same source with pipelinewise-target-postgres and pipelinewise-target-snowflake and I didn't get any error. Both targets are able to handle this schema and create the equivalent columns.

I'm not familiar with Meltano or the targets that facing a problem here so I'm wondering how are these targets using $ref.

@spacecowboy
Copy link

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants