-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to extend to data vault 2.0 with Spark and/or BigQuery as target? #31
Comments
Hey @dkapitan, thanks for your interest in diepvries! Correct me if I'm wrong, but what you'd like to have is the ability for diepvries to generate SQL loading statement for other dialects than Snowflake, e.g. BigQuery and Databricks? Do you also have other architectural changes in mind, e.g. some structures in the raw vault that can't be loaded by diepvries? To keep you up-to-date with our vision for the future of this framework, we are currently experimenting with using SQLalchemy for all generated SQL statements, in order to have other SQL dialects. Using SQLalchemy doesn't automatically solve all problems, as some SQL statements can differ between dialects and are not always 100% supported (e.g. |
Hi @matthieucan, thanks for your response. Regarding your questions:
That is indeed what I am looking for.
For now, I just want to follow along with Going with SQL Alchemy makes perfect sense to me. Please let me know if there is any way I could help. For example, generating an overview with the key differences between dialects seems useful for refining our needs. I have worked a lot with BigQuery, and getting into Apache Spark more and more, so I could help with those. |
Understood!
While we can't commit to any timeline for this implementation, you might get inspired by this file in a proof-of-concept branch: https://github.com/PicnicSupermarket/diepvries/blob/matthieucan/sqlalchemy/src/diepvries/test_sqlalchemy.py |
Hello all one question how is the porting to SQL Alchemy - I wanted to test it PostgreSQL database. |
Hi, unfortunately this has not been implemented |
Seems like I am the first one to post an issue here 😄. Very interested in diepvries, looking into it for one of my Dutch clients. We are considering the data vault 2.0 pattern for the datalake.
As described in this article there are a couple of issues that need solving for using data vault for datalake technologies like Databricks/Delta Lake or Google BigQuery. Most fundamental one is limited or non-existent functionality for constraints and foreign keys.
The article proposes to use deterministic hashkeys, which I can see working. So my question is, do you think it is doable to extend diepvries such that it can generate DDL for the target schema of the Raw Vault (that's our current scope, so not full DV2.0) for different target engines, most notably Databricks / Delta Lake and BigQuery.
Any suggestions are welcome, and if this is a workable solution we would be willing and able to contribute to diepvries.
The text was updated successfully, but these errors were encountered: