Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The migrator does not take advantage of the ScyllaDB driver #163

Closed
julienrf opened this issue Jun 26, 2024 · 4 comments
Closed

The migrator does not take advantage of the ScyllaDB driver #163

julienrf opened this issue Jun 26, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@julienrf
Copy link
Collaborator

We communicate with ScyllaDB via the the spark-cassandra-connector, which uses the Apache Cassandra driver under the hood (version 4.13 at the time of writing).

This prevents us from taking advantage of the specific ScyllaDB driver such as shard awareness.

We should consider swaping the Cassandra driver with the ScyllaDB driver. We would probably have to change the spark-cassandra-connector itself, though.

@tarzanek
Copy link
Contributor

tarzanek commented Jun 26, 2024

so we'd first need to merge to our fork ( https://github.com/scylladb/spark-cassandra-connector/ ) and release spark connectors built on shard aware driver - e.g. https://github.com/tarzanek/spark-cassandra-connector/tree/v3.0.0-scylla
Until 3.1 I checked and made sure we can have similar version of respective driver version in https://github.com/scylladb/java-driver/tags (above 3.0 is bad example, but we have the respective version now, back then it was missing) , R&D is also releasing this and afaik only spark 3.5.1 does an upgrade of driver (and even to version that should be part of released ones already)

that said that above simple patch for connector might not be everything, special extensions will need similar changes as scylladb/java-driver#156 to be usable from rdds or leveraged in rdds (BYPASS CACHE being the most important one)
for all above we'd need some QA, which I think is biggest blocker now
technically the first step should be doable, so if we'd release it now, it would be without support

@julienrf julienrf added the enhancement New feature or request label Jul 10, 2024
@guy9
Copy link
Collaborator

guy9 commented Jul 14, 2024

Thanks @tarzanek , @julienrf , let's wait with this until I verify we have the resources from the QA team.

@julienrf julienrf self-assigned this Aug 25, 2024
@julienrf
Copy link
Collaborator Author

julienrf commented Aug 26, 2024

The first step to move forward is to publish our fork of spark-cassandra-connector, so that the Migrator can use it directly instead of building it from source.

I suggest creating a branch scylla-4.x in our repository with the following changes:

  • Rename organization from com.datastax to com.scylladb
  • Rename artifact from spark-cassandra-connector to spark-scylladb-connector
  • Document the motivation for the fork and the branching policy

I implemented such changes under my fork. Please @tarzanek review this branch and push it to the ScyllaDB fork if you approve it.

The next steps will be as follows (also tracked in scylladb/spark-scylladb-connector#6):

  • Rename our repository fork from spark-cassandra-connector to spark-scylladb-connector
  • Fast-forward merge the branch feature/track-token-ranges-3.5.0 into scylla-4.x and document the feature in the README
  • Set up a release process in the ScyllaDB fork
  • Release version 4.0.0
  • Use it in the Migrator in place of our submodule

And then we will be able to gradually introduce more features to the connector.

@julienrf
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants