-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce DC build time and complexity by getting rid of Scala and Spark #1890
Comments
I have prepared a branch to illustrate what would be removed... https://github.com/datacleaner/DataCleaner/compare/remove-scala?expand=1 |
The branch looks more or less good to me, although it's been so long I even looked at Java that I'm not sure I'd take my word for it. 😊 Buuuut it's mostly code removal, so as long as it builds and all the runners still work, I guess we're good. I can't test right now, but I'll try to see if I can get some time for it later today or tomorrow. However, I don't see deletions of the |
Regarding pulling it into its own extension, I guess it would take quite a bit of refactoring to allow runners, especially ones that needs to change the system in such a major way, in extensions? If I remember correctly (which is not a given), we/you tried something like that originally, but ended up in this way because such a fundamental change to running just got too hard without quite a bit of coupling. But maybe the Scala parts themselves could be kept in the extension, while the base of the runner was kept here? Admittedly that DOES sound like a bit of a strange design, but if we think the Spark runner still has value to users, it would be a shame to lose. |
I just realized that the branch is in no way ready to go :) I mean, there are a bunch of components that are just no longer included, and I guess we just didn't have integration tests for those, but they would disappear from the product if we merged that branch. But I think they're not too hard to reproduce, so that's definitely next step if we want to complete this issue. Regarding the Spark runner. I agree it's probably not going to be easy to make it a proper extension. I was more thinking that we could make it a separate distribution. A bit like I mean, the other thing is that Spark has moved on massively since this was built. I think everything will break and have to be partially rewritten if we just upgrade Spark to the latest version. But I think it's time that it does get upgraded somehow. |
Ah yeah, I remember most of the Spark components being reasonably simple. How about I finally get back to contributing (and re-jiggle my Java experience a bit) by taking at least some of them on? But it might be a few days before I get started. |
Yeah give it a shot! I'm ready to cheer you on! For my part I'm gonna then look into some of the more simple-n-stupid Scala-to-Java conversions in the non-Spark areas of the code. Like the HTML rendering module and more. |
#1890: Converted Value Distribution renderer to Java
Hi all,
I am picking up DC development for a bit, after a long hiatus. And coming back to this project is making me realize how long and complex a build we have. I would like to make DC build (and thereby the overall developer experience) much nicer by simplifying it. Right now I am spending a lot of time just getting it to compile on my fresh installation. And the main culprit is something that I've noticed before: Scala, and to some extent also the Spark module. So I would suggest to simplify the developer experience by:
The text was updated successfully, but these errors were encountered: