Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to Barrier Executors #19

Open
dmmiller612 opened this issue Nov 28, 2018 · 1 comment
Open

Move to Barrier Executors #19

dmmiller612 opened this issue Nov 28, 2018 · 1 comment
Labels
enhancement New feature or request

Comments

@dmmiller612
Copy link
Contributor

With spark 2.4.0, Barrier Executors were added to ensure tasks run at the same time. We should add this for training in SparkFlow.

@dmmiller612 dmmiller612 added the enhancement New feature or request label Nov 28, 2018
@PowerToThePeople111
Copy link

PowerToThePeople111 commented Jul 3, 2019

I would love that. Cause it seems to me, that the impact of not running the computations in parallel is the reason why models could underperform due to some partitions finishing last and having therefor the biggest impact on the final model. (https://www.youtube.com/watch?v=nNrdv45O3pE at 15:00) At least this is what i understood when listening to this interesting talk.

BTW: Should all partitions of the training data be of the same size? Are there any guarantuees on how close the model performance of this async training are to the ones of normal training or some other estimates that help me get a grasp on the impact of this execution model like "Given the same initial weights, would different model building processes with the same data lead to very different final weights just because of other proccesses running on the workers which might lead to partitions finishing in different orders?".

edit
Ok, I found the paper: https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf
edit

PS: I love your API and the fact that you decided to make it work so seemlessly with spark pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants